Gephi boosts its performance with new “GraphStore” core

Gephi is a graph visualization and analysis platform – the entire tool revolves around the graph the user is manipulating. All modules (e.g. filter, ranking, layout etc.) touch the graph in some way or another and everything happens in real-time, reflected in the visualization. It’s therefore extremely important to rely on a robust and fast underlying graph structure. As explained in this article we decided in 2013 to rewrite the graph structure and started the GraphStore project. Today, this project is mostly complete and it’s time to look at some of the benefits GraphStore is bringing into Gephi (which its 0.9 release is approaching).

Performance is critical when analyzing graphs. A lot can be done to optimize how graphs are represented and accessed in the code but it remains a hard problem. The first versions of Gephi didn’t always shine in that area as the graphs were using a lot of memory and some operations such as filter were slow on large networks. A lot was learnt though and when the time came to start from scratch we knew what would move the needle. Compared to the previous implementation, GraphStore uses simpler data structures (e.g. more arrays, less maps) and cache-friendly collections to make common graph operations faster. Along the way, we relied on many micro-benchmarks to understand what was expensive and what was not. As often with Java, this can lead to surprises but it’s a necessary process to build a world-class graph library.

Benchmark

We wanted to compare Gephi 0.8.2 and Gephi 0.9 (development version) so we’ve  created a benchmark to test the most common graph operations. Here is what we found. The table below represents the relative improvement between the two versions. For instance, “2X” means that the operation is twice faster to complete. A benchmarking utility was used to guarantee the measurements precision and each scenario was performed at least 20 times, and up to 600 times in some cases. We used two different classic graphs, one small (1.5K nodes, 19K edges) and one medium (83K nodes, 68K edges) . Larger graphs may be evaluated in a future blog article.

Benchmark / Graph SMALL (n=1490, e=19025) MEDIUM (n=82670, e=67851)
Node Iteration 23.0x 34.6x
Edge Iteration 40.1x 109.4x
Node Lookup 1.6x 2.1x
Edge Lookup 1.2x 2.3x
Get Edge 1.1x 1.2x
Get Degree 2.5x 2.3x
Get Neighbors 3.4x 1.2x
Set Attributes 2.3x 0.1x
Get Attributes 3.3x 4.0x
Add Nodes 6.2x 5.7x
Add & Remove Nodes 1.4x 2.9x
Add Edges 7.7x 3.8x
Add & Remove Edges 3.3x 1.8x
Create View 2851.0x 4762.3x
Iterate Nodes In View 2.7x 1.5x
Iterate Edges In View 11.6x 7.3x
Save Project 2.4x 1.7x
Load Project 0.6x 0.6x
Project File Size 1.9x 1.5x

These benchmarks show pretty remarkable improvements in common operations, especially read ones such as node or edge iteration. For instance, in average it takes 40 to 100 times less CPU to read all the edges in the graph. Although this benchmark focus on low-level graph operations it will bring material improvements to user-level features such as filter or layout. The way GraphStore creates views is different from what we were doing before, and doesn’t require a deep graph copy anymore – explaining the large difference. Finally, only the set attribute is significantly slower but that can be explained by the introduction of inverted indices, which are updated when attributes are set.

And what about memory usage? Saving memory has been one of our obsession and there’s good news to report on that front as well. Below is a quick comparaison between Gephi 0.8.2 and Gephi 0.9 for the same medium graph above.

Benchmark Gephi 0.8.2 Gephi 0.9 Improvement
Simple graph 115MB 52MB 2.2X
Graph with 5 attribute columns 186MB 55MB 3.4X

This benchmark shows a clear reduction of memory usage in Gephi’s next version. How much? It’s hard to say as it really depends on the graph but the denser (i.e. more edges) and the more attributes, the more memory saved as significant improvements have been made in these areas. Dynamic graphs (i.e. graphs that have their topology or attributes change over time) will also see a big boost as we’ve redesigned this part from scratch.

What’s next?

All of the GraphStore project benefits are included in the upcoming 0.9 release and that’s the most important. However, the work doesn’t end and there’s many more features and performance optimization that can be added.

Then, we count on the community’s help to start collaborating with us on the GraphStore library – calling all database and performance experts. GraphStore will continue to live as an all-purpose Java graph library, released under the Apache 2.0 license and independent from Gephi (i.e. Gephi uses GraphStore but not the opposite). We hope to see it used in other projects in the near future.

graphstore-api

GraphStore API, represented as a graph

GSoC: interconnect Gephi Graph Streaming API and GraphStream

My name is Min WU and during this Google Summer of Code I have worked on the project to interconnect Gephi Graph Streaming API and the GraphStream library. My mentors are Yoann Pigné and André Panisson.

This project aims at interconnecting the GraphStream’s dynamic graph event model with Gephi in order to have Gephi to visualize an ongoing graph evolution and measurement. Based on this project, users can model and simulate complex systems with GraphStream while observing the output with the visual tools offered by Gephi.

GraphStream is written in Java. In order to use streams of graph events in other languages, GraphStream provides the NetStream framework, i.e. a network interface, such that other projects written in other language can use GraphStream. The NetStream framework consists of three parts, receiver, sender and the NetStream Protocol. The receiver is responsible for receiving graph events from the network and dispatching them to pipes. It works within only one thread, listening at a given address and port while receiving graph events from several streams, actually several threads or clients. The sender encodes graph events into messages according to the NetStream protocol and send them to a defined receiver with given port, host and stream ID. Every message contains sourceId, timeId and event context, among which the combination of sourceId and timeId is dedicated to distinguish between several streams and solve the synchronization issue. Finally the NetStream protocol specifies the message format at byte level.

Gephi also supports the idea of “streams of graph events”. It has a framework for graph streaming in Gephi plugin built by André Panisson during the 2010 GSoC, through a multi-threaded socket server. Other applications can push graph data to the Gephi server through the network, and have it visualized. In this graph streaming project, operations (a concept similar to event) are invoked through HTTP requests made by the client to the server, based on a JSON format.

Work done

In my project, I interconnected Gephi and GraphStream based on André’s Graph Streaming plugin. Since NetStream on GraphStream side works on NetStream protocol while Graph Streaming API on Gephi side works on JSON protocol, we have to make them compatible with each other. Considering the flexible interoperability and language agnostic properties, I have chosen the JSON protocol to do the interconnection and implement a sender part and a receiver part.

The sender part (JSONSender) is responsible for sending events from GraphStream to Gephi. GrpahStream works as a client and Gephi works as a server. Every time the graph in GraphStream changes, a corresponding event is sent to Gephi. Gephi handles the event and changes its own graph. In this way, the sender part works as a sink of the GraphStream graph, so it must implement the sink interface which contains methods to deal with graph element events and attribute events. In each method, we first encode the event message into a JSON string, and then send it to Gephi. We connect to Gephi and use “updateGraph” operation to send events. The corresponding URL is “http://host:port/workspace?operation=updateGraph”. The host and port must match with the Gephi sever and the workspace is a destination workspace of Gephi, for example an URL can be “http://127.0.0.1:8080/workspace0?operation=updateGraph”. The Gephi server and client are built with the “Graph Streaming API ” in the Gephi-plugin.

The receiver part (JSONReceiver) is responsible for receiving events from Gephi. It listens to Gephi and waits for events. Every time the graph in the Gephi changes, a corresponding event will be send to GraphStream. Then the GraphStream handles the event and changes its graph object. In this way, the receiver part works as a source of the GraphStream graph. In order to listen to Gephi events, we use a URL within “getGraph” operation to connect to Gephi. The corresponding URL is “http://host:port/workspace0?operation=getGraph”.

With these two classes, we can interconnect GraphStream and Gephi in real-time. Two tutorials are given to show how to do real-time connection between GraphStream and Gephi, see the video below. If you are interested in the detail implementation, please refer to the manual page.

The first class is GraphSender, which aims at loading a graph in GraphStream and dynamically displays it on a Gephi workspace. We need to create a graph instance and a JSONSender instance, and plug the JSONSender instance as a sink of the graph instance. Since then, when we generate the graph, or load the graph from a file, Gephi will display it in real-time.

The other class is LinLogLayoutReceiver. The Lin-Log layout in GraphStream is dedicated to find communities in graphs. This tutorial shows the execution of a Lin-Log layout in GraphStream and the sending of the layout information to Gephi in real time. We first load a graph in Gephi, display it and apply some algorithms. Then we send the graph to GraphStream and apply the Lin-Log layout on the graph on the GraphStream side. Meanwhile we visualize the layout process on the Gephi side in real time. To achieve it, we create a graph instance and a JSONReceiver instance, and then get the ThreadProxyPipe instance and plug the graph instance as an ElementSink of the pipe instance. Then we apply the Lin-Log layout, and create a new thread in which to create a JSONSender instance and plugin it as a sink of the graph layout.

Distribution

This project is distributed under MIT license. You can refer to the code on Github. By the way, I feel very appreciative for my mentors’ supervision. Thank you very much!

GSoC mid-term: a new Timeline to explore time-varying networks

daniel

My name is Daniel Bernardes and during this Google Summer of Code I am working on the new Timeline interface.

Dynamic graphs have been the subject of increasing interest, given their potential as a theoretical model and their promising applications. Following this trend, Gephi has incorporated tools to study dynamic networks. From a visualization perspective, a critical tool is the Timeline component, which allows users to select pertinent time intervals and display and explore the corresponding graph. The challenge concerning the timeline was twofold: redesign the component to improve user experience and add extra features and introduce an animation scheme with the possibility to export the resulting video.

Together with my mentors Cezary Bartosiak and Sébastien Heymann, we have proposed a new design for the timeline component featuring a sparkline chart in the background of the interval selection drawer (which is semi transparent): this feature will help the user to focus on particular moments of the evolution of the dynamic graph, like bursts of connections or changes in graph density or other simple graph metrics. Current metrics are the evolution of the number of nodes, the number of edges and the graph density. The sparkline chart was preferred to other chart solutions because it does not add too much visual pollution to the component and adds to the qualitative analysis. The interaction with the drawer remains globally the same of the old timeline, to guarantee a smooth transition for the user.

To implement this feature we have used the chart library JFreeChart (a library already incorporated to Gephi), customizing their XYPlot into a Sparkline chart by modifying their visual attributes. To display the Sparkine, one needs to measure the properties of the graph in several time instants of the global time frame where the dynamic graph exists. This represented a major challenge, since the original architecture did not allow the timeline component to access (and measure) the graph in particular instants of time; the solution was to introduce a slight modification to the DynamicGraph API to provide an object which gave us snapshots of the graph at given instants. Other challenges we dealt with included the automatic selection/switching of real number/time units in the timeline (depending on the nature of the graph in question) and sampling granularity of the timeline.

Another breakthrough of this project was the introduction of the timeline animation. Once the user has selected a time frame with the drawer it can make it slide as the corresponding graph is being displayed on the screen. Besides the technical aspects of interaction between the timeline and the animation controller, there were also an effort to calibrate the animation (ie, in terms of speed and frames) so it would be comfortable and meaningful for the user.

As far as the UI is concerned, the component has gained a new “Reset” button next to the play button which activates the timeline drawer and displays the chart. It also serves to reset the drawer selection to the full interval when the timeline is active. The play button gained its original function, that is, to control the animation of the timeline — instead of activating the selection.

Finally, the animation export to a video format revealed to be more tricky than expected and couldn’t be finished as planned. There were several setbacks to this feature, beginning with the selection of a convenient library to write de movie container: it turns out that the de facto options available are not fully Java-based and need an encoder working in the background. The best alternative I found was Xuggler, which is based on ffmpeg. Also, obtaining screen captures of the graph to were a little bit tricky so I have exported SVG images from the graph corresponding to each frame, converted them to jpeg and than encoded them though Xuggler to a video format. As one might expect, this solution is not very efficient in terms of time, so Mathieu Bastien and my mentors suggested me to wait for the new features from the new Visualization API that would make this process simpler.

In addition to current bugfixes and minor improvements concerning the timeline and the animation, the movie export remains the the next big step to close this project. If you have questions or suggestion, please do not hesitate! The new timeline will be available in the next release of Gephi.

DB

The HTTP Graph plugin

The HTTP Graph plugin provides real-time collection and visualization of HTTP traffic. Using the embeddable Membrane Router, details are extracted from the transaction headers and fed to Gephi for graphing and further analysis. This approach makes the relationships between clients, servers, and resources easily visible.


See the video in HD on Vimeo.

Nodes

There are 4 types of nodes: client, uri, host, domain.

Client: By default, the largest sized nodes with the source IP addresses of clients for labels. If you are the only one pointing to the plugin’s proxy, there will probably be only one of these nodes that says 127.0.0.1. Clients are linked to a domain node of ‘local’ to keep them together on the graph. Another function of the client node is to keep the graph anchored. You may find it interesting to use a filter in Gephi to hide the client type nodes to see a more “free-form” graph of the internet. If you do this, you may see large pieces float away because they didn’t link to the rest of the graph anymore!

URI: By default, the smallest sized nodes with no visible labels. These represent resources like HTML pages, images, javascript, or whatever other resources might be requested through the proxy. The size in bytes and the MIME (Content-Type) reported by the host when returning the resource is available so you can see what it is.

Host: For a given domain (.gephi.org, .google.com, etc.) there can be multiple hosts which serve the different resources. In some cases, you may see the same resource being served from multiple servers in a DNS-based load balancing system. Other interesting details about the underlying architecture of the sites you are viewing can be seen.

Domain: These nodes exist primarily to keep the related hosts close together on the graph. You may want to use a filter in Gephi for this type of node and hide them to see a different perspective.

Edges

HTTP and the web are defined by links, which are essentially directed graph edges, and these occur at the resource level. An HTML page resource will link to CSS, image, and other file resources, both on the same domain, and on remote domains. These inter-domain links are the glue that forms the structure of the world wide web.

Have fun!

~by phreakocious

Get the HTTP Graph plugin on the Plugins Center, or in Gephi go to Tools > Plugins > Available plugins.

The Egyptian Revolution on Twitter

This is a preliminary result of the network of retweets with the hashtag #jan25 at February 11 2011, at the time of the announcement of Mubarak’s resignation. If you retweeted someone, or has been retweeted, it is possible that your username is one of these tinny points (or maybe a bigger one?).

To collect the network data, I used the Gephi Graph Streaming plugin, connected it to a Python web server I made myself. This web server works like a bridge, it connects to the Twitter Streaming API using the statuses/filter service and converts the users and retweets to nodes and edges in a network format that can be read by the Gephi Graph Streaming plugin. Nodes are twitter users, and links appear between the nodes A and B when B retweeted a message of A containing the hashtag #jan25.

The static network visualization is just the final result of about one hour of data collection. It is a dynamic network, and it’s possible to get much more information from the collected data. For example, before the announcement, there were few nodes and edges, sparse in time. But when the announcement arrives, a boom of retweets appears on the network. A video with the flow of retweets is available on YouTube. It shows the dynamic network construction during the hour of data collection, compacted in less than four minutes. During the collection, I run Gephi with the Force Atlas layout just adjusting some parameters from default: repulsion strength to 2000, attraction strength to 0.3 and speed to 10.



I was very lucky to get this data. On February 11 afternoon I was testing the Python server that works as bridge and connected to Twitter. I tried some interesting hashtags to see it working, and at the moment #jan25 seemed to be an active hashtag. I let the application run for some time, adjusted some parameters for visualization, and at some point there was a burst in the activity. I didn’t understood what was happening until I checked again my Twitter account and realized that the Egypt’s vice-president had just made the resignation announcement. After it, I proceeded collecting data, and the final result was this network. It was very interesting to see, in real time, the exact moment when Tahrir Square, from a mass protest demonstration, has been transformed in a giant party, and the burst in the Twitter’s activity. It was like covering in real time a virtual event, a big event that was happening in the Twitter virtual world.

After playing with the data, I found that the data I got through the Twitter Streaming API is only approximately 10% of the total. I’m now working to recover all data and hopeful soon I can make available the full graph of retweets.

Dataset available in a GEXF file here. Download it and play with it with Gephi!

André Panisson / www.

—–
This work is part of a research project involving the Computer Science Department of the University of Turin (www.di.unito.it), the Complex Networks and Systems Group of the ISI Foundation (www.isi.it), and the Informatics department of Indiana University (http://cnets.indiana.edu/).
—-

/seadragon-samples/twitter_jan25/seadragon.html

GSoC 2010 mid-term: Dynamic attributes and statistics

Cezary Bartosiak

During this summer, six students are working on Gephi with the Google Summer of Code. They contribute to Gephi by developing new features that will be integrated in the 0.8 version, released later this year.

 

The project which is done by Cezary Bartosiak focuses special attention on further development of dynamic network analysis (DNA) in Gephi. The aim is to create a framework which would make it possible to build and query a dynamic graph with use of proper API. It has got a practical purpose, for instance analyzing evolution of networks (see in particular M. Argollo de Menezes, A.-L. Barabási Fluctuations in Network Dynamics) or dynamic networks visualization. The article shows the most important features provided by this GSoC project.

 

In the current 0.7 version we can import dynamic graphs written in GEXF syntax and then filter them using Timeline component. Unfortunately, it only filters graphs topologies and that means hiding nodes and/or edges.

The obvious step is make it possible to handle dynamic changes not only of graph topology but also attributes connected with nodes and edges. It can be done by creating a proper API. This API could be used by other modules, like Statistics to make dynamic versions of them. Computing metrics like Degree Distribution or Clustering Coefficient for each time interval in the time series has got a great interest to analyze graphs within time.

So, getting down to brass tacks, the most important tasks are:

  • A data structure to host dynamic attributes efficiently which would make it possible to present them in Data Laboratory module.
  • A Dynamic API which has got the following features: the Dynamic Graph Decorator, that wraps the graph and a time interval, returns static graphs copies for given time intervals, attributes values arrays for given nodes/edges and time intervals.
  • Adapting Metrics framework to use Dynamic API to propose dynamic versions of existing metrics.

There are also additional features, which will be done in the future (probably they will not be included in the nearest release):

  • Dynamic visualization of attributes.
  • Dynamic version of the Ranking module – dynamic visualization attributes transformation.

I’ll try to shortly describe how these features are done.

Dynamic attributes

It is a very interesting task from a programmer’s point of view since it requires implementing a complicated data structure like Interval Tree (see also Antoine Vigneron – Segment trees and interval trees). But also users will judge it necessary. The purpose is to make it possible to read dynamic attributes from GEXF files and host them efficiently. Thanks to that we are able to get values of attributes of different time intervals. It goes without saying how powerful feature it is. To show how it is working, let’s consider one node (written in GEXF syntax):

<node id="1" label="Some node">
<attvalues>
<attvalue for="0" value="abcdefgh"/>
<attvalue for="2" value="1" end="2009-03-01"/>
<attvalue for="2" value="2" start="2009-03-01" end="2009-03-10"/>
<attvalue for="2" value="1" start="2009-03-10"/>
</attvalues>
</node>

As we can see we have got one dynamic attribute (id = 2) which has three different values in different time intervals. The first interval starts in the “negative infinity”. We simply assume that it only ends, never starts. But if we have got some bounds, for instance, a related graph has its start and end times, this attribute would “start” in the same moment as the graph. It is rather intuitive. The second interval exists from 2009-03-01 to 2009-03-10 and the last one exists from 2009-03-10 to “positive infinity” or graph’s bound.

After importing this to Gephi we can simply get values of ANY time interval we want, for example [-inf, +inf]. But we should know how to estimate a final value. In the above example we have got three values: 1, 2 and 1. To solve the problem which of them should be returned, we provide a set of estimators like AVERAGE, MEDIAN, MODE, SUM, MIN, MAX, FIRST and LAST. Each of them has got different behavior that depends on a type of attribute, i.e. for real numbers they behave like in statistics.

So, users will be able to get values of different time intervals on demand, for instance in Data Laboratory module or (in the future) see them on the screen as a part of a rendered graph. For instance we have got some attribute like priority. A potential user will be able to choose between several possibilities like: nothing (it means this attribute should not be visualized), color, stroke, thickness etc. It means, for instance, that if some node has got this attribute close to its upper bound its stroke thickness would be very high. And, on the other hand, if one node has got this attribute close to its lower bound only its internal color could be visualized.

Metrics framework

For now it is possible to count a set of important metrics but all of them take a “static graph” into consideration. The idea of dynamic metrics is then to execute the static ones in a loop, where the graph changes according to time interval. The following screen shows that use of these additional metrics is similar to their static brothers:

Dynamic Metric (click on the image)

In the screen we can see only Dynamic Degree Power Law, but of course every dynamic metric will be implemented (during writing this article this module was still under development – it also means that the final product could differ from this one presented above). So, user inserts important information like time interval etc. and gets a separate report for every time interval. What are the other results?
The result for each node/edge is written in the graph, so one can see this in Data Laboratory.
General result is also written and presented in the report.

Conclusion

Evolution of networks, network dynamics and dynamic network analysis are hot topics nowadays. There is growing interest in studying these issues. It causes that there is bigger and bigger need of DNA analysis tools. In my opinion Gephi is heading towards being one of the best…

Cezary Bartosiak

GSoC 2010 mid-term: Graph Streaming API

andre-panisson

During this summer, six students are working on Gephi with the Google Summer of Code. They contribute to Gephi by developing new features that will be integrated in the 0.8 version, released later this year.

The purpose of the Graph Streaming API project, run by André Panisson, is to build a unified framework for streaming graph objects. Gephi’s data structure and visualization engine has been built with the idea that a graph is not static and might change continuously. By connecting Gephi with external data-sources, we leverage its power to visualize and monitor complex systems or enterprise data in real-time. Moreover, the idea of streaming graph data goes beyond Gephi, and a unified and standardized API could bring interoperability with other available tools for graph and network analysis, as they could start to interoperate with other tools in a distributed and cooperative fashion.

 

With the increasing level of connectivity and cooperation between systems, for a system that aim to be interoperable, it is imperative to comply with the available standards. Graph objects are abstractions that can represent a wide range of real-world structures, from computer networks to human interactions, and there are a lot of standards to exchange graph data in different formats, from text-based formats to xml-based formats. But the real-world structures are constantly changing, and the current formats are not suitable to exchange such type of dynamic data.

A lot of well-established systems already stream data to its users using a streaming API. Twitter for example defined a Streaming API to allow near-realtime access to its data. They are using two different formats: XML and JSON, but JSON is strongly encouraged over XML, as JSON is more compact and parsing is greatly simplified.

We are not the first to implement a Graph Streaming API, and another very interesting experience is the GraphStream Java Library. It is composed of an API that gives a way to add edges and nodes in a graph and make them evolve. The graphs are composed of nodes and edges that can appear, disappear or be modified, and these operations are called events. The sequence of operations that occur in a graph is seen as a stream of events.

So, as other people already had successful experiences with graph streaming, why not start our work based on these experiences? That’s what we are doing, and beyond finding these experiences very useful, we are also trying to be compatible with the available work. The first Gephi Graph Streaming release will use two formats: JSON for flexibility, and a text-based format, based in the GraphStream implementation.

The first version of the Graph Streaming features will be available in the next release of Gephi, but it’s already possible to taste some of these features. To illustrate how simple it will be to connect to a master, the following video shows Gephi connecting to a master and visualizing the received graph data in real time. The graph in this demo is a part of the Amazon.com library, where the nodes represent books and the edges represent their similarities. For each book, a node is added, the similar books are explored, adding the similar ones as nodes and the similarity as an edge.

 

 

The Graph Streaming specification goes beyond the simple fact that a client can pull data from a master: in fact, clients can interact with the master pushing data to it, in a REST architecture. The same data format used by the master to send graph events to the clients is used by clients to interact with the master.

In the next example, we will transform Gephi in a master to provide graph information to its clients. At the Streaming Tab in the Gephi application, you can access all the features of graph streaming. You can connect to a Master by clicking the ‘+’ button, but you can also transform your Gephi in a master by right-clicking the “Master Server” and selecting “Start” (You are not limited to a single master by host: each Gephi workspace can be available as a master). By default, the HTTP server will listen at port 8080 in plain HTTP, and at port 8443 using SSL. The server path depends on your workspace: each workspace uses a different path. You can configure these parameters (and also Basic Authentication) at the “Settings…” button:

 

Graph Steaming Server start

Graph Steaming Settings Panel

 

Now, you can connect to it using some simple HTTP client. For example, you could use curl to see the data flowing. First of all, open a shell window and execute the following command:

curl "http://localhost:8080/workspace0"

With this, you are connecting to your workspace at Gephi. If the workspace is empty, you will receive no data, but you will remain connected, so you will receive all events from now.

Now open another shell prompt, and with the following commands, you could see a triangle appearing at Gephi:

curl "http://localhost:8080/workspace0?operation=updateGraph" -d $'
{"an":{"A":{"size":10,"r":1,"g":0,"b":0,"z":0,"y":500,"x":70}}}r
{"an":{"B":{"size":10,"r":1,"g":0,"b":0,"z":0,"y":90,"x":250}}}r
{"ae":{"AB":{"source":"A","target":"B","weight":10,"r":0,"g":0,"b":0,"directed":false}}}r
{"an":{"C":{"size":10,"r":1,"g":0,"b":0,"z":0,"y":90,"x":-90}}}r
{"ae":{"BC":{"source":"B","target":"C","weight":10,"r":0,"g":0,"b":0,"directed":false}}}r
{"ae":{"CA":{"source":"C","target":"A","weight":10,"r":0,"g":0,"b":0,"directed":false}}}'

At the same time, all events will be sent to your connected client, in the other shell window.

With the following commands you can retrieve some of the data:

curl "http://localhost:8080/workspace0?operation=getNode&id=A"
curl "http://localhost:8080/workspace0?operation=getEdge&id=AB"

And you could start manipulating your graph through command line, as you like. There are other event types for changing and removing edges and nodes, for more information about them see the current status of the JSON Streaming Format, available at this page. We recall that this format is subject to changes, as the API was build to be very flexible and more requirements are being added to it.

But what about connecting two different Gephi instances together? One instance will be master, and the other client. Using the Graph Streaming API, a change in a graph at the master’s workspace should cause a change in the client’s workspace, and a change at the client’s workspace will cause it to send requests to the master to update its graph accordingly. Both instances working in a distributed mode. In fact, different people could work in a distributed mode to construct a graph: it’s the Collaborative Graph Construction.

My personal impressions about it

For me as a researcher, Gephi has the potential to become a de-facto standard for manipulating and visualizing large scale graphs. I believe that the research community is still lacking a high-quality, general-purpose, community-supported framework for exploratory analysis of large-scale dynamical graph data, and I believe that Gephi has the potential to fill this gap. I’m working also in collaboration with ISI Foundation at the SocioPatterns project, an example of research use case that currently uses Gephi for exploratory data analysis and visualization. The support for dynamic networks, the readiness of the Gephi data model for dynamical update of graph topology and attributes and, in a near future, the support for graph streaming are exciting features that suit very well the large-scale real-time data sources we are dealing with. The potential for processing live streams from our experiments is a unique feature that we are eager to see working.

André Panisson