GSoC: interconnect Gephi Graph Streaming API and GraphStream

My name is Min WU and during this Google Summer of Code I have worked on the project to interconnect Gephi Graph Streaming API and the GraphStream library. My mentors are Yoann Pigné and André Panisson.

This project aims at interconnecting the GraphStream’s dynamic graph event model with Gephi in order to have Gephi to visualize an ongoing graph evolution and measurement. Based on this project, users can model and simulate complex systems with GraphStream while observing the output with the visual tools offered by Gephi.

GraphStream is written in Java. In order to use streams of graph events in other languages, GraphStream provides the NetStream framework, i.e. a network interface, such that other projects written in other language can use GraphStream. The NetStream framework consists of three parts, receiver, sender and the NetStream Protocol. The receiver is responsible for receiving graph events from the network and dispatching them to pipes. It works within only one thread, listening at a given address and port while receiving graph events from several streams, actually several threads or clients. The sender encodes graph events into messages according to the NetStream protocol and send them to a defined receiver with given port, host and stream ID. Every message contains sourceId, timeId and event context, among which the combination of sourceId and timeId is dedicated to distinguish between several streams and solve the synchronization issue. Finally the NetStream protocol specifies the message format at byte level.

Gephi also supports the idea of “streams of graph events”. It has a framework for graph streaming in Gephi plugin built by André Panisson during the 2010 GSoC, through a multi-threaded socket server. Other applications can push graph data to the Gephi server through the network, and have it visualized. In this graph streaming project, operations (a concept similar to event) are invoked through HTTP requests made by the client to the server, based on a JSON format.

Work done

In my project, I interconnected Gephi and GraphStream based on André’s Graph Streaming plugin. Since NetStream on GraphStream side works on NetStream protocol while Graph Streaming API on Gephi side works on JSON protocol, we have to make them compatible with each other. Considering the flexible interoperability and language agnostic properties, I have chosen the JSON protocol to do the interconnection and implement a sender part and a receiver part.

The sender part (JSONSender) is responsible for sending events from GraphStream to Gephi. GrpahStream works as a client and Gephi works as a server. Every time the graph in GraphStream changes, a corresponding event is sent to Gephi. Gephi handles the event and changes its own graph. In this way, the sender part works as a sink of the GraphStream graph, so it must implement the sink interface which contains methods to deal with graph element events and attribute events. In each method, we first encode the event message into a JSON string, and then send it to Gephi. We connect to Gephi and use “updateGraph” operation to send events. The corresponding URL is “http://host:port/workspace?operation=updateGraph”. The host and port must match with the Gephi sever and the workspace is a destination workspace of Gephi, for example an URL can be “http://127.0.0.1:8080/workspace0?operation=updateGraph”. The Gephi server and client are built with the “Graph Streaming API ” in the Gephi-plugin.

The receiver part (JSONReceiver) is responsible for receiving events from Gephi. It listens to Gephi and waits for events. Every time the graph in the Gephi changes, a corresponding event will be send to GraphStream. Then the GraphStream handles the event and changes its graph object. In this way, the receiver part works as a source of the GraphStream graph. In order to listen to Gephi events, we use a URL within “getGraph” operation to connect to Gephi. The corresponding URL is “http://host:port/workspace0?operation=getGraph”.

With these two classes, we can interconnect GraphStream and Gephi in real-time. Two tutorials are given to show how to do real-time connection between GraphStream and Gephi, see the video below. If you are interested in the detail implementation, please refer to the manual page.

The first class is GraphSender, which aims at loading a graph in GraphStream and dynamically displays it on a Gephi workspace. We need to create a graph instance and a JSONSender instance, and plug the JSONSender instance as a sink of the graph instance. Since then, when we generate the graph, or load the graph from a file, Gephi will display it in real-time.

The other class is LinLogLayoutReceiver. The Lin-Log layout in GraphStream is dedicated to find communities in graphs. This tutorial shows the execution of a Lin-Log layout in GraphStream and the sending of the layout information to Gephi in real time. We first load a graph in Gephi, display it and apply some algorithms. Then we send the graph to GraphStream and apply the Lin-Log layout on the graph on the GraphStream side. Meanwhile we visualize the layout process on the Gephi side in real time. To achieve it, we create a graph instance and a JSONReceiver instance, and then get the ThreadProxyPipe instance and plug the graph instance as an ElementSink of the pipe instance. Then we apply the Lin-Log layout, and create a new thread in which to create a JSONSender instance and plugin it as a sink of the graph layout.

Distribution

This project is distributed under MIT license. You can refer to the code on Github. By the way, I feel very appreciative for my mentors’ supervision. Thank you very much!

The HTTP Graph plugin

The HTTP Graph plugin provides real-time collection and visualization of HTTP traffic. Using the embeddable Membrane Router, details are extracted from the transaction headers and fed to Gephi for graphing and further analysis. This approach makes the relationships between clients, servers, and resources easily visible.


See the video in HD on Vimeo.

Nodes

There are 4 types of nodes: client, uri, host, domain.

Client: By default, the largest sized nodes with the source IP addresses of clients for labels. If you are the only one pointing to the plugin’s proxy, there will probably be only one of these nodes that says 127.0.0.1. Clients are linked to a domain node of ‘local’ to keep them together on the graph. Another function of the client node is to keep the graph anchored. You may find it interesting to use a filter in Gephi to hide the client type nodes to see a more “free-form” graph of the internet. If you do this, you may see large pieces float away because they didn’t link to the rest of the graph anymore!

URI: By default, the smallest sized nodes with no visible labels. These represent resources like HTML pages, images, javascript, or whatever other resources might be requested through the proxy. The size in bytes and the MIME (Content-Type) reported by the host when returning the resource is available so you can see what it is.

Host: For a given domain (.gephi.org, .google.com, etc.) there can be multiple hosts which serve the different resources. In some cases, you may see the same resource being served from multiple servers in a DNS-based load balancing system. Other interesting details about the underlying architecture of the sites you are viewing can be seen.

Domain: These nodes exist primarily to keep the related hosts close together on the graph. You may want to use a filter in Gephi for this type of node and hide them to see a different perspective.

Edges

HTTP and the web are defined by links, which are essentially directed graph edges, and these occur at the resource level. An HTML page resource will link to CSS, image, and other file resources, both on the same domain, and on remote domains. These inter-domain links are the glue that forms the structure of the world wide web.

Have fun!

~by phreakocious

Get the HTTP Graph plugin on the Plugins Center, or in Gephi go to Tools > Plugins > Available plugins.

The Egyptian Revolution on Twitter

This is a preliminary result of the network of retweets with the hashtag #jan25 at February 11 2011, at the time of the announcement of Mubarak’s resignation. If you retweeted someone, or has been retweeted, it is possible that your username is one of these tinny points (or maybe a bigger one?).

To collect the network data, I used the Gephi Graph Streaming plugin, connected it to a Python web server I made myself. This web server works like a bridge, it connects to the Twitter Streaming API using the statuses/filter service and converts the users and retweets to nodes and edges in a network format that can be read by the Gephi Graph Streaming plugin. Nodes are twitter users, and links appear between the nodes A and B when B retweeted a message of A containing the hashtag #jan25.

The static network visualization is just the final result of about one hour of data collection. It is a dynamic network, and it’s possible to get much more information from the collected data. For example, before the announcement, there were few nodes and edges, sparse in time. But when the announcement arrives, a boom of retweets appears on the network. A video with the flow of retweets is available on YouTube. It shows the dynamic network construction during the hour of data collection, compacted in less than four minutes. During the collection, I run Gephi with the Force Atlas layout just adjusting some parameters from default: repulsion strength to 2000, attraction strength to 0.3 and speed to 10.



I was very lucky to get this data. On February 11 afternoon I was testing the Python server that works as bridge and connected to Twitter. I tried some interesting hashtags to see it working, and at the moment #jan25 seemed to be an active hashtag. I let the application run for some time, adjusted some parameters for visualization, and at some point there was a burst in the activity. I didn’t understood what was happening until I checked again my Twitter account and realized that the Egypt’s vice-president had just made the resignation announcement. After it, I proceeded collecting data, and the final result was this network. It was very interesting to see, in real time, the exact moment when Tahrir Square, from a mass protest demonstration, has been transformed in a giant party, and the burst in the Twitter’s activity. It was like covering in real time a virtual event, a big event that was happening in the Twitter virtual world.

After playing with the data, I found that the data I got through the Twitter Streaming API is only approximately 10% of the total. I’m now working to recover all data and hopeful soon I can make available the full graph of retweets.

Dataset available in a GEXF file here. Download it and play with it with Gephi!

André Panisson / www.

—–
This work is part of a research project involving the Computer Science Department of the University of Turin (www.di.unito.it), the Complex Networks and Systems Group of the ISI Foundation (www.isi.it), and the Informatics department of Indiana University (http://cnets.indiana.edu/).
—-

/seadragon-samples/twitter_jan25/seadragon.html

GSoC 2010 mid-term: Graph Streaming API

andre-panisson

During this summer, six students are working on Gephi with the Google Summer of Code. They contribute to Gephi by developing new features that will be integrated in the 0.8 version, released later this year.

The purpose of the Graph Streaming API project, run by André Panisson, is to build a unified framework for streaming graph objects. Gephi’s data structure and visualization engine has been built with the idea that a graph is not static and might change continuously. By connecting Gephi with external data-sources, we leverage its power to visualize and monitor complex systems or enterprise data in real-time. Moreover, the idea of streaming graph data goes beyond Gephi, and a unified and standardized API could bring interoperability with other available tools for graph and network analysis, as they could start to interoperate with other tools in a distributed and cooperative fashion.

 

With the increasing level of connectivity and cooperation between systems, for a system that aim to be interoperable, it is imperative to comply with the available standards. Graph objects are abstractions that can represent a wide range of real-world structures, from computer networks to human interactions, and there are a lot of standards to exchange graph data in different formats, from text-based formats to xml-based formats. But the real-world structures are constantly changing, and the current formats are not suitable to exchange such type of dynamic data.

A lot of well-established systems already stream data to its users using a streaming API. Twitter for example defined a Streaming API to allow near-realtime access to its data. They are using two different formats: XML and JSON, but JSON is strongly encouraged over XML, as JSON is more compact and parsing is greatly simplified.

We are not the first to implement a Graph Streaming API, and another very interesting experience is the GraphStream Java Library. It is composed of an API that gives a way to add edges and nodes in a graph and make them evolve. The graphs are composed of nodes and edges that can appear, disappear or be modified, and these operations are called events. The sequence of operations that occur in a graph is seen as a stream of events.

So, as other people already had successful experiences with graph streaming, why not start our work based on these experiences? That’s what we are doing, and beyond finding these experiences very useful, we are also trying to be compatible with the available work. The first Gephi Graph Streaming release will use two formats: JSON for flexibility, and a text-based format, based in the GraphStream implementation.

The first version of the Graph Streaming features will be available in the next release of Gephi, but it’s already possible to taste some of these features. To illustrate how simple it will be to connect to a master, the following video shows Gephi connecting to a master and visualizing the received graph data in real time. The graph in this demo is a part of the Amazon.com library, where the nodes represent books and the edges represent their similarities. For each book, a node is added, the similar books are explored, adding the similar ones as nodes and the similarity as an edge.

 

 

The Graph Streaming specification goes beyond the simple fact that a client can pull data from a master: in fact, clients can interact with the master pushing data to it, in a REST architecture. The same data format used by the master to send graph events to the clients is used by clients to interact with the master.

In the next example, we will transform Gephi in a master to provide graph information to its clients. At the Streaming Tab in the Gephi application, you can access all the features of graph streaming. You can connect to a Master by clicking the ‘+’ button, but you can also transform your Gephi in a master by right-clicking the “Master Server” and selecting “Start” (You are not limited to a single master by host: each Gephi workspace can be available as a master). By default, the HTTP server will listen at port 8080 in plain HTTP, and at port 8443 using SSL. The server path depends on your workspace: each workspace uses a different path. You can configure these parameters (and also Basic Authentication) at the “Settings…” button:

 

Graph Steaming Server start

Graph Steaming Settings Panel

 

Now, you can connect to it using some simple HTTP client. For example, you could use curl to see the data flowing. First of all, open a shell window and execute the following command:

curl "http://localhost:8080/workspace0"

With this, you are connecting to your workspace at Gephi. If the workspace is empty, you will receive no data, but you will remain connected, so you will receive all events from now.

Now open another shell prompt, and with the following commands, you could see a triangle appearing at Gephi:

curl "http://localhost:8080/workspace0?operation=updateGraph" -d $'
{"an":{"A":{"size":10,"r":1,"g":0,"b":0,"z":0,"y":500,"x":70}}}r
{"an":{"B":{"size":10,"r":1,"g":0,"b":0,"z":0,"y":90,"x":250}}}r
{"ae":{"AB":{"source":"A","target":"B","weight":10,"r":0,"g":0,"b":0,"directed":false}}}r
{"an":{"C":{"size":10,"r":1,"g":0,"b":0,"z":0,"y":90,"x":-90}}}r
{"ae":{"BC":{"source":"B","target":"C","weight":10,"r":0,"g":0,"b":0,"directed":false}}}r
{"ae":{"CA":{"source":"C","target":"A","weight":10,"r":0,"g":0,"b":0,"directed":false}}}'

At the same time, all events will be sent to your connected client, in the other shell window.

With the following commands you can retrieve some of the data:

curl "http://localhost:8080/workspace0?operation=getNode&id=A"
curl "http://localhost:8080/workspace0?operation=getEdge&id=AB"

And you could start manipulating your graph through command line, as you like. There are other event types for changing and removing edges and nodes, for more information about them see the current status of the JSON Streaming Format, available at this page. We recall that this format is subject to changes, as the API was build to be very flexible and more requirements are being added to it.

But what about connecting two different Gephi instances together? One instance will be master, and the other client. Using the Graph Streaming API, a change in a graph at the master’s workspace should cause a change in the client’s workspace, and a change at the client’s workspace will cause it to send requests to the master to update its graph accordingly. Both instances working in a distributed mode. In fact, different people could work in a distributed mode to construct a graph: it’s the Collaborative Graph Construction.

My personal impressions about it

For me as a researcher, Gephi has the potential to become a de-facto standard for manipulating and visualizing large scale graphs. I believe that the research community is still lacking a high-quality, general-purpose, community-supported framework for exploratory analysis of large-scale dynamical graph data, and I believe that Gephi has the potential to fill this gap. I’m working also in collaboration with ISI Foundation at the SocioPatterns project, an example of research use case that currently uses Gephi for exploratory data analysis and visualization. The support for dynamic networks, the readiness of the Gephi data model for dynamical update of graph topology and attributes and, in a near future, the support for graph streaming are exciting features that suit very well the large-scale real-time data sources we are dealing with. The potential for processing live streams from our experiments is a unique feature that we are eager to see working.

André Panisson