The GSoC rocket launched

GSoC2011 The Google Summer of Code 2011 is officially launched! Results have been announced by Google. Congratulations to the students that are joining Gephi project:

  • Daniel Bernardes – Timeline player and movie creation
  • Ernesto Aneiros – Indexing Attributes API with Lucene
  • Keheliya Gallaba – Automated build & Maven
  • Luiz Ribeiro – Scripting plug-in
  • Urban Škudnik – Network visualization with WebGL
  • Vojtech Bardiovsky – New Visualization Engine
  • Yudi Xue – Preview Refactoring

You put a lot of attention on doing the bests applications and demonstrate great motivation in addition to strong technical skills. We are very excited to work with you guys!

This year we are also honored to count on world-class researchers in the mentoring team: Eytan Adar and Christian Tominski! Eytan is an Assistant Professor of Information and Computer Science at the University of Michigan, and is well-known for the graph exploration software GUESS. Christian is a Lecturer and Researcher at the Institute for Computer Science at the University of Rostock. He has authored and co-authored several articles in the field of information visualization.

The Summer Timeline:

* Until May 23: Students get to know mentors, read documentation, get up to speed to begin working on their projects.
* May 23: Students starts to code
* July 15: Mid-term evaluation
* August 22: Pencils down

Google Summer of Code 2011

GSoC2011 It’s a great news, Gephi has been accepted again for Google Summer of Code. The program is the best way for students around the world to start contributing to an open-source project. The 2009 and 2010 editions were a great success and dramatically boosted Gephi’s project development.

What is Gephi?

If you look around you, you may notice that networks are everywhere. For instance, social networks, relationships among people or computer networks, links between computers. Transportations routes, power grids, emails networks or the relations between scientific papers are other examples of networks. The ability to analyze, manipulate and represent a network is a key-feature for solving difficult problems and boost knowledge discovery in many disciplines.

Gephi’s project aims to bring the perfect tool for visualizing and manipulating networks. We focus on usability, performance and modularity:

  • Usability: Easy to install, an UI without scripts and real-time manipulation.
  • Performance: Visualization engine and data structures are built scalable. Supporting always-larger graphs is an endless challenge!
  • Modularity: Extensible software architecture, built on top of Netbeans Platform. Add plug-ins with ease.

Learn more about Gephi, watch Introducing Gephi 0.7, download and try it by following Quick Start Tutorial.

Gephi’s project is recent, the growing community is composed of a mixture of engineers and research scientists involved in network science, datavis and complex networks.

List of ideas

List of ideas are availabe on our wiki. They cover various skills and level of difficulties:

* Preview_RefactoringSimplify and modularize the Preview architecture
* Web-based network visualization with WebGLStart a new project by developing an efficient network visualization library for the web using WebGL
* Timeline player and movie creation Add a ‘Play’ feature to the timeline component and create animated network movies.
* New Visualization EngineDevelop the new visualization engine, add interMake the new visualization engine using Shaders on GPU, and aims to release a feature-complete version
* Indexed Attributes API using LuceneAdd index support to Gephi attributes system
* Scripting GephiDevelop a scripting language and a console plug-in for Gephi
* Automated build & MavenChoose and create a deployment server to generate releases automatically

You can also propose your ideas, please post on this forum topic. They will be considered and discussed by the community. Have a look on our long-term Roadmap.

Students, join the network

Students, apply now for Gephi proposals. Come to the GSOC forum section and say Hi! to this topic. The fill in and follow the questionnaire. Be careful, deadline is April 8 (timeline)!

Helder Suzuki, student in 2009 wrote:
At Gephi students will have the opportunity to produce high impact work on a rapidly growing area and be noted for it.

Have a look to 2009 pages and Helder’s interview.

Follow gephi on Twitter

Gephi has a Student Program

mini-300x260 The Gephi project now has a Student Program. The idea is simple: propose various tasks adapted to computer science students and help them to create Gephi plug-ins. Inspired by the Google Summer of Code and our own experience as students, the program offers original ideas and a real open-source experience.

Computer science students are usually asked to complete many small projects during their studies, often by group. We want to propose original projects, which the result can easily be packaged as a plug-in and shared. Graphs and networks are at the heart in variety of problems, including Graph Theory, Sociology, Data Mining, Statistics or HCI (Human-Computer Interaction). The ability to visualize and manipulate graph structures in a simple way is central to understand these problems. Moreover, what a best way to learn graph theory than code on of its algorithm?

The wiki page will maintain an (open) list of ideas and projects. For each student interested, we will provide some help to get things done and promote his project. We are also thinking about free tee-shirts for all students.

GSoC mid-term: Shader Engine

primo_piano-100x100

My name is Antonio Patriarca. The aim of my Shader Engine project is to implement a new visualization engine in Gephi based on modern graphics card capabilities. The current one is in fact designed to be very portable and only uses legacy features which are not optimal on modern hardware.

The OpenGL API, which Gephi uses, has changed a lot in the last years to follows the evolution of the hardware. Several parts of the first versions are now considered deprecated and they are difficult to implement efficiently in the drivers. For this reason, it is often necessary to redesign the graphics engines based on old versions to get the best from modern hardware.

In the old days, graphics primitives were rendered using a fixed pipeline implemented in hardware. The programmers had to set several states in order to control the rendering logic. Several tricks has to be invented to achieve not supported effects. At each new OpenGL version, several new states were introduced to give more freedom and power to the users and the pipeline soon became very complex and difficult to manage. This is how the current Gephi visualization engine is still implemented.

Inspired by the success of the RenderMan Shading Language, graphics card manufactures finally introduced some programmable stages to the graphics pipeline. These stages give the ability to precisely and easily define the rendering logic without taking care of a large number of states. At first, the only way to implement the programs, called shaders, running in these programmable stages was using an assembly language. But the GLSL language was soon designed to simplify shader implementation in OpenGL and other similar high level languages were introduced in the other APIs as well. The number of the programmable stages of the graphics pipeline are increased since their first introduction and modern GPUs are now often used as general purpose stream processors. The new visualization engine will use the shaders to render the nodes and edges of the graph in a more efficient way, while achieving an higher quality image.

gpu-300x296

Current Gephi Visualization Module issues

It is useful to discuss the problems of the current architecture to better understand the new engine design. The current Gephi Visualization Module was designed to be very portable and only use features available in the first OpenGL version. It is therefore based on the fixed pipeline and it sends the geometry to the GPU using the immediate mode or display lists. The new engine will still support legacy hardware, but it will also includes a more modern pipeline. Some issue will be only solved in the new pipeline.

Each renderable object (nodes, edges and labels) implements a common interface which take care of how it is rendered and responds to changes in the graph. In the rendering loop, the engine iterates over each edge, node and label and calls the corresponding method to render it. This is not optimal on modern graphics cards for several reasons.
A modern GPU is composed by several cores which runs the same program (they may execute different parts of it though) in parallel. Each time a state change, the GPU may stall waiting for each core to terminate before updating the state. In the current design, several states change between consecutive renderable objects causing the GPU to be idle most of the time. Moreover, each renderable object is composed by a small number of polygons and it is unable to use all the available cores. The best way to render objects using a modern graphics card is to sort them by state change and render them in batches. This is the strategy which will be implemented in the new visualization engine.

The OpenGL API is optimized to render 3D polygonal meshes. Therefore, the only ways to draw vector objects were to approximate them using polygons or textures. They both have issues when the objects are bigger enough which can be solved using additional memory. The current engine draw circles and spheres using a polygonal approximation. Shaders gives to the programmers a lot of additional freedom and it is now possible to draw general objects rendering a single polygon. The shapes generated using shaders looks good regardless of the size of the objects. The engine will use this method where possible.

New Visualization Module design

node_new-100x100The new visualization engine will be composed by three parts: the viewer, the controller and the data manager. The viewer controls the drawable object and the main rendering loop. The controller updates both the camera and the selection area and handles all the window and mouse events. The data manager updates the graph information in the engine and decides what to draw each frame. Each part of the engine runs in a different thread. Therefore, it shouldn’t freeze when the graph is modified as it sometimes happen in current engine.

The rendering system of the new visualization engine will use concrete and immutable representations of the renderable objects. Each data manager frame, after the graph has been updated, each node, edge and label data is inserted in a render batch which is then inserted in the specific render queue. All the render queues (one for each renderable object type) are sent with the current camera to the Viewer for rendering. The Viewer then updates the current render queues and wait the following display method call.

The rendering logic for each renderable object will be defined in the renderers. Each renderer will be only able to render a specific type of object and it will render the entire render queue. There will be no way to directly render a single renderable object in the new visualization engine. The viewer will maintain a single renderer for each object type and it will then use them to render the current render queues. Two renderers will be implemented for each renderable object (one using the OpenGL 2.0 version and one using the OpenGL 1.2 version) and the viewer will decide what renderer to use based on the installed graphics card and user preferences. A detailed description of all the renderers will be soon published in the specification.

New features

The majority of the changes in the new visualization engine will be invisible to the user, but there will be also some new features. The most important ones are the following (some of them will also requires some works in the other Gephi modules):

  • Different node shapes for each node. It will be possible to define a different shape for each node and therefore use node shapes to distinguish between the different groups of nodes.
  • Images as node shape. It will be possible to load images to use instead of the predefined node shapes.
  • Wider choice of 2D node shape. Several node shapes will be supported in addition to circles and rectangles.
  • Starting and ending color for edges. It will be possible to define a different color for the starting and ending part of the edges and use gradients to define edge directions.
  • Clusters implemented as halos or depth-aware borders around nodes. Clusters will be supported rendering additional halos or borders of increasing depth of the cluster color around each node in the cluster.
  • Per primitive anti aliasing (PPAA). Shaders allows the implementation of an anti aliasing strategy independent to multisampling. The new engine will implement these strategies to achieve a better image quality. In the figure above it is possible to observe the PPAA technique applied to the transition between the inner part and the border of a node in an old prototype without multisampling. The effect is slightly exaggerated.

Current status and future work

The engine is currently implemented as a standalone application independent of the rest of Gephi. The engine has been implemented in this way to be able to focus on the most important parts of the engine from the start. Gephi wasn’t in fact independent enough from the current visualization engine to immediately substitutes it with the new one. When the project will terminate, the standalone application will not have all the desired features and it will not be ready to be included in Gephi. The basic rendering system logic and the general engine architecture will be probably implemented in time though.

When the GSoC project will terminate the interaction layer with Gephi will be implemented and some parts of the code will be adapted to work with it. The project will be rewritten as a Gephi module and the required interfaces with the rest of the application will be implemented.

GSoC 2010 mid-term: Adding support for Neo4j in Gephi

Martin Škurla

During this summer, six students are working on Gephi with the Google Summer of Code. They contribute to Gephi by developing new features that will be integrated in the 0.8 version, released later this year.

My name is Martin Škurla and this summer I was working on GSoC project called “Adding support for Neo4j in Gephi”. In this article we will look at implemented features including these under the hood, pictures of dialogs, common use cases and future plans.

 

Gephi project

At first I want to make quick introduction into Gephi project. Gephi is Open Source Visualization Platform build on top of the NetBeans platform. It is written in Java so you can run it on various Operating Systems including Windows, Linux, Mac OS. It supports many interesting graph analysis capabilities including:

  • Real-time visualization
  • Layout
  • Metrics
  • Dynamic network analysis
  • Cartography clustering and hierarchical graphs
  • Dynamic filtering

The story so far

The main idea of my project is to add support for Neo4j in Gephi. This means the ability to transform the Neo4j graph into Gephi graph. In fact, both graph models are different so the first task was to make mapping between Neo4j graph items and Gephi graph items and vice versa.

There was also a mismatch between types supported in Neo4j and these supported by Gephi. This mismatch was solved by adding new “List” types into Gephi, so now every type in Neo4j has its appropriate type in Gephi.

There were also some changes under the hood which are not visible to end user, but must be defined and implemented. The most interesting thing is adding “Delegating mechanism”. This mechanism is responsible for getting values from storing engine (Neo4j) as well as manipulation with data. In fact during the importing process, graph representation of Neo4j graph is created in Gephi, but all values are not stored directly, but they are queried using delegating mechanism.

Another minor tasks were to customize the open dialogs used for importing local Neo4j database and debugging the imported database. The open dialog for importing accepts only valid Neo4j database directories. I defined valid Neo4j database directory structure and every valid directory now includes picture of Neo4j in the open dialog. User is able to open only valid Neo4j directories in the process of importing. The open dialog for debugging accepts only Java class files that can be used for debugging process. This simply means they have to implement required interface and have public nonparam constructor. Every valid class file will have Neo4j picture and after selecting a valid debug file, Target and Visualization options will be automatically filled based on data from selected class file.

 

Open Neo4j directory dialog customization

Open Neo4j debug file dialog customization

 

Neo4j integration

Menu integration

All possible actions started in menu. As we can see, this is the entry point to import from, export to and debug the Neo4j graph. Both importing and exporting support local as well as remote Neo4j databases.

Importing

Whole graph import dialog

Importing process consist of 2 approaches:

  • whole import
  • traversal import

Whole graph import dialog is designed for importing whole graph. We can customize the rules responsible for returning nodes by defining filtering expressions. For example previous dialog can be used when we want to find all people working on project Gephi with maximum age 30 years. Only people with at least 5 years of experience and those which have driver licence types A, B and C will be included.

Let’s have a deeper look at the dialog:

  • Property key is the name of property we want to filter
  • Property value is the value which will be compared to actual Node property value using chosen operator. Values will be automatically converted into appropriate types and if the value cannot be converted, the node will not be included into graph. All types supported in Neo4j are supported in this dialog. We can also see the support for array types in the last filter expression.
  • Operator will be applied on the final expression and if the expression is evaluated to true, node will be included
  • Match case means the ability to compare String, char, String[] and char[] types with respect of the same case
  • Restrict mode is used to restrict some nodes. Imagine we have people stored in database which have only subset of required property names used in filtering expressions. If the Restrict mode is on, only nodes which have all property names and all filtering expressions evaluated to true will be included. If the Restrict mode is off, every node which has any subset of required property names (even empty subset) will be included if all the filtering expressions applicable to the subset will be evaluated to true.

All the filtering expressions are combined together using AND and the list of current supported operators consist of: ==, !=, <, <=, >, >=.

In fact, usefulness of adding new operators as well as including OR and other useful import options is the main idea behind Questionnaire which is part of this article.

Traversal graph import dialog is designed for importing any subgraph using traversal capabilities of Neo4j v 1.1. Traversal import adds additional options:

  • Start node can be set in two ways, either by its id or by its indexing key and value pair
  • Order can be set to depth or breadth first algorithms
  • Max depth can be set to concrete number or to end of graph
  • Relationships can be restricted too. We can set any combination of Relationship types and directions which should traversal include. The list of Relationship types is dynamically filled from database with existing values.

 

Traversal graph import dialog

This was the quick summary of Gephi Neo4j importing capabilities implemented in the project. We focused on more features and one of them is the support for exporting. We can export any loaded graph into local or remote Neo4j database. The exporting process can be customized in similar way as importing.

Exporting

Export dialog

Exporting means opposite process to importing. Previous dialog shows exporting options as well as validation. We can customize exporting process by setting:

  • From column is used to set the RelationshipType to appropriate values from any of Gephi edge columns. During importing Neo4j graph, column with name “Neo4j Relationship Type” is automatically created.
  • Default value is used in the case when processed Gephi edge does not have value in selected From column
  • Export Node columns is the set of Gephi columns in node table which will be exported
  • Export Edge columns is the set of Gephi columns in edge table which will be exported

Remote importing/exporting

The only difference between local and remote importing/exporting is the existence of Remote dialog, where we need to set following connection information:

  • Remote database URL
  • Login
  • Password

All of them must be filled in order to successfully import/export remote graph.

Remote import/export dialog

Delegation process

Nodes values exploration (click on the image to enlarge)

As we can see from previous picture, we can very simply explore all the node and edge values. This is exactly the place where delegating mechanism is used. All values are in fact not stored directly in memory in some kind of Gephi data structure, but the storing engine (Neo4j) is requested for actual values every time we need them.

Debugging

Debugging in action

We can see debugging in action in previous picture. The dialog is initialized with data from chosen debug class file, but we can change all of them at the runtime too. Any change in options will automatically update graph visualization. We can change visibility of nodes and edges as well as colors for both nodes and edges. User proceeds to next step of debugging/traversal by clicking on the Next button.

Use cases

That was the quick summary of all implemented features and now we can summarize common use cases every user can be interested in.

Visualizing Neo4j graphs

One of the main ideas of my project was to implement the ability to visualize Neo4j graphs, even big ones. As we saw from the dialog pictures, we have many options how to customize the importing process including filtering. After the import we can use all the rich graph analysis features Gephi provides.

Analyzing only part of the whole graphs

Quite common use case is to analyze only part of the graph, which is possible in Gephi too. We can take advantage of traversing where we can set starting node and other traversal options. After that we can visualize and analyze only part of the graph.

Export graph stored in text files/databases into Neo4j

Another use case could be exporting graphs stored in graph text files or relational databases into Neo4j. In fact, every graph loaded into Gephi can be easily exported to Neo4j database. Importing formats depends on Gephi abilities themselves, currently following formats are supported:

  • Text formats: GEXF, GDF, GML, GraphML, Pajek NET, GraphViz DOT, CSV, UCINET DL, Tulip TPL, XGMML
  • Relational databases: MySQL, PostgreSQL, SQL Server

Future plans

There are more things which we want to implement, including:

  • support for Gephi Toolkit, which is in general set of Gephi core libraries which you can use in your own Java projects for graph visualization and manipulation
  • implementing proof of concept Web application using both Gephi Toolkit & Neo4j to manipulate with Neo4j database & show results (probably using GWT)
  • more features, bug fixing, performance optimizations

Questionnaire

One of the big advantages of Gephi is the fact that it is developed as Open Source project. We want to add additional features according to user requests and their opinions. That’s why we created questionnaire focusing on usefulness of proposed additions. We will be very happy if you fill the questionnaire because it is very valuable source of information and we can focus on features Neo4j users think useful. Please fill in the questionnaire.

Conclusion

I am very happy that I can be part of the Gephi developer community and introduce integration with Neo4j. During this summer I learned a lot and I am proud that I was chosen as GSoC student. The fact is that none of these features can be done without great help of my mentors, so big thank to both of them: Mathieu Bastian & Tobias Ivarsson.

If you are interested in and want to test the code, you can download source codes from my branch using bzr branch lp:~bujacik/gephi/support-for-neo4j

All the pictures were made on data stored in testing Neo4j database which can be created using Java SE project and you can download it using:
bzr branch lp:~bujacik/+junk/testing-new-neo4j-traversal-api

 

Martin Škurla

Download this article in PDF.

GSoC 2010 mid-term: Dynamic attributes and statistics

Cezary Bartosiak

During this summer, six students are working on Gephi with the Google Summer of Code. They contribute to Gephi by developing new features that will be integrated in the 0.8 version, released later this year.

 

The project which is done by Cezary Bartosiak focuses special attention on further development of dynamic network analysis (DNA) in Gephi. The aim is to create a framework which would make it possible to build and query a dynamic graph with use of proper API. It has got a practical purpose, for instance analyzing evolution of networks (see in particular M. Argollo de Menezes, A.-L. Barabási Fluctuations in Network Dynamics) or dynamic networks visualization. The article shows the most important features provided by this GSoC project.

 

In the current 0.7 version we can import dynamic graphs written in GEXF syntax and then filter them using Timeline component. Unfortunately, it only filters graphs topologies and that means hiding nodes and/or edges.

The obvious step is make it possible to handle dynamic changes not only of graph topology but also attributes connected with nodes and edges. It can be done by creating a proper API. This API could be used by other modules, like Statistics to make dynamic versions of them. Computing metrics like Degree Distribution or Clustering Coefficient for each time interval in the time series has got a great interest to analyze graphs within time.

So, getting down to brass tacks, the most important tasks are:

  • A data structure to host dynamic attributes efficiently which would make it possible to present them in Data Laboratory module.
  • A Dynamic API which has got the following features: the Dynamic Graph Decorator, that wraps the graph and a time interval, returns static graphs copies for given time intervals, attributes values arrays for given nodes/edges and time intervals.
  • Adapting Metrics framework to use Dynamic API to propose dynamic versions of existing metrics.

There are also additional features, which will be done in the future (probably they will not be included in the nearest release):

  • Dynamic visualization of attributes.
  • Dynamic version of the Ranking module – dynamic visualization attributes transformation.

I’ll try to shortly describe how these features are done.

Dynamic attributes

It is a very interesting task from a programmer’s point of view since it requires implementing a complicated data structure like Interval Tree (see also Antoine Vigneron – Segment trees and interval trees). But also users will judge it necessary. The purpose is to make it possible to read dynamic attributes from GEXF files and host them efficiently. Thanks to that we are able to get values of attributes of different time intervals. It goes without saying how powerful feature it is. To show how it is working, let’s consider one node (written in GEXF syntax):

<node id="1" label="Some node">
<attvalues>
<attvalue for="0" value="abcdefgh"/>
<attvalue for="2" value="1" end="2009-03-01"/>
<attvalue for="2" value="2" start="2009-03-01" end="2009-03-10"/>
<attvalue for="2" value="1" start="2009-03-10"/>
</attvalues>
</node>

As we can see we have got one dynamic attribute (id = 2) which has three different values in different time intervals. The first interval starts in the “negative infinity”. We simply assume that it only ends, never starts. But if we have got some bounds, for instance, a related graph has its start and end times, this attribute would “start” in the same moment as the graph. It is rather intuitive. The second interval exists from 2009-03-01 to 2009-03-10 and the last one exists from 2009-03-10 to “positive infinity” or graph’s bound.

After importing this to Gephi we can simply get values of ANY time interval we want, for example [-inf, +inf]. But we should know how to estimate a final value. In the above example we have got three values: 1, 2 and 1. To solve the problem which of them should be returned, we provide a set of estimators like AVERAGE, MEDIAN, MODE, SUM, MIN, MAX, FIRST and LAST. Each of them has got different behavior that depends on a type of attribute, i.e. for real numbers they behave like in statistics.

So, users will be able to get values of different time intervals on demand, for instance in Data Laboratory module or (in the future) see them on the screen as a part of a rendered graph. For instance we have got some attribute like priority. A potential user will be able to choose between several possibilities like: nothing (it means this attribute should not be visualized), color, stroke, thickness etc. It means, for instance, that if some node has got this attribute close to its upper bound its stroke thickness would be very high. And, on the other hand, if one node has got this attribute close to its lower bound only its internal color could be visualized.

Metrics framework

For now it is possible to count a set of important metrics but all of them take a “static graph” into consideration. The idea of dynamic metrics is then to execute the static ones in a loop, where the graph changes according to time interval. The following screen shows that use of these additional metrics is similar to their static brothers:

Dynamic Metric (click on the image)

In the screen we can see only Dynamic Degree Power Law, but of course every dynamic metric will be implemented (during writing this article this module was still under development – it also means that the final product could differ from this one presented above). So, user inserts important information like time interval etc. and gets a separate report for every time interval. What are the other results?
The result for each node/edge is written in the graph, so one can see this in Data Laboratory.
General result is also written and presented in the report.

Conclusion

Evolution of networks, network dynamics and dynamic network analysis are hot topics nowadays. There is growing interest in studying these issues. It causes that there is bigger and bigger need of DNA analysis tools. In my opinion Gephi is heading towards being one of the best…

Cezary Bartosiak

GSoC 2010 mid-term: Graph Streaming API

andre-panisson

During this summer, six students are working on Gephi with the Google Summer of Code. They contribute to Gephi by developing new features that will be integrated in the 0.8 version, released later this year.

The purpose of the Graph Streaming API project, run by André Panisson, is to build a unified framework for streaming graph objects. Gephi’s data structure and visualization engine has been built with the idea that a graph is not static and might change continuously. By connecting Gephi with external data-sources, we leverage its power to visualize and monitor complex systems or enterprise data in real-time. Moreover, the idea of streaming graph data goes beyond Gephi, and a unified and standardized API could bring interoperability with other available tools for graph and network analysis, as they could start to interoperate with other tools in a distributed and cooperative fashion.

 

With the increasing level of connectivity and cooperation between systems, for a system that aim to be interoperable, it is imperative to comply with the available standards. Graph objects are abstractions that can represent a wide range of real-world structures, from computer networks to human interactions, and there are a lot of standards to exchange graph data in different formats, from text-based formats to xml-based formats. But the real-world structures are constantly changing, and the current formats are not suitable to exchange such type of dynamic data.

A lot of well-established systems already stream data to its users using a streaming API. Twitter for example defined a Streaming API to allow near-realtime access to its data. They are using two different formats: XML and JSON, but JSON is strongly encouraged over XML, as JSON is more compact and parsing is greatly simplified.

We are not the first to implement a Graph Streaming API, and another very interesting experience is the GraphStream Java Library. It is composed of an API that gives a way to add edges and nodes in a graph and make them evolve. The graphs are composed of nodes and edges that can appear, disappear or be modified, and these operations are called events. The sequence of operations that occur in a graph is seen as a stream of events.

So, as other people already had successful experiences with graph streaming, why not start our work based on these experiences? That’s what we are doing, and beyond finding these experiences very useful, we are also trying to be compatible with the available work. The first Gephi Graph Streaming release will use two formats: JSON for flexibility, and a text-based format, based in the GraphStream implementation.

The first version of the Graph Streaming features will be available in the next release of Gephi, but it’s already possible to taste some of these features. To illustrate how simple it will be to connect to a master, the following video shows Gephi connecting to a master and visualizing the received graph data in real time. The graph in this demo is a part of the Amazon.com library, where the nodes represent books and the edges represent their similarities. For each book, a node is added, the similar books are explored, adding the similar ones as nodes and the similarity as an edge.

 

 

The Graph Streaming specification goes beyond the simple fact that a client can pull data from a master: in fact, clients can interact with the master pushing data to it, in a REST architecture. The same data format used by the master to send graph events to the clients is used by clients to interact with the master.

In the next example, we will transform Gephi in a master to provide graph information to its clients. At the Streaming Tab in the Gephi application, you can access all the features of graph streaming. You can connect to a Master by clicking the ‘+’ button, but you can also transform your Gephi in a master by right-clicking the “Master Server” and selecting “Start” (You are not limited to a single master by host: each Gephi workspace can be available as a master). By default, the HTTP server will listen at port 8080 in plain HTTP, and at port 8443 using SSL. The server path depends on your workspace: each workspace uses a different path. You can configure these parameters (and also Basic Authentication) at the “Settings…” button:

 

Graph Steaming Server start

Graph Steaming Settings Panel

 

Now, you can connect to it using some simple HTTP client. For example, you could use curl to see the data flowing. First of all, open a shell window and execute the following command:

curl "http://localhost:8080/workspace0"

With this, you are connecting to your workspace at Gephi. If the workspace is empty, you will receive no data, but you will remain connected, so you will receive all events from now.

Now open another shell prompt, and with the following commands, you could see a triangle appearing at Gephi:

curl "http://localhost:8080/workspace0?operation=updateGraph" -d $'
{"an":{"A":{"size":10,"r":1,"g":0,"b":0,"z":0,"y":500,"x":70}}}r
{"an":{"B":{"size":10,"r":1,"g":0,"b":0,"z":0,"y":90,"x":250}}}r
{"ae":{"AB":{"source":"A","target":"B","weight":10,"r":0,"g":0,"b":0,"directed":false}}}r
{"an":{"C":{"size":10,"r":1,"g":0,"b":0,"z":0,"y":90,"x":-90}}}r
{"ae":{"BC":{"source":"B","target":"C","weight":10,"r":0,"g":0,"b":0,"directed":false}}}r
{"ae":{"CA":{"source":"C","target":"A","weight":10,"r":0,"g":0,"b":0,"directed":false}}}'

At the same time, all events will be sent to your connected client, in the other shell window.

With the following commands you can retrieve some of the data:

curl "http://localhost:8080/workspace0?operation=getNode&id=A"
curl "http://localhost:8080/workspace0?operation=getEdge&id=AB"

And you could start manipulating your graph through command line, as you like. There are other event types for changing and removing edges and nodes, for more information about them see the current status of the JSON Streaming Format, available at this page. We recall that this format is subject to changes, as the API was build to be very flexible and more requirements are being added to it.

But what about connecting two different Gephi instances together? One instance will be master, and the other client. Using the Graph Streaming API, a change in a graph at the master’s workspace should cause a change in the client’s workspace, and a change at the client’s workspace will cause it to send requests to the master to update its graph accordingly. Both instances working in a distributed mode. In fact, different people could work in a distributed mode to construct a graph: it’s the Collaborative Graph Construction.

My personal impressions about it

For me as a researcher, Gephi has the potential to become a de-facto standard for manipulating and visualizing large scale graphs. I believe that the research community is still lacking a high-quality, general-purpose, community-supported framework for exploratory analysis of large-scale dynamical graph data, and I believe that Gephi has the potential to fill this gap. I’m working also in collaboration with ISI Foundation at the SocioPatterns project, an example of research use case that currently uses Gephi for exploratory data analysis and visualization. The support for dynamic networks, the readiness of the Gephi data model for dynamical update of graph topology and attributes and, in a near future, the support for graph streaming are exciting features that suit very well the large-scale real-time data sources we are dealing with. The potential for processing live streams from our experiments is a unique feature that we are eager to see working.

André Panisson