Google Summer of Code 2013

It’s a great news, Gephi has been accepted again for the Google Summer of Code for the 5th year! The program is the best way for students around the world to start contributing to an open-source project. Since 2009, each edition is a great success and dramatically boosted Gephi’s project development.

What is Gephi?

Networks are everywhere: email systems, financial transaction systems and gene-protein interaction networks are just a few examples. Gephi began as a university student project four years ago and has quickly become an open source software leader in the visualization and analysis of large networks. It is an important contribution to the ecosystem of tools used by researchers and big data analysts to explore and extract value from the deluge of relational data and disseminate a better understanding for people to think about a “connected” world.

Gephi is a “Photoshop” for graphs: designed to make data navigation and manipulation easy, it covers the entire process from data importing to aesthetics refinements and communication. Users interact with the visualization and manipulate structures, shapes and colors to reveal the properties of complex and messy data. The goal is to help data analysts make hypotheses and intuitively discover patterns or errors in large data collections.

Gephi’s project aims at providing the perfect tool to visualize and analyze networks. We focus on usability, performance and modularity:

  • Usability: Easy to install, an UI without scripts and real-time manipulation.
  • Performance: Visualization engine and data structures are built scalable. Supporting always-larger graphs is an endless challenge!
  • Modularity: Extensible software architecture, built on top of Netbeans Platform. Add plug-ins with ease.

Learn more about Gephi, watch Introducing Gephi 0.7, download and try it by following Quick Start Tutorial.

Gephi’s project is young, the growing community is composed of engineers and scientists involved in network science, datavis and complex networks.

List of ideas

List of ideas are availabe on our wiki. They cover various skills and level of difficulties:

* Completing Legend moduleComplete the Legend module, which was started in last year GSoC.
* GraphStore benchmark and tuningOptimize and tune GraphStore based on a serie of new well-defined benchmarks.

Please also propose your ideas on the forum. They will be considered and discussed by the community. Have a look at our long-term Roadmap.

Students, join the network

Students, apply now for Gephi proposals. Join us on the forum and fill in the questionnaire. Be careful, deadline for submitting proposals is May 3 (timeline)!

Hélder Suzuki, student for Gephi in 2009 and now software engineer at Google, wrote:
At Gephi students will have the opportunity to produce high impact work on a rapidly growing area and be noted for it.

View our previous Google Summer of Code projects here and read former students interviews.

Follow Gephi on Twitter

GSoC: interconnect Gephi Graph Streaming API and GraphStream

My name is Min WU and during this Google Summer of Code I have worked on the project to interconnect Gephi Graph Streaming API and the GraphStream library. My mentors are Yoann Pigné and André Panisson.

This project aims at interconnecting the GraphStream’s dynamic graph event model with Gephi in order to have Gephi to visualize an ongoing graph evolution and measurement. Based on this project, users can model and simulate complex systems with GraphStream while observing the output with the visual tools offered by Gephi.

GraphStream is written in Java. In order to use streams of graph events in other languages, GraphStream provides the NetStream framework, i.e. a network interface, such that other projects written in other language can use GraphStream. The NetStream framework consists of three parts, receiver, sender and the NetStream Protocol. The receiver is responsible for receiving graph events from the network and dispatching them to pipes. It works within only one thread, listening at a given address and port while receiving graph events from several streams, actually several threads or clients. The sender encodes graph events into messages according to the NetStream protocol and send them to a defined receiver with given port, host and stream ID. Every message contains sourceId, timeId and event context, among which the combination of sourceId and timeId is dedicated to distinguish between several streams and solve the synchronization issue. Finally the NetStream protocol specifies the message format at byte level.

Gephi also supports the idea of “streams of graph events”. It has a framework for graph streaming in Gephi plugin built by André Panisson during the 2010 GSoC, through a multi-threaded socket server. Other applications can push graph data to the Gephi server through the network, and have it visualized. In this graph streaming project, operations (a concept similar to event) are invoked through HTTP requests made by the client to the server, based on a JSON format.

Work done

In my project, I interconnected Gephi and GraphStream based on André’s Graph Streaming plugin. Since NetStream on GraphStream side works on NetStream protocol while Graph Streaming API on Gephi side works on JSON protocol, we have to make them compatible with each other. Considering the flexible interoperability and language agnostic properties, I have chosen the JSON protocol to do the interconnection and implement a sender part and a receiver part.

The sender part (JSONSender) is responsible for sending events from GraphStream to Gephi. GrpahStream works as a client and Gephi works as a server. Every time the graph in GraphStream changes, a corresponding event is sent to Gephi. Gephi handles the event and changes its own graph. In this way, the sender part works as a sink of the GraphStream graph, so it must implement the sink interface which contains methods to deal with graph element events and attribute events. In each method, we first encode the event message into a JSON string, and then send it to Gephi. We connect to Gephi and use “updateGraph” operation to send events. The corresponding URL is “http://host:port/workspace?operation=updateGraph”. The host and port must match with the Gephi sever and the workspace is a destination workspace of Gephi, for example an URL can be “http://127.0.0.1:8080/workspace0?operation=updateGraph”. The Gephi server and client are built with the “Graph Streaming API ” in the Gephi-plugin.

The receiver part (JSONReceiver) is responsible for receiving events from Gephi. It listens to Gephi and waits for events. Every time the graph in the Gephi changes, a corresponding event will be send to GraphStream. Then the GraphStream handles the event and changes its graph object. In this way, the receiver part works as a source of the GraphStream graph. In order to listen to Gephi events, we use a URL within “getGraph” operation to connect to Gephi. The corresponding URL is “http://host:port/workspace0?operation=getGraph”.

With these two classes, we can interconnect GraphStream and Gephi in real-time. Two tutorials are given to show how to do real-time connection between GraphStream and Gephi, see the video below. If you are interested in the detail implementation, please refer to the manual page.

The first class is GraphSender, which aims at loading a graph in GraphStream and dynamically displays it on a Gephi workspace. We need to create a graph instance and a JSONSender instance, and plug the JSONSender instance as a sink of the graph instance. Since then, when we generate the graph, or load the graph from a file, Gephi will display it in real-time.

The other class is LinLogLayoutReceiver. The Lin-Log layout in GraphStream is dedicated to find communities in graphs. This tutorial shows the execution of a Lin-Log layout in GraphStream and the sending of the layout information to Gephi in real time. We first load a graph in Gephi, display it and apply some algorithms. Then we send the graph to GraphStream and apply the Lin-Log layout on the graph on the GraphStream side. Meanwhile we visualize the layout process on the Gephi side in real time. To achieve it, we create a graph instance and a JSONReceiver instance, and then get the ThreadProxyPipe instance and plug the graph instance as an ElementSink of the pipe instance. Then we apply the Lin-Log layout, and create a new thread in which to create a JSONSender instance and plugin it as a sink of the graph layout.

Distribution

This project is distributed under MIT license. You can refer to the code on Github. By the way, I feel very appreciative for my mentors’ supervision. Thank you very much!

GSoC: Legend module

My Name is Eduardo Gonzalo Espinoza Carreon and during this summer I developed the new Legend Module for Gephi, with the mentoring of Eduardo Ramos and Sébastien Heymann. This article will give you an overview of the work done.

Problem statement

Currently Gephi offers the possibility of visualizing graphs, but what about legends? Legends provide basic and extra information related to the graph and they are useful when interpreting any kind of network map. If a person is not familiar with the content of a graph, missing or wrong legends could lead to misleading interpretations and sometimes wrong decisions. When a visualization is used by multiple people for discussing, analyzing or communicating data, legends are of great importance.

For instance, the following graph represents the coappearance of characters in the novel Les Miserables. After performing a visual analysis we could only conclude that the graph has 9 groups. This is probably a little of the information the creator wanted to transmit. The graph has no information related to the number of nodes explored, or what the groups represent and how many elements each group has, etc.

A current workaround to solve this problem is to export the graph as an image, and then manually add the legends using Inkscape, Adobe Illustrator or another graphics editor. However this task is time-consuming and can be automated. The new Legend Module proposes a solution to this problem.

Solution

We propose an extension to the Preview module for generating legend items. The following legend items are available: Table, Text, Image, Groups and Description. They can be added using the Legend Manager, which is shown in a new tab under the Preview Settings:

After selecting a type of legend, the user chooses a sub-type builder, e.g. “Table” > “Partition interaction table”, or “Top 10 nodes with greatest degree”, as shown in the following figure:

When a new Legend item is added, it is displayed in the list of active legend items, where the user can edit its properties. The user can also edit its label and assign a user-friendly name to remember the content of the legend easily.

Every item has a set of common properties: label, position, width, height, background color and border, title, description; and also each type of item has its own properties and data. The values of those properties are editable through a Property Editor like the one used in the preview settings.

Some properties like scale and translation can be modified using the mouse like most of the graphic design applications. All legend items are designed with a smart way of autoresize. It’s not the common scale feature, e.g. if the text included in the Text Item is bigger than the size assigned, then the Text Renderer overrides the text font defined by the user and decreases the font size until the text is able to fit in the specified dimensions. The results of this feature are shown in the next figure:

Workflow

The legend builder retrieves the graph data (partitions, node labels, edge labels, etc) and creates a new Legend item for each of them. Then a legend renderer makes use of these information, plus the properties set by the user, to render the Legend item to the specified target: PNG, PDF or SVG.

For developers

The renderers can be extended. For instance, the default Group Renderer is:

Using external libraries like JFreeChart, we can extend it to create a Pie Chart Renderer like as follows:

Other types of items can be created by combining other available Legend Items or by extending Legend Item, Legend Item Builder and Legend Item Renderer.

The Legend Module also provides a save/load feature. So you can save your legends for future editing.

Limitations

Currently there are some limitations like selecting a specific renderer for each type of item, and also exporting legends to SVG format is not done automatically like PNG and PDF, e.g. Exporting an Image (they will be embedded in the SVG file).

Conclusions

I would like to thank Eduardo Ramos and Sébastien Heymann for their support and feedback, which was critical during the development of this new module. The Legend module will be available as core feature in next Gephi release.

This GSoC was a great opportunity to learn and it also represents my first important contribution to the open-source community.

GSoC: Force Directed Edge Bundling

My name is Taras Klaskovsky and during this Summer Of Code I have implemented the Force Directed Edge Bundling algorithm.

Force Directed Edge Bundling (FDEB) is an edge layout algorithm. Gephi already has node layouts, which are placing the nodes (usually using force-directed algorithms). FDEB helps to further improve graph visualization by changing shapes and colors of the edges to merge them in bundles, analogous to the way electrical wires and network cables are merged into bundles along their joint paths and fanned out again at the end. This reduces visual clutter, inevitable in large graphs, allowing to find high-level edge patterns and get an overview of the graph just by looking at it. As example, US flights graph below, with nodes as airports and edges as flights.

The algorithm

Edges are modeled as flexible springs that can attract each other while node positions remain fixed. A force-directed technique is used to calculate the bundling and it is relatively slow. On small graphs it works pretty fast due to optimizations, but consumes large amount of memory to store precomputed similar pairs of edges; for average and large graphs a special slow low-memory mode implemented. After every iteration preview is being refreshed, so it’s possible to observe formation of bundles in real-time.

Full algorithm description can be found on this research paper.

original
mode-1

mode-2
mode-3

Renderer Modes

FDEB has 3 renderer modes:

  • Simple renderer: Draws all edges with the same color and transparency (color, transparency and thickness are set in the preview settings in the bottom-left panel), bundles are emphasized by combined transparency.
  • Gradient renderer: Draws all edges with color from gradient slider. Edges that are similar to higher number of other edges get higher color.
  • Gradient slow renderer: Uses more precise method to determine intensity of edge (needs precalculate points checkbox and to be re-runned) to set personal color for every segment.

FDEB can be customized with lots of options and they all have descriptions, some of the most influential on result are:

  • Use inverse-qudratic model: Makes more localized bundling
  • Compatibility threshold: Ignore pairs of edges that are not similar enough, which makes FDEB faster (ignored in low-memory mode) and bundling become more localized.

To make FDEB faster it’s also possible to decrease number of cycles/subdivision points increase rate.

gui

The Edge Layout API

An edge layout API has been created to simplify the integration of other edge layouts into Gephi. This API is very similar to the existing node layout API, with the following additions. Since edge layouts do not only change shapes of edges, but are also responsible for their visualization, a modifyAlgo() is called each time when the Preview is refreshed, to control the modification of parameters. The edge layout data, which is accessible for each edge, provides polyline points and it’s colors for all renderer modes.

How to get the FDEB algorithm

It will be available in the next release of Gephi. The current source code is available on Github.

The GSoC rocket launched

GSoC_2012_logo_250pxThe Google Summer of Code 2012 has officially started! The results have been announced by Google. Congratulations to the students who join the Gephi project:

  • Eduardo Espinoza – Legend Module in Preview
  • Romain Yon – Cloud Gephi
  • Taras Klaskovsky – Force Directed Edge Bundling algorithm
  • Vikash Anand – Statistics Reports and HTML5 Charts
  • Min WU – Interconnect Graph Streaming API and GraphStream

You put a lot of attention on doing the bests applications and demonstrate great motivation in addition to strong technical skills. We are very excited to work with you guys!

This year we are also honored to count on world-class researchers as mentors: Yoann Pigné is an Assistant Professor at the university of Le Havre, France, and is a leader of the GraphStream project. We will co-mentor the Graph Streaming project with André Panisson, our former Google Summer of Code student. André got his Ph.D. recently, and authored the video of the Egyptian Revolution on Twitter. Finally, Christian Tominski, who mentored the Preview refactoring last year, will mentor the Force Directed Edge Bundling project. He is a Lecturer and Researcher at the Institute for Computer Science at the University of Rostock. He has authored several articles in the field of information visualization.

Former Google Summer of Code students will also mentor and advise students, like Luiz Ribeiro.

The Summer Timeline:

* Until May 21: Students get to know mentors, read documentation, get up to speed to begin working on their projects.
* May 21: Students starts to code
* July 13: Mid-term evaluation
* August 24: Final evaluation

Google Summer of Code 2012

GSoC_2012_logo_250pxIt’s a great news, Gephi has been accepted again for the Google Summer of Code. The program is the best way for students around the world to start contributing to an open-source project. Since 2009, each edition is a great success and dramatically boosted Gephi’s project development.

What is Gephi?

Networks are everywhere: email systems, financial transaction systems and gene-protein interaction networks are just a few examples. Gephi began as a university student project four years ago and has quickly become an open source software leader in the visualization and analysis of large networks. It is an important contribution to the ecosystem of tools used by researchers and big data analysts to explore and extract value from the deluge of relational data and disseminate a better understanding for people to think about a “connected” world.

Gephi is a “Photoshop” for graphs: designed to make data navigation and manipulation easy, it covers the entire process from data importing to aesthetics refinements and communication. Users interact with the visualization and manipulate structures, shapes and colors to reveal the properties of complex and messy data. The goal is to help data analysts make hypotheses and intuitively discover patterns or errors in large data collections.

Gephi’s project aims at providing the perfect tool to visualize and analyze networks. We focus on usability, performance and modularity:

  • Usability: Easy to install, an UI without scripts and real-time manipulation.
  • Performance: Visualization engine and data structures are built scalable. Supporting always-larger graphs is an endless challenge!
  • Modularity: Extensible software architecture, built on top of Netbeans Platform. Add plug-ins with ease.

Learn more about Gephi, watch Introducing Gephi 0.7, download and try it by following Quick Start Tutorial.

Gephi’s project is young, the growing community is composed of engineers and scientists involved in network science, datavis and complex networks.

List of ideas

List of ideas are availabe on our wiki. They cover various skills and level of difficulties:

* Legend moduleIntegrate a legend in the Preview module
* Flexible Table ImporterCreate a generic network creation wizard from data tables
* Cloud GephiBuild an online gallery and bring some of Gephi’s features to the cloud
* Force Directed Edge BundlingImplement Force Directed Edge Bundling algorithm
* Statistics Reports and HTML5 ChartsImprove statistic report and port existing charts to HTML5+Javascript
* Statistics Unit TestsAdd unit tests to the statistical algorithms
* Graph StreamingImprove Graph Streaming API and interconnect GraphStream’s dynamic graph event model with Gephi

Please also propose your ideas on the forum. They will be considered and discussed by the community. Have a look at our long-term Roadmap.

Students, join the network

Students, apply now for Gephi proposals. Join us on the forum and fill in the questionnaire. Be careful, deadline for submitting proposals is April 6 (timeline)!

Hélder Suzuki, student for Gephi in 2009 and now software engineer at Google, wrote:
At Gephi students will have the opportunity to produce high impact work on a rapidly growing area and be noted for it.

View our previous Google Summer of Code projects here and read former students interviews.

Follow Gephi on Twitter

screenshot_gephi1903

How 3 years of Google Summer of Code made us great

seb

This post was originally posted on the Google Open Source blog by Sébastien Heymann, co-founder of the Gephi project and Google Summer of Code administrator.

Networks are everywhere: email systems, financial transaction systems and gene-protein interaction networks are just a few examples. Gephi began as a university student project four years ago and has quickly become an open source software leader in the visualization and analysis of large networks. It is an important contribution to the ecosystem of tools used by researchers and big data analysts to explore and extract value from the deluge of relational data and disseminate a better understanding for people to think about a “connected” world.

Gephi is a “Photoshop” for such data: designed to make data navigation and manipulation easy, it covers the entire process from data importing to aesthetics refinements and communication. Users interact with the visualization and manipulate structures, shapes and colors to reveal the properties of complex and messy data. The goal is to help data analysts make hypotheses and intuitively discover patterns or errors in large data collections.

separator

Our success was made much faster thanks to the Google Summer of Code. The timing of our acceptance into our first Google Summer of Code in 2009 was perfect: we were at the point where we could make the project really open in the way our infrastructure could scale code, and our human organization was ready to welcome contributors. Participating in the program gave us a boost of fame helping us promote the project and created an international community for Gephi.

We met many people and learned a lot, but this is the most important lesson to share: though students are paid stipends for their work during the program, money should not be the first incentive. To encourage students to stick with the project, we talk with each of them to find their deeper motivations in working on Gephi and try and develop a win-win situation. And it works! Many of the students continue to contribute to the project for at least a few months after the end of the Google Summer of Code program, and others have gone on to become members of our team.

We recognize this long-term investment by promoting their work, like André Panisson who released a plug-in in 2010, which connects Gephi to a graph stream and visualizes it in real-time. André made this amazing video of the Egyptian Revolution on Twitter, when he monitored the hashtag #jan25. More recently, Martin Škurla presented his work at FOSDEM 2012 and talked about his plug-in which connects Gephi to the graph database Neo4j. He started his project during the Google Summer of Code 2010 and continued his work until the release. We really appreciated the effort, so the Gephi Consortium and Neo Technologies Inc. paid his expenses to attend the conference. Finally, I must talk about Eduardo Ramos, who we rejected as a student two years ago for Google Summer of Code but who was so motivated that he decided to contribute to Gephi anyway, becoming one of the project leaders, a Google Summer of Code mentor… and a friend!

To learn more about Gephi, watch our madness screencast and view our previous Google Summer of Code projects here. Want to apply for Gephi? Join us on the forum.

GSoC mid-term: Attributes Disk Store

empty

My name is Ernesto Aneiros and during this Google Summer of Code I am working on the Attributes Disk Store.

The problem

In Gephi, Attributes are the data that is associated with nodes and edges. As graphs grow larger and larger, attributes occupy more memory even though many times they are not essential to the end-user when he is only applying transformations or algorithms to the graphs. These attributes can be of different types, from simple Java primitives (byte, char, int, String, etc) to Gephi’s internal data types (lists of primitives or versioned data). The idea for the project was to have a combined memory/disk cache system to partially off-load these attributes to disk. The system should have a well-designed cache system to handle heavy read access on the most-accessed elements.

The Solution (1st iteration)

Lucene is one of the most popular text searching engines and a flagship Java open source project. Lucene is capable of handling and indexing millions of records while remaining performant, and when the idea for the Attributes API cache was born, Lucene was first considered for the role of the data store, and as added bonus Gephi will get full-text search capabilities with almost no extra effort. When analyzing the problem, the following criteria were developed to judge a possible data store:

  1. Reliable (resistant to corrupted disk data, failed transactions, unexpected errors)
  2. Fast
  3. Transparent (minimum complexity exposed to the end-user)

While Lucene complies with items 2 and 3, the approach when dealing with corrupted indices in Lucene is to rebuild from scratch therefore failing item 1. This doesn’t pose a problem to Lucene because in the context where it is supposed to be used (indexing of external information), input data is always available separately from the index and can be accessed if needed. In Gephi, however, this is not the case. Once Attributes are loaded from disk they remain in memory until saved back to file. If an error occurs during a disk store transaction the end-user can end up losing a day’s work, certainly not acceptable.

The Solution (2nd iteration)

After Lucene was ruled out as a contender for a data store, several options were considered, including using embedded SQL databases and using a combination of Ehcache plus BerkeleyDb. Both options bring a lot to the table and embedded databases in Java have achieved impressive results in performance when compared to other mainstream database systems (see projects H2 and HSQL for example). Ehcache + BerkeleyDb however win when complexity is considered since they introduce almost no translation layers between Gephi and the cache. Both solutions are good fits for the problem but in the end the balance tilted in favor of Ehcache + BDB because the complexity consideration.

Optimizing Ehcache and BerkeleyDB

Even though Ehcache provides a great deal of functionality and features, it was relatively easy getting up to speed with it. The documentation provided online was very complete with code samples available and detailed explanations. In almost no time an in-memory cache was up, running and being tested. Traditionally cache sizes have been specified as the amount of max elements that they can hold. In the 2.5 BETA of Ehcache a new feature was introduced that allowed sizing the caches by memory consumed instead of elements held. For our project this is a killer feature since we can now expose a single option to the user, letting him specify how much memory the cache should consume. Even though using the new feature proved a little more complicated than expected we obtained great feedback from the Ehcache community, specially from alexsnaps and Mike Allen, which helped us to solve the issues we were having.

BerkeleyDB on the other hand, is a very complex piece of software. With years of development under the belt, BDB has evolved to be a very robust and flexible database. In fact, it is so flexible that can be used as full blown database supporting queries, a simple key/value datastore or with a front-end that exposes a Java collections map that greatly simplifies its use. All of this flexibility does not come free though, configuring and optimizing BerkeleyDB requires delving into details about transactions, buffers, log file sizes and BDB internals. However the tools are there and the information provided is quite good, especially the FAQ and the optimization section.

Integration with Gephi

Since ease of use and transparency are important considerations for the end-user of Gephi, only the minimal configuration options are exposed in the preference panel of the disk cache, but an Advanced tab provides more control for those who want it.

general_settings_tab
The General settings tab, where cache can be enabled or disabled and the memory usage configured.


The advanced settings tab allows a more advanced user to configure several of BerkeleyDB’s options.

The Disk Store in Action


Memory consumption without the disk store. It reaches 400MB.


Memory consumption with the store, after load it drops below 400 MB. Note how load time increased due to disk operations, a trade-off to consider when using the store.

Known issues

The project is still in development. Being memory saving the main goal of the disk store project, results are not good enough yet because of several reasons.

While BerkeleyDB provides a very convenient way of storing bytes in disk, it is still a database oriented software and therefore it is not the most suitable solution for out project because of large memory usage to caching data, building and maintaining its index (features desirable for databases but not for this project).

Trying to reduce BerkeleyDB memory usage with its settings will produce quite different results in different systems or even in the same system. The benchmark above shows not bad results but it is not always the case. A better control of maximum heap growth can be observed but still with memory usage peaks that prevent better saving.

The conclusion is that it is a priority to replace BerkeleyDB with other disk persistence system or create one specifically designed for Gephi disk store.

It is also known that graphs with more complex data like strings or lists will always benefit more from a disk storage system than graphs with simple data like integers or booleans. An idea is to always store simple data in memory because indexing in in the disk is going to need as much memory anyway, or even more.

On the other hand, Gephi works and was designed with in-memory data structures in mind. Adding a cache/disk store to the system is bound to create integration issues with other parts of the codebase. For example the GEXF file importer tends to load large portions of the graph file to memory while parsing it, which is not so good in memory constrained environments and using the cache here will not make a difference. One of these issues is regarding the handling of data in files with .gephi format. Due to the way that .gephi files are imported, some integration problems still need to be debugged in the disk store to work properly.

Looking to the future

This GSOC project is only scratching the surface of what a memory + disk cache system can achieve. In the future BerkeleyDB could be replaced with other persistence provider, and it doesn’t necessarily has to persist locally to the disk. For example replacing BerkeleyDB with a datastore like Cassandra, or maybe some RDBMS.

Conclusion

While the Data Store API introduced by this project is still taking its first steps and can be significantly evolved, it has helped ironing out many issues and has paved the way for bigger and better improvements. Working during this summer has been a great experience and I have been able to share with great mentors like Eduardo Ramos, who knows the Gephi codebase in and out. I hope the work of all of Gephi’s GSOC’ers becomes the starting point for many new features and enhancements that the community will surely appreciate. Happy coding and see you next summer!

GSoC mid-term: a new Timeline to explore time-varying networks

daniel

My name is Daniel Bernardes and during this Google Summer of Code I am working on the new Timeline interface.

Dynamic graphs have been the subject of increasing interest, given their potential as a theoretical model and their promising applications. Following this trend, Gephi has incorporated tools to study dynamic networks. From a visualization perspective, a critical tool is the Timeline component, which allows users to select pertinent time intervals and display and explore the corresponding graph. The challenge concerning the timeline was twofold: redesign the component to improve user experience and add extra features and introduce an animation scheme with the possibility to export the resulting video.

Together with my mentors Cezary Bartosiak and Sébastien Heymann, we have proposed a new design for the timeline component featuring a sparkline chart in the background of the interval selection drawer (which is semi transparent): this feature will help the user to focus on particular moments of the evolution of the dynamic graph, like bursts of connections or changes in graph density or other simple graph metrics. Current metrics are the evolution of the number of nodes, the number of edges and the graph density. The sparkline chart was preferred to other chart solutions because it does not add too much visual pollution to the component and adds to the qualitative analysis. The interaction with the drawer remains globally the same of the old timeline, to guarantee a smooth transition for the user.

To implement this feature we have used the chart library JFreeChart (a library already incorporated to Gephi), customizing their XYPlot into a Sparkline chart by modifying their visual attributes. To display the Sparkine, one needs to measure the properties of the graph in several time instants of the global time frame where the dynamic graph exists. This represented a major challenge, since the original architecture did not allow the timeline component to access (and measure) the graph in particular instants of time; the solution was to introduce a slight modification to the DynamicGraph API to provide an object which gave us snapshots of the graph at given instants. Other challenges we dealt with included the automatic selection/switching of real number/time units in the timeline (depending on the nature of the graph in question) and sampling granularity of the timeline.

Another breakthrough of this project was the introduction of the timeline animation. Once the user has selected a time frame with the drawer it can make it slide as the corresponding graph is being displayed on the screen. Besides the technical aspects of interaction between the timeline and the animation controller, there were also an effort to calibrate the animation (ie, in terms of speed and frames) so it would be comfortable and meaningful for the user.

As far as the UI is concerned, the component has gained a new “Reset” button next to the play button which activates the timeline drawer and displays the chart. It also serves to reset the drawer selection to the full interval when the timeline is active. The play button gained its original function, that is, to control the animation of the timeline — instead of activating the selection.

Finally, the animation export to a video format revealed to be more tricky than expected and couldn’t be finished as planned. There were several setbacks to this feature, beginning with the selection of a convenient library to write de movie container: it turns out that the de facto options available are not fully Java-based and need an encoder working in the background. The best alternative I found was Xuggler, which is based on ffmpeg. Also, obtaining screen captures of the graph to were a little bit tricky so I have exported SVG images from the graph corresponding to each frame, converted them to jpeg and than encoded them though Xuggler to a video format. As one might expect, this solution is not very efficient in terms of time, so Mathieu Bastien and my mentors suggested me to wait for the new features from the new Visualization API that would make this process simpler.

In addition to current bugfixes and minor improvements concerning the timeline and the animation, the movie export remains the the next big step to close this project. If you have questions or suggestion, please do not hesitate! The new timeline will be available in the next release of Gephi.

DB

GSoC mid-term: GraphGL, network visualization with WebGL

urban-škudnik

My name is Urban Škudnik and during this Google Summer of Code I develop GraphGL, an open source network visualization library for the Web.

Introduction

GraphGL is a network visualization library designed for rendering (massive) graphs in web browsers and puts dynamic graph exploration on the web another step forward. In short, it calculates the layout of the graph in real time and is therefore suitable for static files (exported GraphML/GEXF files) and for dynamic files (LinkedIn InMaps would be one such example).

As such, it is both a replacement for Gephi and a complimentary tool, similar to Seadragon, providing another method for displaying graphs in a Web browser.

Google-ChromeScreenSnapz001

null Static demo on the Java dataset.
null Static demo on a random graph with 100 nodes and 500 edges.
null Static demo on a random graph with 10,000 nodes and 50,000 edges.
null Static demo on a random graph with 100,000 nodes and 200,000 edges.
null Dynamic demo on Java dependencies dataset.

Commands: mouse left-button to pan, mouse wheel to zoom.

Alternatives

While having Gephi (renderer, at least) in the browser would be nice, such alternatives are not really realistic – for one, Java in Web browser is not welcomed by many users as it alone is a large resource hog. Another issue that can be raised is it’s integration with the rest of the web environment and issues that a developer can face with integration into his web application. It’s benefit however would be almost native-application performance.

Flash can also be considered for our problem as it supports 3D hardware accelerated graphics but being a proprietary technology it is not particularly attractive, especially for a library that wants to be based on open and standard technologies.

An alternative is aforementioned Seadragon plugin that builds image tiles of the rendered graph and provides interactivity components similar to those found at Google Maps or any other mapping site. As calculating graph layout and rendering itself can be very resource intensive this method can still be encouraged at graphs where unreasonably large amounts of RAM and CPU are required. It’s issue is interactivity and dynamics – after graph is rendered and exported, it can not be easily changed, especially not in real time.

WebGL and Web Workers

However, WebGL and WebWorkers present a solution, that can circumvent the issues of interactivity and dynamics and at the same time offer good performance.

3D graphics on the Web was always a bit tricky and was only possible if you had Java or Flash plugin. WebGL origins can be traced to Canvas 3D experiments at Mozilla in 2006, but it was in 2009 that Mozilla and Khronos, consortium that is focused (among other) on creating and maintaining open standard for graphics, started WebGL Working Group. It’s first stable specification was released in March 2011.

Since then, it has been touted both as a solution to 3D graphics problem on the web as well as a huge security vulnerability that provides a completely new vector of attack – access to kernel-mode graphics drivers and hardware.

The WebGL API is based on OpenGL ES 2.0 (with slight changes) and is exposed through HTML Canvas element. OpenGL ES 2.0, in turn, is a subset of OpenGL, primarily target at embedded devices and enables fully programmable 3D graphics with a vertex and fragment shader exposed to the developer.

Web Workers are a lot less controversial technology. Basically, they are an API for starting, running and terminating Javascript scripts in the background (separate thread) and thus allow web application to perform long-running calculations that could otherwise be interrupted either by user actions or by browsers timeout limits for Javascript.

WebGL and Web Workers are supported by Firefox (enabled by default since 4), Safari (disabled by default in 5.1), Chrome (enabled by default since 9) and Opera (though for Windows at the moment there is only a development build).

Microsoft has already indicated that they do not plan to support WebGL in its current form due to security issues, but there is a plugin, IEWebGL, that adds support for it.

Basically, all of this boils down to this: if your users are relatively tech savvy and therefor have relatively modern web browser and that browser is not Internet Explorer, you can give GraphGL a serious consideration. If your target audience will include a large proportion of IE users that will not or can not install a plugin, this might not be your optimal solution.

GraphGL

GraphGL’s objective is to be an open source network exploration tool for the Web. Built with open technologies, easily extensible (e.g. with other layout algorithms), easy to integrate with existing web applications, it enables easy adoption in your application and rapid development of any missing features also for developers that are not familiar with OpenGL and GLSL (shader language of OpenGL).

To achieve all of these objectives, GraphGL is built with the help of three.js, an awesome library for WebGL that abstracts-away low-level graphic calls. This means that Javascript developers should not have too much trouble giving a helping hand to the project.

Currently, data is imported with JSON (JavaScript Object Notation) converted to internal representation and displayed. Basic interactivity, such as panning, zooming and selecting node and its connections are already implemented, with further additions for selection possible.

Use cases

Another factor to consider is what you are trying to achieve. As mentioned, if you have a multi-million node graph, calculating its layout in real time might be a bit too heavy-weight for your average computer. It’s current best use case would be when you do not have a too large graph so that layout can be calculated on a client side.

One such example could be graphs that change frequently or are dependent on the per-rendering settings: interconnections between particular Twitter users’ followers, where, if Twitter would provide such a tool, calculating all layouts would be extremely expensive for Twitter, while for most average users this wouldn’t present any problem if layout would be calculated on client side when user would visit this tool.

LinkedIn is doing something similar with it’s InMaps service.

What to expect in term of performance

Performance varies greatly, as could be expected from such a library. On a modern computer one should not have problem calculating layout and rendering thousands, if not tens of thousands of nodes, while on older hardware (lower) thousands of nodes should still be rendered, but performance may not be super-smooth. In the future, further optimizations should give us even a higher FPS (Frames Per Second).

If, however, you are dealing with static graph (meaning, exported GraphML file, converted to JSON), we can easily render tens of thousands of nodes and edges and actual file size gets the biggest limitation.

To put things a bit into perspective: On my notebook (Summer 2007 Macbook Pro – 2.2GHz Core2Duo, 4GB RAM, GF8600M) I can render the Java dependency dataset that comes with Gephi (1.5k nodes, 8k edges) with about 40FPS, 10k nodes with 50k edges with around 10 to 15FPS and 100k nodes with 200k edges with around 3-5FPS. However, 100k nodes and 200k edges file comes at almost 22MB. At one time I tested with 2k nodes and 900k edges, file came at almost 37MB and sent Chrome belly up (though I haven’t tested that dataset with latest branch that supports static layouts).

I hope we (my hopes are that more developers join in the effort) still have some space to optimize and render even larger graphs.

Limitations

As said, support for WebGL is not universal and this can present a show stopper for you. Further limitation for the time being can be layout calculations and the strain it can put on resources of your users. Along with that, one should also keep in mind a very real issue of file size – large datasets are large not just by number of nodes but also by megabytes.

Technicals

What follows is a more technical discussion of implementation and issues for those that are interested in development of GraphGL.

Theory

Web is always a bit of a tricky environment due to a rather restrictive environment in which you must operate. Not only you have to share resources with other applications, but you also share resources with other web applications which on times have memory leaks or just burn through CPU cycles like there is no tomorrow (though GraphGL will fall into later category – but with layout processing and heavy rendering that is somewhat expected).

Along with these usual restrictions there is also a browser limit on the duration of execution of Javascript code, performance of Javascript itself (no call-by-value), practical file size limitations, recursion limits, etc.

As said, WebGL and Web Workers were utilized to try to circumvent these limitations. Using three.js to abstract low-level graphic calls has its advantages and potential problems, but in general advantages out-weight problems.

Advantages of faster development and wider developer base have already been pointed out, so I’ll just point out the biggest possible problem (and advantages at the same time). With three.js, the abstraction removes low-level control over details of implementation and optimization for our use case.

At the beginning of Summer of Code I also looked at other libraries but at the end three.js won over the rest primarily due to a lot more active developer community around it. Most of other libraries in general provide tools to help with things like loading shaders and how to send attributes and uniforms to the shader and leave majority of graphic calls to programmer. None of them also provided any particular advantage over each other so at the end the deciding factor was really a number of semi-active developers as my hope is that GraphGL becomes de facto the open source network visualization library for the Web for the foreseeable future and for that it needs a foundation that will not be unmaintained.

Implementation

Library imports JSON (GEXF and GraphML were considered, but are unfeasible – as they are XML, they can only be properly parsed (i.e. not with Regex) in the main window, which would lock the browser at graph of any meaningful size).

At this very time, there are two implementations – one which relies on meshes for rendering of nodes and one that relies on three.js‘s particle system. Later is not yet quite as stable and therefore still in separate branch.

For the “stable” relase: nodes are rendered as Meshes – each one a plane – with a shader drawing a circle by determining whether pixel should be colored or not, i.e., whether it satisfies the equation x^2 + y^2 – r^2 < 0.

As for "particlesystem" branch: Every node is a particle, rendered as a gl.POINT, determining its size with gl.PointSize. Coloring and shape are yet to be implemented, but will follow the same rule.

Edges are rendered as a single Line object – three.js translates this to WebGLs gl.LINES – to efficiently render large number of lines. Arches (disabled at the moment) – are, as nodes, rendered as planes with each one being colored by shader – if pixel lies in a certain range of values and therefore satisfies an implicit equation.

Currently only one color of edges is supported.

As for layout – it is calculated in a Web Worker that (at the moment) uses a not-quite-finished-yet version of Force Atlas 1 algorithm. Me and Julian Bilcke (my mentor) are in the process of re-writing Force Atlas 2 into Javascript but for all practical purposes my library should be easily understandable to anyone to write any desired algorithm into Javascript – if not, do not hesitate to contact me for help/explanation/suggestions.

Future

For what remains of Summer of Code I plan to fix bugs, write documentation, maybe finish Force Atlas 2.

Currently labels are also missing but should be implemented in the near future. I just have to decide if I should implement them with HTML or as text in WebGL. First option gives us easy copy-and-paste and greater flexibility for (custom) styling, second gives performance. One take would be to do it with HTML and only show labels when you are close enough and remove those that are not in the view or only show labels of a node and it’s neighbors when you select it.

My long term (and at the moment still uncertain) goal is to also try to move layout calculations to GPU, though this presents serious challenges. I tried to implement this in the middle of GSoC but stumbled upon a couple of technical issues that prevented practical implementation. Since then I came upon several demos that overcame those specific issues, making me hopeful that it shouldn’t be impossible.

While implementing it with WebGL will be hard, it should be a lot easier to achieve with WebCL. Hopefully, WebCL adoption will head the same way as WebGL (meaning, generally about a year or two).

Summary

I hope this text provided good introduction into GraphGL, what technologies it uses, how it is built, what are it’s objective and for what kind of problems it is best suitable for. If you have a use case already but don’t see a particular feature do not hesitate to request it – it just might bump it up the priority list.

And remember, the point of GraphGL is customization and easy changes that can be done by everyone.

Feature requests? Comments? Suggestions? Opinions?

Comments or urban.skudnik@gmail.com or github – just fork it! 😉