GSoC mid-term: GraphGL, network visualization with WebGL

urban-škudnik

My name is Urban Škudnik and during this Google Summer of Code I develop GraphGL, an open source network visualization library for the Web.

Introduction

GraphGL is a network visualization library designed for rendering (massive) graphs in web browsers and puts dynamic graph exploration on the web another step forward. In short, it calculates the layout of the graph in real time and is therefore suitable for static files (exported GraphML/GEXF files) and for dynamic files (LinkedIn InMaps would be one such example).

As such, it is both a replacement for Gephi and a complimentary tool, similar to Seadragon, providing another method for displaying graphs in a Web browser.

Google-ChromeScreenSnapz001

null Static demo on the Java dataset.
null Static demo on a random graph with 100 nodes and 500 edges.
null Static demo on a random graph with 10,000 nodes and 50,000 edges.
null Static demo on a random graph with 100,000 nodes and 200,000 edges.
null Dynamic demo on Java dependencies dataset.

Commands: mouse left-button to pan, mouse wheel to zoom.

Alternatives

While having Gephi (renderer, at least) in the browser would be nice, such alternatives are not really realistic – for one, Java in Web browser is not welcomed by many users as it alone is a large resource hog. Another issue that can be raised is it’s integration with the rest of the web environment and issues that a developer can face with integration into his web application. It’s benefit however would be almost native-application performance.

Flash can also be considered for our problem as it supports 3D hardware accelerated graphics but being a proprietary technology it is not particularly attractive, especially for a library that wants to be based on open and standard technologies.

An alternative is aforementioned Seadragon plugin that builds image tiles of the rendered graph and provides interactivity components similar to those found at Google Maps or any other mapping site. As calculating graph layout and rendering itself can be very resource intensive this method can still be encouraged at graphs where unreasonably large amounts of RAM and CPU are required. It’s issue is interactivity and dynamics – after graph is rendered and exported, it can not be easily changed, especially not in real time.

WebGL and Web Workers

However, WebGL and WebWorkers present a solution, that can circumvent the issues of interactivity and dynamics and at the same time offer good performance.

3D graphics on the Web was always a bit tricky and was only possible if you had Java or Flash plugin. WebGL origins can be traced to Canvas 3D experiments at Mozilla in 2006, but it was in 2009 that Mozilla and Khronos, consortium that is focused (among other) on creating and maintaining open standard for graphics, started WebGL Working Group. It’s first stable specification was released in March 2011.

Since then, it has been touted both as a solution to 3D graphics problem on the web as well as a huge security vulnerability that provides a completely new vector of attack – access to kernel-mode graphics drivers and hardware.

The WebGL API is based on OpenGL ES 2.0 (with slight changes) and is exposed through HTML Canvas element. OpenGL ES 2.0, in turn, is a subset of OpenGL, primarily target at embedded devices and enables fully programmable 3D graphics with a vertex and fragment shader exposed to the developer.

Web Workers are a lot less controversial technology. Basically, they are an API for starting, running and terminating Javascript scripts in the background (separate thread) and thus allow web application to perform long-running calculations that could otherwise be interrupted either by user actions or by browsers timeout limits for Javascript.

WebGL and Web Workers are supported by Firefox (enabled by default since 4), Safari (disabled by default in 5.1), Chrome (enabled by default since 9) and Opera (though for Windows at the moment there is only a development build).

Microsoft has already indicated that they do not plan to support WebGL in its current form due to security issues, but there is a plugin, IEWebGL, that adds support for it.

Basically, all of this boils down to this: if your users are relatively tech savvy and therefor have relatively modern web browser and that browser is not Internet Explorer, you can give GraphGL a serious consideration. If your target audience will include a large proportion of IE users that will not or can not install a plugin, this might not be your optimal solution.

GraphGL

GraphGL’s objective is to be an open source network exploration tool for the Web. Built with open technologies, easily extensible (e.g. with other layout algorithms), easy to integrate with existing web applications, it enables easy adoption in your application and rapid development of any missing features also for developers that are not familiar with OpenGL and GLSL (shader language of OpenGL).

To achieve all of these objectives, GraphGL is built with the help of three.js, an awesome library for WebGL that abstracts-away low-level graphic calls. This means that Javascript developers should not have too much trouble giving a helping hand to the project.

Currently, data is imported with JSON (JavaScript Object Notation) converted to internal representation and displayed. Basic interactivity, such as panning, zooming and selecting node and its connections are already implemented, with further additions for selection possible.

Use cases

Another factor to consider is what you are trying to achieve. As mentioned, if you have a multi-million node graph, calculating its layout in real time might be a bit too heavy-weight for your average computer. It’s current best use case would be when you do not have a too large graph so that layout can be calculated on a client side.

One such example could be graphs that change frequently or are dependent on the per-rendering settings: interconnections between particular Twitter users’ followers, where, if Twitter would provide such a tool, calculating all layouts would be extremely expensive for Twitter, while for most average users this wouldn’t present any problem if layout would be calculated on client side when user would visit this tool.

LinkedIn is doing something similar with it’s InMaps service.

What to expect in term of performance

Performance varies greatly, as could be expected from such a library. On a modern computer one should not have problem calculating layout and rendering thousands, if not tens of thousands of nodes, while on older hardware (lower) thousands of nodes should still be rendered, but performance may not be super-smooth. In the future, further optimizations should give us even a higher FPS (Frames Per Second).

If, however, you are dealing with static graph (meaning, exported GraphML file, converted to JSON), we can easily render tens of thousands of nodes and edges and actual file size gets the biggest limitation.

To put things a bit into perspective: On my notebook (Summer 2007 Macbook Pro – 2.2GHz Core2Duo, 4GB RAM, GF8600M) I can render the Java dependency dataset that comes with Gephi (1.5k nodes, 8k edges) with about 40FPS, 10k nodes with 50k edges with around 10 to 15FPS and 100k nodes with 200k edges with around 3-5FPS. However, 100k nodes and 200k edges file comes at almost 22MB. At one time I tested with 2k nodes and 900k edges, file came at almost 37MB and sent Chrome belly up (though I haven’t tested that dataset with latest branch that supports static layouts).

I hope we (my hopes are that more developers join in the effort) still have some space to optimize and render even larger graphs.

Limitations

As said, support for WebGL is not universal and this can present a show stopper for you. Further limitation for the time being can be layout calculations and the strain it can put on resources of your users. Along with that, one should also keep in mind a very real issue of file size – large datasets are large not just by number of nodes but also by megabytes.

Technicals

What follows is a more technical discussion of implementation and issues for those that are interested in development of GraphGL.

Theory

Web is always a bit of a tricky environment due to a rather restrictive environment in which you must operate. Not only you have to share resources with other applications, but you also share resources with other web applications which on times have memory leaks or just burn through CPU cycles like there is no tomorrow (though GraphGL will fall into later category – but with layout processing and heavy rendering that is somewhat expected).

Along with these usual restrictions there is also a browser limit on the duration of execution of Javascript code, performance of Javascript itself (no call-by-value), practical file size limitations, recursion limits, etc.

As said, WebGL and Web Workers were utilized to try to circumvent these limitations. Using three.js to abstract low-level graphic calls has its advantages and potential problems, but in general advantages out-weight problems.

Advantages of faster development and wider developer base have already been pointed out, so I’ll just point out the biggest possible problem (and advantages at the same time). With three.js, the abstraction removes low-level control over details of implementation and optimization for our use case.

At the beginning of Summer of Code I also looked at other libraries but at the end three.js won over the rest primarily due to a lot more active developer community around it. Most of other libraries in general provide tools to help with things like loading shaders and how to send attributes and uniforms to the shader and leave majority of graphic calls to programmer. None of them also provided any particular advantage over each other so at the end the deciding factor was really a number of semi-active developers as my hope is that GraphGL becomes de facto the open source network visualization library for the Web for the foreseeable future and for that it needs a foundation that will not be unmaintained.

Implementation

Library imports JSON (GEXF and GraphML were considered, but are unfeasible – as they are XML, they can only be properly parsed (i.e. not with Regex) in the main window, which would lock the browser at graph of any meaningful size).

At this very time, there are two implementations – one which relies on meshes for rendering of nodes and one that relies on three.js‘s particle system. Later is not yet quite as stable and therefore still in separate branch.

For the “stable” relase: nodes are rendered as Meshes – each one a plane – with a shader drawing a circle by determining whether pixel should be colored or not, i.e., whether it satisfies the equation x^2 + y^2 – r^2 < 0.

As for "particlesystem" branch: Every node is a particle, rendered as a gl.POINT, determining its size with gl.PointSize. Coloring and shape are yet to be implemented, but will follow the same rule.

Edges are rendered as a single Line object – three.js translates this to WebGLs gl.LINES – to efficiently render large number of lines. Arches (disabled at the moment) – are, as nodes, rendered as planes with each one being colored by shader – if pixel lies in a certain range of values and therefore satisfies an implicit equation.

Currently only one color of edges is supported.

As for layout – it is calculated in a Web Worker that (at the moment) uses a not-quite-finished-yet version of Force Atlas 1 algorithm. Me and Julian Bilcke (my mentor) are in the process of re-writing Force Atlas 2 into Javascript but for all practical purposes my library should be easily understandable to anyone to write any desired algorithm into Javascript – if not, do not hesitate to contact me for help/explanation/suggestions.

Future

For what remains of Summer of Code I plan to fix bugs, write documentation, maybe finish Force Atlas 2.

Currently labels are also missing but should be implemented in the near future. I just have to decide if I should implement them with HTML or as text in WebGL. First option gives us easy copy-and-paste and greater flexibility for (custom) styling, second gives performance. One take would be to do it with HTML and only show labels when you are close enough and remove those that are not in the view or only show labels of a node and it’s neighbors when you select it.

My long term (and at the moment still uncertain) goal is to also try to move layout calculations to GPU, though this presents serious challenges. I tried to implement this in the middle of GSoC but stumbled upon a couple of technical issues that prevented practical implementation. Since then I came upon several demos that overcame those specific issues, making me hopeful that it shouldn’t be impossible.

While implementing it with WebGL will be hard, it should be a lot easier to achieve with WebCL. Hopefully, WebCL adoption will head the same way as WebGL (meaning, generally about a year or two).

Summary

I hope this text provided good introduction into GraphGL, what technologies it uses, how it is built, what are it’s objective and for what kind of problems it is best suitable for. If you have a use case already but don’t see a particular feature do not hesitate to request it – it just might bump it up the priority list.

And remember, the point of GraphGL is customization and easy changes that can be done by everyone.

Feature requests? Comments? Suggestions? Opinions?

Comments or urban.skudnik@gmail.com or github – just fork it! 😉

Graph visualization on the web with Gephi and Seadragon

The project takes another big step forward and bring dynamic graph exploration on the web in one click from Gephi with the Seadragon Web Export plugin.

Mathieu Bastian and Julian Bilcke worked on a Seadragon export plugin. Directly export large graph pictures and put it on the web. Seadragon is pure Javascript and works on all modern browsers. As it uses images tiles (like Google Maps), there is no graph size limit.

Go to your Gephi installation and then to the Plugin Center (Tools > Plugin) to install the plugin. You can also download manually the plugin archive or get the source code.

/seadragon-samples/diseasome/seadragon.html

Sample with Diseasome Network dataset directly exported from Gephi

Communication about (large) graphs is currently limited because it’s not easy to put them on the web. Graph visualization has very much same aims as other types of visualization and need powerful web support. It’s a long time we are thinking about the best way to do this and found that there is no perfect solution. We need in the same time efficiency, interactivity and portability. The simpleness of making and hacking the system is also important, as we want developers to be able to improve it easily.

By comparing technologies we found that Seadragon is the best short-term solution, with minimum efforts and maximum results. It has however still a serious limitation: interactivity. No search and no click on nodes are possible for the moment. But as it is JS, I don’t see hurdles to add these features in the future, help needed.

The table below see our conclusions on technologies we are considering. We are very much eager to discuss it on the forum. As performance is the most important demand, WebGL is a serious candidate but development would require time and resources. We plan to start a WebGL visualization engine prototype next summer, for Google Summer of Code 2011, but we would like to discuss specifications with anyone interested and make this together.

Portability Efficiency Effort Interactivity
Flash
Java2D/Processing
Canvas (Processing.js/RaphaelJS)
WebGL
Seadragon
Figure: Comparing technologies able to display networks on the web.

How to use the plugin?

Install the plugin from Gephi, “Tools > Plugin” and find Seadragon Web Export. After restarting Gephi, the plugin is installed in the export menu. Load a sample network and try the plugin. Go to the Preview tab to configure the rendering settings like colors, labels and edges.

Export directly from Gephi Export menu

The settings asks for a valid directory where to export the files and the size of the canvas. Bigger is the canvas, more you can zoom in, but it takes longer time to generate and to load.

Export settings, configure the size of the image

Note that result on the local hard-drive can’t be viewed with Chrome, due to a bug. Run Chrome with “–allow-file-access-from-files” option to make it work.

Kudos to Microsoft Live Labs for this great library, released in Ms-PL open source license. Thank you to Franck Cuny for the CPAN Explorer project that inspired this plugin. Other interesting projects are GEXF Explorer, a Flash-based dynamic widget and gexf4js, load GEXF files into Protovis.

Mozilla Drumbeat – Map the web

Mozilla Drumbeat initiative is an open project to build a better web. It gathers communities around various projects to discuss technology and the way we will use the web in the future. It is also possible to submit your own project ideas.

But there is one which interests us in particular already, Map the Web:

Map the Web uses art, design and data to map the internet — to help all of us understand the web, how it works and what it means.

The objectives of the project so far, quoted from the project page:

  1. Transform the big, abstract internet into something simple and emotional that busy people can understand by …
  2. Building a community of artists, designers and data nerds passionate about mapping the internet who …
  3. Create tools and maps that help people understand how the internet works and where they fit in.
  4. Over time: use the insights from these maps to generate other kinds of tools and projects that add to users’ experiences of the web.

At Gephi we do believe in the power of maps to tell stories and help users to understand complex, unsorted data. For instance why not use maps for bookmarks? Represented by a network of associated URLs and tags, bookmarks would appear in thematic clusters naturally. Gary Flake showed recently interesting ideas about data visualization use in browser.

Gephi is a desktop Java application, but in the next months our aim is to launch a web canvas project as well. The idea is to lead or participate building a network visualization library standard for the web. Simple enough to be used in various applications, we propose to do this using WebGL. A data visualization library must be efficient and OpenGL is clearly suitable for this task. We had considered starting a Google Summer of Code project this year about that but we finally decided to wait a bit more. WebGL is getting lots of support and development and promise to be the standard, as Google recently dropped O3D. We think this canvas project has many common interests with the ‘Map the Web’ Drumbeat project and therefore naturally propose to help.

Let’s start the discussion and contribute to this project! Who’s joining?