Label Adjust

The Label Adjust functionality is a special type of algorithm. It is available through the Spatialization menu but instead of working with nodes position it works with labels. The aim is to automatically avoid label overlapping.

Gephi is built to produce readable maps, which can be published or printed. By default, if a network has more than 1000 nodes it becomes hard to read and even more if labels are displayed. With the Label Adjust algorithm, the boring work when you manually move each node of the network vanished.

When running, the algorithm slightly moves nodes where labels are overlapping. For instance with long labels like URLs this functionality is really time-saving, and it is easy to use. Display labels as you want (font, size, color, …) and start the algorithm. It automatically stops when its detect no more label overlapping, but you can also stop it by hand.

Here is a small demo video of the feature running. Needless to say the algorithm is designed for larger networks.

http://vimeo.com/moogaloop.swf?clip_id=2242916&server=vimeo.com&show_title=1&show_byline=1&show_portrait=0&color=&fullscreen=1

This functionality is important when exporting map results in Gephi. The standard process of publishing network maps in Gephi would be something like that:
1. Spatialize the network, using for instance Force Atlas algorithm.
2.Use filters to set nodes color and size depending of the network data.
3.Display labels and set text settings.
4.Use Label Adjust to makes all labels readable.
5.Export.

Performance and scalability

With this article and some following I’ll focus on the application design and explain technical points I think relevant to understand our approach.

Today’s subject is performance and scalability in the visualization. Although other modules need high-quality performances, the visualization of thousands nodes and edges remains the major challenge. For a visualization-centered software like Gephi it is a key feature we attach great important.

What you can find in other network visualization software is either a poor visualization module or a stunning aspect but not efficient. For instance Pajek has a very efficient core and you can achieve a lot with it but problems starts when you want to visualize your network. With GUESS you are able to produce nice maps but the render engine starts suffering seriously over 2000 nodes. Gephi tries to combine an efficient render engine with looking good results.

In 2007 when we started designing the current version of Gephi we had in mind we want to create a new generation of network visualization software and hence we made some choices I will try to explain here.

Use multi-core
Already in 2007 and even more now multi-core processors impose new rules in software development. It brings appealing features but also some risks. However technology starts to be mature in this, all current Top 10 video games has been thought multi-thread from the beginning. Multi-core brings performance but does it bring scalability as well? I would say YES for Gephi because no matter how many processor you have, what can be parallelized will be parallelized. Graphics card are not able to parallelize yet but we count this would be the case in the future.

Use GPU
You may notice we got some inspiration from video games development. When using the graphic card features, Gephi’s render engine let the processor free for other computing and allows using GPU acceleration to speed up rendering. Apart allowing 3D graphs, many drawings are speed up by the GPU in Gephi. I would say the only problem is compatibility, due to the high number of different graphic cards on the market.

Architecture
The visualization package architecture is a compromise between flexibility and performances. In 3D engine design it is quite impossible to have both in the same time. Hence our engine has flexibility where it doesn’t harm efficiency.

These choices allow good performance for visualizing, and I would say it is only the beginning. Currently, up to 50,000 nodes can be visualized and even more but this depends on edges number and how your graph is spatialized. Indeed we use techniques to avoid computing of parts of the graphs out of the screen:

Octree cubes partition Octree cubes on a 3D graph

The graph is cut in fixed volumes in a structure called Octree. It is easy for the render engine to determine which cubes are hidden and which are visible. Only 3D objects in visible cubes are computed. As a consequence performances don’t depend on how much nodes you have in your network but how many you are currently visualizing. So even with huge graphs, zooming in and exploring parts of it remains fast.

Besides the current 3D engine, which is intended to work on all configurations a new one will be developed in 2009. Using the last features of graphic card, networks size limit around 200,000 nodes may be reached.