Guest blog post from Dr. Tominski who accepted to review Gephi 0.7alpha4 for us.
Christian Tominski received his diploma (MCS) from the University of Rostock in 2002. In 2006 he received doctoral degree (Dr.-Ing.) from the same university. Currently, Christian is working as a lecturer and researcher at the Institute for Computer Science at the University of Rostock. Christian has authored and co-authored several articles in the field of information visualization. His main interests concern visualization of multivariate data in time and space, visualization of graph structures, and visualization on mobile devices. In his research, a special focus is set on interactivity, including novel interaction methods and implications for software engineering.
Recently, I stumbled upon the Gephi Project – an open source graph visualization system. As I’ve done some research in the area of interactive graph visualization, I was eager to see how Gephi works and if it brings some new concepts or if it’s yet another graph visualization system. I’ll share my thoughts on Gephi from three perspectives. The first one is the user perspective. I’ll take the role of a user who is interested in getting a visual depiction of some graphs. Secondly, I’ll take the role of a developer and shed some light on the aspect of software engineering. And finally, I’ll be a scientist and try to foresee if and in which regard Gephi might have some impact on visualization research.
The User’s Perspective
Gephi has been designed with the users and their needs in mind. The system welcomes its users with a familiar look and feel. It is quite easy to load graph data into the system. Many of the known file formats for graphs are supported, as for instance, DOT, GML, GraphML, or Tulip’s file format TLP. A nice thing about the data import is that an import report provides essential information about the import process (e.g., number of nodes and edges, edge-directedness, potential problems, etc.). Once imported, the graph is shown as nodes and links in a main view, and several complementary views provide additional information.
The main view is the core for visual graph exploration. It allows users to zoom in, to select nodes, to adjust node size and color, to find shortest paths, and to access attributes of nodes and edges. In addition to letting users set sizes and colors manually, the system can also set these automatically based on attributes associated with nodes and edges. What is called “Partition” in Gephi is used to assign unique colors to nodes and edges based on qualitative data attributes (e.g., class affiliation). Quantitative data values can be mapped to size and color of nodes, edges, and labels using the “Ranking” tool. All these tools are customizable. It is worth mentioning, that Gephi provides some nice user controls to parameterize the color coding.
Gephi also supports graph editing, i.e., insertion and deletion of nodes and edges as well as manipulation of attribute values. What is missing in terms of editing the data is the possibility to add (and delete) attributes, for instance to generate some derived data values using simple formula.
A key aspect in graph exploration is the layout of node and edges. As it is usually unclear what will be the best layout for a given graph, Gephi offers various layout algorithms to choose from. While a layout is being computed, the main view constantly updates itself to provide feedback of the progress made. A big plus is that users can interrupt the layout algorithm once they deem the result to be ok or if they find that it might be more suitable to use the current result as the initial setup for another algorithm. This way users can easily tune the layout to fit the graph and the particular needs. Users may put the finishing touches to the layout by moving nodes manually in the main view.
Once a suitable visual representation has been created, the final step is to export nice pictures of the graph. To this end, Gephi follows the philosophy of providing a dedicated export interface with many options to create high quality printouts.
People that have been working with larger graphs might know that some computations on graphs (including layout computation) are quite complex and take some time. While other systems are blocked during computation and in the best case provide a progress bar, Gephi is different. Long running calculations are concurrent to the main application. From my point of view, this is one of the strongest points of Gephi, the system does not block during costly computations. The benefit for the users is that they can always interact, for instance to initiate some other computations or to cancel running ones when they recognize that a re-parameterization would yield better results.
Concurrency is Gephi’s solution to offering computations of statistics about the graph. Currently, Gephi supports a variety of classic graph statistics including degree distribution, number of connected components, and others. Based on data attributes and computed statistics, the graph can be filtered to reduce nodes and edges to those that fulfill the filter criteria. In a dynamic filtering UI, several filters can be combined using drag’n’drop and thresholds can be manipulated easily, for instance via sliders. Besides using filtering for data reduction, Gephi also provides basic support for graph clustering. However, the currently implemented MCL algorithm is still experimental. But there is the possibility to manually group nodes to build a hierarchical structure on top of the visualized graph. Yet, this is quite cumbersome for larger graphs. Additional tools are needed to support the user in creating a navigable hierarchy on top of a graph. Configurable clustering pipelines that combine several strategies for clustering (e.g., based on attributes or based on bi-connected components) in addition to a clustering wizard user interface would be helpful.
In summary, I see a much potential in Gephi, the overall shape of the system impressed me – me as a user. I personally felt it easy to work with Gephi and explore some of my own data sets and some provided at Gephi’s website. Given the fact that the version I’ve worked with is 0.7 alpha, there is also much space for improvements. In the first place I would like to mention the navigation of the graph. The main view provides just basic zoom and pan navigation, which is even imprecise in some situations. Navigation tools like those provided in Google Earth and navigation based on paths through a graph would be really helpful. Moreover, I was missing the concept of linking between views. Selecting an element (node or edge) in one view should highlight that element in all other views. Right now this is not really an issue as the number of views seen in parallel is quite low. But once additional views are needed, for instance to focus on data attributes in a Parallel Coordinates Plot or to visualize the cluster hierarchy in a dedicated view, or when one and the same graph is shown in parallel in two or more main views for comparing different analytic results, linking will be crucial for user experience. But these things are not too complex and should be easy to integrate in future versions of Gephi. Another aspect regards highlighting in the main view: instead of marking the selected node, all non-selected nodes faded out to focus on the selected node. This implies rather big visual changes because all but one nodes change their appearance when a single node gets selected and deselected.
The Developer’s Perspective
Now let me switch to the developer’s view. Gephi is open source software so that everybody can participate in improving the system or can adapt the system to personal or business needs. Gephi seems to be very well designed on the back-end. The project is based on the Netbeans platform and the Java language. It is subdivided into a number of modules that define several APIs and SPIs and that provide implementations of these interfaces. Thanks to the modular structure, Gephi can be extended quite easily. The best way to do so is to implement plugins. Plugins can be used, for instance, to add further layout or clustering algorithms, statistical computations, filter components, or export methods. The modular structure also allows for using only specific parts of the Gephi project in one’s own projects. The Gephi Toolkit is a good example. It is not an end-user desktop application, but a class library that provides all the functionality of Gephi to those who want to reuse Gephi’s functionality and data structures in different ways.
As I’ve mentioned in the user perspective, the way how Gephi deals with long running computations is a big plus. Given the fact that aspects of multi-threading are inherent in the system from the very beginning and are manifested at the systems core, I sincerely hope – no, I’m quite sure that Gephi will not run into all the problems that are likely to occur when multithreading is integrated into an existing single-threaded system, as I have experienced it myself. Also I conjecture that others will find it much easier to implement concurrent non-blocking extensions of the system simply by following the way how existing code handles things in Gephi.
As Gephi is split up into many different modules, it took me a while to get accustomed to the system and to learn which functionality can be found in which module. But I have to add that I had no prior experience in Netbeans platform development and the module concept that is used there. I also found that the code documentation could be improved in several parts of Gephi’s sources. On the other hand, the Gephi website provides informative wiki pages with various examples and tutorials.
My view from the developer’s perspective can be summarized as the following pros and cons:
The Scientist’s Perspective
As a scientist I’m not so much interested in developing fully-fledge end-user software, but in developing solutions to scientific questions and in publishing the results. A difficulty in interactive visualization is that usually one needs a broad basis of fundamental functionality to be able to develop such solutions. Previous attempts of establishing a common infrastructure for interactive data exploration made notable progress, but eventually did not fully succeed or are no longer actively maintained. This is due to the fact that a single researcher usually simply does not have the time to do decent research and at the same time to maintain a larger software project.
I personally feel that Gephi can become such a fundamental infrastructure. Maintained by an active community, the system allows researchers to focus on solutions in form of plugins, while they can utilize the functionality that the system provides. Visualization researchers will be happy if they can simply plug in new visualization techniques as additional views, test new layout algorithms, and experiment with new clustering methods. Moreover, new solutions can be easily disseminated to real users in the community. This might prove beneficial when it comes to acquiring early user feedback or when more formal user evaluation is needed prior to publishing new techniques and concepts.
A big issue in visualization research is visual analytics, that is, the combination of analytical, interactive, and visual means to facilitate making sense of large volumes of data. In terms of analytic means, a goal is to break analytic black boxes and make analysis algorithms interactively steerable. With the architecture of Gephi, where parameterizable algorithms run concurrently and provide feedback in form of intermediate results, I believe this goal can be reach in the future. A thing that I’m curious about is if it is also possible to come up with concepts that allow for plugging in new interaction techniques. As interaction is usually quite tightly bound to a view, I wonder if interaction could be implemented as independent plugins as well, and if novel interaction concepts will be supported in the future (e.g., touch interaction)? Furthermore, aspects of interactive collaboration of multiple users working to solve a common analysis problem could be of interest. A question related to the visual side is whether it is possible to use Gephi with different displays and display environments such as tabletop displays, display walls, smart phones, or multi-display environments?
A facet of graph visualization that I did not mention in the user’s perspective as I felt it more suited to be mentioned here is dealing with dynamically changing graphs. Visualization of time-varying graphs is a hot research topic and Gephi is about to face this challenge. There is preliminary support for exploring time-dependent graphs via a time slider. But there is more to this that just browsing in time. Concepts have to be integrated to support easy comparison of multiple snapshots of a graph and to highlight significant changes in the development of a graphs history.
Let me try to put my thoughts into a pros and cons list:
Since I’ve put hands on Gephi I’m infected. Maybe I’m dazzled by the beautiful demo video or the nice pictures that have been generated using Gephi, but in my opinion Gephi has the potential to become a big player in interactive visual graph exploration and analysis. From all perspectives that I’ve taken I see many positive things – and plenty of room for improvements or additional features. I do hope that the people behind Gephi will continue their work to the benefit of all users, developers, and researchers.
There are many other systems and frameworks out there that do a great job in interactive graph visualization or in supporting it as a toolkit. I would like to give credit to these systems, because they can be the source of many ideas and much inspiration:
- Tulip – A great visualization system
- Cytoscape – Sophisticated network visualization for bio-medicine
- GUESS – Interactive graph exploration at its best
- Pajek – Graph analysis at its best
- prefuse – The well known information visualization framework
- JUNG – A library for graph structures and algorithms
- JGraphT – Another library for graphs