Gephi boosts its performance with new “GraphStore” core

Gephi is a graph visualization and analysis platform – the entire tool revolves around the graph the user is manipulating. All modules (e.g. filter, ranking, layout etc.) touch the graph in some way or another and everything happens in real-time, reflected in the visualization. It’s therefore extremely important to rely on a robust and fast underlying graph structure. As explained in this article we decided in 2013 to rewrite the graph structure and started the GraphStore project. Today, this project is mostly complete and it’s time to look at some of the benefits GraphStore is bringing into Gephi (which its 0.9 release is approaching).

Performance is critical when analyzing graphs. A lot can be done to optimize how graphs are represented and accessed in the code but it remains a hard problem. The first versions of Gephi didn’t always shine in that area as the graphs were using a lot of memory and some operations such as filter were slow on large networks. A lot was learnt though and when the time came to start from scratch we knew what would move the needle. Compared to the previous implementation, GraphStore uses simpler data structures (e.g. more arrays, less maps) and cache-friendly collections to make common graph operations faster. Along the way, we relied on many micro-benchmarks to understand what was expensive and what was not. As often with Java, this can lead to surprises but it’s a necessary process to build a world-class graph library.

Benchmark

We wanted to compare Gephi 0.8.2 and Gephi 0.9 (development version) so we’ve  created a benchmark to test the most common graph operations. Here is what we found. The table below represents the relative improvement between the two versions. For instance, “2X” means that the operation is twice faster to complete. A benchmarking utility was used to guarantee the measurements precision and each scenario was performed at least 20 times, and up to 600 times in some cases. We used two different classic graphs, one small (1.5K nodes, 19K edges) and one medium (83K nodes, 68K edges) . Larger graphs may be evaluated in a future blog article.

Benchmark / Graph SMALL (n=1490, e=19025) MEDIUM (n=82670, e=67851)
Node Iteration 23.0x 34.6x
Edge Iteration 40.1x 109.4x
Node Lookup 1.6x 2.1x
Edge Lookup 1.2x 2.3x
Get Edge 1.1x 1.2x
Get Degree 2.5x 2.3x
Get Neighbors 3.4x 1.2x
Set Attributes 2.3x 0.1x
Get Attributes 3.3x 4.0x
Add Nodes 6.2x 5.7x
Add & Remove Nodes 1.4x 2.9x
Add Edges 7.7x 3.8x
Add & Remove Edges 3.3x 1.8x
Create View 2851.0x 4762.3x
Iterate Nodes In View 2.7x 1.5x
Iterate Edges In View 11.6x 7.3x
Save Project 2.4x 1.7x
Load Project 0.6x 0.6x
Project File Size 1.9x 1.5x

These benchmarks show pretty remarkable improvements in common operations, especially read ones such as node or edge iteration. For instance, in average it takes 40 to 100 times less CPU to read all the edges in the graph. Although this benchmark focus on low-level graph operations it will bring material improvements to user-level features such as filter or layout. The way GraphStore creates views is different from what we were doing before, and doesn’t require a deep graph copy anymore – explaining the large difference. Finally, only the set attribute is significantly slower but that can be explained by the introduction of inverted indices, which are updated when attributes are set.

And what about memory usage? Saving memory has been one of our obsession and there’s good news to report on that front as well. Below is a quick comparaison between Gephi 0.8.2 and Gephi 0.9 for the same medium graph above.

Benchmark Gephi 0.8.2 Gephi 0.9 Improvement
Simple graph 115MB 52MB 2.2X
Graph with 5 attribute columns 186MB 55MB 3.4X

This benchmark shows a clear reduction of memory usage in Gephi’s next version. How much? It’s hard to say as it really depends on the graph but the denser (i.e. more edges) and the more attributes, the more memory saved as significant improvements have been made in these areas. Dynamic graphs (i.e. graphs that have their topology or attributes change over time) will also see a big boost as we’ve redesigned this part from scratch.

What’s next?

All of the GraphStore project benefits are included in the upcoming 0.9 release and that’s the most important. However, the work doesn’t end and there’s many more features and performance optimization that can be added.

Then, we count on the community’s help to start collaborating with us on the GraphStore library – calling all database and performance experts. GraphStore will continue to live as an all-purpose Java graph library, released under the Apache 2.0 license and independent from Gephi (i.e. Gephi uses GraphStore but not the opposite). We hope to see it used in other projects in the near future.

graphstore-api

GraphStore API, represented as a graph

Improving the Gephi User Experience

This is an effort to rethink the design of Gephi authored by Donato Ricci, co-founder of Density Design and senior designer at the Sciences Po médialab in Paris, and me, creator of Gephi and an engineer at the same lab (note that I am not Mathieu Bastian, our lead developer and actual powerhouse of the project for the past 10 years).

While this text presents possible improvements and practical solutions, it does not address practical considerations of available labor. Also, be aware that this is not a formal roadmap for future releases but rather a way to open the current state of our reflections for brainstorming. So feel free to share your ideas and comments with us.

There are five main categories that structure the improvements that we currently envision:

  • Design strategy: Ensure that a coherent design philosophy is applied across the entirety of the project
  • User interface: Identify and correct user-facing errors
  • Network-focus: Re-focus design and architecture around the network’s position of primary importance
  • Filling in the Gaps: Providing expected functionality
  • Miscellany: Other minor issues

In addition, we have drafted a UI mockup illustrating some of our propositions.

Design strategy

Gephi was built by engineers without a comprehensive design strategy. This situation is fairly common: engineers approach design in an ad-hoc fashion learning by trial and errors and through casual user observation but without a formal user-testing protocol. Should the tool succeed, it is mostly because the utility triumphs over the pain of use. Gephi is an embodiment of this phenomenon in its current state. Some computer scientists may find it simple, partly because using terrible interfaces is a part of their job, but for many users Gephi is confusing. Geeks of a masochistic tendency may love the tool as a result of digital Stockholm Syndrome, but the bulk of users that could benefit from Gephi find it to be confusing and opaque. In our defense, developing desktop applications is heavily constrained and the Java technology was not helping us to overcome this difficulty. What could a designer do to alleviate this situation? Apply a strategy.

A designer does more than treating the symptoms of poor usability; he or she approaches user experience from its fundamentals. Improving Gephi requires rethinking some of its longest standing features from a new standpoint. A design strategy is the solid foundation upon which we build both a satisfying user experience and underlying software architecture.

Our design strategy fits in five basic points: obtaining substantial and organized user feedback, giving Gephi a clear workflow, implementing a facet-oriented interface layout, reordering panels from the user standpoint, and removing unnecessary features. Each point is explained in further detail along with practical guidelines for implementing potential solutions in the future.

User feedback

We cannot build a sustainable user interface without a quantitative measure of user activity. These data are necessary to support and validate design choices. One approach to obtain this information is to log and collect feedback about interface usage.

An optional logger could be implemented in Gephi to allow users to opt-in to the collection of logs in order to improve the software. Data harvesting can be done as a campaign: for example, we may ask some users to activate it for one month to evaluate the usage of a new interface paradigm that we are testing.

A clear workflow

Users need a clear and visible path to start with Gephi, in particular when opening a new file. We need to remove information to allow users focusing on what is important.

Gephi involves not only the software itself, but also the installer, website and documentation. Our ultimate goal is to make the entry process as simple as possible by coordinating these different elements. We begin by focusing here on just the software itself. We propose to consider that there are only two proper ways to enter in Gephi:

  • Opening a file (constructing a network from a pre-generated file)
  • Connecting to a data source (embedded scraper or API connection)

We also need to clarify the roles of the “open” and “import” functions. We have to clarify that:

  • If the user has a file and needs to see it in Gephi, then “Open” is the right answer
  • If the user has an external data source he or she wishes to connect to, there is distinct menu option for this function

Use case: we have observed that some users try to get in Gephi with a table of nodes and a table of links, and do not succeed in finding the right path. The problem there is that it is not explicit that it is necessary to create an empty document, go to the data table, and then import the tables. Since the pattern is “I have files and I want to see them in Gephi”, then the answer should be under the “Open” menu item.

Facet-oriented interface layout

Rethinking overall design has the virtue of allowing for the reorganization of the interface from a user-centric perspective. The current interface relies on the panels system provided by the Netbeans Platform, which provides some beneficial properties for design. We were inspired by Ben Shneiderman’s motto of “overview first, zoom and filter, then details-on-demand” and it has been quite successful. However the different views are not articulated in a coherent way and the features sometimes struggle to find the right place in terms of visibility.

We propose three simple guidelines for a better organization of the panels:

  • The global hierarchy of containers should reflect the generality of the features
  • Some panels are not facet-dependent: they should not change with the facet
  • The network should occupy a single place whatever its facet, since it is always the same object

Panels guidelines

These guidelines have two consequences. First: facet-dependent panels should be contained inside facet-independent panels; which is to say that there is a single container for all facet-dependent features. Second: the facet selector (currently the three tabs on top) has to be inside this container.

We illustrate this with a comparison between a representation of the current layout and a new simplified structure.

Web

Reordering panels from the user standpoint

A part of our design strategy is to reduce visual clutter by grouping panels that are not used simultaneously. Though it is not intuitive, we prioritize separating panels that work well together over grouping similar features. For instance the following panels should be placed in different groups in order to be used combined:

  • Filters + Layout
  • Filters + Partition or Ranking
  • Timeline + almost anything else

With the current panels there are at least three obvious groups: one with filters, another with layout, and the third with the timeline. Generic contextual information is a fourth possible group, but could be placed in a non-intrusive location like the footer. Some panels, like statistics, could theoretically be at home in any group or even as a separate window that could be invoked from a menu.

Collapsing panels concept

Panels and groups of panels create two levels, so the “window” menu should have two levels too.

Removing unnecessary features

Reducing complexity can also be accomplished by removing features. We have detected at least one clear candidate for removal, but we may find more unnecessary complications to remove.

The “preview” panel of Gephi has been increasingly simplified over iterative updates. The goal of this feature is to provide a quick way to export cartographies. Users with competence in design tend to rely on third-party tools that provide finer-grained control over the visualization, like Illustrator and Scriptographer. Thus, the focus in Gephi is to provide a quick way to export images that can be manipulated in other tools.

We propose to further streamline preview functionality by removing some advanced label features: they infrequently used, complicated, and at times internally inconsistent with other preview settings. Furthermore, it is not necessary to facilitate changes to features like label and node size and color when such adjustments can be made much more easily using other tools.

User Interface

Donato Ricci has identified various flaws in the Gephi UI. Fixing them is a priority for the future.

User-centric features: reordering workflow

Users think in terms of results they want to obtain. They have an action in mind and they search how to do it. By displaying features according to their result, we can both improve user orientation and reduce the tool’s learning curve. A few examples of follow.

We propose to aggregate Partition and Ranking under the more accessible term “Appearance”, and to reverse the order of what is asked to the user. The current interface is organized in the following way: if the user has metadata that can rank the nodes then the user can visualize it using different attributes like color and size. The new interface inverts this approach: if the user wants to color nodes then he or she chooses which metadata to use. The panel may progress like a wizard to reduce cognitive load by drawing attention only to information that is necessary for a given step.

Unified panel appearance

Collapsing advanced layout settings

The current design of Gephi does not respect the general principle of drawing attention to information that is commonly used while obscuring information that is infrequently used.

Tools of the Overview: many problems to fix

The small tool buttons on the left side of the overview panel have a number of problems including:

  • Confusing icons that do not easily communicate the use of tools
  • Indistinct icons that do not sufficiently distinguish between different tools
  • Missing tools that are commonly expected

These issues are compounded by evidence that the tools themselves do not provide sufficient utility for common use.

We propose to alleviate some of these shortcomings by putting most of these tools in a collapsed panel and to have a normal panel dedicated to the settings of each tool. We also propose to implement a default tool cursor that draws on common mouse usage paradigms to provide intuitive functionality to users:

  • A click-drag starting on the background (or an edge) of the view makes a rectangle selection
  • A click-drag starting on a node moves this node
  • A click on a node selects the node, the shift key is used for multiple selection
  • A click on the background deselects
  • A set of meta-keys changes the click function, for instance the spacebar to switch to the view-panning tool (i.e. the hand in Photoshop)
  • The secondary-click works the same

Fixing highlight colors

Highlighting works by tweaking colors so that some nodes get more contrast than others. The contrast should depend on the background color, but this is not current implementation. At the very leastthe following should be done:

  • White background: highlighted nodes darker and other nodes lighter
  • Black background: highlighted nodes lighter and other nodes darker

Network-focus

As a network analysis tool, the network itself plays an obvious central role in Gephi. We have explored different ways to incorporate the network into the software’s presentation and have developed some suggestions for modifications that would increase interface coherency.

A different layout for the panels: network as background

In this approach, the network is contained by a “background sheet” and floating panels support functions. Such a philosophy has been successfully implemented in other systems like Photoshop or Google Maps. Using the root panel for the network, like grouping facet-dependent panels, fosters the feeling that we always deal with the same object, the network. This operates on the metaphor that we are always manipulating a primary canvas that consists of the network to be analyzed.

Network as background

Statistics as an invoked panel or window

Statistics tend to be used on demand, and thus do not need to be displayed permanently. Rather, a discrete menu or button could invoke the statistics panel when needed. Removing this visual information leaves more room to focus on what is important, i.e. the network.

Workspaces: more visible, on top

The workspaces need more attention. We propose to show them as tabs on the top of Gephi. It is more natural to have the workspaces above the facet selector in the hierarchy of panels. This is consistent with the prevalence of the “tab” paradigm in the browser space.

Workspaces-on-top

Filling in the Gaps: Providing Expected Functionality

In addition to the different aspects listed above, users need some well-known common features such as an “undo” function, even if they are complex to implement.

History and undo: feasible if limited to network structure

A visible trace of previous steps, like a proverbial breadcrumb trail, provides users with a sense of orientation and confidence when exploring and manipulating data in a speculative fashion. This also accelerates the learning process by alleviating cognitive load by not forcing users to have to remember a series of unfamiliar steps. This works in tandem with an “undo” feature, which facilitates experimentation without fear of permanently corrupting data.

History and Undo are complex to implement and burden the development of plugins and modules as these functions tend to be deeply embedded in a piece of software’s architecture. This partly explains why they are not currently available in Gephi. However a prudent approach in Gephi would be to focus recording and reversal of changes to the structure of the network: Nodes and edges, their attributes (including color, size and coordinates), but not the state of panels such as filters, statistics…

An initial approach would be to cover only a minimal set of modifications of the network structure. The history would then contain information about the type of the modification, but not its exact content nor the way it was done (manually, filter, data table…). For instance:

  • Modifying attribute X for node N / n nodes / all nodes
  • Modifying color / size / position for node N / n nodes / all nodes
  • Adding / removing: node N / n nodes / all nodes
  • Adding / removing: node attribute X / n attributes / all attributes
  • …and the same for links

The history would not include operations such as exporting files, taking screenshots, modification of views, changes to settings, or other changes that did not directly affect the structure of the network

Protecting irreversible operations

Some operations are irreversible: removing nodes, edges and attributes (and possibly more). Because these operations are definitive and may cause the loss of a certain quantity of work, they should be protected. A classical solution is to ask confirmation for any definitive operation. This is a simple guideline but the result is quite user hostile. We propose a better solution, as implemented in Photoshop: when an irreversible operation has been done, when the user tries to save the network the “Save as…” window appears instead and proposes a different name (with a suffix number or “Copy of X”).

Miscellany

A few additional points deserve to be listed, and are done so in no particular order.

Manual versioning

A basic versioning feature would be appreciated: just the opportunity to save with incrementing / adding a number suffix:

  • “My Network.gexf” is saved as “My Network 01.gexf”
  • “My Network 11.gexf” is saved as “My Network 12.gexf”

A common shortcut for this is Ctrl+Alt+S.

Generalizing zoom options (more internal consistency)

We can currently set how the zoom impacts text labels. The same feature would be useful for edges, for instance to keep 1 pixel lines whatever the zoom, as well as for nodes, for instance to keep small points whatever the zoom.

Generalized zoom options

Size nodes according to area

Our eyes perceive areas. Setting a ranking to the diameter of nodes is less intuitive than to apply it to the area of nodes. We propose to offer an option to customize ranking by either diameter or area, but set the default to area.

Removing unnecessary settings about labels

Node labels and edge labels should help the user identifying nodes. However, using the color or size of labels to visualize attributes is confusing. Gephi presently contains settings to manipulate labels in this way, these settings should be removed and replaced with a simpler interface.

UI Mockup sample (work in progress)

We present here a possible approach to integrate some of the different suggestions made in this document. Consider the following image as a way to help imagine the future of the Gephi user experience.

MockupA-01

As we stated earlier, the purpose of this document is to open up the floor to brainstorming ideas about improving the Gephi UX. Please share your ideas in the comments!

PS: Thanks to Niranjan Sivakumar for his excellent proof-reading 🙂

Rebuilding Gephi’s core for the 0.9 version

This is the first article about the future Gephi 0.9 version. Our objective is to prepare the ground for a future 1.0 release and focus on solving some of the most difficult problems. It all starts with the core of Gephi and we’re giving today a preview of the upcoming changes in that area. In fact, we’re rewriting the core modules from scratch to improve performance, stability and add new features. The core modules represent and store the graph and attributes in memory so it’s available to the rest of the application. Rewriting Gephi’s core is like replacing the engine of a truck and involves adapting a lot of interconnected pieces. Gephi’s current graph structure engine was designed in 2009 and didn’t change much in multiple releases. Although it’s working, it doesn’t have the level of quality we want for Gephi 1.0 and needs to be overhauled. The aim is to complete the new implementation and integrate it in the 0.9 version.

In November 2012, we started to develop a completely new in-memory graph structure implementation for Gephi based on what we’ve learnt over the years and our desire to design a solution that will last. The project code-name is GraphStore and we focus on four main things:

  • Performance: The graph structure is so important to the rest of the application that is has to be fast and memory efficient.
  • Stability: The new code will be the most heavily unit-tested in the history of Gephi.
  • Simplicity: The Graph API should be documented and easy to use for developers.
  • Openness: If possible, we want GraphStore to be used in other projects and keep the code free of Gephi-specific concepts.

Gephi is known to use a large amount of memory, especially for very large networks. We want to challenge ourselves and tackle this issue by redesigning the way graphs are encoded and stored. Besides memory usage, we carefully analyzed possible solutions to improve read/write performance and optimize the throughput. Stability and simplicity are like food and shelter, and whatever we try to do at Gephi should be simple to use and stable. As we’re going towards a 1.0 version, we’re putting more and more efforts to testing and code quality.

Since November 2012, we have been working on GraphStore separately from Gephi’s codebase and will start the integration fairly soon. The Graph API is very similar to the existing API. However, it isn’t entirely compatible and several core things changed like attributes, views or dynamic networks and will require a lot of work in some modules. On the other hand, because the GraphStore code is decoupled, it could be leveraged in other projects. For instance, it could serve as a Blueprints implementation as an alternative to TinkerGraph.

Graph structure

A graph (also called network) is a pair of a set of nodes and a set of edges. Edges can be undirected, or directed if the direction of the relation matters. Edges may also have weights to represent a value attached to the edges, like the strength of a connection or the flow capacity. Edges may also point to the same node (i.e. self-loops). Gephi currently supports these features, but they are not sufficient to describe the variety of problems graphs can be helpful with. Multigraphs permit several relationships between nodes and is for instance commonly used to represent RDF graphs. Multigraphs with properties (i.e. ability to attach any property to nodes and edges) have recently become the standard representation for graph databases.

The next version of Gephi will support multigraphs and therefore allow multiple edges between nodes to be imported. The rollout will be done in two phases. The first phase is to allow this new type of graph to be imported, filtered and exported. We will update the importers and add new options to support these graphs. The second phase is to update the visualization and the way multiple edges between nodes look like.

Hierarchical graphs

Since the 0.7 version released in 2009, Gephi has supported hierarchical graphs. Hierarchical graphs let the user group or ungroup nodes so it forms a tree. Nodes which contain other nodes are named meta-nodes and edges are collapsed into meta-edges. Groups obtained from clustering algorithm (e.g. modularity) could also easily be collapsed into meta-nodes in order to study the network at a higher level. We initially recognized the potential of this idea for network analysis and developed a hierarchy-enabled data structure. However, we realized we didn’t completely fulfill the vision by not providing all the tools to fully explore and manage hierarchical networks. Although the data structure allows it, the software still lacks many features to really make hierarchical networks explorable.

Recently, we are more focused on networks over time and plan to continue to do so. In the past years, users have shown steady and continuous interest in dynamic networks and we haven’t really seen a strong interest in hierarchical networks. Therefore, we propose to remove this feature from next releases. On the developer side, cutting this feature will greatly simplify the code and improve performance.

Dynamic networks

Networks that change over time are some of the most interesting to visualize and analyze. We have heavily invested in supporting this type of network, for instance by developing the Timeline component. However, dynamic graph support was added after the current graph structure implementation was conceived and therefore remains suboptimal and difficult to scale. Now that we have enough hindsight, we can rethink how this should be done and make it simpler.

One pain point is the way we decided to represent the time. Essentially, there are two ways to represent time for a particular node in a graph: timestamps or intervals. Timestamps are a list of points where the particular nodes exist and intervals have a beginning and an end. For multiple reasons, we thought intervals would be easier to manipulate and more efficient than a (possibly very large) set of timestamps. By talking to our users, we found that intervals are rarely used in real-world data. On the code side, we also found that it makes things much more complex and not that efficient at the end.

In future versions, we’ll remove support for intervals and add timestamps instead. We considered supporting both intervals and timestamps but decided that it would add too much complexity and confusion.

Graph structure internals

Graph structures design is an interesting problem to solve. The objective is quite simple, yet challenging: how to best represent an interconnected graph so it’s fast to query and compact in space? Also, how to keep it simple and serve a large number of features at the same time?

Graph storage

Our goal is to develop a thread-safe, in-memory graph structure implementation in Java suitable for real-time analysis. You may ask how this differs from a graph database or a distributed graph analysis package. In a few words, one can say the requirements are quite different.

Graph databases like Neo4j, OrientDB or Titan store the graph on local disk or in a cluster and are optimized for large graphs and large number of concurrent users. Typically, the networks are much larger than what can fit in memory and these databases mostly focus on answering traversal queries. In the environment where graph databases operate most of the needs can be converted in some sort of traversal query (e.g. friends of X, tweets of Y). Traversal queries are also the reason why graph databases scale to billions of nodes. Indeed, for each traversal, only a subset of the graph is accessed. This is quite different from Gephi, which by its nature of being an analysis software needs to access the complete graph. For instance, when a layout is running Gephi needs to read the X,Y position of each node as quickly as possible. Although reading from the disk can be very quick as well (e.g. GraphChi), it’s limited to sequential access and things become more complex that way.

Because of the real-time requirements, we want to keep our graph data in memory accessible at all time. However, we want to make it easy to connect to external data sources, and graph databases in particular.

Reducing overhead

In computer science, overhead is any combination of excess or indirect computation time, memory, bandwidth, or other resources that are required to attain a particular goal.

GraphStore heavily relies on Java primitives, arrays and efficient collections library like fastutil. We are reducing overhead by simply avoiding using too many Java objects, which are very costly. Instead of using maps, trees or lists, Nodes and Edges are stored in large arrays which can be dynamically resized in blocks. For instance, iterating over the graph should be extremely fast because the CPU caches array blocks. This may sounds obvious but performance optimizations are tricky in Java because of the JVM and the uncertainty of what makes a difference and what doesn’t. In his “Effective Java” book, Joshua Bloch writes “Don’t guess, measure” and that’s still true today. For our project, we rely on well-defined micro-benchmarks to see where the bottlenecks are and how to make our data-structure more cache-friendly and more compact in memory. When the graph contains millions of edges, every byte saved per edge can make a large difference at the end.

In terms of speed, we focused on optimizing the most common operations, which are iterating over all the elements and consult nodes’ neighbors. Typically, a layout algorithm needs to read the neighbors of every node at each iteration. Neighbors can’t simply be an unsorted list because of the removal complexity: to remove a node, you need to know where it is. The current Gephi graph structure uses a binary tree to store the node’s neighbors. Although the complexity is logarithmic, every node in the tree takes extra memory and logarithmic complexity is still suboptimal. After isolating the problem in a benchmark, we found that using a double linked-list is the best solution for our requirements and achieves a O(1) complexity, as it fulfills both a quick iteration and quick update. Here is a snapshot of the solution:

Every edge has 4 integer pointers to the next in/out predecessor and successors and a separate dictionary would help to find the right edge based on the source and destination pointers. Each node has a pointer to the first edge in the linked list (i.e the head). Node ids are integers (32 bits) so one can easily create a long->Edge dictionary by encoding the source and destination node into a single long number (64 bits). The diagram intentionally leaves out the multigraph support for simplicity. In reality, nodes can have multiple head pointers, one for each edge type. Each edge type is represented by a integer index.

Views

Views are one of the most useful aspects of Gephi’s graph structure and are mainly used behind the scenes in the Filter module. A view is a graph subset (i.e. a subgraph) which remains connected to the main structure, so if a node is removed from the graph, it’s removed from the views as well. For instance, when users create a ‘Degree Filter’, Gephi creates a view and removes all the nodes which don’t fulfill the degree threshold. Multiple views can co-exist at any time in the graph structure. In the current graph structure, a node tree complete copy is done for every view and we found that this can be very inefficient.

In the new version, the way views are implemented is very different and should yield to better performance. Instead of doing a copy of the nodes, we maintain bit-vectors for nodes and edges. Because these elements are stored in large arrays with a unique identifier, it’s easy to create and maintain a bit-vector. When developers obtain the ‘Graph’ object for a particular view, the bit-vectors are used behind the scenes to adapt iterators and accessors. This solution should make filtering for large graphs much quicker. One drawback is that whereas the current implementation copies and then trims the view, GraphStore work with bit-vectors but continues to access the complete graph. In other words, if the view represents only 1% of the original graph, it still needs to iterate over the 100% to find which elements are the 1%. Even though this sounds bad, our benchmarks show it’s a very fast operation and we win overall because of the reduced overhead of duplicating the graph. Moreover, we can introduce some caching later to optimize this further.

Inverted Index

When you’re using the Partition module in Gephi, you’re manipulating some sort of inverted index. Nodes and edges have properties like ‘gender’, ‘age’ or ‘country’, and these properties are contained within the nodes and edges objects. An index is a simple data structure which allows to retrieve the list of elements for a particular value. For instance, the partition module needs to know what is the number of ‘male’ or ‘female’ nodes for the ‘gender’ column. When the column is a number like ‘age’, it also needs to know what is the maximum and minimum value. Unlike the Ranking module and its auto-apply feature, the Partition module is not refreshed in real-time and therefore difficult to use when the graph is changing a lot. We have decided to invest in this feature for the future release and are building a real column inverted index in the graph structure. The index will simply keep track of which values exist for each column and which elements are holding this value. The index will be updated in real-time as elements are added, removed or updated.

The ability to quickly retrieve elements and counts based on specific values will be very useful in many different modules like Filters, Partition or Data Laboratory. New APIs will be added for developers to use the newly created index interface. As we’re working on attributes storage and manipulation, we’ll also merge the Attributes and Graph API because they are so interconnected that it doesn’t really make sense to have them separate. The interfaces that developers are familiar with like Table or Columns will remain the same.

Events

In software programming, events are a common way to inform other modules that something changed. In Gephi, we also use events to convey graph updates events to inform other modules about updated nodes or edges. In the new GraphStore, we’ll stop using events to transport graph modifications because of the large overhead due to the creation of event objects. Indeed, when 10K nodes are added to the graph, the existing structure literally creates 10K event objects and puts them in a queue. Although the event queue is compressing objects of the same type, the overhead to create, queue, send and destroy large amount of small Java objects is too large.

Instead of a push model (i.e. the emitter is pushing updates), we want to rather promote a pull model (i.e. the listener pull updates from time to time) for future releases. A similar system is already in place to link the graph and the visualization module and it has been working without a glitch. We’ll develop the tools to easily calculate graph differences between a listener module and the graph structure. By removing the bottleneck, write performance should greatly improve.

Timestamps

As said earlier, we’ll add timestamps support to represent dynamic networks. Instead of using a time interval, a timestamp array will be associated with nodes and edges. For element (node/edge) visibility, each timestamp represents the presence of the element at that time. For example if a network snapshot is collected every month for a year, each node will have up to 12 different timestamps. The timestamp itself is a real number and can therefore represents an epoch time but also any other value in a different context. For a dynamic attribute, the time+value is simply represented as a list of (time, value) pairs.

To support the timeline and dynamic networks algorithm, we’re developing an inverted index for timestamps so we can make time filtering very quick. One good thing about intervals is that it’s very easy to know if two intervals overlaps with each other. With a flat list of timestamps, one can’t avoid to go through the entire list. The index will essentially map timestamps to the nodes and edges elements in the graph and therefore solves this issue. The Interval tree implementation which we are currently using to store intervals is based on a binary tree and is very costly in memory because of all the Java objects overhead. Using simple arrays should reduce overhead and improve performance for large dynamic networks. When computing a dynamic network algorithm (ex: Clustering coefficient over time), we’re using a sliding window over the graph so the ability to quickly filter is critical as it impacts how fast the graph refreshes.

Saving/Loading

Saving and loading the graph structure into into/from a file (or a stream) is another critical feature. When a user saves a project in Gephi, the graph data structure is serialized in XML and compressed into a .gephi file. If you worked with project files in Gephi, you may have experienced corrupted files issues or errors when loading a file. We’ve done our best to fix these problems but some still remain. We’re rethinking how this should be done in GraphStore and are making a call to rewrite the code from scratch. Our approach will rely on a lot of unit tests to make sure the code is stable so we don’t repeat the same issues in future versions. Please note that this concerns the .gephi files only and existing importers (e.g. GEXF, GraphML) will remain the same.

Concerning the GraphStore serialization, we’re abandoning XML in favor of pure byte arrays. That should yield to better performance and reduced project file size. We’ll create a custom reader for previous Gephi versions so you can still open your existing projects. Other modules like Filters or Preview will continue to use XML as it’s working just fine.

Next steps

This is the first post about the Gephi 0.9 version and more will come soon. We’re excited about the current developments and hope to hear from you. Please join the gephi-dev mailing list to learn more about ongoing projects and contribute. We need your ideas!

Follow us on Twitter!

GSoC: Legend module

My Name is Eduardo Gonzalo Espinoza Carreon and during this summer I developed the new Legend Module for Gephi, with the mentoring of Eduardo Ramos and Sébastien Heymann. This article will give you an overview of the work done.

Problem statement

Currently Gephi offers the possibility of visualizing graphs, but what about legends? Legends provide basic and extra information related to the graph and they are useful when interpreting any kind of network map. If a person is not familiar with the content of a graph, missing or wrong legends could lead to misleading interpretations and sometimes wrong decisions. When a visualization is used by multiple people for discussing, analyzing or communicating data, legends are of great importance.

For instance, the following graph represents the coappearance of characters in the novel Les Miserables. After performing a visual analysis we could only conclude that the graph has 9 groups. This is probably a little of the information the creator wanted to transmit. The graph has no information related to the number of nodes explored, or what the groups represent and how many elements each group has, etc.

A current workaround to solve this problem is to export the graph as an image, and then manually add the legends using Inkscape, Adobe Illustrator or another graphics editor. However this task is time-consuming and can be automated. The new Legend Module proposes a solution to this problem.

Solution

We propose an extension to the Preview module for generating legend items. The following legend items are available: Table, Text, Image, Groups and Description. They can be added using the Legend Manager, which is shown in a new tab under the Preview Settings:

After selecting a type of legend, the user chooses a sub-type builder, e.g. “Table” > “Partition interaction table”, or “Top 10 nodes with greatest degree”, as shown in the following figure:

When a new Legend item is added, it is displayed in the list of active legend items, where the user can edit its properties. The user can also edit its label and assign a user-friendly name to remember the content of the legend easily.

Every item has a set of common properties: label, position, width, height, background color and border, title, description; and also each type of item has its own properties and data. The values of those properties are editable through a Property Editor like the one used in the preview settings.

Some properties like scale and translation can be modified using the mouse like most of the graphic design applications. All legend items are designed with a smart way of autoresize. It’s not the common scale feature, e.g. if the text included in the Text Item is bigger than the size assigned, then the Text Renderer overrides the text font defined by the user and decreases the font size until the text is able to fit in the specified dimensions. The results of this feature are shown in the next figure:

Workflow

The legend builder retrieves the graph data (partitions, node labels, edge labels, etc) and creates a new Legend item for each of them. Then a legend renderer makes use of these information, plus the properties set by the user, to render the Legend item to the specified target: PNG, PDF or SVG.

For developers

The renderers can be extended. For instance, the default Group Renderer is:

Using external libraries like JFreeChart, we can extend it to create a Pie Chart Renderer like as follows:

Other types of items can be created by combining other available Legend Items or by extending Legend Item, Legend Item Builder and Legend Item Renderer.

The Legend Module also provides a save/load feature. So you can save your legends for future editing.

Limitations

Currently there are some limitations like selecting a specific renderer for each type of item, and also exporting legends to SVG format is not done automatically like PNG and PDF, e.g. Exporting an Image (they will be embedded in the SVG file).

Conclusions

I would like to thank Eduardo Ramos and Sébastien Heymann for their support and feedback, which was critical during the development of this new module. The Legend module will be available as core feature in next Gephi release.

This GSoC was a great opportunity to learn and it also represents my first important contribution to the open-source community.

GSoC: Force Directed Edge Bundling

My name is Taras Klaskovsky and during this Summer Of Code I have implemented the Force Directed Edge Bundling algorithm.

Force Directed Edge Bundling (FDEB) is an edge layout algorithm. Gephi already has node layouts, which are placing the nodes (usually using force-directed algorithms). FDEB helps to further improve graph visualization by changing shapes and colors of the edges to merge them in bundles, analogous to the way electrical wires and network cables are merged into bundles along their joint paths and fanned out again at the end. This reduces visual clutter, inevitable in large graphs, allowing to find high-level edge patterns and get an overview of the graph just by looking at it. As example, US flights graph below, with nodes as airports and edges as flights.

The algorithm

Edges are modeled as flexible springs that can attract each other while node positions remain fixed. A force-directed technique is used to calculate the bundling and it is relatively slow. On small graphs it works pretty fast due to optimizations, but consumes large amount of memory to store precomputed similar pairs of edges; for average and large graphs a special slow low-memory mode implemented. After every iteration preview is being refreshed, so it’s possible to observe formation of bundles in real-time.

Full algorithm description can be found on this research paper.

original
mode-1

mode-2
mode-3

Renderer Modes

FDEB has 3 renderer modes:

  • Simple renderer: Draws all edges with the same color and transparency (color, transparency and thickness are set in the preview settings in the bottom-left panel), bundles are emphasized by combined transparency.
  • Gradient renderer: Draws all edges with color from gradient slider. Edges that are similar to higher number of other edges get higher color.
  • Gradient slow renderer: Uses more precise method to determine intensity of edge (needs precalculate points checkbox and to be re-runned) to set personal color for every segment.

FDEB can be customized with lots of options and they all have descriptions, some of the most influential on result are:

  • Use inverse-qudratic model: Makes more localized bundling
  • Compatibility threshold: Ignore pairs of edges that are not similar enough, which makes FDEB faster (ignored in low-memory mode) and bundling become more localized.

To make FDEB faster it’s also possible to decrease number of cycles/subdivision points increase rate.

gui

The Edge Layout API

An edge layout API has been created to simplify the integration of other edge layouts into Gephi. This API is very similar to the existing node layout API, with the following additions. Since edge layouts do not only change shapes of edges, but are also responsible for their visualization, a modifyAlgo() is called each time when the Preview is refreshed, to control the modification of parameters. The edge layout data, which is accessible for each edge, provides polyline points and it’s colors for all renderer modes.

How to get the FDEB algorithm

It will be available in the next release of Gephi. The current source code is available on Github.

GSoC mid-term: a new Timeline to explore time-varying networks

daniel

My name is Daniel Bernardes and during this Google Summer of Code I am working on the new Timeline interface.

Dynamic graphs have been the subject of increasing interest, given their potential as a theoretical model and their promising applications. Following this trend, Gephi has incorporated tools to study dynamic networks. From a visualization perspective, a critical tool is the Timeline component, which allows users to select pertinent time intervals and display and explore the corresponding graph. The challenge concerning the timeline was twofold: redesign the component to improve user experience and add extra features and introduce an animation scheme with the possibility to export the resulting video.

Together with my mentors Cezary Bartosiak and Sébastien Heymann, we have proposed a new design for the timeline component featuring a sparkline chart in the background of the interval selection drawer (which is semi transparent): this feature will help the user to focus on particular moments of the evolution of the dynamic graph, like bursts of connections or changes in graph density or other simple graph metrics. Current metrics are the evolution of the number of nodes, the number of edges and the graph density. The sparkline chart was preferred to other chart solutions because it does not add too much visual pollution to the component and adds to the qualitative analysis. The interaction with the drawer remains globally the same of the old timeline, to guarantee a smooth transition for the user.

To implement this feature we have used the chart library JFreeChart (a library already incorporated to Gephi), customizing their XYPlot into a Sparkline chart by modifying their visual attributes. To display the Sparkine, one needs to measure the properties of the graph in several time instants of the global time frame where the dynamic graph exists. This represented a major challenge, since the original architecture did not allow the timeline component to access (and measure) the graph in particular instants of time; the solution was to introduce a slight modification to the DynamicGraph API to provide an object which gave us snapshots of the graph at given instants. Other challenges we dealt with included the automatic selection/switching of real number/time units in the timeline (depending on the nature of the graph in question) and sampling granularity of the timeline.

Another breakthrough of this project was the introduction of the timeline animation. Once the user has selected a time frame with the drawer it can make it slide as the corresponding graph is being displayed on the screen. Besides the technical aspects of interaction between the timeline and the animation controller, there were also an effort to calibrate the animation (ie, in terms of speed and frames) so it would be comfortable and meaningful for the user.

As far as the UI is concerned, the component has gained a new “Reset” button next to the play button which activates the timeline drawer and displays the chart. It also serves to reset the drawer selection to the full interval when the timeline is active. The play button gained its original function, that is, to control the animation of the timeline — instead of activating the selection.

Finally, the animation export to a video format revealed to be more tricky than expected and couldn’t be finished as planned. There were several setbacks to this feature, beginning with the selection of a convenient library to write de movie container: it turns out that the de facto options available are not fully Java-based and need an encoder working in the background. The best alternative I found was Xuggler, which is based on ffmpeg. Also, obtaining screen captures of the graph to were a little bit tricky so I have exported SVG images from the graph corresponding to each frame, converted them to jpeg and than encoded them though Xuggler to a video format. As one might expect, this solution is not very efficient in terms of time, so Mathieu Bastien and my mentors suggested me to wait for the new features from the new Visualization API that would make this process simpler.

In addition to current bugfixes and minor improvements concerning the timeline and the animation, the movie export remains the the next big step to close this project. If you have questions or suggestion, please do not hesitate! The new timeline will be available in the next release of Gephi.

DB

GSoC mid-term: new Preview API

My name is Yudi Xue and during this Google Summer of Code am glad to work on the Core evolution of Gephi.

Current API in the Preview module provides too many granular methods and classes. Developers are clueless about how they may extend the component. In this project, we do not seek to expand what the Preview module already have to offer. Rather, we focus on making the Preview module easy to learn, easy to use and easy to extend to the Gephi developers. The new API will allow developers to focus on particular parts of the module. They may specify a new visual algorithms just by implementing a new type of Renderer, such as edge bundling and convex hull. They may also extend the RenderTarget to allow display or export visualization to different platform.

The user story

We took the infovis reference model into consideration when we started designing the new infrastructure. The infrastructure aims to provide support to a visualization-preview workflow:

raw data -> the data builder -> renderers -> render targets.

In particular, the raw data is the graph associated with the current gephi workspace. The data builder (DataBuilder) will interpret information associated with the nodes and edges and generate Item objects for Preview use. The Item objects are immutable objects that are either node item (NodeItem), edge item (EdgeItem) or item group (GroupItem) specified from the graph workspace or data lab. We append “Item” to refer that they are data rather than display objects. After the data has been imported, the preview controller (PreviewController) will associate each type of entity items with Renderer objects. Renderer objects are functional procedures that describe how an item should be drew. While we give information to an Renderer object what it is going to draw, we also tell it what RenderTarget it will use. By default, we provide ProcessingRenderTarget, PDFRenderTarget and SVGRenderTarget. All RenderTarget objects contribute to the RenderTarget API, which provide granular drawing functions that can be used by developers to form advanced visual algorithm. In addition to the workflow, we will provide a flexible properties structure to the Preview module so it may be used to provide listener to user interface commands. The property will allow dynamic dependency where grouped properties can listen for a single parent property.

The code below demonstrates how a Renderer to a particular Item type could be updated at runtime.

Code sample:

PreviewController prc = (PreviewController)Lookup.getDefault().lookupItem(
                                                       PreviewController.DEFAULT_IMPL).getInstance();
prc.loadGraph();
// Load graph from workspace
prc.updateRenderer(NodeItem.class, new Renderer() {
    // How I want to draw a node, edge, or item types.
    // Specify your procedural visualization algorithm here
    @Override
    public void render(Item item, RenderTarget rt) {
        NodeItem ni = (NodeItem) item;
        rt.drawImage(..);
        rt.drawline(..);
    }
    // The RenderTarget will pick up the properties and draw the rest..
});

The big picture

Speaking of API flexibility, the Preview API goes from constrained to flexible in the direction from DataBuilder to RenderTarget. Here is the big picture:

Current progress

  • done:
    • a working copy based on the new architecture
    • added ProcessingRengerTarget
    • added GroupRenderer (Convex hull)
    • added ImageRenderer
    • basic unit testing
    • basic functional testing against updating Renderer in Preview API at runtime
  • in progress:
    • Property support
    • Selfloop, curved edge drawing
    • PDF and SVG RenderTarget implementation

Here is a screenshot of the new system with convex-hull enabled:

Code practice

The code base is under active development at https://code.launchpad.net/~yudi-xue/gephi/gephi-preview. The code base includes the PreviewAPI module and the PreviewImpl module.

Lookup API

We make use of Netbeans Lookup API to instantiate singleton and use Lookup. Template to ensure the correct implementation been called.

For example, to call the default PreviewController constructor, we call:

/*
* DEFAULT_IMPL is defined in the interface.
* It refers to default implementation class
* "org.gephi.preview.PreviewControllerImpl"
*/
(PreviewController)Lookup.getDefault().lookupItem(PreviewController.DEFAULT_IMPL).getInstance();

Accordingly, you may choose to use the API with your implementation by creating a Template that points your implementation class.

Functional Tests

During the development, we are creating functional tests against our own API for the purpose of both flexibility and stability. the “PreviewAPIFunctionalTest”

Conclusions

Our goal is to bring modularity and extensibility to the Preview module. We aim to deliver the freedom in defining your own visual algorithms (Renderer) and user interaction (Property) and make use of API without thinking about the detailed mechanism. I would like to give my thanks to Dr. Christian Tominski, Mathieu Bastian and Sébastien Heymann for their support and feedback, which is critical during the development for the new architecture.

GSoC mid-term: new Visualization API

My name is Vojtech Bardiovsky and I am working on the new Visualization API. This is done together with the new visualization engine based on shaders.

API design

The aim of the project was to design a clean and usable API for the new engine. It exposes only as much as necessary, but enough to make customization of visualization possible. The following four API classes are all services and can be retrieved through ‘Lookup’.

Visualization controller

This is the most important class in the API and can be used to retrieve the ‘Camera’, ‘Canvas’ used for visualization display, and very importantly the instance of active ‘VizModel’ and ‘VizConfig’ classes that both contain many settings that help controlling the visualization. It will also allow making direct changes to visualization like setting the frame rate or centering camera according to different parameters. The ‘Camera’ class can be used to get data about its position or to make actions such as translation or zooming.

Event and Selection managers

The Event manager can be used to register listeners to user events exactly as in the old engine. This is very important for the tools. The selection manager provides methods to retrieve all currently selected nodes or edges, to select nodes or edges and to control the selection state of the UI (dragging, rectangle selection, etc).

Motion manager

Apart from listening to all user induced events and their most basic handling (selection, translation, zoom), this class provides information about current mouse position in both screen and world coordinates.

New features

There are many changes the new engine will bring and although it is not finished yet, there already are some new user-side features.

Complex selection

In the old visualization engine, only rectangular and direct (one node) selection were possible. New API will allow to implement any reasonable shape. At the moment it supports rectangles, ellipses and polygons.

Thanks to the selection shape variability and changes in the mouse event system, it is possible to make incremental/decremental selections using Shift and Ctrl keys. Opposed to only one node at the time, the whole selection can be dragged and moved now.

Background image

It is now possible to change and configure the background image. Settings are similar to the CSS properties such as ‘background position’ or ‘background repeat’.

Node shapes

It is possible to have different shapes for every node in graph. Basic shapes include ‘circle’, ‘triangle’, ‘square’, etc., but also up to 8 custom images that can be imported by user. Nodes can have their shapes defined in the import file or set them directly through the context menu.

Better 3D

Work has been done on a better way to control the scene in the 3D. Graphs are not naturally suited for 3D, for example adding new nodes or moving them will never be perfectly intuitive. But for displaying the graph, some enhancements can be done.

Current status

The engine is still under development, but the API is slowly closing to its final state. Next step for the API will be to include as many configuration possibilities as the engine will allow. The underlying data structures will be optimized for performance.
As the project consists of two parts, API and engine, Antonio Patriarca, the mentor for this GSoC project and implementor of the engine will write an article about rendering details in the near future.

(The rendering pipeline for edges is not fully finished, so the images shown are not the actual new look of gephi.)

Gephi maps exhibited at the International Design Biennale

affiche_last2-187x300 The Saint Étienne International Design Biennial, holding from 20 November to 5 December, is a unique event in the domain of design, due to the exhibitions shown as well as the diversity of its attendees. The Biennial democratizes design and makes it accessible to all kinds of audiences, proving that this creative discipline can take many forms, and is often driven by human aspects, including its uses by humans.

The theme of the 2010 biennial is around teleportation. It intends to explore paths of discoveries that will tend in their extreme expression to lead to a possible teleportation as the dematerialization of movement which appears to be an incredibly revealing notion of our era.

Sebastien Heymann will exhibit maps of designers’ conceptual world, placed at the center of the “Prédiction” exhibition. Made in collaboration with Benjamin Loyauté, curator of the event, these inscriptions are a proposal to reveal the state of knowledge sharing in Design today.

Useful information on how to come here.

 

You may contact Sebastien by email to appoint a meeting during the first weekend.

Ten free tickets will be given to the first comments of this post!

EDIT: photos are available on the Facebook page of Gephi.

Scientist Christian Tominski about Gephi

Guest blog post from Dr. Tominski who accepted to review Gephi 0.7alpha4 for us.

Christian Tominski received his diploma (MCS) from the University of Rostock in 2002. In 2006 he received doctoral degree (Dr.-Ing.) from the same university. Currently, Christian is working as a lecturer and researcher at the Institute for Computer Science at the University of Rostock. Christian has authored and co-authored several articles in the field of information visualization. His main interests concern visualization of multivariate data in time and space, visualization of graph structures, and visualization on mobile devices. In his research, a special focus is set on interactivity, including novel interaction methods and implications for software engineering.

Recently, I stumbled upon the Gephi Project – an open source graph visualization system. As I’ve done some research in the area of interactive graph visualization, I was eager to see how Gephi works and if it brings some new concepts or if it’s yet another graph visualization system. I’ll share my thoughts on Gephi from three perspectives. The first one is the user perspective. I’ll take the role of a user who is interested in getting a visual depiction of some graphs. Secondly, I’ll take the role of a developer and shed some light on the aspect of software engineering. And finally, I’ll be a scientist and try to foresee if and in which regard Gephi might have some impact on visualization research.

The User’s Perspective

Gephi has been designed with the users and their needs in mind. The system welcomes its users with a familiar look and feel. It is quite easy to load graph data into the system. Many of the known file formats for graphs are supported, as for instance, DOT, GML, GraphML, or Tulip’s file format TLP. A nice thing about the data import is that an import report provides essential information about the import process (e.g., number of nodes and edges, edge-directedness, potential problems, etc.). Once imported, the graph is shown as nodes and links in a main view, and several complementary views provide additional information.

The main view is the core for visual graph exploration. It allows users to zoom in, to select nodes, to adjust node size and color, to find shortest paths, and to access attributes of nodes and edges. In addition to letting users set sizes and colors manually, the system can also set these automatically based on attributes associated with nodes and edges. What is called “Partition” in Gephi is used to assign unique colors to nodes and edges based on qualitative data attributes (e.g., class affiliation). Quantitative data values can be mapped to size and color of nodes, edges, and labels using the “Ranking” tool. All these tools are customizable. It is worth mentioning, that Gephi provides some nice user controls to parameterize the color coding.

Gephi also supports graph editing, i.e., insertion and deletion of nodes and edges as well as manipulation of attribute values. What is missing in terms of editing the data is the possibility to add (and delete) attributes, for instance to generate some derived data values using simple formula.

A key aspect in graph exploration is the layout of node and edges. As it is usually unclear what will be the best layout for a given graph, Gephi offers various layout algorithms to choose from. While a layout is being computed, the main view constantly updates itself to provide feedback of the progress made. A big plus is that users can interrupt the layout algorithm once they deem the result to be ok or if they find that it might be more suitable to use the current result as the initial setup for another algorithm. This way users can easily tune the layout to fit the graph and the particular needs. Users may put the finishing touches to the layout by moving nodes manually in the main view.

Once a suitable visual representation has been created, the final step is to export nice pictures of the graph. To this end, Gephi follows the philosophy of providing a dedicated export interface with many options to create high quality printouts.
People that have been working with larger graphs might know that some computations on graphs (including layout computation) are quite complex and take some time. While other systems are blocked during computation and in the best case provide a progress bar, Gephi is different. Long running calculations are concurrent to the main application. From my point of view, this is one of the strongest points of Gephi, the system does not block during costly computations. The benefit for the users is that they can always interact, for instance to initiate some other computations or to cancel running ones when they recognize that a re-parameterization would yield better results.

Concurrency is Gephi’s solution to offering computations of statistics about the graph. Currently, Gephi supports a variety of classic graph statistics including degree distribution, number of connected components, and others. Based on data attributes and computed statistics, the graph can be filtered to reduce nodes and edges to those that fulfill the filter criteria. In a dynamic filtering UI, several filters can be combined using drag’n’drop and thresholds can be manipulated easily, for instance via sliders. Besides using filtering for data reduction, Gephi also provides basic support for graph clustering. However, the currently implemented MCL algorithm is still experimental. But there is the possibility to manually group nodes to build a hierarchical structure on top of the visualized graph. Yet, this is quite cumbersome for larger graphs. Additional tools are needed to support the user in creating a navigable hierarchy on top of a graph. Configurable clustering pipelines that combine several strategies for clustering (e.g., based on attributes or based on bi-connected components) in addition to a clustering wizard user interface would be helpful.

In summary, I see a much potential in Gephi, the overall shape of the system impressed me – me as a user. I personally felt it easy to work with Gephi and explore some of my own data sets and some provided at Gephi’s website. Given the fact that the version I’ve worked with is 0.7 alpha, there is also much space for improvements. In the first place I would like to mention the navigation of the graph. The main view provides just basic zoom and pan navigation, which is even imprecise in some situations. Navigation tools like those provided in Google Earth and navigation based on paths through a graph would be really helpful. Moreover, I was missing the concept of linking between views. Selecting an element (node or edge) in one view should highlight that element in all other views. Right now this is not really an issue as the number of views seen in parallel is quite low. But once additional views are needed, for instance to focus on data attributes in a Parallel Coordinates Plot or to visualize the cluster hierarchy in a dedicated view, or when one and the same graph is shown in parallel in two or more main views for comparing different analytic results, linking will be crucial for user experience. But these things are not too complex and should be easy to integrate in future versions of Gephi. Another aspect regards highlighting in the main view: instead of marking the selected node, all non-selected nodes faded out to focus on the selected node. This implies rather big visual changes because all but one nodes change their appearance when a single node gets selected and deselected.

Pros: Cons:
  • Easy graph import and export
  • Many options for visual encoding
  • Various layout algorithms to choose from
  • Support for dynamic filtering
  • Computation of graph statistics
  • Basic support for graph clustering
  • System does not block during long running computations
  • Graph navigation can be improved
  • No linking among views
  • Few visual glitches
  • Still an alpha version with bugs here and there

The Developer’s Perspective

Now let me switch to the developer’s view. Gephi is open source software so that everybody can participate in improving the system or can adapt the system to personal or business needs. Gephi seems to be very well designed on the back-end. The project is based on the Netbeans platform and the Java language. It is subdivided into a number of modules that define several APIs and SPIs and that provide implementations of these interfaces. Thanks to the modular structure, Gephi can be extended quite easily. The best way to do so is to implement plugins. Plugins can be used, for instance, to add further layout or clustering algorithms, statistical computations, filter components, or export methods. The modular structure also allows for using only specific parts of the Gephi project in one’s own projects. The Gephi Toolkit is a good example. It is not an end-user desktop application, but a class library that provides all the functionality of Gephi to those who want to reuse Gephi’s functionality and data structures in different ways.

As I’ve mentioned in the user perspective, the way how Gephi deals with long running computations is a big plus. Given the fact that aspects of multi-threading are inherent in the system from the very beginning and are manifested at the systems core, I sincerely hope – no, I’m quite sure that Gephi will not run into all the problems that are likely to occur when multithreading is integrated into an existing single-threaded system, as I have experienced it myself. Also I conjecture that others will find it much easier to implement concurrent non-blocking extensions of the system simply by following the way how existing code handles things in Gephi.

As Gephi is split up into many different modules, it took me a while to get accustomed to the system and to learn which functionality can be found in which module. But I have to add that I had no prior experience in Netbeans platform development and the module concept that is used there. I also found that the code documentation could be improved in several parts of Gephi’s sources. On the other hand, the Gephi website provides informative wiki pages with various examples and tutorials.

My view from the developer’s perspective can be summarized as the following pros and cons:

Pros: Cons:
  • Open source
  • Modular structure
  • Well defined interfaces
  • Extensible via plugins
  • Inherently multithreaded
  • In-code documentation can be improved

The Scientist’s Perspective

As a scientist I’m not so much interested in developing fully-fledge end-user software, but in developing solutions to scientific questions and in publishing the results. A difficulty in interactive visualization is that usually one needs a broad basis of fundamental functionality to be able to develop such solutions. Previous attempts of establishing a common infrastructure for interactive data exploration made notable progress, but eventually did not fully succeed or are no longer actively maintained. This is due to the fact that a single researcher usually simply does not have the time to do decent research and at the same time to maintain a larger software project.

I personally feel that Gephi can become such a fundamental infrastructure. Maintained by an active community, the system allows researchers to focus on solutions in form of plugins, while they can utilize the functionality that the system provides. Visualization researchers will be happy if they can simply plug in new visualization techniques as additional views, test new layout algorithms, and experiment with new clustering methods. Moreover, new solutions can be easily disseminated to real users in the community. This might prove beneficial when it comes to acquiring early user feedback or when more formal user evaluation is needed prior to publishing new techniques and concepts.

A big issue in visualization research is visual analytics, that is, the combination of analytical, interactive, and visual means to facilitate making sense of large volumes of data. In terms of analytic means, a goal is to break analytic black boxes and make analysis algorithms interactively steerable. With the architecture of Gephi, where parameterizable algorithms run concurrently and provide feedback in form of intermediate results, I believe this goal can be reach in the future. A thing that I’m curious about is if it is also possible to come up with concepts that allow for plugging in new interaction techniques. As interaction is usually quite tightly bound to a view, I wonder if interaction could be implemented as independent plugins as well, and if novel interaction concepts will be supported in the future (e.g., touch interaction)? Furthermore, aspects of interactive collaboration of multiple users working to solve a common analysis problem could be of interest. A question related to the visual side is whether it is possible to use Gephi with different displays and display environments such as tabletop displays, display walls, smart phones, or multi-display environments?

A facet of graph visualization that I did not mention in the user’s perspective as I felt it more suited to be mentioned here is dealing with dynamically changing graphs. Visualization of time-varying graphs is a hot research topic and Gephi is about to face this challenge. There is preliminary support for exploring time-dependent graphs via a time slider. But there is more to this that just browsing in time. Concepts have to be integrated to support easy comparison of multiple snapshots of a graph and to highlight significant changes in the development of a graphs history.

Let me try to put my thoughts into a pros and cons list:

Pros: Cons:
  • Potential infrastructure for visualization research
  • Researchers can focus on solutions in form of plugins
  • Potential to use community for user feedback and evaluation
  • Partial results for current research questions (graph clustering, steerable algorithms, dynamic graphs)
  • Nice playground for experimentation and testing new ideas
  • Unclear if new and alternative technologies will be supported

Summary

Since I’ve put hands on Gephi I’m infected. Maybe I’m dazzled by the beautiful demo video or the nice pictures that have been generated using Gephi, but in my opinion Gephi has the potential to become a big player in interactive visual graph exploration and analysis. From all perspectives that I’ve taken I see many positive things – and plenty of room for improvements or additional features. I do hope that the people behind Gephi will continue their work to the benefit of all users, developers, and researchers.

Related Stuff

There are many other systems and frameworks out there that do a great job in interactive graph visualization or in supporting it as a toolkit. I would like to give credit to these systems, because they can be the source of many ideas and much inspiration:

To go further about Gephi design, see also this article about semiotics.