Improving the Gephi User Experience

This is an effort to rethink the design of Gephi authored by Donato Ricci, co-founder of Density Design and senior designer at the Sciences Po médialab in Paris, and me, creator of Gephi and an engineer at the same lab (note that I am not Mathieu Bastian, our lead developer and actual powerhouse of the project for the past 10 years).

While this text presents possible improvements and practical solutions, it does not address practical considerations of available labor. Also, be aware that this is not a formal roadmap for future releases but rather a way to open the current state of our reflections for brainstorming. So feel free to share your ideas and comments with us.

There are five main categories that structure the improvements that we currently envision:

  • Design strategy: Ensure that a coherent design philosophy is applied across the entirety of the project
  • User interface: Identify and correct user-facing errors
  • Network-focus: Re-focus design and architecture around the network’s position of primary importance
  • Filling in the Gaps: Providing expected functionality
  • Miscellany: Other minor issues

In addition, we have drafted a UI mockup illustrating some of our propositions.

Design strategy

Gephi was built by engineers without a comprehensive design strategy. This situation is fairly common: engineers approach design in an ad-hoc fashion learning by trial and errors and through casual user observation but without a formal user-testing protocol. Should the tool succeed, it is mostly because the utility triumphs over the pain of use. Gephi is an embodiment of this phenomenon in its current state. Some computer scientists may find it simple, partly because using terrible interfaces is a part of their job, but for many users Gephi is confusing. Geeks of a masochistic tendency may love the tool as a result of digital Stockholm Syndrome, but the bulk of users that could benefit from Gephi find it to be confusing and opaque. In our defense, developing desktop applications is heavily constrained and the Java technology was not helping us to overcome this difficulty. What could a designer do to alleviate this situation? Apply a strategy.

A designer does more than treating the symptoms of poor usability; he or she approaches user experience from its fundamentals. Improving Gephi requires rethinking some of its longest standing features from a new standpoint. A design strategy is the solid foundation upon which we build both a satisfying user experience and underlying software architecture.

Our design strategy fits in five basic points: obtaining substantial and organized user feedback, giving Gephi a clear workflow, implementing a facet-oriented interface layout, reordering panels from the user standpoint, and removing unnecessary features. Each point is explained in further detail along with practical guidelines for implementing potential solutions in the future.

User feedback

We cannot build a sustainable user interface without a quantitative measure of user activity. These data are necessary to support and validate design choices. One approach to obtain this information is to log and collect feedback about interface usage.

An optional logger could be implemented in Gephi to allow users to opt-in to the collection of logs in order to improve the software. Data harvesting can be done as a campaign: for example, we may ask some users to activate it for one month to evaluate the usage of a new interface paradigm that we are testing.

A clear workflow

Users need a clear and visible path to start with Gephi, in particular when opening a new file. We need to remove information to allow users focusing on what is important.

Gephi involves not only the software itself, but also the installer, website and documentation. Our ultimate goal is to make the entry process as simple as possible by coordinating these different elements. We begin by focusing here on just the software itself. We propose to consider that there are only two proper ways to enter in Gephi:

  • Opening a file (constructing a network from a pre-generated file)
  • Connecting to a data source (embedded scraper or API connection)

We also need to clarify the roles of the “open” and “import” functions. We have to clarify that:

  • If the user has a file and needs to see it in Gephi, then “Open” is the right answer
  • If the user has an external data source he or she wishes to connect to, there is distinct menu option for this function

Use case: we have observed that some users try to get in Gephi with a table of nodes and a table of links, and do not succeed in finding the right path. The problem there is that it is not explicit that it is necessary to create an empty document, go to the data table, and then import the tables. Since the pattern is “I have files and I want to see them in Gephi”, then the answer should be under the “Open” menu item.

Facet-oriented interface layout

Rethinking overall design has the virtue of allowing for the reorganization of the interface from a user-centric perspective. The current interface relies on the panels system provided by the Netbeans Platform, which provides some beneficial properties for design. We were inspired by Ben Shneiderman’s motto of “overview first, zoom and filter, then details-on-demand” and it has been quite successful. However the different views are not articulated in a coherent way and the features sometimes struggle to find the right place in terms of visibility.

We propose three simple guidelines for a better organization of the panels:

  • The global hierarchy of containers should reflect the generality of the features
  • Some panels are not facet-dependent: they should not change with the facet
  • The network should occupy a single place whatever its facet, since it is always the same object

Panels guidelines

These guidelines have two consequences. First: facet-dependent panels should be contained inside facet-independent panels; which is to say that there is a single container for all facet-dependent features. Second: the facet selector (currently the three tabs on top) has to be inside this container.

We illustrate this with a comparison between a representation of the current layout and a new simplified structure.

Web

Reordering panels from the user standpoint

A part of our design strategy is to reduce visual clutter by grouping panels that are not used simultaneously. Though it is not intuitive, we prioritize separating panels that work well together over grouping similar features. For instance the following panels should be placed in different groups in order to be used combined:

  • Filters + Layout
  • Filters + Partition or Ranking
  • Timeline + almost anything else

With the current panels there are at least three obvious groups: one with filters, another with layout, and the third with the timeline. Generic contextual information is a fourth possible group, but could be placed in a non-intrusive location like the footer. Some panels, like statistics, could theoretically be at home in any group or even as a separate window that could be invoked from a menu.

Collapsing panels concept

Panels and groups of panels create two levels, so the “window” menu should have two levels too.

Removing unnecessary features

Reducing complexity can also be accomplished by removing features. We have detected at least one clear candidate for removal, but we may find more unnecessary complications to remove.

The “preview” panel of Gephi has been increasingly simplified over iterative updates. The goal of this feature is to provide a quick way to export cartographies. Users with competence in design tend to rely on third-party tools that provide finer-grained control over the visualization, like Illustrator and Scriptographer. Thus, the focus in Gephi is to provide a quick way to export images that can be manipulated in other tools.

We propose to further streamline preview functionality by removing some advanced label features: they infrequently used, complicated, and at times internally inconsistent with other preview settings. Furthermore, it is not necessary to facilitate changes to features like label and node size and color when such adjustments can be made much more easily using other tools.

User Interface

Donato Ricci has identified various flaws in the Gephi UI. Fixing them is a priority for the future.

User-centric features: reordering workflow

Users think in terms of results they want to obtain. They have an action in mind and they search how to do it. By displaying features according to their result, we can both improve user orientation and reduce the tool’s learning curve. A few examples of follow.

We propose to aggregate Partition and Ranking under the more accessible term “Appearance”, and to reverse the order of what is asked to the user. The current interface is organized in the following way: if the user has metadata that can rank the nodes then the user can visualize it using different attributes like color and size. The new interface inverts this approach: if the user wants to color nodes then he or she chooses which metadata to use. The panel may progress like a wizard to reduce cognitive load by drawing attention only to information that is necessary for a given step.

Unified panel appearance

Collapsing advanced layout settings

The current design of Gephi does not respect the general principle of drawing attention to information that is commonly used while obscuring information that is infrequently used.

Tools of the Overview: many problems to fix

The small tool buttons on the left side of the overview panel have a number of problems including:

  • Confusing icons that do not easily communicate the use of tools
  • Indistinct icons that do not sufficiently distinguish between different tools
  • Missing tools that are commonly expected

These issues are compounded by evidence that the tools themselves do not provide sufficient utility for common use.

We propose to alleviate some of these shortcomings by putting most of these tools in a collapsed panel and to have a normal panel dedicated to the settings of each tool. We also propose to implement a default tool cursor that draws on common mouse usage paradigms to provide intuitive functionality to users:

  • A click-drag starting on the background (or an edge) of the view makes a rectangle selection
  • A click-drag starting on a node moves this node
  • A click on a node selects the node, the shift key is used for multiple selection
  • A click on the background deselects
  • A set of meta-keys changes the click function, for instance the spacebar to switch to the view-panning tool (i.e. the hand in Photoshop)
  • The secondary-click works the same

Fixing highlight colors

Highlighting works by tweaking colors so that some nodes get more contrast than others. The contrast should depend on the background color, but this is not current implementation. At the very leastthe following should be done:

  • White background: highlighted nodes darker and other nodes lighter
  • Black background: highlighted nodes lighter and other nodes darker

Network-focus

As a network analysis tool, the network itself plays an obvious central role in Gephi. We have explored different ways to incorporate the network into the software’s presentation and have developed some suggestions for modifications that would increase interface coherency.

A different layout for the panels: network as background

In this approach, the network is contained by a “background sheet” and floating panels support functions. Such a philosophy has been successfully implemented in other systems like Photoshop or Google Maps. Using the root panel for the network, like grouping facet-dependent panels, fosters the feeling that we always deal with the same object, the network. This operates on the metaphor that we are always manipulating a primary canvas that consists of the network to be analyzed.

Network as background

Statistics as an invoked panel or window

Statistics tend to be used on demand, and thus do not need to be displayed permanently. Rather, a discrete menu or button could invoke the statistics panel when needed. Removing this visual information leaves more room to focus on what is important, i.e. the network.

Workspaces: more visible, on top

The workspaces need more attention. We propose to show them as tabs on the top of Gephi. It is more natural to have the workspaces above the facet selector in the hierarchy of panels. This is consistent with the prevalence of the “tab” paradigm in the browser space.

Workspaces-on-top

Filling in the Gaps: Providing Expected Functionality

In addition to the different aspects listed above, users need some well-known common features such as an “undo” function, even if they are complex to implement.

History and undo: feasible if limited to network structure

A visible trace of previous steps, like a proverbial breadcrumb trail, provides users with a sense of orientation and confidence when exploring and manipulating data in a speculative fashion. This also accelerates the learning process by alleviating cognitive load by not forcing users to have to remember a series of unfamiliar steps. This works in tandem with an “undo” feature, which facilitates experimentation without fear of permanently corrupting data.

History and Undo are complex to implement and burden the development of plugins and modules as these functions tend to be deeply embedded in a piece of software’s architecture. This partly explains why they are not currently available in Gephi. However a prudent approach in Gephi would be to focus recording and reversal of changes to the structure of the network: Nodes and edges, their attributes (including color, size and coordinates), but not the state of panels such as filters, statistics…

An initial approach would be to cover only a minimal set of modifications of the network structure. The history would then contain information about the type of the modification, but not its exact content nor the way it was done (manually, filter, data table…). For instance:

  • Modifying attribute X for node N / n nodes / all nodes
  • Modifying color / size / position for node N / n nodes / all nodes
  • Adding / removing: node N / n nodes / all nodes
  • Adding / removing: node attribute X / n attributes / all attributes
  • …and the same for links

The history would not include operations such as exporting files, taking screenshots, modification of views, changes to settings, or other changes that did not directly affect the structure of the network

Protecting irreversible operations

Some operations are irreversible: removing nodes, edges and attributes (and possibly more). Because these operations are definitive and may cause the loss of a certain quantity of work, they should be protected. A classical solution is to ask confirmation for any definitive operation. This is a simple guideline but the result is quite user hostile. We propose a better solution, as implemented in Photoshop: when an irreversible operation has been done, when the user tries to save the network the “Save as…” window appears instead and proposes a different name (with a suffix number or “Copy of X”).

Miscellany

A few additional points deserve to be listed, and are done so in no particular order.

Manual versioning

A basic versioning feature would be appreciated: just the opportunity to save with incrementing / adding a number suffix:

  • “My Network.gexf” is saved as “My Network 01.gexf”
  • “My Network 11.gexf” is saved as “My Network 12.gexf”

A common shortcut for this is Ctrl+Alt+S.

Generalizing zoom options (more internal consistency)

We can currently set how the zoom impacts text labels. The same feature would be useful for edges, for instance to keep 1 pixel lines whatever the zoom, as well as for nodes, for instance to keep small points whatever the zoom.

Generalized zoom options

Size nodes according to area

Our eyes perceive areas. Setting a ranking to the diameter of nodes is less intuitive than to apply it to the area of nodes. We propose to offer an option to customize ranking by either diameter or area, but set the default to area.

Removing unnecessary settings about labels

Node labels and edge labels should help the user identifying nodes. However, using the color or size of labels to visualize attributes is confusing. Gephi presently contains settings to manipulate labels in this way, these settings should be removed and replaced with a simpler interface.

UI Mockup sample (work in progress)

We present here a possible approach to integrate some of the different suggestions made in this document. Consider the following image as a way to help imagine the future of the Gephi user experience.

MockupA-01

As we stated earlier, the purpose of this document is to open up the floor to brainstorming ideas about improving the Gephi UX. Please share your ideas in the comments!

PS: Thanks to Niranjan Sivakumar for his excellent proof-reading 🙂

Rebuilding Gephi’s core for the 0.9 version

This is the first article about the future Gephi 0.9 version. Our objective is to prepare the ground for a future 1.0 release and focus on solving some of the most difficult problems. It all starts with the core of Gephi and we’re giving today a preview of the upcoming changes in that area. In fact, we’re rewriting the core modules from scratch to improve performance, stability and add new features. The core modules represent and store the graph and attributes in memory so it’s available to the rest of the application. Rewriting Gephi’s core is like replacing the engine of a truck and involves adapting a lot of interconnected pieces. Gephi’s current graph structure engine was designed in 2009 and didn’t change much in multiple releases. Although it’s working, it doesn’t have the level of quality we want for Gephi 1.0 and needs to be overhauled. The aim is to complete the new implementation and integrate it in the 0.9 version.

In November 2012, we started to develop a completely new in-memory graph structure implementation for Gephi based on what we’ve learnt over the years and our desire to design a solution that will last. The project code-name is GraphStore and we focus on four main things:

  • Performance: The graph structure is so important to the rest of the application that is has to be fast and memory efficient.
  • Stability: The new code will be the most heavily unit-tested in the history of Gephi.
  • Simplicity: The Graph API should be documented and easy to use for developers.
  • Openness: If possible, we want GraphStore to be used in other projects and keep the code free of Gephi-specific concepts.

Gephi is known to use a large amount of memory, especially for very large networks. We want to challenge ourselves and tackle this issue by redesigning the way graphs are encoded and stored. Besides memory usage, we carefully analyzed possible solutions to improve read/write performance and optimize the throughput. Stability and simplicity are like food and shelter, and whatever we try to do at Gephi should be simple to use and stable. As we’re going towards a 1.0 version, we’re putting more and more efforts to testing and code quality.

Since November 2012, we have been working on GraphStore separately from Gephi’s codebase and will start the integration fairly soon. The Graph API is very similar to the existing API. However, it isn’t entirely compatible and several core things changed like attributes, views or dynamic networks and will require a lot of work in some modules. On the other hand, because the GraphStore code is decoupled, it could be leveraged in other projects. For instance, it could serve as a Blueprints implementation as an alternative to TinkerGraph.

Graph structure

A graph (also called network) is a pair of a set of nodes and a set of edges. Edges can be undirected, or directed if the direction of the relation matters. Edges may also have weights to represent a value attached to the edges, like the strength of a connection or the flow capacity. Edges may also point to the same node (i.e. self-loops). Gephi currently supports these features, but they are not sufficient to describe the variety of problems graphs can be helpful with. Multigraphs permit several relationships between nodes and is for instance commonly used to represent RDF graphs. Multigraphs with properties (i.e. ability to attach any property to nodes and edges) have recently become the standard representation for graph databases.

The next version of Gephi will support multigraphs and therefore allow multiple edges between nodes to be imported. The rollout will be done in two phases. The first phase is to allow this new type of graph to be imported, filtered and exported. We will update the importers and add new options to support these graphs. The second phase is to update the visualization and the way multiple edges between nodes look like.

Hierarchical graphs

Since the 0.7 version released in 2009, Gephi has supported hierarchical graphs. Hierarchical graphs let the user group or ungroup nodes so it forms a tree. Nodes which contain other nodes are named meta-nodes and edges are collapsed into meta-edges. Groups obtained from clustering algorithm (e.g. modularity) could also easily be collapsed into meta-nodes in order to study the network at a higher level. We initially recognized the potential of this idea for network analysis and developed a hierarchy-enabled data structure. However, we realized we didn’t completely fulfill the vision by not providing all the tools to fully explore and manage hierarchical networks. Although the data structure allows it, the software still lacks many features to really make hierarchical networks explorable.

Recently, we are more focused on networks over time and plan to continue to do so. In the past years, users have shown steady and continuous interest in dynamic networks and we haven’t really seen a strong interest in hierarchical networks. Therefore, we propose to remove this feature from next releases. On the developer side, cutting this feature will greatly simplify the code and improve performance.

Dynamic networks

Networks that change over time are some of the most interesting to visualize and analyze. We have heavily invested in supporting this type of network, for instance by developing the Timeline component. However, dynamic graph support was added after the current graph structure implementation was conceived and therefore remains suboptimal and difficult to scale. Now that we have enough hindsight, we can rethink how this should be done and make it simpler.

One pain point is the way we decided to represent the time. Essentially, there are two ways to represent time for a particular node in a graph: timestamps or intervals. Timestamps are a list of points where the particular nodes exist and intervals have a beginning and an end. For multiple reasons, we thought intervals would be easier to manipulate and more efficient than a (possibly very large) set of timestamps. By talking to our users, we found that intervals are rarely used in real-world data. On the code side, we also found that it makes things much more complex and not that efficient at the end.

In future versions, we’ll remove support for intervals and add timestamps instead. We considered supporting both intervals and timestamps but decided that it would add too much complexity and confusion.

Graph structure internals

Graph structures design is an interesting problem to solve. The objective is quite simple, yet challenging: how to best represent an interconnected graph so it’s fast to query and compact in space? Also, how to keep it simple and serve a large number of features at the same time?

Graph storage

Our goal is to develop a thread-safe, in-memory graph structure implementation in Java suitable for real-time analysis. You may ask how this differs from a graph database or a distributed graph analysis package. In a few words, one can say the requirements are quite different.

Graph databases like Neo4j, OrientDB or Titan store the graph on local disk or in a cluster and are optimized for large graphs and large number of concurrent users. Typically, the networks are much larger than what can fit in memory and these databases mostly focus on answering traversal queries. In the environment where graph databases operate most of the needs can be converted in some sort of traversal query (e.g. friends of X, tweets of Y). Traversal queries are also the reason why graph databases scale to billions of nodes. Indeed, for each traversal, only a subset of the graph is accessed. This is quite different from Gephi, which by its nature of being an analysis software needs to access the complete graph. For instance, when a layout is running Gephi needs to read the X,Y position of each node as quickly as possible. Although reading from the disk can be very quick as well (e.g. GraphChi), it’s limited to sequential access and things become more complex that way.

Because of the real-time requirements, we want to keep our graph data in memory accessible at all time. However, we want to make it easy to connect to external data sources, and graph databases in particular.

Reducing overhead

In computer science, overhead is any combination of excess or indirect computation time, memory, bandwidth, or other resources that are required to attain a particular goal.

GraphStore heavily relies on Java primitives, arrays and efficient collections library like fastutil. We are reducing overhead by simply avoiding using too many Java objects, which are very costly. Instead of using maps, trees or lists, Nodes and Edges are stored in large arrays which can be dynamically resized in blocks. For instance, iterating over the graph should be extremely fast because the CPU caches array blocks. This may sounds obvious but performance optimizations are tricky in Java because of the JVM and the uncertainty of what makes a difference and what doesn’t. In his “Effective Java” book, Joshua Bloch writes “Don’t guess, measure” and that’s still true today. For our project, we rely on well-defined micro-benchmarks to see where the bottlenecks are and how to make our data-structure more cache-friendly and more compact in memory. When the graph contains millions of edges, every byte saved per edge can make a large difference at the end.

In terms of speed, we focused on optimizing the most common operations, which are iterating over all the elements and consult nodes’ neighbors. Typically, a layout algorithm needs to read the neighbors of every node at each iteration. Neighbors can’t simply be an unsorted list because of the removal complexity: to remove a node, you need to know where it is. The current Gephi graph structure uses a binary tree to store the node’s neighbors. Although the complexity is logarithmic, every node in the tree takes extra memory and logarithmic complexity is still suboptimal. After isolating the problem in a benchmark, we found that using a double linked-list is the best solution for our requirements and achieves a O(1) complexity, as it fulfills both a quick iteration and quick update. Here is a snapshot of the solution:

Every edge has 4 integer pointers to the next in/out predecessor and successors and a separate dictionary would help to find the right edge based on the source and destination pointers. Each node has a pointer to the first edge in the linked list (i.e the head). Node ids are integers (32 bits) so one can easily create a long->Edge dictionary by encoding the source and destination node into a single long number (64 bits). The diagram intentionally leaves out the multigraph support for simplicity. In reality, nodes can have multiple head pointers, one for each edge type. Each edge type is represented by a integer index.

Views

Views are one of the most useful aspects of Gephi’s graph structure and are mainly used behind the scenes in the Filter module. A view is a graph subset (i.e. a subgraph) which remains connected to the main structure, so if a node is removed from the graph, it’s removed from the views as well. For instance, when users create a ‘Degree Filter’, Gephi creates a view and removes all the nodes which don’t fulfill the degree threshold. Multiple views can co-exist at any time in the graph structure. In the current graph structure, a node tree complete copy is done for every view and we found that this can be very inefficient.

In the new version, the way views are implemented is very different and should yield to better performance. Instead of doing a copy of the nodes, we maintain bit-vectors for nodes and edges. Because these elements are stored in large arrays with a unique identifier, it’s easy to create and maintain a bit-vector. When developers obtain the ‘Graph’ object for a particular view, the bit-vectors are used behind the scenes to adapt iterators and accessors. This solution should make filtering for large graphs much quicker. One drawback is that whereas the current implementation copies and then trims the view, GraphStore work with bit-vectors but continues to access the complete graph. In other words, if the view represents only 1% of the original graph, it still needs to iterate over the 100% to find which elements are the 1%. Even though this sounds bad, our benchmarks show it’s a very fast operation and we win overall because of the reduced overhead of duplicating the graph. Moreover, we can introduce some caching later to optimize this further.

Inverted Index

When you’re using the Partition module in Gephi, you’re manipulating some sort of inverted index. Nodes and edges have properties like ‘gender’, ‘age’ or ‘country’, and these properties are contained within the nodes and edges objects. An index is a simple data structure which allows to retrieve the list of elements for a particular value. For instance, the partition module needs to know what is the number of ‘male’ or ‘female’ nodes for the ‘gender’ column. When the column is a number like ‘age’, it also needs to know what is the maximum and minimum value. Unlike the Ranking module and its auto-apply feature, the Partition module is not refreshed in real-time and therefore difficult to use when the graph is changing a lot. We have decided to invest in this feature for the future release and are building a real column inverted index in the graph structure. The index will simply keep track of which values exist for each column and which elements are holding this value. The index will be updated in real-time as elements are added, removed or updated.

The ability to quickly retrieve elements and counts based on specific values will be very useful in many different modules like Filters, Partition or Data Laboratory. New APIs will be added for developers to use the newly created index interface. As we’re working on attributes storage and manipulation, we’ll also merge the Attributes and Graph API because they are so interconnected that it doesn’t really make sense to have them separate. The interfaces that developers are familiar with like Table or Columns will remain the same.

Events

In software programming, events are a common way to inform other modules that something changed. In Gephi, we also use events to convey graph updates events to inform other modules about updated nodes or edges. In the new GraphStore, we’ll stop using events to transport graph modifications because of the large overhead due to the creation of event objects. Indeed, when 10K nodes are added to the graph, the existing structure literally creates 10K event objects and puts them in a queue. Although the event queue is compressing objects of the same type, the overhead to create, queue, send and destroy large amount of small Java objects is too large.

Instead of a push model (i.e. the emitter is pushing updates), we want to rather promote a pull model (i.e. the listener pull updates from time to time) for future releases. A similar system is already in place to link the graph and the visualization module and it has been working without a glitch. We’ll develop the tools to easily calculate graph differences between a listener module and the graph structure. By removing the bottleneck, write performance should greatly improve.

Timestamps

As said earlier, we’ll add timestamps support to represent dynamic networks. Instead of using a time interval, a timestamp array will be associated with nodes and edges. For element (node/edge) visibility, each timestamp represents the presence of the element at that time. For example if a network snapshot is collected every month for a year, each node will have up to 12 different timestamps. The timestamp itself is a real number and can therefore represents an epoch time but also any other value in a different context. For a dynamic attribute, the time+value is simply represented as a list of (time, value) pairs.

To support the timeline and dynamic networks algorithm, we’re developing an inverted index for timestamps so we can make time filtering very quick. One good thing about intervals is that it’s very easy to know if two intervals overlaps with each other. With a flat list of timestamps, one can’t avoid to go through the entire list. The index will essentially map timestamps to the nodes and edges elements in the graph and therefore solves this issue. The Interval tree implementation which we are currently using to store intervals is based on a binary tree and is very costly in memory because of all the Java objects overhead. Using simple arrays should reduce overhead and improve performance for large dynamic networks. When computing a dynamic network algorithm (ex: Clustering coefficient over time), we’re using a sliding window over the graph so the ability to quickly filter is critical as it impacts how fast the graph refreshes.

Saving/Loading

Saving and loading the graph structure into into/from a file (or a stream) is another critical feature. When a user saves a project in Gephi, the graph data structure is serialized in XML and compressed into a .gephi file. If you worked with project files in Gephi, you may have experienced corrupted files issues or errors when loading a file. We’ve done our best to fix these problems but some still remain. We’re rethinking how this should be done in GraphStore and are making a call to rewrite the code from scratch. Our approach will rely on a lot of unit tests to make sure the code is stable so we don’t repeat the same issues in future versions. Please note that this concerns the .gephi files only and existing importers (e.g. GEXF, GraphML) will remain the same.

Concerning the GraphStore serialization, we’re abandoning XML in favor of pure byte arrays. That should yield to better performance and reduced project file size. We’ll create a custom reader for previous Gephi versions so you can still open your existing projects. Other modules like Filters or Preview will continue to use XML as it’s working just fine.

Next steps

This is the first post about the Gephi 0.9 version and more will come soon. We’re excited about the current developments and hope to hear from you. Please join the gephi-dev mailing list to learn more about ongoing projects and contribute. We need your ideas!

Follow us on Twitter!

Making up after the 100-Day Plan

After more than four months of intense activity, and after the end of the 100-Day Plan it’s time to sum up what was achieved since the release of the 0.7 version. Most of the objectives have been carried out and Gephi has been downloaded more than 17K times since February. The future looks bright and more and more people are interested using and developing the software. Networks are gaining a lot of momentum in the research community and in the industry, by being a generic and extensible platform we position Gephi as a reference tool. If you’re interested funding us, please let us know.

To beta version

Four versions have been released and a lot of bugs have been fixed during this period. The current version is 0.7 alpha 4, released last month. New features were also developed, with the help of the developer community and were quickly deployed, for instance PDF Export and new Metrics. Not later than yesterday, new bug-fixes were deployed and available through Gephi updates.

The milestone date is also fixed for the 0.7 beta: August 14, 2010. The aim is to fix remaining bugs until this date. If you notice one, please consider reporting it.

Documentation

A Quick Start Guide and a Visualization Tutorial have been written. The community made great tutorials also, which made a huge difference. Kudos to them. The forum has also showed great ability to connect with users and provide quick support. Many efforts are still needed in that way and hope to get more support writing tutorials.

Manifesto

We completed the Gephi Manifesto, to understand the project’s goals and aims.

Communication

The video Introducing Gephi 0.7 had a huge success, viewed more than 12K times. It was done to promote the release of the 0.7 version and succeeded in this job. Gephi has now its place among graph visualization software and is already recognized for its easiness and efficiency. But above all, the audience see great potential in Gephi and many people are thinking how they could use or reuse Gephi for achieving bigger tasks. That is very positive and we are cheering developers to code plug-ins.

Follow the #madewithgephi hashtag on Twitter to see recent comments. The Gephi team also attended the IEEE EuroVis conference in June and will provide software demonstration at Sunbelt XXX next week, on Friday July 2.

Development

Thought the 0.7 beta is still in preparation, the new developments still continue and are now in a very active period. Indeed, six Google Summer of Code students are working hard and are preparing outstanding improvements. The whole code is also profiting from the toolkit project, where essential modules are built together in a single JAR in order to be reused as a Java library. Good progress is made on this project. It is very important for many developers who wants to reuse Gephi features in other Java applications. So stay tuned about GSoC updates and gephi-toolkit!

The roadmap and blueprints page also got a lifting.

We would like also to reinforce interoperability with other tools and develop connectors to new file formats.

Developers tutorials

New help pages for developers were created: Checkout Code, Configuring NetBeans and Plugin Quick Start. HowTo for extending Gephi features have been written also, including layout, metric and import and were already used by third-parties developers to create new plugins.

The next tutorial will concentrate on the gephi-toolkit project and how to reuse Gephi as a Java library.

Many other tasks are on the way, notably translating Gephi in French and Spanish and preparing Gephi Student Program. We are very interested involving CS students and propose to them challenging tasks for a semester or a quartile. We hope to interest professors about that.

One more thing, after discussing with the community members we decided to move to AGPL licence for the beta version. The GNU Affero General Public License is a modified version of the ordinary GNU GPL version 3. It has one added requirement: if you run the program on a server and let other users communicate with it there, your server must also allow them to download the source code corresponding to the program that it’s running. If what’s running there is your modified version of the program, the server’s users must get the source code as you modified it. It is specifically designed to protect the Gephi Toolkit.

Gephi’s 100-Day Plan

Gephi team is very proud to announce 0.7 version of the software, after almost one year of hard, yet passionate work. You can download and try the alpha version with various datasets. Please give us all feedback on our new forum.

Gephi is turning now in a fully collaborative free software project, as its software architecture allows collaborative development. The 0.7 project was a complete rewriting of the code in a modular way. Features were spitted in modules that can be developed and managed by different developers more easily. The Google Summer of Code 09 was a perfect way to experience our architecture. The modules developed at that time are now fully integrated in the 0.7alpha release. We plan to have many developers interested by creating plugins for Gephi. We oriented our new website in that way, have a look on the Plugin Center. Plugins could be anything, from a new layout algorithm to the connector that build graphs from your enterprise data. If you’re interested, please join the discussion on the forum.

We hope to collaborate with the entire community to help drive network visualization forward. Gephi’s aim is not only proposing unseen features, but above all build a professional tool that works and serve community’s needs.

Official 100-Day plan, presented as top priorities first

  • Fix all reported bugs and improve usability. The 0.7 beta version will be released.
  • Fill users’ Quick Start guide and create a simple tutorial how to use Gephi.
  • Documentation, documentation and more documentation.
  • Make a feature-madness video, to let people know us.
  • Create tutorial pages for plugin development. At least layout, metrics and import.
  • Fill the Plugin Portal documentation on the wiki.
  • Create a Manifest and fill “Goals and Aims” on the website. Share our passion for graph visualization and highlight most challenging tasks.
  • Start Gephi Student Program, by proposing a set of programming tasks that can be managed in a semester or a quartile by computer science students.
  • Post blog articles about our approach and how Gephi is different (and great!).
  • Reorganize 0.7 specifications and build a clear roadmap and a discussion space for future developments. Invite people to join the specification team.
  • Finish documenting Gephi’s API.
  • Give more insight which dataset Gephi can deal with, get closer from users’ needs by learning which networks they wants to visualize and support them in this task.

Please consider joining us for achieving these tasks! See the wiki.

Follow us on Twitter

Version 0.7 launched

I’m pleased to inform the community that we are working on a new Gephi core. As it is explained in recent news about 0.6 beta2 release, we won’t publish an official 0.6 version and move directly to an entire new architecture. Though the 0.6 architecture is pretty stable and has been used for worth case studies (see our Exhibition section or WebAtlas gallery), it doesn’t fit with various new needs, like clustering support and complete dynamic network (DNA) support. The 0.6 core was used since September 2007 from now, mostly in an experimental context. As we receive many positive comments and support, we are adopting a long term view and enter a new development cycle. When increasing quality and modularity of the application, we expect more developers helping and a lower learning curve. Therefore the new architecture will be based on the awesome Netbeans Platform, currently the most advanced open-source framework for large Java modular applications. Plugins development will be eased when based on this framework.

Besides the framework, the most serious change will be located in data structures, with a brand new technology. Developed by us for more than 6 months, the DHNS (Durable Hierarchical Network Structure) is a very efficient data structure for storing hierarchical and dynamic networks. Thanks to this new module, new hard features like a clustering algorithm framework or hierarchical network navigation could be hatched more easily. I am currently writing a paper about these researches.

Other features will be added or improved within 0.7 version, you can have a look on our 2009 roadmap. It’s hard to say now exactly how priorities will be managed but I will publish regularly news about the progress for keeping you informed. For now the top priority is building the new architecture around Netbeans Platform.

Until the end of summer, Gephi project is hosted by RTGI SAS in Paris. Besides supporting our project, they plan to use Gephi daily and thus we have feedbacks and opportunities improving user interface and ergonomics.

Let’s go back to work and build the best open-source network visualization software.