Gephi updates with 0.9.1 version

splash091A new Gephi version has been released and can be downloaded from gephi.org. This version is an update from the 0.9.0 version released last December and mostly addresses issues discovered since.

One notable improvement is a new localization: German! Gephi is now localized in nine languages (English, French, Spanish, Japanese, Brazilian Portuguese, Russian, Chinese, Czech and German) and we hope to continue the momentum on this effort in the future.

new_german_l10n

Other notable improvements include a better support for parallel edges, appending to existing workspaces and how filters are saved in .gephi files. More than 60 bugs were fixed with a majority of them reported by the community. Thanks to all users who took the time to help! The complete list of bugfixes and improvements can be found in the changelog on GitHub.

In the next few weeks we would like to focus on documentation as there’s still many features brought in the 0.9.0 version without up-to-date documentation. This is especially important for more complex features such as dynamic graphs, which got a major upgrade.

As usual, please share your experience/feedback on our Facebook group or on Twitter.

Gephi 0.9 released: Play with network data again

splash

We’re proud to announce the release of the next major version of Gephi! This 0.9.0 version has been more than three years in the making but today brings an exciting new life to this project, and the graph/network analytics community at large. You can download it here for Windows, Mac OS X and Linux.

Gephi is the leading graph visualization software – known as the “Photoshop for networks” and is open-source and free. It has been downloaded more than a million times and is used by many scholars and data scientists around the world. This new release brings new features in the area of dynamic networks (i.e. network over time) and major compatibility and performance improvements.

Since the last release in 2013, users were facing compatibility issues with Java, which have been resolved with this release. Development had slow down three years ago but had never stopped. In fact, in March 2013 the time had come to think about what Gephi 1.0 would look like and realize it needed a new core. This was by far the most complex project the team had to overcome but developers had a long-term vision and know that future developments will now rely on a robust and extensible core, with world-class performances.

The world is increasingly complex and interconnected. Gephi’s purpose is to unfold this complex relational data in a way anyone can understand them. It allows you to visualize graph data as a map and create the visualizations to support your narratives. State-of-the-art algorithms make readable layouts, highlighting communities or influential nodes. Visual tools tweak colors and shapes to reveal hidden patterns in the data, helping solving complex problems. More and more network-maps are pictured in online, offline press and other communication media. They spread from science to business, art and activism. People are increasingly exposed to them and learn how to read them. Gephi aims to accelerate this commoditization process by providing free and easy-to-use tools.

What’s new in Gephi 0.9?

The list is too long! The complete changelog for this version can be found on GitHub’s release page.

Next steps

There are a few immediate next steps coming up right after this release. Following-up on the recent plugin development announcement we’ll get in touch with plugin developers and start migrating plugins to this version. There’s more than 80 plugins to update!

Then, we’ll identify and resolve new issues that appeared with this version. A future Gephi 0.9.1 release will come next year to address those.

A Gephi Toolkit release will also be made very soon so developers can update their application built on top of Gephi’s modules. In the meantime, we’re interested in users’ feedback and want to hear from you on Twitter or Facebook. Issues can directly be reported on GitHub as well, where the developers are.

Finally, thanks to all contributors and the community for supporting this project!

Plugin development gets new tools and opens-up to the community

Since the introduction of the Gephi Marketplace and tools such as the Plugins Bootcamp we’ve seen more and more plugins being developed. Even developers with little experience with Java give it a try and succeed in creating their first plugin. We want developers to be productive and make it as easy as possible to get started with plugin development and find help along the way. As the release of the 0.9 version is near, it’s time to review our plan on that matter and upcoming improvements. Here’s the summary:

  • The gephi-plugins base repository (i.e. repository plugin developers fork) is now using Maven for building and is simpler. It contains only 4 files versus 890 for the Ant-based system.
  • All Gephi modules are published on Maven central, making it very easy to inspect and extend.
  • Introduction of a custom Maven plugin designed to facilitate plugin development.
  • The submission and review of plugins will be entirely based of GitHub, making it more scalable and transparent.
  • A new online portal for plugins is coming up with an easier edit experience and new features.

From Ant to Maven

Before diving into plugins, let’s first review what has changed on how Gephi is compiled, built and packaged – as this directly affects plugins as well. Since the Gephi 0.8.2 version we have migrated our build system from Ant to Maven. This is in line with what the Netbeans Platform (i.e. which Gephi is based on) community recommends. It already has increased the level of automation we’re capable of as a result. The main benefits are (compared to Ant):

  • Maven is great at dependencies management. It’s now very clear what version of what library Gephi depends on, making it simpler to integrate. Dependencies are also downloaded automatically instead of being checked in the codebase
  • Unlike the Ant-based system, it’s independent from Netbeans. This allows developers not using Netbeans to develop Gephi and produce a build entirely from the command-line.
  • Gephi modules can now be placed on Maven Central (i.e. global repository where Maven finds its dependencies). This allows plugins to automatically find the Gephi dependencies online, reducing the manual steps at each Gephi upgrade.

Build assistant

There are a few critical steps we want to help plugin developers with and as a result started the development of a custom Maven plugin. This new tool will work behind the scenes when developers build their plugin. No installation or configuration is needed as it comes already as dependency of the gephi-plugins module. It already addresses common pain points and hope to automate more and more of the steps in the future. This is what it can do as of today:

  • Plugin validation: The assistant reviews the plugin configuration and metadata at each build. This allows for instance to check if the plugin depends on the correct Gephi version or remind the developer to define an author or license in its configuration.
  • Run Gephi with plugins: A single command allows to run Gephi with the plugins pre-installed. This makes testing faster than ever when developing plugins.
  • New plugin generator: A step-by-step command-line tool that creates the correct folder structure and configuration to get started.

In the future, we want to rely on this build assistant to further automate the process and for instance do easy migration or code generation. For instance, you could ask to generate a Layout plugin code and configuration. Afterwards, all needed would be to fill in the blanks in the code.

A new way to review and submit plugins

As the number of plugins grows, it’s important to have a clear process how plugins are reviewed and updated. We also want this process to be transparent and open to the community. So far, the process was based on the submission of the plugin binaries with a manual review done by the team. This helped us get where we are today, but we want to get it to the next level and propose to entirely move this process to GitHub – using the pull-request mechanism. This has multiple advantages, listed below:

  • Reviewing new/updated plugins can scale because any developer can read the code and contribute to the pull requests.
  • Developers are already asked to fork the gephi-plugins repository so submitting the plugin via GitHub is a natural extension to it.
  • There’s a clear history of each version, comment and what code has changed from one version to another.
  • It makes it easier to test plugins and detect issues before the plugin is approved.

As part of this migration, we’ll no longer add plugins with closed source code but all existing plugins for Gephi 0.8.2 will remain available. For security and stability reasons, it’s essential that each plugin’s code can be inspected before approval. In order for this to work, all existing plugins not already on GitHub or not forking the gephi-plugins repository will need to migrate. For those already set up, the migration will be easier but Ant-based plugins will still need to migrate to Maven.

To summarize, this is what the new 4-steps process looks like for developers:

gephi-new-plugin-development-process

In the current submission process we ask for additional information such as description, author or license as well as allow the upload of images. Going forward with GitHub, all of these data will directly be defined in the plugin’s configuration making it easier to update.

A new home for plugins (again)

Plugins are currently available online from the Gephi Marketplace, where users could also reach people providing teachings and support.  We have ideas on how to improve these community services and will be migrating them to a new architecture, starting with the plugins. We will tell you more about these changes in an upcoming post but for now our focus is on developing a new lightweight plugin portal that can directly be connected with the data source on GitHub.

Here is a preview of what it will look like for plugin pages:

new-plugin-frontend-preview

 

The content of this website will be automatically updated when plugins are published or updated. The way it works is with Travis CI (i.e. continuous integration platform) simply refreshing the JSON file after changes to the plugin repository on GitHub. Developers can even embed images and write the description in Markdown. This will remove entirely the need for plugin developers to login to the marketplace, update NBMs and metadata.

Migrating plugins

This new Maven-based repository along with the new submission process will be introduced with the Gephi 0.9 release. Let’s review what plugin developers need to know to bring their plugin to this new major version.

As with all major Gephi release, plugins compatibility needs to be evaluated as APIs may have changed. In fact, given this new version is based on an entirely redeveloped core it’s very likely code changes will be required. Hopefully, these changes will often be minor and actually simplify things (i.e less, more efficient code). Documentation will be published on these API changes and core developers will be available to answer questions as well.

Plugin developers will also get contacted regarding moving their code to GitHub with a step-by-step guide. We’re considering adding a migrate command to the new Gephi Maven plugin to facilitate the transition from Ant but that’s an unfunded project at the moment (if you’re interested contributing to that, please let us know). Stay tuned for details right after the release on the path to migration.

And again, thanks for all your hard work on bringing your ideas to life though new Gephi plugins!

 

Announcing Gephi 0.9 release date

Gephi 0.9

Gephi has an amazing community of passionate users and developers. In the past few years, they have been very dedicated creating tutorials, developing new plugins or helping out on GitHub. They also have been patiently waiting for a new Gephi release! Today we’re happy to share with you that the wait will come to an end December 20th with the release of Gephi 0.9 for Windows, MacOS X and Linux.

We’re very excited about this upcoming release and developers are hard at work to deliver its roadmap before the end of 2015. This release will resolve a serie of compatibility issues as well as improve features and performance.

Our vision for Gephi remains focused on a few fundamentals, which were already outlined in our Manifesto back in 2009. Gephi should be a software for everyone, powerful yet easy to learn. In many ways, we still have the impression that we’ve only scratched the surface and want to continue to focus on making each module of Gephi better. As part of this release, we’ve undertaken one of the most difficult project we’ve worked on and completely rewrote the core of Gephi. Although not very visible for the end-user, this brings new capabilities, better performance and a level of code quality we can be proud of. This ensure a very solid foundation for the future of this software and paves the way for a future 1.0 version.

Below is an overview of the new features and improvements the 0.9 version will bring.

Java and MacOS compatibility

This release will restore Gephi’s compatibility with the latest Java versions on all platforms. This will resolve issues our users encounter with Java 7 and 8. Compatibility issues with Mac OS will also be resolved and full support for Retina display screens added.

New redeveloped core

This release will improve performance and reliability by a large margin. The graph structure at Gephi’s core has been redeveloped from scratch and will bring multiple new features, better performance and lower memory consumption. On benchmarks, simple operations such as reading nodes or setting attributes see performance improvements ranging from 2X to 100X. This new core will make many operations in Gephi faster and push the envelope even further in large graphs exploration. Reducing memory usage has also been an area of focus and we have measured a 2X reduction compared to Gephi 0.8.2 on a medium-size graph.

New Appearance module

We’re introducing a new user module named Appearance designed to combine and replace Ranking and Partition modules. Appearance will group in one place all controls acting on the node or edge appearance. The partition capabilities will also greatly improve as part of this new module and a new palette selector is being added. In addition of the default palettes, we’re also adding a cool palette generator designed to find the optimal colors (i.e. partitions can be differentiated from each other). Moreover, it will be possible to “Auto-apply” partitions as well, which is a feature that was only available for Ranking so far.

Appearance Partition Palette selector Palette Generator

Timestamp support

This 0.9 release adds a new way to represent networks over time: timestamps. So far, users could only represent time using intervals and that was cumbersome when the underlying network data was collected at fixed time intervals (e.g. one network per day). Starting with this release, Gephi will support both intervals and timestamps to represent evolving network topology and/or evolving attribute values.

GEXF 1.3 support

The GEXF format is also evolving to its 1.3 version. This version improves the support for graphs over time and introduces the ability to represent time using timestamps rather than intervals. In addition, it’s now possible to set a timestamp or an interval for the entire graph, allowing building collections of GEXF files where each represents a “slice”. This is a common request from the community and we hope this will greatly facilitate the exploration of longitudinal networks.

Multiple files import

With the 0.9 version users will be able to import multiple files in Gephi at the same time. Once the files have been read, two choices are offered, either to import each file into a separate workspace or merge them into the same workspace. The latter is a powerful option when used with dynamic graphs. Indeed, a collection of GEXF files representing the same network over time will be imported in Gephi in a single step.

Import Report Multiple Graphs

Multi-graph support

Multi-graph exampleThis release will bring support for multi-graphs, where multiple edges can exist between two nodes each with a different relationship. Users will be able to import, filter and run algorithms on these graphs but the support for visualizing these graphs will come in a further release.

New workspace selection UI

We’ve heard users’ feedback and the workspace selection user interface will be improved. The new interface will be a “tab-style” interface where each workspace is a tab and switching from one workspace to another only requires a single click. Tabs will be located at the top just under the perspective selection. The previous interface is located at the bottom right corner and will be entirely removed.Workspace Tabs

Gephi Toolkit release

A new release of the Gephi Toolkit, based on the 0.9 version will be made soon after December 20th.

Bug fixes

We’ve done a serious bug squash and already addressed many difficult issues, more to come until the release date.

separator

Follow us on Twitter or join the Facebook group to get the latest news. If you want to know more about this upcoming release, or want to help out please send us a note.

Gephi is asleep and up to awaken

gephi round graph

Gephi has been almost inactive since quite a long time: we did not release, we did not fix issues, we did not post on the blog. This lack of recent updates creates an increasing amount of difficulties, including installing Gephi on a recent Mac computer. A lot of users ask if the project is still alive. We understand why you wonder and decided to write this post to explain where we’re at and provide to the community a preview of what’s next in Gephi’s lifecycle.

In short, Gephi is still alive yet asleep, but its reawakening is in sight. However, a series of issues prevents us from doing better right now.

The ambitious yet incomplete 0.9 release

The next planned release is Gephi 0.9 and promises to be a major release with a complete rewrite of Gephi’s core module. Performance, and especially memory usage for large graphs has been a lingering issue since the first version of the software. As explained in this article written by Mathieu Bastian – Gephi’s lead developer – the solution resides in a more efficient graph structure implementation that we named “GraphStore”. This technology brings many new features and significantly reduces the memory usage but is a large development effort and requires all modules to be adapted. Indeed, the module which stores and manages the graph is pretty much used by every other module (e.g layout, filters, preview etc.).

This work on the core graph module was initiated as part of a larger vision focused towards a 1.0 release, which aims to address a much larger set of problems, missing measures and bugs. As you may know, the current version has a various set of problems. Some issues are preventing the normal use of the software, like the difficulties to install it on a recent Mac OS X (Yosemite and Maverick). Others are incomplete or missing features, such as various user interface design issues or the improper management of categories’ colors. Finally, some internal problems are hidden in the code but nevertheless real. For example, the technology used to code the user interface (Swing) has been replaced by a more modern technology named JavaFX. For the most part, these problems require a deep rework of the code. The good news is that the most difficult part in this 1.0 vision is probably the rewriting of Gephi’s core graph module, which is what the 0.9 version focus on already.

The current 0.9 developments have reached around 80% completion and many modules, but not all, have already been adapted to GraphStore. A stable version can’t be released until this reaches 100% and all the modules are converted to the new core implementation. Other important issues such as installation issues on recent Macs have already been addressed in this development version. Finally, a series of bugs will be fixed along with minor features and improvements. Finishing the last modules and releasing the 0.9 version is our current priority.

Limited resources

Like many other open source projects, Gephi’s development is for the most part unpaid and remains an activity on the side for all contributors. The notable exception is the Google Summer of Code, which sponsored students multiple years in a row to work on the project. Therefore, the project’s progress depends on the contributors’ professional and personal situations. Although individuals are ready and willing, time is limited and there was just not enough of it lately to make significant progress. Mathieu Bastian is Gephi’s architect and has been behind the software’s key iterations since 2007. This time again he holds the keys to its future and has been involved in the GraphStore project. This complex project requires all of his knowledge of Gephi’s code and is hardly a task someone else could do at this point. Therefore, a part of our development depends on his free time, and we accept it. This situation is temporary though. Indeed, Mathieu will eventually obtain more time to conclude the work on the 0.9 release and Gephi’s development will be less dependent on him in the future.

In addition, we are working on stabilizing some resources in the long run, but our strategy requires a readjustment. Gephi needs time and energy from good java developers, clear-minded designers, and seasoned software architects. We have to entice skilled people, support their involvement and get the best from their contributions. We aim to improve the management of our limited workforce to make the development more attractive and dynamic. This evolution is organized by our team but benefits from external support. For instance, the Sciences Po médialab, the institution I belong to, provides resources for organizing the project, rethinking the user interface and some coding. These changes may not be immediately visible but we’re committed for the medium and long term.

What is next

Releasing the Gephi 0.9 version is the immediate next step. This version will include compatibility fixes and the whole new core based on GraphStore. Then, an important project to rework the overall user experience will be kicked off. It requires a technology switch (from Swing to JavaFX) and the overhaul of a majority of the modules but aims to make Gephi simpler and more intuitive to use. We already have a good diagnostic of the user experience issues in Gephi but need to explore different designs. In an upcoming blog post, I will explain our thought process on this topic with the help of Professore Donato Ricci, senior interaction designer. Eventually, the 1.0 version will be worked on and released.

Gephi is almost 10 years old. It is usable but still plagued with many well-known issues. Though sometimes frustrating, it allows users to do incredible things. We think Gephi is still relevant to research, journalism, civil society and more. We are going to give it the renewal it deserves.

Mathieu Jacomy

Google Summer of Code 2013

It’s a great news, Gephi has been accepted again for the Google Summer of Code for the 5th year! The program is the best way for students around the world to start contributing to an open-source project. Since 2009, each edition is a great success and dramatically boosted Gephi’s project development.

What is Gephi?

Networks are everywhere: email systems, financial transaction systems and gene-protein interaction networks are just a few examples. Gephi began as a university student project four years ago and has quickly become an open source software leader in the visualization and analysis of large networks. It is an important contribution to the ecosystem of tools used by researchers and big data analysts to explore and extract value from the deluge of relational data and disseminate a better understanding for people to think about a “connected” world.

Gephi is a “Photoshop” for graphs: designed to make data navigation and manipulation easy, it covers the entire process from data importing to aesthetics refinements and communication. Users interact with the visualization and manipulate structures, shapes and colors to reveal the properties of complex and messy data. The goal is to help data analysts make hypotheses and intuitively discover patterns or errors in large data collections.

Gephi’s project aims at providing the perfect tool to visualize and analyze networks. We focus on usability, performance and modularity:

  • Usability: Easy to install, an UI without scripts and real-time manipulation.
  • Performance: Visualization engine and data structures are built scalable. Supporting always-larger graphs is an endless challenge!
  • Modularity: Extensible software architecture, built on top of Netbeans Platform. Add plug-ins with ease.

Learn more about Gephi, watch Introducing Gephi 0.7, download and try it by following Quick Start Tutorial.

Gephi’s project is young, the growing community is composed of engineers and scientists involved in network science, datavis and complex networks.

List of ideas

List of ideas are availabe on our wiki. They cover various skills and level of difficulties:

* Completing Legend moduleComplete the Legend module, which was started in last year GSoC.
* GraphStore benchmark and tuningOptimize and tune GraphStore based on a serie of new well-defined benchmarks.

Please also propose your ideas on the forum. They will be considered and discussed by the community. Have a look at our long-term Roadmap.

Students, join the network

Students, apply now for Gephi proposals. Join us on the forum and fill in the questionnaire. Be careful, deadline for submitting proposals is May 3 (timeline)!

Hélder Suzuki, student for Gephi in 2009 and now software engineer at Google, wrote:
At Gephi students will have the opportunity to produce high impact work on a rapidly growing area and be noted for it.

View our previous Google Summer of Code projects here and read former students interviews.

Follow Gephi on Twitter

Rebuilding Gephi’s core for the 0.9 version

This is the first article about the future Gephi 0.9 version. Our objective is to prepare the ground for a future 1.0 release and focus on solving some of the most difficult problems. It all starts with the core of Gephi and we’re giving today a preview of the upcoming changes in that area. In fact, we’re rewriting the core modules from scratch to improve performance, stability and add new features. The core modules represent and store the graph and attributes in memory so it’s available to the rest of the application. Rewriting Gephi’s core is like replacing the engine of a truck and involves adapting a lot of interconnected pieces. Gephi’s current graph structure engine was designed in 2009 and didn’t change much in multiple releases. Although it’s working, it doesn’t have the level of quality we want for Gephi 1.0 and needs to be overhauled. The aim is to complete the new implementation and integrate it in the 0.9 version.

In November 2012, we started to develop a completely new in-memory graph structure implementation for Gephi based on what we’ve learnt over the years and our desire to design a solution that will last. The project code-name is GraphStore and we focus on four main things:

  • Performance: The graph structure is so important to the rest of the application that is has to be fast and memory efficient.
  • Stability: The new code will be the most heavily unit-tested in the history of Gephi.
  • Simplicity: The Graph API should be documented and easy to use for developers.
  • Openness: If possible, we want GraphStore to be used in other projects and keep the code free of Gephi-specific concepts.

Gephi is known to use a large amount of memory, especially for very large networks. We want to challenge ourselves and tackle this issue by redesigning the way graphs are encoded and stored. Besides memory usage, we carefully analyzed possible solutions to improve read/write performance and optimize the throughput. Stability and simplicity are like food and shelter, and whatever we try to do at Gephi should be simple to use and stable. As we’re going towards a 1.0 version, we’re putting more and more efforts to testing and code quality.

Since November 2012, we have been working on GraphStore separately from Gephi’s codebase and will start the integration fairly soon. The Graph API is very similar to the existing API. However, it isn’t entirely compatible and several core things changed like attributes, views or dynamic networks and will require a lot of work in some modules. On the other hand, because the GraphStore code is decoupled, it could be leveraged in other projects. For instance, it could serve as a Blueprints implementation as an alternative to TinkerGraph.

Graph structure

A graph (also called network) is a pair of a set of nodes and a set of edges. Edges can be undirected, or directed if the direction of the relation matters. Edges may also have weights to represent a value attached to the edges, like the strength of a connection or the flow capacity. Edges may also point to the same node (i.e. self-loops). Gephi currently supports these features, but they are not sufficient to describe the variety of problems graphs can be helpful with. Multigraphs permit several relationships between nodes and is for instance commonly used to represent RDF graphs. Multigraphs with properties (i.e. ability to attach any property to nodes and edges) have recently become the standard representation for graph databases.

The next version of Gephi will support multigraphs and therefore allow multiple edges between nodes to be imported. The rollout will be done in two phases. The first phase is to allow this new type of graph to be imported, filtered and exported. We will update the importers and add new options to support these graphs. The second phase is to update the visualization and the way multiple edges between nodes look like.

Hierarchical graphs

Since the 0.7 version released in 2009, Gephi has supported hierarchical graphs. Hierarchical graphs let the user group or ungroup nodes so it forms a tree. Nodes which contain other nodes are named meta-nodes and edges are collapsed into meta-edges. Groups obtained from clustering algorithm (e.g. modularity) could also easily be collapsed into meta-nodes in order to study the network at a higher level. We initially recognized the potential of this idea for network analysis and developed a hierarchy-enabled data structure. However, we realized we didn’t completely fulfill the vision by not providing all the tools to fully explore and manage hierarchical networks. Although the data structure allows it, the software still lacks many features to really make hierarchical networks explorable.

Recently, we are more focused on networks over time and plan to continue to do so. In the past years, users have shown steady and continuous interest in dynamic networks and we haven’t really seen a strong interest in hierarchical networks. Therefore, we propose to remove this feature from next releases. On the developer side, cutting this feature will greatly simplify the code and improve performance.

Dynamic networks

Networks that change over time are some of the most interesting to visualize and analyze. We have heavily invested in supporting this type of network, for instance by developing the Timeline component. However, dynamic graph support was added after the current graph structure implementation was conceived and therefore remains suboptimal and difficult to scale. Now that we have enough hindsight, we can rethink how this should be done and make it simpler.

One pain point is the way we decided to represent the time. Essentially, there are two ways to represent time for a particular node in a graph: timestamps or intervals. Timestamps are a list of points where the particular nodes exist and intervals have a beginning and an end. For multiple reasons, we thought intervals would be easier to manipulate and more efficient than a (possibly very large) set of timestamps. By talking to our users, we found that intervals are rarely used in real-world data. On the code side, we also found that it makes things much more complex and not that efficient at the end.

In future versions, we’ll remove support for intervals and add timestamps instead. We considered supporting both intervals and timestamps but decided that it would add too much complexity and confusion.

Graph structure internals

Graph structures design is an interesting problem to solve. The objective is quite simple, yet challenging: how to best represent an interconnected graph so it’s fast to query and compact in space? Also, how to keep it simple and serve a large number of features at the same time?

Graph storage

Our goal is to develop a thread-safe, in-memory graph structure implementation in Java suitable for real-time analysis. You may ask how this differs from a graph database or a distributed graph analysis package. In a few words, one can say the requirements are quite different.

Graph databases like Neo4j, OrientDB or Titan store the graph on local disk or in a cluster and are optimized for large graphs and large number of concurrent users. Typically, the networks are much larger than what can fit in memory and these databases mostly focus on answering traversal queries. In the environment where graph databases operate most of the needs can be converted in some sort of traversal query (e.g. friends of X, tweets of Y). Traversal queries are also the reason why graph databases scale to billions of nodes. Indeed, for each traversal, only a subset of the graph is accessed. This is quite different from Gephi, which by its nature of being an analysis software needs to access the complete graph. For instance, when a layout is running Gephi needs to read the X,Y position of each node as quickly as possible. Although reading from the disk can be very quick as well (e.g. GraphChi), it’s limited to sequential access and things become more complex that way.

Because of the real-time requirements, we want to keep our graph data in memory accessible at all time. However, we want to make it easy to connect to external data sources, and graph databases in particular.

Reducing overhead

In computer science, overhead is any combination of excess or indirect computation time, memory, bandwidth, or other resources that are required to attain a particular goal.

GraphStore heavily relies on Java primitives, arrays and efficient collections library like fastutil. We are reducing overhead by simply avoiding using too many Java objects, which are very costly. Instead of using maps, trees or lists, Nodes and Edges are stored in large arrays which can be dynamically resized in blocks. For instance, iterating over the graph should be extremely fast because the CPU caches array blocks. This may sounds obvious but performance optimizations are tricky in Java because of the JVM and the uncertainty of what makes a difference and what doesn’t. In his “Effective Java” book, Joshua Bloch writes “Don’t guess, measure” and that’s still true today. For our project, we rely on well-defined micro-benchmarks to see where the bottlenecks are and how to make our data-structure more cache-friendly and more compact in memory. When the graph contains millions of edges, every byte saved per edge can make a large difference at the end.

In terms of speed, we focused on optimizing the most common operations, which are iterating over all the elements and consult nodes’ neighbors. Typically, a layout algorithm needs to read the neighbors of every node at each iteration. Neighbors can’t simply be an unsorted list because of the removal complexity: to remove a node, you need to know where it is. The current Gephi graph structure uses a binary tree to store the node’s neighbors. Although the complexity is logarithmic, every node in the tree takes extra memory and logarithmic complexity is still suboptimal. After isolating the problem in a benchmark, we found that using a double linked-list is the best solution for our requirements and achieves a O(1) complexity, as it fulfills both a quick iteration and quick update. Here is a snapshot of the solution:

Every edge has 4 integer pointers to the next in/out predecessor and successors and a separate dictionary would help to find the right edge based on the source and destination pointers. Each node has a pointer to the first edge in the linked list (i.e the head). Node ids are integers (32 bits) so one can easily create a long->Edge dictionary by encoding the source and destination node into a single long number (64 bits). The diagram intentionally leaves out the multigraph support for simplicity. In reality, nodes can have multiple head pointers, one for each edge type. Each edge type is represented by a integer index.

Views

Views are one of the most useful aspects of Gephi’s graph structure and are mainly used behind the scenes in the Filter module. A view is a graph subset (i.e. a subgraph) which remains connected to the main structure, so if a node is removed from the graph, it’s removed from the views as well. For instance, when users create a ‘Degree Filter’, Gephi creates a view and removes all the nodes which don’t fulfill the degree threshold. Multiple views can co-exist at any time in the graph structure. In the current graph structure, a node tree complete copy is done for every view and we found that this can be very inefficient.

In the new version, the way views are implemented is very different and should yield to better performance. Instead of doing a copy of the nodes, we maintain bit-vectors for nodes and edges. Because these elements are stored in large arrays with a unique identifier, it’s easy to create and maintain a bit-vector. When developers obtain the ‘Graph’ object for a particular view, the bit-vectors are used behind the scenes to adapt iterators and accessors. This solution should make filtering for large graphs much quicker. One drawback is that whereas the current implementation copies and then trims the view, GraphStore work with bit-vectors but continues to access the complete graph. In other words, if the view represents only 1% of the original graph, it still needs to iterate over the 100% to find which elements are the 1%. Even though this sounds bad, our benchmarks show it’s a very fast operation and we win overall because of the reduced overhead of duplicating the graph. Moreover, we can introduce some caching later to optimize this further.

Inverted Index

When you’re using the Partition module in Gephi, you’re manipulating some sort of inverted index. Nodes and edges have properties like ‘gender’, ‘age’ or ‘country’, and these properties are contained within the nodes and edges objects. An index is a simple data structure which allows to retrieve the list of elements for a particular value. For instance, the partition module needs to know what is the number of ‘male’ or ‘female’ nodes for the ‘gender’ column. When the column is a number like ‘age’, it also needs to know what is the maximum and minimum value. Unlike the Ranking module and its auto-apply feature, the Partition module is not refreshed in real-time and therefore difficult to use when the graph is changing a lot. We have decided to invest in this feature for the future release and are building a real column inverted index in the graph structure. The index will simply keep track of which values exist for each column and which elements are holding this value. The index will be updated in real-time as elements are added, removed or updated.

The ability to quickly retrieve elements and counts based on specific values will be very useful in many different modules like Filters, Partition or Data Laboratory. New APIs will be added for developers to use the newly created index interface. As we’re working on attributes storage and manipulation, we’ll also merge the Attributes and Graph API because they are so interconnected that it doesn’t really make sense to have them separate. The interfaces that developers are familiar with like Table or Columns will remain the same.

Events

In software programming, events are a common way to inform other modules that something changed. In Gephi, we also use events to convey graph updates events to inform other modules about updated nodes or edges. In the new GraphStore, we’ll stop using events to transport graph modifications because of the large overhead due to the creation of event objects. Indeed, when 10K nodes are added to the graph, the existing structure literally creates 10K event objects and puts them in a queue. Although the event queue is compressing objects of the same type, the overhead to create, queue, send and destroy large amount of small Java objects is too large.

Instead of a push model (i.e. the emitter is pushing updates), we want to rather promote a pull model (i.e. the listener pull updates from time to time) for future releases. A similar system is already in place to link the graph and the visualization module and it has been working without a glitch. We’ll develop the tools to easily calculate graph differences between a listener module and the graph structure. By removing the bottleneck, write performance should greatly improve.

Timestamps

As said earlier, we’ll add timestamps support to represent dynamic networks. Instead of using a time interval, a timestamp array will be associated with nodes and edges. For element (node/edge) visibility, each timestamp represents the presence of the element at that time. For example if a network snapshot is collected every month for a year, each node will have up to 12 different timestamps. The timestamp itself is a real number and can therefore represents an epoch time but also any other value in a different context. For a dynamic attribute, the time+value is simply represented as a list of (time, value) pairs.

To support the timeline and dynamic networks algorithm, we’re developing an inverted index for timestamps so we can make time filtering very quick. One good thing about intervals is that it’s very easy to know if two intervals overlaps with each other. With a flat list of timestamps, one can’t avoid to go through the entire list. The index will essentially map timestamps to the nodes and edges elements in the graph and therefore solves this issue. The Interval tree implementation which we are currently using to store intervals is based on a binary tree and is very costly in memory because of all the Java objects overhead. Using simple arrays should reduce overhead and improve performance for large dynamic networks. When computing a dynamic network algorithm (ex: Clustering coefficient over time), we’re using a sliding window over the graph so the ability to quickly filter is critical as it impacts how fast the graph refreshes.

Saving/Loading

Saving and loading the graph structure into into/from a file (or a stream) is another critical feature. When a user saves a project in Gephi, the graph data structure is serialized in XML and compressed into a .gephi file. If you worked with project files in Gephi, you may have experienced corrupted files issues or errors when loading a file. We’ve done our best to fix these problems but some still remain. We’re rethinking how this should be done in GraphStore and are making a call to rewrite the code from scratch. Our approach will rely on a lot of unit tests to make sure the code is stable so we don’t repeat the same issues in future versions. Please note that this concerns the .gephi files only and existing importers (e.g. GEXF, GraphML) will remain the same.

Concerning the GraphStore serialization, we’re abandoning XML in favor of pure byte arrays. That should yield to better performance and reduced project file size. We’ll create a custom reader for previous Gephi versions so you can still open your existing projects. Other modules like Filters or Preview will continue to use XML as it’s working just fine.

Next steps

This is the first post about the Gephi 0.9 version and more will come soon. We’re excited about the current developments and hope to hear from you. Please join the gephi-dev mailing list to learn more about ongoing projects and contribute. We need your ideas!

Follow us on Twitter!