Google Summer of Code 2013

It’s a great news, Gephi has been accepted again for the Google Summer of Code for the 5th year! The program is the best way for students around the world to start contributing to an open-source project. Since 2009, each edition is a great success and dramatically boosted Gephi’s project development.

What is Gephi?

Networks are everywhere: email systems, financial transaction systems and gene-protein interaction networks are just a few examples. Gephi began as a university student project four years ago and has quickly become an open source software leader in the visualization and analysis of large networks. It is an important contribution to the ecosystem of tools used by researchers and big data analysts to explore and extract value from the deluge of relational data and disseminate a better understanding for people to think about a “connected” world.

Gephi is a “Photoshop” for graphs: designed to make data navigation and manipulation easy, it covers the entire process from data importing to aesthetics refinements and communication. Users interact with the visualization and manipulate structures, shapes and colors to reveal the properties of complex and messy data. The goal is to help data analysts make hypotheses and intuitively discover patterns or errors in large data collections.

Gephi’s project aims at providing the perfect tool to visualize and analyze networks. We focus on usability, performance and modularity:

  • Usability: Easy to install, an UI without scripts and real-time manipulation.
  • Performance: Visualization engine and data structures are built scalable. Supporting always-larger graphs is an endless challenge!
  • Modularity: Extensible software architecture, built on top of Netbeans Platform. Add plug-ins with ease.

Learn more about Gephi, watch Introducing Gephi 0.7, download and try it by following Quick Start Tutorial.

Gephi’s project is young, the growing community is composed of engineers and scientists involved in network science, datavis and complex networks.

List of ideas

List of ideas are availabe on our wiki. They cover various skills and level of difficulties:

* Completing Legend moduleComplete the Legend module, which was started in last year GSoC.
* GraphStore benchmark and tuningOptimize and tune GraphStore based on a serie of new well-defined benchmarks.

Please also propose your ideas on the forum. They will be considered and discussed by the community. Have a look at our long-term Roadmap.

Students, join the network

Students, apply now for Gephi proposals. Join us on the forum and fill in the questionnaire. Be careful, deadline for submitting proposals is May 3 (timeline)!

Hélder Suzuki, student for Gephi in 2009 and now software engineer at Google, wrote:
At Gephi students will have the opportunity to produce high impact work on a rapidly growing area and be noted for it.

View our previous Google Summer of Code projects here and read former students interviews.

Follow Gephi on Twitter

Rebuilding Gephi’s core for the 0.9 version

This is the first article about the future Gephi 0.9 version. Our objective is to prepare the ground for a future 1.0 release and focus on solving some of the most difficult problems. It all starts with the core of Gephi and we’re giving today a preview of the upcoming changes in that area. In fact, we’re rewriting the core modules from scratch to improve performance, stability and add new features. The core modules represent and store the graph and attributes in memory so it’s available to the rest of the application. Rewriting Gephi’s core is like replacing the engine of a truck and involves adapting a lot of interconnected pieces. Gephi’s current graph structure engine was designed in 2009 and didn’t change much in multiple releases. Although it’s working, it doesn’t have the level of quality we want for Gephi 1.0 and needs to be overhauled. The aim is to complete the new implementation and integrate it in the 0.9 version.

In November 2012, we started to develop a completely new in-memory graph structure implementation for Gephi based on what we’ve learnt over the years and our desire to design a solution that will last. The project code-name is GraphStore and we focus on four main things:

  • Performance: The graph structure is so important to the rest of the application that is has to be fast and memory efficient.
  • Stability: The new code will be the most heavily unit-tested in the history of Gephi.
  • Simplicity: The Graph API should be documented and easy to use for developers.
  • Openness: If possible, we want GraphStore to be used in other projects and keep the code free of Gephi-specific concepts.

Gephi is known to use a large amount of memory, especially for very large networks. We want to challenge ourselves and tackle this issue by redesigning the way graphs are encoded and stored. Besides memory usage, we carefully analyzed possible solutions to improve read/write performance and optimize the throughput. Stability and simplicity are like food and shelter, and whatever we try to do at Gephi should be simple to use and stable. As we’re going towards a 1.0 version, we’re putting more and more efforts to testing and code quality.

Since November 2012, we have been working on GraphStore separately from Gephi’s codebase and will start the integration fairly soon. The Graph API is very similar to the existing API. However, it isn’t entirely compatible and several core things changed like attributes, views or dynamic networks and will require a lot of work in some modules. On the other hand, because the GraphStore code is decoupled, it could be leveraged in other projects. For instance, it could serve as a Blueprints implementation as an alternative to TinkerGraph.

Graph structure

A graph (also called network) is a pair of a set of nodes and a set of edges. Edges can be undirected, or directed if the direction of the relation matters. Edges may also have weights to represent a value attached to the edges, like the strength of a connection or the flow capacity. Edges may also point to the same node (i.e. self-loops). Gephi currently supports these features, but they are not sufficient to describe the variety of problems graphs can be helpful with. Multigraphs permit several relationships between nodes and is for instance commonly used to represent RDF graphs. Multigraphs with properties (i.e. ability to attach any property to nodes and edges) have recently become the standard representation for graph databases.

The next version of Gephi will support multigraphs and therefore allow multiple edges between nodes to be imported. The rollout will be done in two phases. The first phase is to allow this new type of graph to be imported, filtered and exported. We will update the importers and add new options to support these graphs. The second phase is to update the visualization and the way multiple edges between nodes look like.

Hierarchical graphs

Since the 0.7 version released in 2009, Gephi has supported hierarchical graphs. Hierarchical graphs let the user group or ungroup nodes so it forms a tree. Nodes which contain other nodes are named meta-nodes and edges are collapsed into meta-edges. Groups obtained from clustering algorithm (e.g. modularity) could also easily be collapsed into meta-nodes in order to study the network at a higher level. We initially recognized the potential of this idea for network analysis and developed a hierarchy-enabled data structure. However, we realized we didn’t completely fulfill the vision by not providing all the tools to fully explore and manage hierarchical networks. Although the data structure allows it, the software still lacks many features to really make hierarchical networks explorable.

Recently, we are more focused on networks over time and plan to continue to do so. In the past years, users have shown steady and continuous interest in dynamic networks and we haven’t really seen a strong interest in hierarchical networks. Therefore, we propose to remove this feature from next releases. On the developer side, cutting this feature will greatly simplify the code and improve performance.

Dynamic networks

Networks that change over time are some of the most interesting to visualize and analyze. We have heavily invested in supporting this type of network, for instance by developing the Timeline component. However, dynamic graph support was added after the current graph structure implementation was conceived and therefore remains suboptimal and difficult to scale. Now that we have enough hindsight, we can rethink how this should be done and make it simpler.

One pain point is the way we decided to represent the time. Essentially, there are two ways to represent time for a particular node in a graph: timestamps or intervals. Timestamps are a list of points where the particular nodes exist and intervals have a beginning and an end. For multiple reasons, we thought intervals would be easier to manipulate and more efficient than a (possibly very large) set of timestamps. By talking to our users, we found that intervals are rarely used in real-world data. On the code side, we also found that it makes things much more complex and not that efficient at the end.

In future versions, we’ll remove support for intervals and add timestamps instead. We considered supporting both intervals and timestamps but decided that it would add too much complexity and confusion.

Graph structure internals

Graph structures design is an interesting problem to solve. The objective is quite simple, yet challenging: how to best represent an interconnected graph so it’s fast to query and compact in space? Also, how to keep it simple and serve a large number of features at the same time?

Graph storage

Our goal is to develop a thread-safe, in-memory graph structure implementation in Java suitable for real-time analysis. You may ask how this differs from a graph database or a distributed graph analysis package. In a few words, one can say the requirements are quite different.

Graph databases like Neo4j, OrientDB or Titan store the graph on local disk or in a cluster and are optimized for large graphs and large number of concurrent users. Typically, the networks are much larger than what can fit in memory and these databases mostly focus on answering traversal queries. In the environment where graph databases operate most of the needs can be converted in some sort of traversal query (e.g. friends of X, tweets of Y). Traversal queries are also the reason why graph databases scale to billions of nodes. Indeed, for each traversal, only a subset of the graph is accessed. This is quite different from Gephi, which by its nature of being an analysis software needs to access the complete graph. For instance, when a layout is running Gephi needs to read the X,Y position of each node as quickly as possible. Although reading from the disk can be very quick as well (e.g. GraphChi), it’s limited to sequential access and things become more complex that way.

Because of the real-time requirements, we want to keep our graph data in memory accessible at all time. However, we want to make it easy to connect to external data sources, and graph databases in particular.

Reducing overhead

In computer science, overhead is any combination of excess or indirect computation time, memory, bandwidth, or other resources that are required to attain a particular goal.

GraphStore heavily relies on Java primitives, arrays and efficient collections library like fastutil. We are reducing overhead by simply avoiding using too many Java objects, which are very costly. Instead of using maps, trees or lists, Nodes and Edges are stored in large arrays which can be dynamically resized in blocks. For instance, iterating over the graph should be extremely fast because the CPU caches array blocks. This may sounds obvious but performance optimizations are tricky in Java because of the JVM and the uncertainty of what makes a difference and what doesn’t. In his “Effective Java” book, Joshua Bloch writes “Don’t guess, measure” and that’s still true today. For our project, we rely on well-defined micro-benchmarks to see where the bottlenecks are and how to make our data-structure more cache-friendly and more compact in memory. When the graph contains millions of edges, every byte saved per edge can make a large difference at the end.

In terms of speed, we focused on optimizing the most common operations, which are iterating over all the elements and consult nodes’ neighbors. Typically, a layout algorithm needs to read the neighbors of every node at each iteration. Neighbors can’t simply be an unsorted list because of the removal complexity: to remove a node, you need to know where it is. The current Gephi graph structure uses a binary tree to store the node’s neighbors. Although the complexity is logarithmic, every node in the tree takes extra memory and logarithmic complexity is still suboptimal. After isolating the problem in a benchmark, we found that using a double linked-list is the best solution for our requirements and achieves a O(1) complexity, as it fulfills both a quick iteration and quick update. Here is a snapshot of the solution:

Every edge has 4 integer pointers to the next in/out predecessor and successors and a separate dictionary would help to find the right edge based on the source and destination pointers. Each node has a pointer to the first edge in the linked list (i.e the head). Node ids are integers (32 bits) so one can easily create a long->Edge dictionary by encoding the source and destination node into a single long number (64 bits). The diagram intentionally leaves out the multigraph support for simplicity. In reality, nodes can have multiple head pointers, one for each edge type. Each edge type is represented by a integer index.

Views

Views are one of the most useful aspects of Gephi’s graph structure and are mainly used behind the scenes in the Filter module. A view is a graph subset (i.e. a subgraph) which remains connected to the main structure, so if a node is removed from the graph, it’s removed from the views as well. For instance, when users create a ‘Degree Filter’, Gephi creates a view and removes all the nodes which don’t fulfill the degree threshold. Multiple views can co-exist at any time in the graph structure. In the current graph structure, a node tree complete copy is done for every view and we found that this can be very inefficient.

In the new version, the way views are implemented is very different and should yield to better performance. Instead of doing a copy of the nodes, we maintain bit-vectors for nodes and edges. Because these elements are stored in large arrays with a unique identifier, it’s easy to create and maintain a bit-vector. When developers obtain the ‘Graph’ object for a particular view, the bit-vectors are used behind the scenes to adapt iterators and accessors. This solution should make filtering for large graphs much quicker. One drawback is that whereas the current implementation copies and then trims the view, GraphStore work with bit-vectors but continues to access the complete graph. In other words, if the view represents only 1% of the original graph, it still needs to iterate over the 100% to find which elements are the 1%. Even though this sounds bad, our benchmarks show it’s a very fast operation and we win overall because of the reduced overhead of duplicating the graph. Moreover, we can introduce some caching later to optimize this further.

Inverted Index

When you’re using the Partition module in Gephi, you’re manipulating some sort of inverted index. Nodes and edges have properties like ‘gender’, ‘age’ or ‘country’, and these properties are contained within the nodes and edges objects. An index is a simple data structure which allows to retrieve the list of elements for a particular value. For instance, the partition module needs to know what is the number of ‘male’ or ‘female’ nodes for the ‘gender’ column. When the column is a number like ‘age’, it also needs to know what is the maximum and minimum value. Unlike the Ranking module and its auto-apply feature, the Partition module is not refreshed in real-time and therefore difficult to use when the graph is changing a lot. We have decided to invest in this feature for the future release and are building a real column inverted index in the graph structure. The index will simply keep track of which values exist for each column and which elements are holding this value. The index will be updated in real-time as elements are added, removed or updated.

The ability to quickly retrieve elements and counts based on specific values will be very useful in many different modules like Filters, Partition or Data Laboratory. New APIs will be added for developers to use the newly created index interface. As we’re working on attributes storage and manipulation, we’ll also merge the Attributes and Graph API because they are so interconnected that it doesn’t really make sense to have them separate. The interfaces that developers are familiar with like Table or Columns will remain the same.

Events

In software programming, events are a common way to inform other modules that something changed. In Gephi, we also use events to convey graph updates events to inform other modules about updated nodes or edges. In the new GraphStore, we’ll stop using events to transport graph modifications because of the large overhead due to the creation of event objects. Indeed, when 10K nodes are added to the graph, the existing structure literally creates 10K event objects and puts them in a queue. Although the event queue is compressing objects of the same type, the overhead to create, queue, send and destroy large amount of small Java objects is too large.

Instead of a push model (i.e. the emitter is pushing updates), we want to rather promote a pull model (i.e. the listener pull updates from time to time) for future releases. A similar system is already in place to link the graph and the visualization module and it has been working without a glitch. We’ll develop the tools to easily calculate graph differences between a listener module and the graph structure. By removing the bottleneck, write performance should greatly improve.

Timestamps

As said earlier, we’ll add timestamps support to represent dynamic networks. Instead of using a time interval, a timestamp array will be associated with nodes and edges. For element (node/edge) visibility, each timestamp represents the presence of the element at that time. For example if a network snapshot is collected every month for a year, each node will have up to 12 different timestamps. The timestamp itself is a real number and can therefore represents an epoch time but also any other value in a different context. For a dynamic attribute, the time+value is simply represented as a list of (time, value) pairs.

To support the timeline and dynamic networks algorithm, we’re developing an inverted index for timestamps so we can make time filtering very quick. One good thing about intervals is that it’s very easy to know if two intervals overlaps with each other. With a flat list of timestamps, one can’t avoid to go through the entire list. The index will essentially map timestamps to the nodes and edges elements in the graph and therefore solves this issue. The Interval tree implementation which we are currently using to store intervals is based on a binary tree and is very costly in memory because of all the Java objects overhead. Using simple arrays should reduce overhead and improve performance for large dynamic networks. When computing a dynamic network algorithm (ex: Clustering coefficient over time), we’re using a sliding window over the graph so the ability to quickly filter is critical as it impacts how fast the graph refreshes.

Saving/Loading

Saving and loading the graph structure into into/from a file (or a stream) is another critical feature. When a user saves a project in Gephi, the graph data structure is serialized in XML and compressed into a .gephi file. If you worked with project files in Gephi, you may have experienced corrupted files issues or errors when loading a file. We’ve done our best to fix these problems but some still remain. We’re rethinking how this should be done in GraphStore and are making a call to rewrite the code from scratch. Our approach will rely on a lot of unit tests to make sure the code is stable so we don’t repeat the same issues in future versions. Please note that this concerns the .gephi files only and existing importers (e.g. GEXF, GraphML) will remain the same.

Concerning the GraphStore serialization, we’re abandoning XML in favor of pure byte arrays. That should yield to better performance and reduced project file size. We’ll create a custom reader for previous Gephi versions so you can still open your existing projects. Other modules like Filters or Preview will continue to use XML as it’s working just fine.

Next steps

This is the first post about the Gephi 0.9 version and more will come soon. We’re excited about the current developments and hope to hear from you. Please join the gephi-dev mailing list to learn more about ongoing projects and contribute. We need your ideas!

Follow us on Twitter!

rgexf: An R library to work with GEXF graph files

 

george_picture-100x100George Vega is an economist working at the Chilean Pension Supervisor and cofounder of nodoschile.org. His research interests are Statistical Computing, HPC, Complex Systems and Public Policy.

The first R library to work with GEXF files, rgexf allows both writing (exporting) and reading (importing) .gexf files.

Features:

  • Writing and reading GEXF files
  • Writing dynamic graphs
  • Writing graphs with attributes (boolean, numeric, char)
  • Writing graphs with VIZ attributes (color, size, shape)
  • Building GEXF graphs from scratch (node/edge by node/edge)

rgexf is written in such a way that it is not necessary to have knowledge about XML.

Some examples:

# Installing from CRAN and loading
install.packages("rgexf", dependencies=TRUE)
library(rgexf)

# Reading lesmiserables graph (and summarizing)
lesmiserables <- read.gexf("http://gephi.org/datasets/LesMiserables.gexf")
summary(lesmiserables)

# Building a GEXF class object (includes data frames of nodes/edges +
# XML representation of it) from two two-column data.frames
mygraph <- write.gexf(nodes=people, edges=relations)

# Exporting to some place
print(mygraph, output="mygraph.gexf", replace=TRUE)

# Creating a GEXF object from scratch (and adding a node)
mynewgraph <- new.gexf.graph()
mynewgraph <- add.gexf.nodes(mynewgraph, id=1, label="George")

The source code plus more examples can be found on the project website.

For suggestions, bug reports or support (any) ask me through Twitter @gvegayon or just write me an email to george [dot] vega [at] nodoschile [dot] org

George Vega Yon

Graph visualization meet-up in Paris

Meetup on graph visualization: join us the 24/01 in Paris

Neo4j, the leading graph database software, will be organizing a meetup on the visualization of graphs. It’s free, if you want to come you just have to register here.

Graph databases are a new way to store and access data by representing it as nodes and connections. It is particularly useful for dealing with highly connected data as social networks, recommendation engines, music discovery or anti-fraud systems do. Graph databases give data scientists exciting opportunities.

You will learn how to combine Neo4j and Gephi using the Neo4j plugin of Gephi. You will also discover Linkurious, a novel web-based application to explore graph data easily, which has been co-founded by Sébastien Heymann.

The workshop will be held in English and French.

Register on meetup.com

Date: 24th of January, from 7:30 PM to 10:15 PM
Place: Zenika office, 10 rue de Milan, Paris

0.8.2 beta released

The latest version of Gephi has been released, download it for Windows, Mac OS and Linux platforms. This release captitalizes the bugfixes and stability improvements we have done over the last few months. It also greatly improves the Mac OS X compatibility with the Gatekeeper and Retina Display support. Gephi should now starts right away when double-clicking on the App with a Gatekeeper-enabled computer. However if you have an older version of Gephi on your computer, you should uninstall it and remove the user directory, see the installation instructions.

This release is the first one based on our new Continuous Integration environment. This new system makes it easy for developers to create a new release and for beta-testers to test an early version. Users eventually get a software which has been tested much more heavily and by a larger population compared to previous releases.

Plugins need to be checked for compatibility. They will reappear on the Plugin Center in the coming days, as they are verified. Thanks for your patience.

Consult the release notes and the new Javadoc for more information.

New and Noteworthy

* Data Laboratory’s time interval merge strategy now supports custom date format
* Improved parser for dynamic types. Literal strings are now supported.
* Add Retina Display support to the Mac OS X version

Bug fixes

* Filters ‘Duplicate’ does not work (Issue 176)
* Exporting SVG File throws DomException due to invalid stroke-widths (Issue 697)
* File name entered is lost when changing folder (Issue 463)
* Datalab: can’t export all columns (Issue 628)
* NullPointerException when importing from database (Issue 691)
* Filter on column created with regex causes crash (Issue 663)
* “Long cannot be cast to a String” when either exporting a graph or saving a gephi file (Issue 679)
* Mapping of Escape Keyboard Shortcut for “Save changes before closing?” dialog box (Issue 686)
* DataLab: filling edge weight column doesn’t work when dynamic (Issue 619)
* Spreadsheet import of dynamic data: support of “infinity” (Issue 631)
* Missing license headers (Issue 664)
* Spreadsheet import and self-loops (Issue 683)
* Timelime interval set in infinite loop (Issue 712)
* OutDegreeRange broken (Issue 651)
* Shortest path on filtered graphs fails (Issue 650)
* Weighted degree computation fails (all values 0) when a filter is applied (Issue 636)
* Dot parser fixes (Issue 621)
* Filtering leads to Null Pointer exception when saving (Issue 617)
* Partition percentages incorrectly composed across filters (Issue 637)
* Start/end attributes are always imported using DATETIME format (Issue 649)
* HeatMap / Shortest Path on undirected Graph wrongly paints / calculates (Issue 630)
* Typo fix connetion to connection (Issue 642)
* Timeline sparkline bug (Issue 615)
* Fixes calculation of clustering coefficient on graph (Issue 625)
* Unix timestamp support (Issue 612)
* GML loading cannot accept scientific notation for float-type edge property (Issue 300)
* GML export does not respect specs (Issue 604)
* GEXF export outputs incorrect files (Issue 570)
* Problems with export of PNG files (Issue 601)
* Edge pencil: unable to set edge directedness (Issue 549)
* Presets of “Preview settings” are incorrectly / not saved (Issue 575)
* Option “Time intervals as dates” in Timeline (Issue 613)
* 8.1 and 8.0 both freeze at start-up when on network (Issue 592)
* Data laboratory max columns unintuitive (Issue 590)
* Modularity with Edge Weight Causes Array Out-of-Bounds (Issue 577)

Do Gephi technologies matter for your research or business? You can support us by donating to the Gephi Consortium, or becoming a member to have an impact on our roadmap.

The next version will be a new serie (0.9) and will bring a strengthen core and new features. Stay tuned for that.

Feel free to reach to us if you are willing to organize events (meetups, workshops, hackathon, etc.), we will support them.

Talk on community management at Inria fOSSa 2012

The Gephi Consoritium will participate to the fourth edition of the fOSSa Conference taking place from December 4 to 6, 2012 in Lille (France).

The aim of the fOSSa (Free Open Source Academia Conference) is to reaffirm the underlying values of Open Source software: innovation & research in software development.

While the first edition aimed at providing valuable information on the Open Source model at large, the second edition focused specific key-aspects of FOSS such as tech innovation,upcoming issues & challenges in the open development context and how open activities, collaboration and knowledge sharing is beneficial to academia, education & industry. The third edition look at the future of Open Source (Eco system, Trends, new territories, etc).

The fourth edition will address in an open-minded style about:
Digital Geographic Strategies & the Native Generation,
– FLOSS History with the movie: “Revolution OS” followed by a debate,
Open Art, collaboration between art & science,
Licenses in the real life: no lawyers speeches, only facts & return experience,
– Workshops to learn how to develop code for debian, gnome, apache, robotics ROS …
And, of course the usual fOSSa topics (Education & Community management).

At this occasion, Sébastien Heymann will make a presentation about Motivations in Free Software communities, 6th Dec at 2pm in the Community Management track.

“What marks the difference between fOSSa and other events is the air that you breath there. An event organized by passionate people, with passionate attendees as well … and great speakers. Every year you can get some presentations of greater international events in advance (I remember the year of Arduino, to give you an example).” — Gabriele Ruffatti — SpagoWorld Blog 2012.

fOSSa days are open to everyone and registration is free!
more information @ http://fossa.inria.fr

EDIT: slides of the presentation
http://fr.slideshare.net/slideshow/embed_code/15531802

Continuous Integration at Gephi

We recently finished to deploy a continuous integration environment at Gephi and I’m excited to share some of the highlights.

The Gephi developer team has been hard at work to change the way we iterate and create releases at Gephi. Developer productivity has been an important theme for this year’s focus and we already made several improvements. At the end of last year we migrated our code to GitHub and improved the documentation. We then focused on plugin developers and made it really easy to create new plugins with the Plugin Bootcamp and the new gephi-plugins repository. Finally, we’re now introducing a completely automated build and release production system.

Our objective was to automate the way releases are created and tested. Previously, creating a release was a manual process and included error prone tasks like updating configuration files, unzipping translations in the right folder or creating installers. Open-source tools like Maven, Jenkins and Nexus can help to make this process seamless and always have the latest deliverables available.

Maven migration

We migrated our code base from Ant to Maven. Gephi is based on the Netbeans Platform and has more than 80 different modules with dependencies and third-party librairies. Maven makes it easy to manage a large number of dependencies and put all configuration parameters in one place. Maven has also a large number of plugins and is very well integrated in Netbeans and Eclipse IDE.

Highlights:

  • A full application package, all Javadocs and sources are now produced and uploaded online with a single command.
  • Dependencies are all defined in one place. It is also much easier to update to the latest version of the Netbeans Platform.
  • All library JARs are dependencies to Maven Central or 3rd party repositories. No library JARs are directly included in the sources anymore.
  • The Gephi project is now a standard multi-module Maven project. It can therefore be opened and built in Eclipse or IntelliJ, as well as Netbeans out of the box
  • It facilitates module reuse in other projects like the Gephi Toolkit. Any other project can easily depend on any (or all) Gephi modules.

Jenkins server

Jenkins is the continuous integration server we chose to automate building and testing Gephi. It is configured to build and test Gephi every night based on the latest version of the code on GitHub. If the build fails, developers are informed something needs to be fixed.

Highlights:

  • Fully automated build in a stable environment. If something is wrong, it must be the code.
  • In addition of Gephi itself, we’re also building the Gephi Toolkit every night. Eventually, we’ll be able to build and test plugins as well.
  • Artifacts produced are uploaded to Nexus.

Nexus

Nexus is a repository for artifacts, which could either be librairies Gephi is using or release binaires like the latest release. At any time, beta testers can download the nightly build and test new features. We just announced a new beta testing program, which couldn’t be possible without the availability of the nightly build.

Highlights:

  • All 3rd party librairies have been uploaded to Nexus. Maven is using Nexus as a source for librairies.
  • The nightly build packages are available for download.
  • It also hosts the latest set of NBMs and Javadocs.


We learnt a lot during this project and will continue to strengthen the developer and beta-tester environment to scale up Gephi development. So far, we’ve done the Maven migration on a separate GitHub repository but we’ll soon convert the main repository and soon after release a 0.8.2 Gephi version. We’ve created a new Continuous Integration section on the Dev Portal and documented this project.

Plugin development remains the same for now and all plugins should be compatible with the new code base. In the next few months we would like to bring continuous integration to plugin developers as well. Testing at scale a large number of plugins at each new Gephi version remains a challenge and we would like to improve that. Also, we’ve seen issues where different plugins use different version of the same library and eventually cause crashes. Stay tuned for some news on that.

In the next few weeks we’ll update the documentation at various places how to build Gephi and work with the code. Developers interested to try this new system out should follow the instructions on GitHub or reach to us on the developer mailing-list.

Last but not least, we would like to say kudos to Maven, Jenkins and Nexus contributors for their huge and excellent work!

Beta Tester program starts

Hi all, today we are announcing a new program and it’s all about testing the latest versions of Gephi. Anyone can join the program and test the development version, send feedback and discuss features. We want to build a team of beta-testers who the developers can work with to detect issues before the software reaches the standard users.

So far, testing has been done by a small group of developers and users but we would like to extend it to a larger audience. There are many different versions of Windows, Mac OS X and Linux Gephi supports. Testers will help to detect compatibility issues specific on a single platform and overall participate in testing new features.

To make this effort successful, we’re making it super easy to test the latest development version without requiring to know about programming or how Gephi is built. We’re introducing a nightly build package which gets updated automatically every night with the latest version of the code. Once downloaded and installed, this version of Gephi will ask you to update itself every time a new version is available so you don’t have to download and install Gephi over and over again. If you’re already familiar with Gephi’s auto update capability, this is using the same system.

How to get started?

    1. Join the gephi-tests@lists.gephi.org mailing list

Developers and testers will discuss on this list.

    1. Fill this questionnaire online.

Cuple of questions on your hardware and software configurations.

Questions? Feel free to stop by on this forum thread.

GSoC: interconnect Gephi Graph Streaming API and GraphStream

My name is Min WU and during this Google Summer of Code I have worked on the project to interconnect Gephi Graph Streaming API and the GraphStream library. My mentors are Yoann Pigné and André Panisson.

This project aims at interconnecting the GraphStream’s dynamic graph event model with Gephi in order to have Gephi to visualize an ongoing graph evolution and measurement. Based on this project, users can model and simulate complex systems with GraphStream while observing the output with the visual tools offered by Gephi.

GraphStream is written in Java. In order to use streams of graph events in other languages, GraphStream provides the NetStream framework, i.e. a network interface, such that other projects written in other language can use GraphStream. The NetStream framework consists of three parts, receiver, sender and the NetStream Protocol. The receiver is responsible for receiving graph events from the network and dispatching them to pipes. It works within only one thread, listening at a given address and port while receiving graph events from several streams, actually several threads or clients. The sender encodes graph events into messages according to the NetStream protocol and send them to a defined receiver with given port, host and stream ID. Every message contains sourceId, timeId and event context, among which the combination of sourceId and timeId is dedicated to distinguish between several streams and solve the synchronization issue. Finally the NetStream protocol specifies the message format at byte level.

Gephi also supports the idea of “streams of graph events”. It has a framework for graph streaming in Gephi plugin built by André Panisson during the 2010 GSoC, through a multi-threaded socket server. Other applications can push graph data to the Gephi server through the network, and have it visualized. In this graph streaming project, operations (a concept similar to event) are invoked through HTTP requests made by the client to the server, based on a JSON format.

Work done

In my project, I interconnected Gephi and GraphStream based on André’s Graph Streaming plugin. Since NetStream on GraphStream side works on NetStream protocol while Graph Streaming API on Gephi side works on JSON protocol, we have to make them compatible with each other. Considering the flexible interoperability and language agnostic properties, I have chosen the JSON protocol to do the interconnection and implement a sender part and a receiver part.

The sender part (JSONSender) is responsible for sending events from GraphStream to Gephi. GrpahStream works as a client and Gephi works as a server. Every time the graph in GraphStream changes, a corresponding event is sent to Gephi. Gephi handles the event and changes its own graph. In this way, the sender part works as a sink of the GraphStream graph, so it must implement the sink interface which contains methods to deal with graph element events and attribute events. In each method, we first encode the event message into a JSON string, and then send it to Gephi. We connect to Gephi and use “updateGraph” operation to send events. The corresponding URL is “http://host:port/workspace?operation=updateGraph”. The host and port must match with the Gephi sever and the workspace is a destination workspace of Gephi, for example an URL can be “http://127.0.0.1:8080/workspace0?operation=updateGraph”. The Gephi server and client are built with the “Graph Streaming API ” in the Gephi-plugin.

The receiver part (JSONReceiver) is responsible for receiving events from Gephi. It listens to Gephi and waits for events. Every time the graph in the Gephi changes, a corresponding event will be send to GraphStream. Then the GraphStream handles the event and changes its graph object. In this way, the receiver part works as a source of the GraphStream graph. In order to listen to Gephi events, we use a URL within “getGraph” operation to connect to Gephi. The corresponding URL is “http://host:port/workspace0?operation=getGraph&#8221;.

With these two classes, we can interconnect GraphStream and Gephi in real-time. Two tutorials are given to show how to do real-time connection between GraphStream and Gephi, see the video below. If you are interested in the detail implementation, please refer to the manual page.

The first class is GraphSender, which aims at loading a graph in GraphStream and dynamically displays it on a Gephi workspace. We need to create a graph instance and a JSONSender instance, and plug the JSONSender instance as a sink of the graph instance. Since then, when we generate the graph, or load the graph from a file, Gephi will display it in real-time.

The other class is LinLogLayoutReceiver. The Lin-Log layout in GraphStream is dedicated to find communities in graphs. This tutorial shows the execution of a Lin-Log layout in GraphStream and the sending of the layout information to Gephi in real time. We first load a graph in Gephi, display it and apply some algorithms. Then we send the graph to GraphStream and apply the Lin-Log layout on the graph on the GraphStream side. Meanwhile we visualize the layout process on the Gephi side in real time. To achieve it, we create a graph instance and a JSONReceiver instance, and then get the ThreadProxyPipe instance and plug the graph instance as an ElementSink of the pipe instance. Then we apply the Lin-Log layout, and create a new thread in which to create a JSONSender instance and plugin it as a sink of the graph layout.

Distribution

This project is distributed under MIT license. You can refer to the code on Github. By the way, I feel very appreciative for my mentors’ supervision. Thank you very much!

GSoC: Legend module

My Name is Eduardo Gonzalo Espinoza Carreon and during this summer I developed the new Legend Module for Gephi, with the mentoring of Eduardo Ramos and Sébastien Heymann. This article will give you an overview of the work done.

Problem statement

Currently Gephi offers the possibility of visualizing graphs, but what about legends? Legends provide basic and extra information related to the graph and they are useful when interpreting any kind of network map. If a person is not familiar with the content of a graph, missing or wrong legends could lead to misleading interpretations and sometimes wrong decisions. When a visualization is used by multiple people for discussing, analyzing or communicating data, legends are of great importance.

For instance, the following graph represents the coappearance of characters in the novel Les Miserables. After performing a visual analysis we could only conclude that the graph has 9 groups. This is probably a little of the information the creator wanted to transmit. The graph has no information related to the number of nodes explored, or what the groups represent and how many elements each group has, etc.

A current workaround to solve this problem is to export the graph as an image, and then manually add the legends using Inkscape, Adobe Illustrator or another graphics editor. However this task is time-consuming and can be automated. The new Legend Module proposes a solution to this problem.

Solution

We propose an extension to the Preview module for generating legend items. The following legend items are available: Table, Text, Image, Groups and Description. They can be added using the Legend Manager, which is shown in a new tab under the Preview Settings:

After selecting a type of legend, the user chooses a sub-type builder, e.g. “Table” > “Partition interaction table”, or “Top 10 nodes with greatest degree”, as shown in the following figure:

When a new Legend item is added, it is displayed in the list of active legend items, where the user can edit its properties. The user can also edit its label and assign a user-friendly name to remember the content of the legend easily.

Every item has a set of common properties: label, position, width, height, background color and border, title, description; and also each type of item has its own properties and data. The values of those properties are editable through a Property Editor like the one used in the preview settings.

Some properties like scale and translation can be modified using the mouse like most of the graphic design applications. All legend items are designed with a smart way of autoresize. It’s not the common scale feature, e.g. if the text included in the Text Item is bigger than the size assigned, then the Text Renderer overrides the text font defined by the user and decreases the font size until the text is able to fit in the specified dimensions. The results of this feature are shown in the next figure:

Workflow

The legend builder retrieves the graph data (partitions, node labels, edge labels, etc) and creates a new Legend item for each of them. Then a legend renderer makes use of these information, plus the properties set by the user, to render the Legend item to the specified target: PNG, PDF or SVG.

For developers

The renderers can be extended. For instance, the default Group Renderer is:

Using external libraries like JFreeChart, we can extend it to create a Pie Chart Renderer like as follows:

Other types of items can be created by combining other available Legend Items or by extending Legend Item, Legend Item Builder and Legend Item Renderer.

The Legend Module also provides a save/load feature. So you can save your legends for future editing.

Limitations

Currently there are some limitations like selecting a specific renderer for each type of item, and also exporting legends to SVG format is not done automatically like PNG and PDF, e.g. Exporting an Image (they will be embedded in the SVG file).

Conclusions

I would like to thank Eduardo Ramos and Sébastien Heymann for their support and feedback, which was critical during the development of this new module. The Legend module will be available as core feature in next Gephi release.

This GSoC was a great opportunity to learn and it also represents my first important contribution to the open-source community.