Posts by Mathieu Jacomy

Creator of Gephi (but not lead developer!). Engineer and researcher at Sciences po médialab. I'm specialized in digital methods in social sciences, sometimes called digital sociology or digital humanities.

Gephi Lite

Authors: Mathieu Jacomy, Alexis Jacomy, Paul Girard, and Benoît Simard

It is a spin-off

For a long time, one could think of the Gephi as both a piece of software and a project; the purpose of the project would be to develop and maintain the tool. But from the start, our project was about more than Gephi’s code: websites, tutorials, forums, plugins, events, social media, fundraising… We now take this further with an unprecedented decision (for us): the Gephi project will host 2 different tools.

Gephi Lite will be a web version of Gephi, with the same basic features, but reduced to a minimalistic package. It will remain similar to Gephi and be compatible with it, but will have less options. It will not be able to open big networks, but it will be simpler and more ergonomic.

It will not dilute our efforts to maintain and improve Gephi. If you think of the energy we can afford to dedicate to Gephi as a cake, you may think that we will now have to split the cake into 2 shares, one for Gephi and one for Gephi Lite. But in fact, we will have a bigger cake. Indeed, the development is taken in charge by a different team, namely Alexis Jacomy, Paul Girard and Benoît Simard from Ouestware.

Bootstrapping Gephi Lite during the Gephi Week

The Gephi community has known Ouestware for a long time thanks to their work on network visualization on the web, to which we must add Guillaume Plique from the Sciences Po médialab. Let us call them the JS community in this context. Over the years, these developers have been contributing to a long series of libraries, prototypes, and tools adjacent to Gephi. Here is a quick list of their collective contributions to network visualization:

  • Libraries: Graphology, to handle graphs; and SigmaJS, to render them on the web.
  • Prototypes: ManyLines, to share networks as explorable slides; MiniVan, to share networks online as browsable documents; and a top-secret project to make up networks quickly.
  • Tools: Retina, an online network visualizer with filtering and search; and Gephisto, a one-click network visualizer for teaching.

We have also given a joint talk about Gephi and JS at the FOSDEM. We mention all of this to highlight that there was a fertile ground for something new. The skills were there, but everyone had pushed their own projects and explored different directions. Could we coordinate our efforts in the future?

Long story short: we met at the Gephi Week, it happened, and the outcome is Gephi Lite. Here is an account of the project after the Gephi Week and a follow-up sprint at OuestWare.

A very early prototype of Gephi Lite: the graph rendering is already functional thanks to Graphology and Sigma.

What makes Gephi Gephi?

We explored that question because we had to answer this: what should Gephi Lite be like? Indeed, what makes Gephi Lite different from the tools above is that it tries to stick to the Gephi recipe. But what is that recipe?

We distilled Gephi to this feature set:

  • Load data
  • Render layouts
  • Compute metrics
  • Apply filters
  • Select data
  • Set the semiotics (appearance panel)
  • Save data
  • Export as images
  • Export on the web
  • Manual intervention (create and edit data, for ex. attributes)
  • Plugins (Note: we will leave this aside for Gephi Lite for now)

To which we add:

  • Gephi Lite must interoperate with Gephi
  • Reuse the Gephi look and feel when possible (consistency)

We used this list as a starting point to decide Gephi Lite’s scope.

Sketching the scope of Gephi Lite

About the name: what does “Lite” mean?

Lite means that Gephi Lite will always be at a higher level than Gephi desktop. Further from the metal, as we say. More blackboxed, with more layers of software. More usable, but at the core, less efficient. This is why it will have lower scaling capacities in terms of size of graph.

Lite also means that we aim at less complex usages than Gephi. This principle has to be taken as a general guideline and not a strict rule. Indeed Gephi Lite differs also by the fact that it is on the web. It is a drawback at times, but it also brings opportunities to do things differently and add new features. So Lite does not mean that Gephi Lite is Gephi with missing pieces. It has its own feature set..

We considered the name “Gephi Web” but we decided that making it clear that it would not be as scalable as Gephi would help manage people’s expectations. The discussion is not entirely closed, though.

Data-driven rendering

Here is where Gephi Lite will differ from Gephi. The semiotic work on the visualization (node color and size, edge thickness…) will always be tied to the data. That is why we call it “data-driven”.

In Gephi, you can do whatever you want. You can manually paint a bunch of nodes in red. You can paint some nodes with a gradient of colors representing their degree, and other nodes with a color representing a cluster they belong to. You can play with this feature. You can do art. You can do things so weird you could not even explain. It’s flexible and powerful, but it comes with complications. And notably, you cannot always have a caption, because Gephi cannot keep track of what the colors or sizes mean.

Gephi Lite has a limited set of features, and it sometimes creates opportunities. We decided to allow no manual coloring of the nodes and edges, and to use a rule-based mapping of colors and sizes. For example, you can only apply colors according to an attribute or a simple combination of attributes. Think of the rule as something like “Democrat blogs in blue and Republicans in red”, or something a bit more complicated but not too much. In Gephi Lite, the appearance of nodes and edges will be fully determined by such rules. As a benefit, it can keep track of what the colors mean, apply them dynamically, and build a caption. For most users, it will be simpler.

For the record, to make this system work, we settled on this set of features:

  • No manual coloring.
  • Add quali/quanti tags on node/edge attributes to help users make meaningful semiotic choices.
  • Nodes/edge appearance is dynamic: we watch attribute changes.
  • Appearance is determined by rules which are always applied, even if not in the filtered version. In other words, the modalities taken into account to set colors account for hidden nodes as well. It does not depend on how the network is currently filtered.
  • Missing values are systematically handled.
  • Gephi Lite will be able to draw a caption.
  • Gephi Lite will contribute and use the GEXF new v1.3 spec by prototyping it and contributing to its specifications.
  • Original GEXF viz attributes will be used as special “gexf_viz” prefixed data attribute when no caption is present (GEXF <= 1.2) to be able to reuse it at export and use those default in the default appearance state.

Appearance panel

Like in Gephi, a dedicated space in the user interface will allow setting the semiotics of the map. In Gephi, it is the appearance panel. A similar space will exist in Gephi (whether that’s a panel or something else).

As we have just seen, the semiotics are rule-based and dynamic. In addition to this, we decided the following:

  • Appearance gathers all visual variables to draw nodes, edges and their labels (size, color).
  • Nodes and edges labels sizing will be dealt with in the appearance bloc.
  • Considered feature: applying different rules to different parts of the graph (see “partitions” later).
  • Appropriately handle missing values, anomalous values (ex: a string among numbers), unexpected values (ex: negative weight), and errors.
  • Always ask the user how to render undefined values. Undefined values could be the cases above or valid values that have not been set for different reasons. Those can typically be dealt with using a default color and size.

Something we have been discussing but we have not solved yet: where is the caption? We could generate a caption on-demand, but since the appearance is fully dynamic, we could as well have a caption accessible at all times. Is the caption part of the appearance panel? If not, is it redundant? We will be iterating over this question.

The appearance panel in our earliest work-in-progress.

Filtering

The filters UI in Gephi is both too complex for the scope of Gephi Lite, and inconsistent with web UX design. WE have an opportunity to do better, albeit simpler. We chose a new abstraction, that is less flexible but much simpler to manipulate:

  • Filters are a stack (and not a tree like in Gephi)
  • Each filter is applied on the graph resulting from the previous filter (they cascade)
  • The filters can be on nodes or edges
  • A filter can be related to an attribute, a custom script (written or pasted by the user), or a topological filter (ex: the main component filter).

We have to experiment with the design of filtering, but let us acknowledge that filtering has to be simple for the user. Our priority is to keep our user experience straightforward.

Statistics

Gephi Lite will feature statistics (computable metrics), although with less choice as Gephi. Those statistics are those included in Graphology, and if we add new ones they will also be included in Graphology. We want to make it possible to choose the name of the attribute where the output is generated (with a warning if it already exists).

Layout

Similarly to statistics, we will use the Graphology layout algorithms and possibly extend them.

Not only related to the layout, but let us share with you a complicated question that can give you a practical idea of the design challenges we face. In Gephi, the node size is relative to the layout. They use the same coordinate system. By contrast, in other contexts like with Sigma, the node size may vary independently. The scaling of the layout is, after all, arbitrary. But because Gephi Lite is a companion to Gephi, we want to enforce consistency between them. Therefore the node size should behave similarly to Gephi. It turns out that this behavior conflicts with the current architecture of Sigma, due how the renderer layer behaves. We see no other solution than to introduce a breaking change in Sigma (the v3 is therefore in preparation).

Buckets

The simplification of the filters and appearance systems gets in the way of some popular scenarios that we want to make possible in Gephi Lite. We are therefore adding a new concept to support these advanced uses, in a way that would be transparent to beginners. We call this feature buckets.

A bucket is a set of nodes and edges. A subgraph, technically. But you may think of it as a partition of your network, or a layer, or a selection. It’s just a way to handle a subset of your network in a few places where we think it is necessary.

The users who do not understand this concept can ignore it completely. It is not required to know what a bucket is in most situations. However, you may need it if you feel limited by the way filters and appearance settings work. For example if you have a two-mode network and you want to set the size of the nodes according to their degree, but with a different scale for each type of node, because one type consists of a few nodes with many links (they would be too big) and the other of many nodes with a few links (they would be too small). If you meet this kind of problem, then using buckets is the way to go.

We are aware that this feature makes the memory structure of Gephi Lite a bit more complicated to write, but we consider that it is worth the effort.

UX organization

We identified a few problems. For instance:

  • The appearance panel will be heavier than in Gephi
  • It is just too heavy to display all panels all the time
  • We cannot have tabs because the browser already has tabs
  • We do not want to split the app into pages (screens) for similar reasons

The solution we found (currently) entails:

  • A compact sidebar on the left with shortcuts to different panels: metrics, layout, appearance and filter.
  • When clicking on a shortcut in the compact sidebar, the corresponding panel opens in an additional sidebar next to the compact sidebar (the panel unfolds in a collapsible column).
  • A column on the right with graph context (visible nodes and edges) and contextual information and actions (for instance, what is selected etc.)
The organization of the sidebars as it currently exists in our prototype.

Our approach is to design by drafting (no mockups) but leaving aside all graphic design choices for the moment. Those will be designed later on in “live wireframes” (or as we call it, “ugly soulless prototypes”).

We have three personas in mind when making our design decisions:

  • The cartographer
  • The data scientist
  • The collaborator: someone who wants to share the exploration of a network

Cloud file management

Because we are on the web, it can be really useful to save the project’s file on the cloud. To achieve that we identified the following needs:

  • To sign-in
  • To list and/or search the files that are compatible with Gephi Lite
  • To load a file
  • To save a file

The first implementation that we want to provide is Github Gist.  Gephi has a plugin to publish a graph on the web that generates a GEXF file and saves it in Github as a Gist (we will post about that at a later point). Github Gist allows CORS (a major constraint of this approach), so an internet application can load a gist file like Retina does. 

How we see the lifecycle of a file in Gephi Lite:

  1. Use Gephi
  2. Export on the web
  3. Use Gephi Lite
  4. Import the GEXF into Gephi

The last part (importing a remote GEXF file in Gephi) doesn’t exist yet, but it’s easy to develop as a plugin. Using Github Gist gives us also the opportunity to see revisions of a file (history management, rollback…). This system is compatible with other providers that we could add in the future like Nextcloud, Google Drive, Dropbox, etc.

MVP

Our design intentions, as stated in this post, can be seen as a long-term road map for the project. Our short-term goal is the MVP, the “minimum viable product”. The MVP is the smallest version of Gephi Lite that can be useful, the point before which it makes no sense to release the tool. Therefore, developing the features of the MVP are the priority. The rest is “nice to have” because it requires the MVP to work. But deciding what belongs to the MVP is not just a matter of technical constraints, it is also a subjective call about what “necessary” and “useful” mean in the context of network analysis.

No final choice has been made for the moment, and we need a better view of the implementation complexity of the various components we have to do. But some things are already clear to us.

We do want the following features in the MVP:

  • Load and save a GEXF file, local or GIST
  • Visualize (zoom, pan the view, search)
  • Appearance (node color and size)
  • Filter (at least one)
  • Statistics (at least one)
  • Layout (Force Atlas 2)

We can wait later for:

  • Buckets
  • Custom scripts
  • Data edition

Roadmap

The next work iteration on Gephi Lite should happen in early 2023. There will be no prototype release before then. We will communicate on our advances at that point. See you there!

– Mathieu, Alexis, Paul and Benoît

Gephi Week 2022: debriefing

From 29 August to 2 September 2022, about 20 people met in Paris and online to make the Gephi codebase more sustainable, discuss the project, experiment with potential features, improve the design, and get closer to the 1.0 version. It was a follow-up to the 2021 code sustainability retreat, and its theme was community detection. In this post we present what we have done.

The event was sponsored by the SoBigData++ project, hosted by the Sciences Po médialab, and live-streamed by Nicolas Bouchaib from First Link. Tommaso Venturini, Axel Meunier and Simon Bourdieu-Apartis carried the burden of organization. We warmly thank all of them for having made this event possible!

Work done

We have covered a broad spectrum of topics. Find the list below. Just keep in mind that not every project could be finalized during the week. About a half of the contributions will need some time to be released to the users. The forthcoming 0.10.0 version will include the rest, and is to be expected for the end of the year (2022).

These features and experimentations will be developed in upcoming blog posts:

  • Gephi Lite, an upcoming web version of Gephi. Alexis Jacomy has been paving the road map and leading the discussion about its features.
  • Revamping the icons in Gephi. Côme Brocas reworked the icons system and Mathieu Bastian reworked the implementation. This will also contribute to the upcoming dark mode!
  • New web export based on OuestWare’s Retina, with a plugin developed by Clément Levallois and Alexis Jacomy.
  • New Neo4J plugin, developed by an expert of that technology, Benoît Simard.
  • Rethinking how we visualize community detection in Gephi, and notably when it comes to the ambiguity about the groups with which each node can be associated with. Tommaso Elli, Andrea Benedetti, Mathieu Jacomy and Guillaume Plique reflected on visualizing the process of the algorithm. Benjamin Ooghe-Tabanou, Étienne Côme and Guillaume reflected on metrics and visualizations to assess the ambiguity itself.
  • Video presenting the codebase, by Mathieu Bastian. To be released soon!

These features are developed below in this post:

  • Allowing the export of node coordinates as columns in the data, an often demanded feature added by Sukankana Chakraborty.
  • Making the edge types editable in the data laboratory, which matters to multigraphs, by Matthieu Totet.
  • Exporting the same node borders as in the overview, by Roberto Luna-Garcia.
  • Exporting with a transparent background, by Roberto Luna-Garcia.
  • Adding arrows to curved edges when you export an image. Mathieu Jacomy tackled this seemingly simple issue, but it was more complicated than it looked.
  • Revamping the online documentation for developers notably, by Mathieu Bastian and Matthieu Totet.

Takeaways

A few general points before diving into the details.

Gephi is expanding to the web. We will develop this in an upcoming post, but in short, we are committed to stabilize a web version of Gephi, with a reduced scope but a more modern UX, called Gephi Lite. The team at OuestWare (Alexis Jacomy, Benoît Simard and Paul Girard) has taken the lead of this branch of our project. It will be based on Graphology and SigmaJS, and benefit from the invaluable help of Guillaume Plique.

Gephi is popular, and many people are willing to help the project. This second edition attracted more participants than the last. More varied people, too: designers, researchers, data analysts, content creators, OSINT practitioners, and developers. Those categories are not mutually exclusive.

It is hard to recruit Java developers. One of the reasons seems to be that Java Desktop and Swing are not sexy, but more importantly, we are not that well connected to the Java dev world. We find our contributors either through plugins, or in the overlap of data science and dev: people who use Gephi and happen to also know how to dev. We will keep communicating about our need to stabilize a community of developers, and we believe that a lively non-dev community around Gephi (users, content creators, designers…) contributes indirectly to a more lively dev community.

We still need to stabilize the codebase. We are not ready yet to move to finalizing a version 1.0, at least because we still need to rework the visualization engine to get rid of unmaintained dependencies. This will require a separate effort later on this year.

The Gephi Week was very beneficial to the project. Although we struggle to be attractive to Java developers, the Gephi Week was an occasion for everyone to improve their knowledge about the codebase. Some contributors like myself were rusty, and it was for us an opportunity to exercise our coding muscles again, under the excellent coaching of Mathieu Bastian, who also recorded a guided tour of the codebase. Newcomers could also learn the basics, and the codebase received more scrutiny. Little by little, we build the ability to help and support each other, and improve our autonomy. And beyond the central concern about the sustainability of the codebase, the project immensely improved in many unexpected directions, such as rethinking the design of popular features like community detection, revamping big parts of the visual identity (icons), and building a web sibling to the Java version, Gephi Lite. Even beyond these developments, the coding retreat spawned satellite events like a meeting with the local OSINT community (many Gephi users!), live-streaming with YouTubers, and discussing with renowned researchers. Around the coding retreat, something like a mini-festival is growing by itself.

We remain committed to keeping this event yearly, and we expect it to grow again next year.

Wrap-up videos

As our wrap-up was live streamed, we had the opportunity to share that moment with you. The stream has been cleaned up and sliced. We published it on our YouTube channel as a playlist (see below). The playlist, about 100 minutes-long, is in the order it was recorded. For a more thematic approach, each video will be featured separately as we explain what we have done during the Gephi Week, starting in the next section.

Playlist of the wrap-up on YouTube. 100 minutes.

More about what we have done

Allowing the export of node coordinates

An often demanded feature added by Sukankana “Schuh” Chakraborty. The (x,y) coordinates of nodes are native to Gephi. They are not like any other attribute insofar as they are used to draw the layout. As an unfortunate consequence, they used to be omitted during the export of data. Which is a problem, notably if you want to draw the nodes in another environment like Tableau. Schuh addressed this issue and added the option.

Screenshot of the settings panel
Schuh (Sukankana Chakraborty) explains her work on making the node coordinate exportable.

Making the type of edges editable

In multigraphs, each edge has a given “type”, also sometimes called “kind”. Those differentiate the represented relations, for example mother, sister, niece… Like for Schuh’s issue just before, the edge type was a special attribute, and we could not change it in the data laboratory. As Mathieu Bastian explains below, Matthieu Totet addressed the issue (he could not be present during the wrap-up).

Mathieu Bastian explains his work with Matthieu Totet on making edge kind editable for multigraphs in Gephi.

Exporting the same node borders as in the overview

You may have noticed that the nodes have a different look in the Overview and in the Preview. The Preview (the image exporter) can generally do more than the Overview, but one feature was missing: having node borders colored with a darker version of the node color. Roberto Luna-Garcia added this option to the settings.

Roberto Luna-Garcia showcases his work on adding node borders in the export.

Exporting a PDF with transparent background

Roberto also addressed the need to export network map with a transparent background:

Roberto Luna-Garcia showcases his work on making the background transparent when you export a Gephi visualization.

Adding arrows to curved edges

When Mathieu Jacomy picked this seemingly simple issue, he thought it would take a few hours. Alas, deep down the rabbit hole, a much more fearsome beast awaited. Bezier curves had to be replaced with circle arcs, which came with their own share of implementation weirdness, as each renderer speaks three different languages: SVG, PDF, and Java2D.

Mathieu Jacomy shows his work on adding arrows to curved edges during the Gephi Week.

Revamping the online documentation

How to write accessible documentation for developers? Matthieu Totet and Mathieu Bastian drew inspiration from the OpenRefine community, and reworked the system around Gephi, with a good share of automation.

Mathieu Bastian explains his role in the Gephi Week and showcases his work with Matthieu Totet on improving the online Gephi documentation.

Photos

Most of the week consisted of this collective workshopping that one would totally expect. Andrea, Guillaume, Étienne, Benjamin, Mathieu B, Matthieu T and Nicolas.
It’s not just coding, it’s also thinking without coding. Guillaume and Benjamin enjoy the vibe while Mathieu J contemplates despair.
As expected, one could see networks. Here Guillaume is tinkering with semantic zooming.
Clément skimming through the book recommended by Mathieu B to understand the codebase better: The Definitive Guide to NetBeans Platform 7, by Heiko Böck.
Côme at work redesigning icons that are too specific to be present in existing libraries.
Nicolas hosting a stream with Viviane (Scilabus on YouTube) and Mathieu J.
Nicolas installing the streaming setup for the wrap-up session.
It was exhausting but very enjoyable. Mathieu B, Matthieu T, Nicolas, Clément and Roberto.

This event is supported by the European Union – Horizon 2020 Program under the scheme “INFRAIA-01-2018-2019 – Integrating Activities for Advanced Communities”, Grant Agreement n.871042, “SoBigData++: European Integrated Infrastructure for Social Mining and Big Data Analytics” (http://www.sobigdata.eu).

Call for participants: Gephi code sustainability retreat 2022

We are organizing a second code sustainability retreat (check the first one), and we are looking for Java developers willing to contribute to Gephi’s core codebase over the next few years. It will be one of the two tracks of a broader event, the Gephi Week, the other one being dedicated to the question of visualizing community structure in networks (we will make a dedicated post for that).

Our goal: Make Gephi’s codebase sustainable, and beyond this, recruit a team of developers into the project.

When: 29 August to 2 September (one week, Monday to Friday)

How long: 4 to 5 days. Let’s see how travel goes for everyone.

Where: In Paris, France.

How many participants: We aim at about 5 Java developers, not counting the Gephi core team (~3 people).

Funding: We pay for travel and accommodation thanks to the sponsoring of the SoBigData++ project. We will also offer a small compensation for your time and effort (~100€/day).

What we will do during the retreat: Our lead developer will share knowledge about the codebase. We will get an overview of the state of Gephi, set up a more technical road map (identify the main challenges, decide of the best course of action) and code part of it – in short, we will push the cart further. Furthermore, we will get to know each other better and have some good time together.

HOW TO APPLY
Send an email to the organiser
mathieu.jacomy@gmail.com

What’s next: We will probably meet you online for a quick talk and check that we are on the same page. If too many people apply, we will make a choice and inform you of the result. We will deal with travel and accommodation, and then meet you in Paris!

Feel free to ask if you have any question (to the email above, in comments, or via Twitter to @Gephi).

To know more about this, you can check the report to the Gephi code sustainability retreat 2021.


This event is supported by the European Union – Horizon 2020 Program under the scheme “INFRAIA-01-2018-2019 – Integrating Activities for Advanced Communities”, Grant Agreement n.871042, “SoBigData++: European Integrated Infrastructure for Social Mining and Big Data Analytics” (http://www.sobigdata.eu).

Transition to semantic versioning

You may have noticed two things.

First, we released several versions recently. Gephi 0.9.2 was released in September 2017. Then the version 0.9.3 in March 2022, with a 4+ years hiatus. Then 0.9.4 three weeks ago, in April 2022. Then 0.9.5 a week ago, in May. What is going on?

Second, the splash screen. Version 0.9.1 and 0.9.2 were like this:

Then version 0.9.3 changed color, to mark the end of the hiatus. It was somehow a big change.

…but since version 0.9.4 it just features the version number 0.9 (without the last bit):

What is going on?

We are transitioning to semantic versioning

Semantic versioning is a certain logic to attribute version numbers. It uses three numbers:

  1. The major version number. It is incremented when the new version breaks compatibility. It matters to the user because they may not want to upgrade, or not yet, because it may break for them.
  2. The minor version number. It is incremented when features are added, but in a compatible way. The user generally wants to upgrade, but may not like the changes, so we need a way to refer to that specific version.
  3. The patch number. It is incremented when it is just bug fixes. The user always wants to upgrade.

So far, Gephi was not versioned that way. In part because numbers have a cultural meaning. For example, 0.9 feels like we are getting close to version 1.0, and in many ways we are getting closer, but at the same time this does not have to do with the fact that we have release about 10 major versions. If it takes us 15 steps, then so be it. The next minor version will be 0.10 – which is not like version 0.1! Weird, but version numbers do not work like decimal numbers.

Semantic versioning is a reasonable and safe way to version, and we are getting there, but not in one go. We will only fully do that from version 1.0.0, which requires meeting a number of goals on our road map. Nevertheless, we need to be able to push bug fixes to everyone, and for that, the best is to use the patch number. Which is why we had two more versions in just a month. But it remains less dramatic than the move from 0.9.2 to 0.9.3, which is why we removed the mention of the patch number from the splash screen. Like for semantic versioning, new patch versions should not feel like new versions to the user. And importantly, patch versions are pushed to all users through the automatic update mechanism. So to recap, it works like this:

As final word, you may wonder: why not have the post-hiatus version numbered 0.10.0 instead of 0.9.3? You’d be totally right! It should have been. We did not realize it soon enough, unlike the most savvy members of our community. Oops! But we’re doing it now. Better late than never 🙂

Gephi 0.9.3

Gephi 0.9.3 is here! Download it from http://gephi.org. No crazy new features, but many improvements and bug fixes. Here are 5 highlights:

1. No more Java installation required.

Gephi is installed on top of Java, so you had to install it before Gephi. On some computers, a bug related to theJava path caused the infamous “Cannot find Java 1.8” issue. This does not happen anymore, as you do not have to install Java anymore! It is packaged with Gephi.

2. New look and feel

Gephi now has a flat look and feel. This is much better for Linux users, as well as some Mac users who had issues with the appearance (one did not see which tab was selected etc.).

3. New community detection algorithm via statistical inference

Tiago Peixoto attended the code sustainability retreat 2021 and we implemented a version of his approach to community detection. It uses the same convergence heuristic as the Louvain algorithm (“modularity” in Gephi), also looks for assortative structures, but optimizes a different criterion, based on Bayesian inference.

You can look at Tiago’s blog for more information about it, or the two papers it is based on:

4. GEXF 1.3

The file format often used with Gephi, GEXF, has been updated to 1.3. This version is more mature and reliable than the previous one, and is implemented in Gephi. Check the announcement for the final specification there:
http://gexf.net/history.html

5. High DPI screens

High resolution screens are mainstream. We corrected a number of issues to fully support them.

List of improvements

You can check our changelog in the release page:
https://github.com/gephi/gephi/releases/tag/v0.9.3

For plugin maintainers

Check a specific announcement about your plugin right there:
https://github.com/gephi/gephi-plugins/discussions/245

More about this update

This update is a follow-up to the code sustainability retreat 2021. You can read our report in our last blog post. Not all the features we discussed and worked on are included in this update. We are working on it, but it’s just better to release updates as soon as usable improvements are ready.

Check our road map as of Summer 2021 to have a better idea of where we are and where we go. We are still working on big features such as a new graphic engine and the infamous undo feature.

We will also have a code sustainability retreat in 2022 with funded travel and accommodation, if you are interested in contributing to Gephi. We will almost certainly hold it the week of the 29th of August. A call for participation will come soon.

As usual, please share your experience/feedback on our Facebook group or on Twitter.

Gephi code sustainability retreat 2021: debriefing

From 29 November to 3 December 2021, 6 people met in Copenhagen to make the Gephi codebase more sustainable, discuss the project, pave the way to the long-waited version 1.0, and improve the tool overall. Before the event even started, it had already tackled its main goal. In this post we present what we have done.

The event was sponsored by the Aalborg University TANT Lab in Copenhagen, whom we warmly thank, because they basically funded the whole thing, and hosted us. The participants featured a whooping 50% of Mathieux, and more importantly, a healthy mix of skills:

  • Mathieu Bastian, who was Gephi’s lead dev for many years, and knows the codebase like none;
  • Mathieu Jacomy, who designed Gephi’s UX and some of its algorithms, and organized the event;
  • Matthieu Totet, who authored the Gephi Twitter plugin;
  • Eduardo Ramos Ibáñez, Gephi’s current lead developer;
  • Tiago Peixoto, world-class expert on community detection;
  • and Antonin Delpeuch, the main developer of Open Refine.

We spend the first two days discussing and preparing stuff, and the last three to code. The coding went surprisingly well, compared to many comparable situations, for example hackathons. That is why I wrote above that it had tackled its main goal before it even started: the codebase is sustainable. I know someone who cleans their house before the house cleaning service comes, because they don’t want to expose the actual messiness of their lives. Similarly, Mathieu Bastian and Eduardo had so well prepared the codebase in anticipation of the retreat that there was nothing left to do on that front. Bravo!

Code sustainability was our main goal, but that is not where it ends. We changed our perspective in the process. Let’s start with the main takeaways: goals met or not, and what we have learned.

Takeaways

Code sustainability. The codebase is in a good shape and developers can healthily engage with it. However, we can improve the documentation and entry points for aspiring developers.

Enrolling new developers. This did not work so well, but we will get there. We hoped for more new developers to come to this retreat. We had room and funding for two more people, so our call for participation did not work great. However, some developers manifested themselves during the event, when we started communicating about the retreat on Twitter. We aim at more participants next time.

Anticipating the fundraising phase. We learned a lot from the Open Refine project through Antonin. They have a fiscal sponsor: a structure that represents them legally and manages money, allowing them to collect funds and pay developers. This is what we need, and our next step will be to seek one.

Goodwill. There is still a huge amount of it around the project. The response of the public to this event was outstandingly positive. This is important notably to raising funds.

Dev infrastructure. The GitHub issues system, that we use to track and fix bugs, does not work that great for us in practice. We are thinking of an alternative.

Web presence. Reflecting on our website, blog, and online tools was an explicit non-goal: we decided to focus on that another time. But we could see that it definitely required a good reworking. We are aware of it.

Governance. As the project involves funds and more people, we will need to change our model. We want to talk about it with potential fiscal sponsors.

Mailing list. We have to make one, dedicated to developers, at least for the moment.

Achieved

Here is a summary of what we have achieved during the three coding days. Note that it does not directly translate into a release. You will have to wait for Christmas at least for that (it requires more work).

  • We triaged a lot of issues and tested the contribution process.
  • We fixed a bunch of bugs on our bug bash.
  • We defined and enforced code style on the repository, making it simpler going forward to collaborate between developers.
  • We made the project saving/loading more resilient, preventing users from losing their work due to corrupted .gephi files.
  • We embedded the Java JRE on the Windows and Linux installation, so that users don’t need to install Java by themselves anymore.
  • We migrated the localization system from Transifex to Weblate, making it easier to translate Gephi.
  • We made unit testing easier so that developers are more productive.
  • We integrated the new visualization engine into the Gephi desktop app. We got it working the first day, but without workspace switch support.
  • The following days we implemented the workspace switch, added support for High DPI screens, and got most of the interactivity and tools working fine.
  • We implemented Tiago Peixoto’s statistical inference algorithm for community detection and its unit tests (in progress).
  • We sketched a specification for the undo/redo feature.

Discussed

We discussed the state of the project in various ways during the first two days: its community of developers and users, its scientific state, the infamous (lack of) undo, our road map, governance, funding… We learned a lot from Antonin (Open Refine), and on community detection algorithms from Tiago. That part is hard to transcribe here, but I will write down a few knowledge points we have established.

Our community is broad. It consists of developers and users, and we find those in various scientific fields: digital humanities, SNA (social network analysis), network science (notably teaching). Outside of research, we also find users in data journalism, activism, and in the industry: SEO (search engine optimization), social media listening, patents and papers analysis, intelligence (OSINT), and cybersecurity. We noted that Open Refine does user surveys to know their community better (we’ve done so in 2016).

Gephi is not a commercial product. By that, we mean that we do not want to make Gephi for a specific public. We do not aim at normalizing or formatting usage. We just want to help different kinds of people, even when they want different things. [Mathieu Jacomy’s note: as I am writing this I realize that part of our audience will rightfully remark that we do have methodological commitments and that we necessarily shape usage. Hence this precision:] In other words, contrary to a company whose interest might be to serve certain consumers to the detriment of others, for example because they have higher purchasing powers, we do not have a fixed persona in mind when we make Gephi. We aim at satisfying the existing users including those who have marginal needs. In short, the features are decided on the pragmatic ground of usefulness to people versus implementation difficulty.

Do people leave Gephi and why? We don’t think we have a “users leave Gephi” problem. Here is what we believe: many users naturally move to more advanced tools, yet they may go back to Gephi on various occasions, because it’s easy to use. Using Gephi is sporadic anyway (one does not need to use it every day). That being said, some developers leave Gephi when they write code, because it’s easier to script (e.g. Python).

Web presence. Nice things we want to have: an introductory video to Gephi, a list of the best tutorials produced by the community, a simpler website because there is too much irrelevant information, the content for developers should be moved somewhere else (e.g. GitHub), remove the content about the Gephi Consortium (obsolete), a unified navigation bar over our different online spaces, a YouTube channel, and a way to promote the good content produced by the community.

Book. We would like to write one. Or a MOOC. The Gephi 1.0 release would be the ideal moment.

Community detection. Tiago Peixoto presented his work on the topic. He champions a Bayesian inference approach and considers modularity maximization as an obsolete (disproven) method. He also acknowledged that not everyone necessarily agrees in the research community. He documented his perspective in a series of blog posts (1, 2 & 3). We collectively agreed that we would keep the current popular Louvain method, add the more recent Leiden method, and add Tiago’s statistical inference approach. We will also group them in a specific section of the statistics panel, so that the users can identify them as alternatives, and muscle their own critical thinking by engaging with them.

Undo/redo. We learned how it is done currently in Open Refine, the problems it creates, and how it could be done better. The feature is doable in Gephi, and we sketched an architecture.

Funding. From Antonin’s feedback on the Open Refine project, we realized that what we needed was a fiscal sponsor, for instance Code for Science and Society. We also aggregated a list of possible funders: crowdfunding, Google Summer of Code (good to engage devs over the long run), Outreachy (idem), Chan Zuckerberg Initiative…

Feedback

Here is what we thought of how it went, to remember for next time.

To be improved: organization of travel and accommodation; we should identify decisions when we make them and note them apart; we should record some of the talks, notably the introduction to the codebase; it would be nice to have a (social) occasion to interact with Gephi users; the big map of the source code was not useful.

Went well: the coding was well prepared, thanks to Mathieu Bastian and Eduardo; the coding went often beyond our expectation; knowledge exchange across Gephi and Open Refine was great thanks to the meeting being in person; the live tweeting was engaging to our community including developers; the T-shirts are nice.

To consider for next time: live-streaming moments; a hybrid format or possibly a 100% online edition; preparing explainer videos.

Photos

Here is a small selection ofpictures of us at work, to get you an idea of what it looked like. We hope it makes you consider joining next time! Also, in the meanwhile, consider proposing a talk and attending the Open Research Tools and Technologies devroom where, notably, the Gephi team met Antonin: it’s a great place to meet like-minded people.

Mathieu Jacomy explaining stuff
Tiago Peixoto presenting his approach to community detection. On the left, Antonin Delpeuch at work.
Mathieu Bastian discussing how the Gephi architecture could support an undo/redo à la Open Refine (action stack).

In short, this is the Gephi codebase.

Coding session with Matthieu Totet (left), Eduardo Ramos Ibáñez (center), and Mathieu Bastian (right).

Tiago Peixoto monitoring his algo.

Antonin Delpeuch sieving Gephi issues. Behind, a map of the Gephi source code.
Attempts to track the steps of Tiago’s community detection algorithm during a debugging session.

A social event in the company of Ann-Sofie and Martin Grandjean, who visited us.

The end of the retreat intersected with the university’s Christmas party. More social events!

During this party, Matthieu Totet became a Danish legend (here with Anders Munk on the right).

One cannot fully know what will happen at a Gephi coding retreat. Consider applying in 2022!

Call for participants: Gephi code sustainability retreat 2021

We are organizing a code sustainability retreat, and we are looking for Java developers willing to contribute to Gephi’s core codebase over the next few years.

Our goal: Make Gephi’s codebase sustainable, and beyond this, recruit a team of developers into the project in anticipation of a fundraising phase. We believe that Gephi deserves care, that there is enough interest to fund it, and this is our first step to get there.

When: November or December 2021. Exact dates to be announced in September.

How long: About one full week (4 or 5 days).

Where: In Copenhagen, Denmark, and online.

How many participants: We aim at about 5 Java developers, not counting the Gephi core team (2-3 people).

Funding: We pay for travel and accommodation thanks to the sponsoring of Aalborg University. We will also offer a small compensation for the work (~100€/day).

What we will do during the retreat: Our lead developer will share knowledge about the codebase. We will get an overview of the state of Gephi, set up a more technical road map (identify the main challenges, decide of the best course of action) and code part of it – in short, we will push the cart further. Furthermore, we will get to know each other better and have some good time together.

HOW TO APPLY: Send an email before September 15 (2021) to the organiser: mathieu.jacomy@gmail.com
Note from MJ: Some of you have already applied, thanks a lot! If I’ve answered you, you’re in the candidates’ list.

What’s next: We will select participants for this first issue, tell each of you whether you have made it or not, settle the dates with participants, and prepare the retreat together. That’s a first time for us but we plan to do it again next year and on, see our road map for more info.

Feel free to ask if you have any question (to my email above, in comments, or via Twitter to @Gephi).

Note: We will also organize some sort of side event bridging over the dev/academia demarcation, because the retreat is hosted by a university and because Gephi naturally drives hybrid interest. If you’re interested in the research side of this, it might be even more interesting to you. More on that later on!

Gephi road map, Summer 2021

This road map states, in short, Gephi’s priorities, long-term and short-term goals, challenges in various areas, and way to go.

Project vision

Gephi is multiplatform, open source, installable, extensible by the community, and with local-based files.

Gephi is an opinionated take on network analysis, and is not intended to be the only network analysis tool. Its focus is visual interaction, and a scalable workflow from 10 to 10,000,000 nodes (assuming enough computing power). Its core features are visualizing, filtering networks, and computing statistics. Gephi is exploration-oriented: visualize primarily for yourself, secondarily for others. More info on the community of Gephi users in this post.

Priorities

  1. Sustainability. Notably maintenance: Gephi needs to work before anything else. This includes: being easy to install (including Java) on all platforms, having the UI work in various screen resolutions and sizes, stability, fix major bugs, and have a sufficiently clear and documented codebase that multiple developers can understand it and contribute.
  2. Version 1.0, i.e. current Gephi with a consolidated set of features. We want to release a coherent version of today’s Gephi before discussing new directions to explore.
  3. Stabilizing core contributors. This entails institutional support, fundraising, and discussing governance.
  4. Other. Community tools and online presence (forum, website…). Plugins. Web integration (Gephi JS). Evolution of Gephi. Documentation, tutorials and teaching material. Dev community (code examples). Keeping Gephi state-of-the-art over the long term.

Project road map

Until Winter 2021: Gephi dev campaign.

Goal: enrol new developers in the project.

Fall 2021: Gephi codebase sustainability retreat.

Goal: train new Gephi developers, iterate over the technical road map to Gephi 1.0 and discuss its implementation. Set concrete sustainability goals for 2022.

We will invite ~5 developers for a 1-week code retreat in Copenhagen, compensation 100€/day.

2022: Fundraising for Gephi v1.0

Goal: explore opportunities, small (Google Summer of Code, Outreachy) and big (institutional funding, crowdfunding).

2022: Reach Gephi’s sustainability goals

Goal: make Gephi sustainable again.

Fall 2022: Gephi codebase sustainability retreat, 2nd edition

Goal: train new Gephi developers and iterate over the technical road map to Gephi 1.0

2022-2023: prepare and release Gephi V1.0

Goal: get through Gephi’s technical road map to version 1.0, with the help of the newly trained developers, and the funding. Release Gephi 1.0.

2023: Gephi 1.0 workshop

Goal: celebrate the release of Gephi 1.0. Recruit new contributors. Iterate over the road map. Prepare the future.

Technical road map to Gephi 1.0

This technical road map was largely established in 2018, more on that in this post. Additionally, design guidelines presented in that post.

  • UNDO feature, limited to the “GEXF scope”: network data, metadata, positions, sizes, colors…
  • Default save to GEXF. More stable than “.gephi” though it does not save the state of the user interface.
  • Activity log, possibly coordinated to undo, possibly stored in the GEXF. A plugin is already exploring that direction.
  • Parallel edges. The GraphStore supports it but not the rest of Gephi.
  • New OpenGL engine. Eduardo already prototyped it. It is better but also solves maintenance issues.
  • Curved edges in visual exploration. These are important because they help identifying edge orientation.
  • Quick search in nodes and metadata. It turns out it should be pretty easy to implement.
  • New icons. Many resources are now available to do better and the technical part is trivial.
  • Cleaner data laboratory
  • Update to the latest Netbeans Platform
  • Embed Java: no more hassle with installing the right Java version.
  • Install from MacStore. Easier for Mac users.
  • Fix filter composition.
  • Revamp appearance (label color & size, sliders). For instance incentivize rankings as opposed to default unitary mode.
  • GDPR compliance (bug reports contain PII at the moment)
  • Logging (much more logs to facilitate debugging from crash reports)
  • Instrumentation (opt-in statistics about feature usage and crashes)
  • Unit testing (Gephi codebase has 0 unit tests, only Graphstore. Cover the basics like .gephi i/o, filters…)
  • Better statistics reports in HTML5.
  • Label anchor (start, middle, end)… and possibly some jitter.
  • Better label adjust (one that works better). Possibly with label jitter.

The Gephi paper gets the ICWSM Test of Time Award

Today at the 13th International AAAI Conference on Web and Social Media (ICWSM) the “Gephi paper”, published ten years ago in the same conference, obtained the Test of Time Award.

I (Mathieu Jacomy) attended the conference and received the award on our behalf. I had the occasion to say a few words, which I share here in a slightly redacted form.

Let me fix a misunderstanding, and pay my debt by acknowledging three persons.

This paper is the “Gephi paper”. If it is still cited 10 years later, it is not because its content is decisive. It is because researchers use Gephi. The paper is a proxy. I thank these researchers, their citations matter to us. And the people who get this award, really, are the Gephi contributors.

It also matters that Gephi has been made by software engineers, not computer scientists. Mathieu Bastian, 1st author, is CTO of a startup in Berlin. Sébastien Heymann, 2nd author, is CEO of his own startup. The award goes to us, but also secretly to Eduardo Ramos Ibáñez. He is not an author of the paper, but the current lead developer of Gephi, and his invisible work has been crucial to maintaining Gephi to this date. As for me, the designer of Gephi (to make it short), after 10 years as a research engineer I finally decided myself to get a PhD, in a techno-anthropology lab, and I enjoy the irony of receiving a test of time award during that time.

I think a few things deserve to be stated on the occasion of this award. Science is not only done by researchers, of course. Research engineers also do science, although sometimes indirectly. Some designers as well. And some researchers also do engineering or design. This non-strictly-academic work is not so visible. Having a proxy paper for Gephi, and getting this award, help to make this work visible.

The reasoning of the ICWSM committee was pretty much the same, which I greatly appreciate. On behalf of the Gephi team, we sincerely thank the academic community for its outstanding support.

PS: We dedicate this award to our beloved professor, Franck Ghitalla, who passed away in December 2018. We did not left the way to knowledge he showed us.

Exploring the dystopian future of a Javascript Gephi

Despite Graph Commons, Graphistry, Linkurious or Keylines, there is no equivalent of Gephi in web technologies – notably free and open source. But what if?

We gave a talk at the FOSDEM 2019 on that matter. “We” is Eduardo Ramos Ibáñez, our lead developer, and Mathieu Jacomy, co-founder and network science researcher, teaming up with experts of Javascript network visualization Alexis Jacomy (Sigma.js) and Guillaume Plique (Graphology). The FOSDEM (Free and Open Source Developers’ European Meeting), is a two decades old conference hosting about 4000 visitors every year. The 2019 edition featured 711 speakers, 746 events, and 62 tracks. It is a major moment of the European open source community. You can look at our ~40 min talk in video below, served with its slides.

Our slides:
https://docs.google.com/presentation/d/1SAvbDRgDVLOt5VO_hu0QPDc1OL55yVUlyFGBmj7UrSQ

A quick summary of our talk

We have to face it: the multiplatform is moving from Java to web technologies. Oracle wants a Java that powers backends, not a user interface framework. Gephi’s OpenGL engine has maintenance issues (JOGL is not maintained anymore). At the same time, modern Javascript is powerful and online graph visualization is a thing. So, should we move to web technologies?

Eduardo has developed from scratch a new rendering engine that fixes our current OpenGL issues and improves the performance of Gephi. It has a lower CPU overhead, which provides a better scalability and better leverages the GPU power. It can be used as a library, and it crashes less thanks to its ability to fall back on supported features on older graphic cards. A key to these benefits is the shader-based architecture. Though the engine is still lacking some features (labels…) a demo is available on GitHub, (requires you to build).

On the Javascript side of the situation, graph visualization can be surprisingly efficient, but it comes with a specific kind of challenges. On the bright side, developing interfaces for the web is easy, that is even what it is made for! Web apps are portable, can work on mobiles and tablets, and even be packaged as applications (Electron). But on the flip side, memory boundaries are unpredictable (we cannot tell when an app will crash because of RAM usage), there is no proper multi-threading, and WebGL is only a subset of OpenGL. Gephi is in a specific spot because it is not a simple app (graph visualization has its own requirements) but at the same time we want to benefit from the traditional web app development to improve the user experience. Because of that the classic web development strategies are not sufficient, but we do not want to embrace the “web as a JVM” perspective of compiling C++ or Rust to WebAssembly. “Gephi JS” would need a hybrid approach. It would also require to rethink current Gephi, but this is something we are going to do anyway.

We have made a small indicative benchmark comparing current Gephi engine to the new OpenGL engine and to a WebGL engine (Sigma.js v2 alpha). It turns out that the current Gephi engine is sensibly outperformed even by the WebGL engine, as you can see below!

As you can see below, all engines experience a performance drop around 10-100 thousands nodes or edges. The intensity of this drop varies, but it is pretty clear that after 10 million items, a normal computer cannot display a network smoothly enough to allow interactions (it lags too much). That being said, scaling up to hundred thousands nodes/edges is quite a lot already!

Ultimately, we believe that web technologies are the new multiplatform for graph visualization. It comes with very real challenges, but it is also a perfectly valid option. It does not mean that we will drop the Java Gephi, but that we are starting to think Gephi as a project hosting multiple tools and not only as a single piece of software, and that the web technologies will be part of its future.

A screenshot of the new OpenGL rendering engine