Posts by Mathieu Jacomy

Creator of Gephi (but not lead developer!). Engineer and researcher at Sciences po médialab. I'm specialized in digital methods in social sciences, sometimes called digital sociology or digital humanities.

Transition to semantic versioning

You may have noticed two things.

First, we released several versions recently. Gephi 0.9.2 was released in September 2017. Then the version 0.9.3 in March 2022, with a 4+ years hiatus. Then 0.9.4 three weeks ago, in April 2022. Then 0.9.5 a week ago, in May. What is going on?

Second, the splash screen. Version 0.9.1 and 0.9.2 were like this:

Then version 0.9.3 changed color, to mark the end of the hiatus. It was somehow a big change.

…but since version 0.9.4 it just features the version number 0.9 (without the last bit):

What is going on?

We are transitioning to semantic versioning

Semantic versioning is a certain logic to attribute version numbers. It uses three numbers:

  1. The major version number. It is incremented when the new version breaks compatibility. It matters to the user because they may not want to upgrade, or not yet, because it may break for them.
  2. The minor version number. It is incremented when features are added, but in a compatible way. The user generally wants to upgrade, but may not like the changes, so we need a way to refer to that specific version.
  3. The patch number. It is incremented when it is just bug fixes. The user always wants to upgrade.

So far, Gephi was not versioned that way. In part because numbers have a cultural meaning. For example, 0.9 feels like we are getting close to version 1.0, and in many ways we are getting closer, but at the same time this does not have to do with the fact that we have release about 10 major versions. If it takes us 15 steps, then so be it. The next minor version will be 0.10 – which is not like version 0.1! Weird, but version numbers do not work like decimal numbers.

Semantic versioning is a reasonable and safe way to version, and we are getting there, but not in one go. We will only fully do that from version 1.0.0, which requires meeting a number of goals on our road map. Nevertheless, we need to be able to push bug fixes to everyone, and for that, the best is to use the patch number. Which is why we had two more versions in just a month. But it remains less dramatic than the move from 0.9.2 to 0.9.3, which is why we removed the mention of the patch number from the splash screen. Like for semantic versioning, new patch versions should not feel like new versions to the user. And importantly, patch versions are pushed to all users through the automatic update mechanism. So to recap, it works like this:

As final word, you may wonder: why not have the post-hiatus version numbered 0.10.0 instead of 0.9.3? You’d be totally right! It should have been. We did not realize it soon enough, unlike the most savvy members of our community. Oops! But we’re doing it now. Better late than never 🙂

Gephi 0.9.3

Gephi 0.9.3 is here! Download it from http://gephi.org. No crazy new features, but many improvements and bug fixes. Here are 5 highlights:

1. No more Java installation required.

Gephi is installed on top of Java, so you had to install it before Gephi. On some computers, a bug related to theJava path caused the infamous “Cannot find Java 1.8” issue. This does not happen anymore, as you do not have to install Java anymore! It is packaged with Gephi.

2. New look and feel

Gephi now has a flat look and feel. This is much better for Linux users, as well as some Mac users who had issues with the appearance (one did not see which tab was selected etc.).

3. New community detection algorithm via statistical inference

Tiago Peixoto attended the code sustainability retreat 2021 and we implemented a version of his approach to community detection. It uses the same convergence heuristic as the Louvain algorithm (“modularity” in Gephi), also looks for assortative structures, but optimizes a different criterion, based on Bayesian inference.

You can look at Tiago’s blog for more information about it, or the two papers it is based on:

4. GEXF 1.3

The file format often used with Gephi, GEXF, has been updated to 1.3. This version is more mature and reliable than the previous one, and is implemented in Gephi. Check the announcement for the final specification there:
http://gexf.net/history.html

5. High DPI screens

High resolution screens are mainstream. We corrected a number of issues to fully support them.

List of improvements

You can check our changelog in the release page:
https://github.com/gephi/gephi/releases/tag/v0.9.3

For plugin maintainers

Check a specific announcement about your plugin right there:
https://github.com/gephi/gephi-plugins/discussions/245

More about this update

This update is a follow-up to the code sustainability retreat 2021. You can read our report in our last blog post. Not all the features we discussed and worked on are included in this update. We are working on it, but it’s just better to release updates as soon as usable improvements are ready.

Check our road map as of Summer 2021 to have a better idea of where we are and where we go. We are still working on big features such as a new graphic engine and the infamous undo feature.

We will also have a code sustainability retreat in 2022 with funded travel and accommodation, if you are interested in contributing to Gephi. We will almost certainly hold it the week of the 29th of August. A call for participation will come soon.

As usual, please share your experience/feedback on our Facebook group or on Twitter.

Gephi code sustainability retreat 2021: debriefing

From 29 November to 3 December 2021, 6 people met in Copenhagen to make the Gephi codebase more sustainable, discuss the project, pave the way to the long-waited version 1.0, and improve the tool overall. Before the event even started, it had already tackled its main goal. In this post we present what we have done.

The event was sponsored by the Aalborg University TANT Lab in Copenhagen, whom we warmly thank, because they basically funded the whole thing, and hosted us. The participants featured a whooping 50% of Mathieux, and more importantly, a healthy mix of skills:

  • Mathieu Bastian, who was Gephi’s lead dev for many years, and knows the codebase like none;
  • Mathieu Jacomy, who designed Gephi’s UX and some of its algorithms, and organized the event;
  • Matthieu Totet, who authored the Gephi Twitter plugin;
  • Eduardo Ramos Ibáñez, Gephi’s current lead developer;
  • Tiago Peixoto, world-class expert on community detection;
  • and Antonin Delpeuch, the main developer of Open Refine.

We spend the first two days discussing and preparing stuff, and the last three to code. The coding went surprisingly well, compared to many comparable situations, for example hackathons. That is why I wrote above that it had tackled its main goal before it even started: the codebase is sustainable. I know someone who cleans their house before the house cleaning service comes, because they don’t want to expose the actual messiness of their lives. Similarly, Mathieu Bastian and Eduardo had so well prepared the codebase in anticipation of the retreat that there was nothing left to do on that front. Bravo!

Code sustainability was our main goal, but that is not where it ends. We changed our perspective in the process. Let’s start with the main takeaways: goals met or not, and what we have learned.

Takeaways

Code sustainability. The codebase is in a good shape and developers can healthily engage with it. However, we can improve the documentation and entry points for aspiring developers.

Enrolling new developers. This did not work so well, but we will get there. We hoped for more new developers to come to this retreat. We had room and funding for two more people, so our call for participation did not work great. However, some developers manifested themselves during the event, when we started communicating about the retreat on Twitter. We aim at more participants next time.

Anticipating the fundraising phase. We learned a lot from the Open Refine project through Antonin. They have a fiscal sponsor: a structure that represents them legally and manages money, allowing them to collect funds and pay developers. This is what we need, and our next step will be to seek one.

Goodwill. There is still a huge amount of it around the project. The response of the public to this event was outstandingly positive. This is important notably to raising funds.

Dev infrastructure. The GitHub issues system, that we use to track and fix bugs, does not work that great for us in practice. We are thinking of an alternative.

Web presence. Reflecting on our website, blog, and online tools was an explicit non-goal: we decided to focus on that another time. But we could see that it definitely required a good reworking. We are aware of it.

Governance. As the project involves funds and more people, we will need to change our model. We want to talk about it with potential fiscal sponsors.

Mailing list. We have to make one, dedicated to developers, at least for the moment.

Achieved

Here is a summary of what we have achieved during the three coding days. Note that it does not directly translate into a release. You will have to wait for Christmas at least for that (it requires more work).

  • We triaged a lot of issues and tested the contribution process.
  • We fixed a bunch of bugs on our bug bash.
  • We defined and enforced code style on the repository, making it simpler going forward to collaborate between developers.
  • We made the project saving/loading more resilient, preventing users from losing their work due to corrupted .gephi files.
  • We embedded the Java JRE on the Windows and Linux installation, so that users don’t need to install Java by themselves anymore.
  • We migrated the localization system from Transifex to Weblate, making it easier to translate Gephi.
  • We made unit testing easier so that developers are more productive.
  • We integrated the new visualization engine into the Gephi desktop app. We got it working the first day, but without workspace switch support.
  • The following days we implemented the workspace switch, added support for High DPI screens, and got most of the interactivity and tools working fine.
  • We implemented Tiago Peixoto’s statistical inference algorithm for community detection and its unit tests (in progress).
  • We sketched a specification for the undo/redo feature.

Discussed

We discussed the state of the project in various ways during the first two days: its community of developers and users, its scientific state, the infamous (lack of) undo, our road map, governance, funding… We learned a lot from Antonin (Open Refine), and on community detection algorithms from Tiago. That part is hard to transcribe here, but I will write down a few knowledge points we have established.

Our community is broad. It consists of developers and users, and we find those in various scientific fields: digital humanities, SNA (social network analysis), network science (notably teaching). Outside of research, we also find users in data journalism, activism, and in the industry: SEO (search engine optimization), social media listening, patents and papers analysis, intelligence (OSINT), and cybersecurity. We noted that Open Refine does user surveys to know their community better (we’ve done so in 2016).

Gephi is not a commercial product. By that, we mean that we do not want to make Gephi for a specific public. We do not aim at normalizing or formatting usage. We just want to help different kinds of people, even when they want different things. [Mathieu Jacomy’s note: as I am writing this I realize that part of our audience will rightfully remark that we do have methodological commitments and that we necessarily shape usage. Hence this precision:] In other words, contrary to a company whose interest might be to serve certain consumers to the detriment of others, for example because they have higher purchasing powers, we do not have a fixed persona in mind when we make Gephi. We aim at satisfying the existing users including those who have marginal needs. In short, the features are decided on the pragmatic ground of usefulness to people versus implementation difficulty.

Do people leave Gephi and why? We don’t think we have a “users leave Gephi” problem. Here is what we believe: many users naturally move to more advanced tools, yet they may go back to Gephi on various occasions, because it’s easy to use. Using Gephi is sporadic anyway (one does not need to use it every day). That being said, some developers leave Gephi when they write code, because it’s easier to script (e.g. Python).

Web presence. Nice things we want to have: an introductory video to Gephi, a list of the best tutorials produced by the community, a simpler website because there is too much irrelevant information, the content for developers should be moved somewhere else (e.g. GitHub), remove the content about the Gephi Consortium (obsolete), a unified navigation bar over our different online spaces, a YouTube channel, and a way to promote the good content produced by the community.

Book. We would like to write one. Or a MOOC. The Gephi 1.0 release would be the ideal moment.

Community detection. Tiago Peixoto presented his work on the topic. He champions a Bayesian inference approach and considers modularity maximization as an obsolete (disproven) method. He also acknowledged that not everyone necessarily agrees in the research community. He documented his perspective in a series of blog posts (1, 2 & 3). We collectively agreed that we would keep the current popular Louvain method, add the more recent Leiden method, and add Tiago’s statistical inference approach. We will also group them in a specific section of the statistics panel, so that the users can identify them as alternatives, and muscle their own critical thinking by engaging with them.

Undo/redo. We learned how it is done currently in Open Refine, the problems it creates, and how it could be done better. The feature is doable in Gephi, and we sketched an architecture.

Funding. From Antonin’s feedback on the Open Refine project, we realized that what we needed was a fiscal sponsor, for instance Code for Science and Society. We also aggregated a list of possible funders: crowdfunding, Google Summer of Code (good to engage devs over the long run), Outreachy (idem), Chan Zuckerberg Initiative…

Feedback

Here is what we thought of how it went, to remember for next time.

To be improved: organization of travel and accommodation; we should identify decisions when we make them and note them apart; we should record some of the talks, notably the introduction to the codebase; it would be nice to have a (social) occasion to interact with Gephi users; the big map of the source code was not useful.

Went well: the coding was well prepared, thanks to Mathieu Bastian and Eduardo; the coding went often beyond our expectation; knowledge exchange across Gephi and Open Refine was great thanks to the meeting being in person; the live tweeting was engaging to our community including developers; the T-shirts are nice.

To consider for next time: live-streaming moments; a hybrid format or possibly a 100% online edition; preparing explainer videos.

Photos

Here is a small selection ofpictures of us at work, to get you an idea of what it looked like. We hope it makes you consider joining next time! Also, in the meanwhile, consider proposing a talk and attending the Open Research Tools and Technologies devroom where, notably, the Gephi team met Antonin: it’s a great place to meet like-minded people.

Mathieu Jacomy explaining stuff
Tiago Peixoto presenting his approach to community detection. On the left, Antonin Delpeuch at work.
Mathieu Bastian discussing how the Gephi architecture could support an undo/redo à la Open Refine (action stack).

In short, this is the Gephi codebase.

Coding session with Matthieu Totet (left), Eduardo Ramos Ibáñez (center), and Mathieu Bastian (right).

Tiago Peixoto monitoring his algo.

Antonin Delpeuch sieving Gephi issues. Behind, a map of the Gephi source code.
Attempts to track the steps of Tiago’s community detection algorithm during a debugging session.

A social event in the company of Ann-Sofie and Martin Grandjean, who visited us.

The end of the retreat intersected with the university’s Christmas party. More social events!

During this party, Matthieu Totet became a Danish legend (here with Anders Munk on the right).

One cannot fully know what will happen at a Gephi coding retreat. Consider applying in 2022!

Call for participants: Gephi code sustainability retreat 2021

We are organizing a code sustainability retreat, and we are looking for Java developers willing to contribute to Gephi’s core codebase over the next few years.

Our goal: Make Gephi’s codebase sustainable, and beyond this, recruit a team of developers into the project in anticipation of a fundraising phase. We believe that Gephi deserves care, that there is enough interest to fund it, and this is our first step to get there.

When: November or December 2021. Exact dates to be announced in September.

How long: About one full week (4 or 5 days).

Where: In Copenhagen, Denmark, and online.

How many participants: We aim at about 5 Java developers, not counting the Gephi core team (2-3 people).

Funding: We pay for travel and accommodation thanks to the sponsoring of Aalborg University. We will also offer a small compensation for the work (~100€/day).

What we will do during the retreat: Our lead developer will share knowledge about the codebase. We will get an overview of the state of Gephi, set up a more technical road map (identify the main challenges, decide of the best course of action) and code part of it – in short, we will push the cart further. Furthermore, we will get to know each other better and have some good time together.

HOW TO APPLY: Send an email before September 15 (2021) to the organiser: mathieu.jacomy@gmail.com
Note from MJ: Some of you have already applied, thanks a lot! If I’ve answered you, you’re in the candidates’ list.

What’s next: We will select participants for this first issue, tell each of you whether you have made it or not, settle the dates with participants, and prepare the retreat together. That’s a first time for us but we plan to do it again next year and on, see our road map for more info.

Feel free to ask if you have any question (to my email above, in comments, or via Twitter to @Gephi).

Note: We will also organize some sort of side event bridging over the dev/academia demarcation, because the retreat is hosted by a university and because Gephi naturally drives hybrid interest. If you’re interested in the research side of this, it might be even more interesting to you. More on that later on!

Gephi road map, Summer 2021

This road map states, in short, Gephi’s priorities, long-term and short-term goals, challenges in various areas, and way to go.

Project vision

Gephi is multiplatform, open source, installable, extensible by the community, and with local-based files.

Gephi is an opinionated take on network analysis, and is not intended to be the only network analysis tool. Its focus is visual interaction, and a scalable workflow from 10 to 10,000,000 nodes (assuming enough computing power). Its core features are visualizing, filtering networks, and computing statistics. Gephi is exploration-oriented: visualize primarily for yourself, secondarily for others. More info on the community of Gephi users in this post.

Priorities

  1. Sustainability. Notably maintenance: Gephi needs to work before anything else. This includes: being easy to install (including Java) on all platforms, having the UI work in various screen resolutions and sizes, stability, fix major bugs, and have a sufficiently clear and documented codebase that multiple developers can understand it and contribute.
  2. Version 1.0, i.e. current Gephi with a consolidated set of features. We want to release a coherent version of today’s Gephi before discussing new directions to explore.
  3. Stabilizing core contributors. This entails institutional support, fundraising, and discussing governance.
  4. Other. Community tools and online presence (forum, website…). Plugins. Web integration (Gephi JS). Evolution of Gephi. Documentation, tutorials and teaching material. Dev community (code examples). Keeping Gephi state-of-the-art over the long term.

Project road map

Until Winter 2021: Gephi dev campaign.

Goal: enrol new developers in the project.

Fall 2021: Gephi codebase sustainability retreat.

Goal: train new Gephi developers, iterate over the technical road map to Gephi 1.0 and discuss its implementation. Set concrete sustainability goals for 2022.

We will invite ~5 developers for a 1-week code retreat in Copenhagen, compensation 100€/day.

2022: Fundraising for Gephi v1.0

Goal: explore opportunities, small (Google Summer of Code, Outreachy) and big (institutional funding, crowdfunding).

2022: Reach Gephi’s sustainability goals

Goal: make Gephi sustainable again.

Fall 2022: Gephi codebase sustainability retreat, 2nd edition

Goal: train new Gephi developers and iterate over the technical road map to Gephi 1.0

2022-2023: prepare and release Gephi V1.0

Goal: get through Gephi’s technical road map to version 1.0, with the help of the newly trained developers, and the funding. Release Gephi 1.0.

2023: Gephi 1.0 workshop

Goal: celebrate the release of Gephi 1.0. Recruit new contributors. Iterate over the road map. Prepare the future.

Technical road map to Gephi 1.0

This technical road map was largely established in 2018, more on that in this post. Additionally, design guidelines presented in that post.

  • UNDO feature, limited to the “GEXF scope”: network data, metadata, positions, sizes, colors…
  • Default save to GEXF. More stable than “.gephi” though it does not save the state of the user interface.
  • Activity log, possibly coordinated to undo, possibly stored in the GEXF. A plugin is already exploring that direction.
  • Parallel edges. The GraphStore supports it but not the rest of Gephi.
  • New OpenGL engine. Eduardo already prototyped it. It is better but also solves maintenance issues.
  • Curved edges in visual exploration. These are important because they help identifying edge orientation.
  • Quick search in nodes and metadata. It turns out it should be pretty easy to implement.
  • New icons. Many resources are now available to do better and the technical part is trivial.
  • Cleaner data laboratory
  • Update to the latest Netbeans Platform
  • Embed Java: no more hassle with installing the right Java version.
  • Install from MacStore. Easier for Mac users.
  • Fix filter composition.
  • Revamp appearance (label color & size, sliders). For instance incentivize rankings as opposed to default unitary mode.
  • GDPR compliance (bug reports contain PII at the moment)
  • Logging (much more logs to facilitate debugging from crash reports)
  • Instrumentation (opt-in statistics about feature usage and crashes)
  • Unit testing (Gephi codebase has 0 unit tests, only Graphstore. Cover the basics like .gephi i/o, filters…)
  • Better statistics reports in HTML5.
  • Label anchor (start, middle, end)… and possibly some jitter.
  • Better label adjust (one that works better). Possibly with label jitter.

The Gephi paper gets the ICWSM Test of Time Award

Today at the 13th International AAAI Conference on Web and Social Media (ICWSM) the “Gephi paper”, published ten years ago in the same conference, obtained the Test of Time Award.

I (Mathieu Jacomy) attended the conference and received the award on our behalf. I had the occasion to say a few words, which I share here in a slightly redacted form.

Let me fix a misunderstanding, and pay my debt by acknowledging three persons.

This paper is the “Gephi paper”. If it is still cited 10 years later, it is not because its content is decisive. It is because researchers use Gephi. The paper is a proxy. I thank these researchers, their citations matter to us. And the people who get this award, really, are the Gephi contributors.

It also matters that Gephi has been made by software engineers, not computer scientists. Mathieu Bastian, 1st author, is CTO of a startup in Berlin. Sébastien Heymann, 2nd author, is CEO of his own startup. The award goes to us, but also secretly to Eduardo Ramos Ibáñez. He is not an author of the paper, but the current lead developer of Gephi, and his invisible work has been crucial to maintaining Gephi to this date. As for me, the designer of Gephi (to make it short), after 10 years as a research engineer I finally decided myself to get a PhD, in a techno-anthropology lab, and I enjoy the irony of receiving a test of time award during that time.

I think a few things deserve to be stated on the occasion of this award. Science is not only done by researchers, of course. Research engineers also do science, although sometimes indirectly. Some designers as well. And some researchers also do engineering or design. This non-strictly-academic work is not so visible. Having a proxy paper for Gephi, and getting this award, help to make this work visible.

The reasoning of the ICWSM committee was pretty much the same, which I greatly appreciate. On behalf of the Gephi team, we sincerely thank the academic community for its outstanding support.

PS: We dedicate this award to our beloved professor, Franck Ghitalla, who passed away in December 2018. We did not left the way to knowledge he showed us.

Exploring the dystopian future of a Javascript Gephi

Despite Graph Commons, Graphistry, Linkurious or Keylines, there is no equivalent of Gephi in web technologies – notably free and open source. But what if?

We gave a talk at the FOSDEM 2019 on that matter. “We” is Eduardo Ramos Ibáñez, our lead developer, and Mathieu Jacomy, co-founder and network science researcher, teaming up with experts of Javascript network visualization Alexis Jacomy (Sigma.js) and Guillaume Plique (Graphology). The FOSDEM (Free and Open Source Developers’ European Meeting), is a two decades old conference hosting about 4000 visitors every year. The 2019 edition featured 711 speakers, 746 events, and 62 tracks. It is a major moment of the European open source community. You can look at our ~40 min talk in video below, served with its slides.

Our slides:
https://docs.google.com/presentation/d/1SAvbDRgDVLOt5VO_hu0QPDc1OL55yVUlyFGBmj7UrSQ

A quick summary of our talk

We have to face it: the multiplatform is moving from Java to web technologies. Oracle wants a Java that powers backends, not a user interface framework. Gephi’s OpenGL engine has maintenance issues (JOGL is not maintained anymore). At the same time, modern Javascript is powerful and online graph visualization is a thing. So, should we move to web technologies?

Eduardo has developed from scratch a new rendering engine that fixes our current OpenGL issues and improves the performance of Gephi. It has a lower CPU overhead, which provides a better scalability and better leverages the GPU power. It can be used as a library, and it crashes less thanks to its ability to fall back on supported features on older graphic cards. A key to these benefits is the shader-based architecture. Though the engine is still lacking some features (labels…) a demo is available on GitHub, (requires you to build).

On the Javascript side of the situation, graph visualization can be surprisingly efficient, but it comes with a specific kind of challenges. On the bright side, developing interfaces for the web is easy, that is even what it is made for! Web apps are portable, can work on mobiles and tablets, and even be packaged as applications (Electron). But on the flip side, memory boundaries are unpredictable (we cannot tell when an app will crash because of RAM usage), there is no proper multi-threading, and WebGL is only a subset of OpenGL. Gephi is in a specific spot because it is not a simple app (graph visualization has its own requirements) but at the same time we want to benefit from the traditional web app development to improve the user experience. Because of that the classic web development strategies are not sufficient, but we do not want to embrace the “web as a JVM” perspective of compiling C++ or Rust to WebAssembly. “Gephi JS” would need a hybrid approach. It would also require to rethink current Gephi, but this is something we are going to do anyway.

We have made a small indicative benchmark comparing current Gephi engine to the new OpenGL engine and to a WebGL engine (Sigma.js v2 alpha). It turns out that the current Gephi engine is sensibly outperformed even by the WebGL engine, as you can see below!

As you can see below, all engines experience a performance drop around 10-100 thousands nodes or edges. The intensity of this drop varies, but it is pretty clear that after 10 million items, a normal computer cannot display a network smoothly enough to allow interactions (it lags too much). That being said, scaling up to hundred thousands nodes/edges is quite a lot already!

Ultimately, we believe that web technologies are the new multiplatform for graph visualization. It comes with very real challenges, but it is also a perfectly valid option. It does not mean that we will drop the Java Gephi, but that we are starting to think Gephi as a project hosting multiple tools and not only as a single piece of software, and that the web technologies will be part of its future.

A screenshot of the new OpenGL rendering engine

Meet Eduardo, our new lead developer

Mathieu Bastian has been our lead developer for more than ten years. He is now the proud father of an adorable little girl, congratulations! 🍾🎈🎉 At this occasion he decided to step down from his leading role in Gephi development, and hand over the reins to Eduardo Ramos Ibáñez.

Mathieu has been the true architect of Gephi’s source code. Not only is he its most prolific author, but also the engineer reflecting on its structure, drawing its blueprints, and building the foundations. He transformed my clunky 2007 prototype into an actual software over half a dozen complete refactorings, never drawing back from facing challenges. We owe him everything that makes it work: the ability to be installed, to be maintained, to have plugins… There would be no Gephi without him. Today is a good occasion to write thank you Mathieu for your years of service to the project! Fortunately he will stay an active member of the community – we would be lost without his invaluable knowledge on the most intricate depths of Gephi’s source code.

So who is Eduardo? Let him present himself:

I am a spanish software developer, currently living in Madrid and I have been helping to maintain Gephi for several years. I love to create interesting software and trying to push its limits, specially data visualization!

I am kind of a progressive music fan, and a cat lover 🙂

You can follow or contact me on twitter @eduramiba

Eduardo is the person who knows best Gephi’s source code after Mathieu, and it is only natural that he is the next in line to lead development efforts. You already know his work since he almost entirely developed the data laboratory, but as often an important part of his contribution is not so visible – maintaining the source code, fixing this bug… This is how he became an expert of Gephi development over the years. He is now developing a new OpenGL engine for Gephi 1.0. Welcome Eduardo, and thank you for stepping up to this new role!

If you want to know more about the situation and future of Gephi, we wrote about it in this blog post.

Is Gephi obsolete? Situation and perspectives.

Note: because it’s a bit long, you can also read it on Medium.

Despite years of collaboration, for the first time, Eduardo, Mathieu and I sit together at the same place. The Gephi community mainly exists online, and its members have few occasions to see each other in person. But we have to talk. Mathieu Bastian is Gephi’s lead developer and currently lives in Berlin. Eduardo Ramos Ibáñez is the second most prolific contributor after Mathieu and the only other one to know Gephi’s core in depth. He lives in Madrid. As for me who started our project, Mathieu Jacomy, I live in Paris. We just arrived in Berlin to have an in-depth talk about Gephi: state of the project, its relevance, its future. Our goal is to question the Gephi project and reevaluate our commitment to it. We need a picture of the different options. We start by the elephant in the room: is the project still worth it? Here is our answer.

What is wrong with the Gephi project

We aim at identifying the project’s strengths and weaknesses. It is not only about evaluating if its benefits counterbalance its issues, but also about finding the right course of actions. Let us start with the problems.

A common issue to niche open source projects, our most limited resource is technical leadership. What does it mean? It is a consequence of Gephi’s code being fairly complicated. Fortunately this is not an issue for all contributors, for instance it is pretty easy to implement a statistics plugin. Many parts of Gephi could be improved by plugin developers, but not all parts. Sometimes we need to modify architecture itself, or a deep and specific part like the GraphStore engine. When it comes up, only a few community members are competent. Namely Mathieu and Eduardo. Coding these parts would not require crazy skills, but a fair amount of Gephi-specific knowledge. Unfortunately that knowledge is imprisoned in the brains of two people (well it’s still better than one!). This is what we call the bottleneck of technical leadership. We may choose to fix core issues ourselves or disseminate the knowledge to other developers, but both scenarios require the precious time of Eduardo and/or Mathieu.

Technology is changing, we must adapt, and it wears out technical leadership. It is obvious to developers but not to users: we cannot just produce a version of Gephi that works well and let it be. It would stop working because technology changes. New versions of Java, new operating systems would break features that work well in today’s environment. Sadly when incompatibilities arise, it is generally for the core developers to deal with. We were in such situation before version 0.9, at a moment when the new GraphStore engine was not ready yet but the Java compatibility broke and during that time, Mac users were not able to use Gephi without a convoluted turnaround. We are not sure to be able to keep up efficiently with these changes because of our limited technical leadership.

Technology evolves in an unfavorable direction. User experience is at the center of the Gephi project. Unfortunately it seems that the Java language tends to drift away from user interface design and development. Admittedly, it has never been a strength of Java. This technology does not support modern UI design – I feel like Java assumes that the UI will be developed by an engineer rather than a designer. It may become even worse. With the obsolescence of OpenGL on Mac and the removal of JavaFX from the runtime environment, we could live in a world where multiplatform softwares have a Java brain and a web face. Gephi is based on the JOGL library whose development is increasingly uncertain, which forces us to consider alternatives like WebGL. We understand that it makes sense to delegate modern UI design to a well-established environment (HTML5 and friends). However WebGL is far from OpenGL stability and performance. We think that from the user stance, Gephi is a lot about forging one’s network exploration and analysis skills on small and easy cases, and scaling them up to larger, more complicated cases. Thanks to its OpenGL engine, it is able to work almost as well for networks of hundred thousands nodes than of tens of nodes. If the ability to visualize huge datasets is key to Gephi, then web technologies are not a viable alternative. We have no definite solution to this issue and we might be facing a technological dead-end in a not so distant future.

Gephi is not only about tech. As a projet it must also face the changes in the lives of its key contributors. Mathieu just had his first child, and more generally our careers follow their own paths that do not always align with the needs of the project. On the one hand we become more efficient at what we are doing, but on the other hand we have less and less time to dedicate to the project. In fact, we just have less spare time. We do not want Gephi to die but we are at risk of becoming tired of the burden it represents. We did not lose our desire for this unexpected journey, but reality often knocks on the door and it would be dishonest to omit this aspect of the situation.

Finally user needs are also changing. Users can access many other systems for network analysis and visualization. A market of web-based solutions emerged and each system found a niche to settle in. A landscape of network tools. Gephi is not necessary anymore, if it ever was. Complex networks were once the most fashionable trick of social science pioneers in a big data world, but now they have UMAP and deep neural networks. Complex networks entered a “business as usual” era. They ceased to draw the attention of the most creative minds. Complex networks had their moment, and it passed. We do not think that it is bad or sad, it might actually be a chance. Nevertheless the context has changed and it is possible that Gephi is not anymore what people need. So what do they need?

What is right with Gephi

We believe that Gephi actually still meets some needs, sometimes in its own unique way. Note that these ideas are not the outcome of a systematic study, but stem out of our empirical contacts with users, during workshops, online, or in our everyday lives. Eduardo, Mathieu and I were pretty convergent in our feelings.

First of all Gephi still has a public and it lies mostly in the sphere of education and research. The Facebook community is active and often features the visualization of digital data in a social science perspective, such as Twitter networks. Since it is the main place where to ask for help, it also attracts a certain amount of exotic tinkering and experimentation. The Gephi community is more than just about using the software, it is also a space where people share what they have done, discuss various topics, and get feedback. It has something of a subculture. We believe that Gephi has some appeal to curious minds, and that it helps a certain public getting engaged with network analysis. Following who mentions Gephi on Twitter also made us realize that “Gephi” is sometimes used as a label to refer to a visual exploration. This seems to be particularly the case in social network analysis (SNA), the community where Gephi spread the most. Since they emerged, digital humanities also made a wide use of Gephi. From what we observe Gephi tends to be more used in social science and by beginners, but it is nevertheless used in natural sciences and by advanced users like data scientists. We can measure its success in the research sector by its 3780 citations (counted in Google Scholar). This public probably finds something in Gephi that it does not find elsewhere, even if just that it is free. This fairly large amount of users is still a good reason to keep maintaining and developing it.

Gephi also has some specificities that could be lost with it in the unfortunate event that its development comes to an end. It has its niche and many users value it for what makes it special. We believe that this specificity comes in three parts. (1) It is a free software that you can install easily on multiple platform. This make it one of the few inexpensive options for teaching, workshops etc. (2) It approaches network analysis from a graphical and interactive perspective that is more intuitive than the math equations of graph theory. It can be understood by non experts such as students and data journalists or social science researchers reaching out of their core competencies. (3) It allows you to scale up your network analysis and exploration skills to much bigger networks. Its learning curve bridges small qualitative networks with large quantitative datasets. The effects of complexity and the way you explore data will be very different but the basic tools at your disposal will stay the same (layout, statistics, filtering…). Gephi is an all-around tool that allows beginners to understand the gist of network exploration. It is at its best in a pedagogical setting where people will leverage practice to improve their data analysis skills.

I want to mention that some of the things that make Gephi appealing are not, in our views, essential. We are well aware that Gephi allows to produce impressive images and that the sight of a spatialization layout unfolding a network have something fascinating. They certainly are an important factor in its success. They also play a role in user engagement with data, which is key to progressing in data science. However these attractive features only make sense insofar as they lead users to improving their network analysis skills. Though Gephi may be used to produce “data porn”, we believe it does not end there. Toying is just the first step towards the ability to get insights out of networks. Other devices might produce evocative visualizations, but Gephi is one of the few that actively leverage play to arouse interest for science (in the field of network analysis).

Where the Gephi project currently stands

Gephi is not the only software for network analysis and more importantly, it does not want to be. Depending on one’s style and skills, other options might be preferable. NetworkX might be more flexible if you know Python. To draw diagrams you should head for GraphViz. As a biologist, Cytoscape is the tool your community is using… and have you tried NodeXL? Different devices do different things and Gephi does not want to be all of that. In the past we have been tempted to build a generic tool for any kind of network, even the dreaded dynamic hierarchical mixed weighted graph. We now want to focus on what Gephi does best and articulate it with other tools that have specific benefits.

We think that Gephi’s niche is visual, interactive exploration of common types of networks with a set of features that are not too specific, and that scale to large number of nodes and edges. We have observed that most users tend to explore networks of multiple orders of magnitude: from 10 to 10K nodes, or from 100 to 10M nodes… We believe that it is a key feature. Conversely we do not believe that producing a static map is its main mission. Other tools are in a better position for that task, and we prioritize exploration features over graphic outputs. Instant visual feedback central to Gephi’s identity. What it is in the best position to do, is making things visible when users apply an algorithm to their network. Fostering this kind of awareness helps users reflect on their method, make sense of their activity, and streamline their workflow.

The Gephi Toolkit has lost most of its relevance. Graph processing libraries like NetworkX have matured and feature most if not all operations you can do in Gephi. The toolkit is basically a separate branch of the project that requires a certain amount of maintenance. It drains forces from the main project. Considering that Gephi’s source code is open and that it is possible to tinker experiments without the Toolkit, we believe that it would make sense to discontinue it – though we did not officially pull the trigger so far.

Refocusing Gephi is not only about removing parts, but also filling holes. For instance though we will deprecate hierarchical graphs because they are not so common, we consider supporting parallel edges, well represented in datasets. In the same spirit, because spatialization layouts are so central to user experience, we consider adding algorithms evaluating the quality of a layout and other features supporting visual network analysis. For instance we believe that edges visualization should be improved in the exploration panel. Last but not least, refocusing Gephi is also about reordering the general user interface to put emphasis on what is important and simplifying what is not. Reflexions about Gephi’s future user interface have already been presented in a previous blog post.

Finally it is worth talking about the project. We like that Gephi is opinionated, multiplatform, free, and open source. We do not want to change any of that. We will not go as far as writing a manifesto but we state here that Gephi is not a company, we do not want it to be company, and it will not become one. This does not mean that there can be no economic activity involving Gephi, but that when it happens it is not hosted by the project. So what is the Gephi project? An informal network of contributors that involves multiple individuals at various degrees, with no clear boundaries, and where anyone can bring their own thing to the project. However being free and open does not mean that we have no structure: the GPL 3 licence protects the project, codes and contents have authors, and different persons have different roles. Gephi is not only software and plugins but also website, blog, Facebook community… A good part of people’s energy goes to producing contents. There is a Gephi project around the Gephi software, and it might become increasingly important.

As a conclusion to this section, lets us summarize what Gephi is and will remain:

  • Free
  • Open source
  • Extensible by the community
  • Multiplatform
  • Installable as a normal software
  • With local based files (no cloud hosting, works offline)
  • User centric
  • Focused on exploration
  • Beginner friendly (as much as possible)
  • Opinionated – it will not always do what other tools do.

Gephi’s future: version 1.0 and beyond

An important part of our discussion revolves around future features. It is not only about what Gephi should focus on, but also what we can do in today’s and tomorrow’s context. As explained above, we have a limited technical leadership and we are constrained by the evolution of Java and OpenGL. This leads us to consider which features can be considered in the current state of Gephi and which features would require a paradigm change. We are not only imagining future Gephi but also future future Gephi (what our project could be if we challenged a number underlying assumptions). We have two different horizons: Gephi 1.0, a focused version of today’s software, and Gephi 2, a possible future on a different ground.

For Gephi 2 we are anticipating that Java is not fully supporting our needs, and we are considering porting a part (and possibly all) of the software in a different platform. Current technological context incentivizes us to use a Java brain behind a web-based face, but WebGL is still a bottleneck for big networks. We have no good solution but it might emerge in time. We are also acknowledging the blooming of the network analysis ecosystem and we believe that a single software might not be the best solution to address a constellation of user needs. For instance if Gephi focuses more on exploration, it leaves room for a different tool about network publication. This tool might be a part of our project and not be the software itself. It might not sound dramatic but for us it is an decisive psychological step to think of the project as multiple tools and not just the Java software. It brings clarity to our intentions and opens new possibilities to address difficult problems.

Future features: fragments of road map

Gephi 1.0 can feature a number of changes that make sense as a natural extension of today’s Gephi, while the more dramatic changes are postponed to Gephi 2. We have no clear picture of what Gephi 2 might be, but its existence helps us select the right features for a close future. Here is a list of improvements we would like to implement before moving to a different paradigm.

  • UNDO feature, limited to the “GEXF scope”: network data, metadata, positions, sizes, colors…
  • Default save to GEXF. More stable than “.gephi” though it does not save the state of the user interface.
  • Activity log, possibly coordinated to undo, possibly stored in the GEXF. A plugin is already exploring that direction.
  • Parallel edges. The GraphStore supports it but not the rest of Gephi.
  • New OpenGL engine. Eduardo already prototyped an alpha version.
  • Curved edges in visual exploration. These are important because they help identifying edge orientation.
  • Quick search in nodes and metadata. It turns out it should be pretty easy to implement.
  • New icons. Many resources are now available to do better and the technical part is trivial.
  • Cleaner data laboratory
  • Embed Java: no more hassle with installing the right Java version.
  • Install from MacStore. Easier for Mac users.
  • Fix filter composition.
  • Better statistics reports in HTML5.
  • Revamp appearance, label color & size, sliders… For instance incentivize rankings as opposed to default unitary mode.
  • Label anchor (start, middle, end)… and possibly some jitter.
  • Better label adjust (one that works better). Possibly with label jitter.

In conclusion

Gephi is not obsolete, and we have a good hope to make its strengths more apparent by refocusing our development efforts towards version 1.0. As an additional outcome of our discussion, we now welcome Eduardo as our new lead developer, but more on that in a separate blog post. Thank you for your support and cheers from Berlin!

Eduardo Ramos Ibáñez, Mathieu Bastian, and Mathieu Jacomy

Gephi 0.9.2 : a new CSV importer

A new version of Gephi has been released! Thanks to Eduardo’s relentless issue fixing, Gephi’s overall stability has been improved. Eduardo is the author of the Data Laboratory, and at this occasion he revamped its CSV importer for a more flexible and straightforward user experience.

The new CSV/spreadsheet importer

Did you know that Gephi can export and import just the table of nodes or the table of edges? This feature is useful in many situations, for instance to produce charts in Excel or to clean data in Open Refine. Below we will showcase the new features and more generally explain how to import a spreadsheet as a list of nodes.

To import a spreadsheet you have to reach the Data Laboratory and click on Import Spreadsheet. In the example below a network is already loaded: we will decide later whether the imported nodes will be merged into the existing ones or not.

Tuto01

Gephi is now able to recognize the type of file you upload, and the support of Excel files has been added. Choosing the right separator is crucial since improperly separated columns would compromise the data. In the example below Gephi recognized that the separator is the Comma (as in a properly formatted CSV file).

Tuto02

The encoding of the file is a common issue, notably with languages using accents and special characters. Gephi can guess the encoding and you can manually edit it if necessary. In the example below Gephi correctly guessed the UTF-8 encoding.

Tuto04

Selecting a different encoding would produce errors. Fortunately the Preview table allows you to see them and fix the encoding. In the screenshot below, see how the wrong encoding produces exotic characters in the data.

Tuto03

When you validate these settings, Gephi now opens the exact same panel as when you open a new network. I personally love this addition since it brings more consistency to the user experience. It allows Gephi to provide a number of useful informations like the number of nodes detected or the issues found during the import process.

Tuto05bis

Do not miss an important feature here: in this panel you decide either to create a new workspace with the imported data or to merge the new nodes with the old ones. This very useful feature was already present at the opening of a new network, but many users still ignore it exists. Mind to select the Append option if you intend to merge the nodes. In that case when an imported node has the same Id than an already present one, the new node data will override the old one.

Tuto06

More info

Take a look at the full list of improvements there:
https://github.com/gephi/gephi/releases/tag/v0.9.2

How do I get this release?

  • If you have a recent Gephi, the update will be automatically proposed to you
  • If you have an older version (0.8 or before) you have to download and install manually
  • This update can be downloaded from http://gephi.org