Note: because it’s a bit long, you can also read it on Medium.
Despite years of collaboration, for the first time, Eduardo, Mathieu and I sit together at the same place. The Gephi community mainly exists online, and its members have few occasions to see each other in person. But we have to talk. Mathieu Bastian is Gephi’s lead developer and currently lives in Berlin. Eduardo Ramos Ibáñez is the second most prolific contributor after Mathieu and the only other one to know Gephi’s core in depth. He lives in Madrid. As for me who started our project, Mathieu Jacomy, I live in Paris. We just arrived in Berlin to have an in-depth talk about Gephi: state of the project, its relevance, its future. Our goal is to question the Gephi project and reevaluate our commitment to it. We need a picture of the different options. We start by the elephant in the room: is the project still worth it? Here is our answer.
What is wrong with the Gephi project
We aim at identifying the project’s strengths and weaknesses. It is not only about evaluating if its benefits counterbalance its issues, but also about finding the right course of actions. Let us start with the problems.
A common issue to niche open source projects, our most limited resource is technical leadership. What does it mean? It is a consequence of Gephi’s code being fairly complicated. Fortunately this is not an issue for all contributors, for instance it is pretty easy to implement a statistics plugin. Many parts of Gephi could be improved by plugin developers, but not all parts. Sometimes we need to modify architecture itself, or a deep and specific part like the GraphStore engine. When it comes up, only a few community members are competent. Namely Mathieu and Eduardo. Coding these parts would not require crazy skills, but a fair amount of Gephi-specific knowledge. Unfortunately that knowledge is imprisoned in the brains of two people (well it’s still better than one!). This is what we call the bottleneck of technical leadership. We may choose to fix core issues ourselves or disseminate the knowledge to other developers, but both scenarios require the precious time of Eduardo and/or Mathieu.
Technology is changing, we must adapt, and it wears out technical leadership. It is obvious to developers but not to users: we cannot just produce a version of Gephi that works well and let it be. It would stop working because technology changes. New versions of Java, new operating systems would break features that work well in today’s environment. Sadly when incompatibilities arise, it is generally for the core developers to deal with. We were in such situation before version 0.9, at a moment when the new GraphStore engine was not ready yet but the Java compatibility broke and during that time, Mac users were not able to use Gephi without a convoluted turnaround. We are not sure to be able to keep up efficiently with these changes because of our limited technical leadership.
Technology evolves in an unfavorable direction. User experience is at the center of the Gephi project. Unfortunately it seems that the Java language tends to drift away from user interface design and development. Admittedly, it has never been a strength of Java. This technology does not support modern UI design – I feel like Java assumes that the UI will be developed by an engineer rather than a designer. It may become even worse. With the obsolescence of OpenGL on Mac and the removal of JavaFX from the runtime environment, we could live in a world where multiplatform softwares have a Java brain and a web face. Gephi is based on the JOGL library whose development is increasingly uncertain, which forces us to consider alternatives like WebGL. We understand that it makes sense to delegate modern UI design to a well-established environment (HTML5 and friends). However WebGL is far from OpenGL stability and performance. We think that from the user stance, Gephi is a lot about forging one’s network exploration and analysis skills on small and easy cases, and scaling them up to larger, more complicated cases. Thanks to its OpenGL engine, it is able to work almost as well for networks of hundred thousands nodes than of tens of nodes. If the ability to visualize huge datasets is key to Gephi, then web technologies are not a viable alternative. We have no definite solution to this issue and we might be facing a technological dead-end in a not so distant future.
Gephi is not only about tech. As a projet it must also face the changes in the lives of its key contributors. Mathieu just had his first child, and more generally our careers follow their own paths that do not always align with the needs of the project. On the one hand we become more efficient at what we are doing, but on the other hand we have less and less time to dedicate to the project. In fact, we just have less spare time. We do not want Gephi to die but we are at risk of becoming tired of the burden it represents. We did not lose our desire for this unexpected journey, but reality often knocks on the door and it would be dishonest to omit this aspect of the situation.
Finally user needs are also changing. Users can access many other systems for network analysis and visualization. A market of web-based solutions emerged and each system found a niche to settle in. A landscape of network tools. Gephi is not necessary anymore, if it ever was. Complex networks were once the most fashionable trick of social science pioneers in a big data world, but now they have UMAP and deep neural networks. Complex networks entered a “business as usual” era. They ceased to draw the attention of the most creative minds. Complex networks had their moment, and it passed. We do not think that it is bad or sad, it might actually be a chance. Nevertheless the context has changed and it is possible that Gephi is not anymore what people need. So what do they need?
What is right with Gephi
We believe that Gephi actually still meets some needs, sometimes in its own unique way. Note that these ideas are not the outcome of a systematic study, but stem out of our empirical contacts with users, during workshops, online, or in our everyday lives. Eduardo, Mathieu and I were pretty convergent in our feelings.
First of all Gephi still has a public and it lies mostly in the sphere of education and research. The Facebook community is active and often features the visualization of digital data in a social science perspective, such as Twitter networks. Since it is the main place where to ask for help, it also attracts a certain amount of exotic tinkering and experimentation. The Gephi community is more than just about using the software, it is also a space where people share what they have done, discuss various topics, and get feedback. It has something of a subculture. We believe that Gephi has some appeal to curious minds, and that it helps a certain public getting engaged with network analysis. Following who mentions Gephi on Twitter also made us realize that “Gephi” is sometimes used as a label to refer to a visual exploration. This seems to be particularly the case in social network analysis (SNA), the community where Gephi spread the most. Since they emerged, digital humanities also made a wide use of Gephi. From what we observe Gephi tends to be more used in social science and by beginners, but it is nevertheless used in natural sciences and by advanced users like data scientists. We can measure its success in the research sector by its 3780 citations (counted in Google Scholar). This public probably finds something in Gephi that it does not find elsewhere, even if just that it is free. This fairly large amount of users is still a good reason to keep maintaining and developing it.
Gephi also has some specificities that could be lost with it in the unfortunate event that its development comes to an end. It has its niche and many users value it for what makes it special. We believe that this specificity comes in three parts. (1) It is a free software that you can install easily on multiple platform. This make it one of the few inexpensive options for teaching, workshops etc. (2) It approaches network analysis from a graphical and interactive perspective that is more intuitive than the math equations of graph theory. It can be understood by non experts such as students and data journalists or social science researchers reaching out of their core competencies. (3) It allows you to scale up your network analysis and exploration skills to much bigger networks. Its learning curve bridges small qualitative networks with large quantitative datasets. The effects of complexity and the way you explore data will be very different but the basic tools at your disposal will stay the same (layout, statistics, filtering…). Gephi is an all-around tool that allows beginners to understand the gist of network exploration. It is at its best in a pedagogical setting where people will leverage practice to improve their data analysis skills.
I want to mention that some of the things that make Gephi appealing are not, in our views, essential. We are well aware that Gephi allows to produce impressive images and that the sight of a spatialization layout unfolding a network have something fascinating. They certainly are an important factor in its success. They also play a role in user engagement with data, which is key to progressing in data science. However these attractive features only make sense insofar as they lead users to improving their network analysis skills. Though Gephi may be used to produce “data porn”, we believe it does not end there. Toying is just the first step towards the ability to get insights out of networks. Other devices might produce evocative visualizations, but Gephi is one of the few that actively leverage play to arouse interest for science (in the field of network analysis).
Where the Gephi project currently stands
Gephi is not the only software for network analysis and more importantly, it does not want to be. Depending on one’s style and skills, other options might be preferable. NetworkX might be more flexible if you know Python. To draw diagrams you should head for GraphViz. As a biologist, Cytoscape is the tool your community is using… and have you tried NodeXL? Different devices do different things and Gephi does not want to be all of that. In the past we have been tempted to build a generic tool for any kind of network, even the dreaded dynamic hierarchical mixed weighted graph. We now want to focus on what Gephi does best and articulate it with other tools that have specific benefits.
We think that Gephi’s niche is visual, interactive exploration of common types of networks with a set of features that are not too specific, and that scale to large number of nodes and edges. We have observed that most users tend to explore networks of multiple orders of magnitude: from 10 to 10K nodes, or from 100 to 10M nodes… We believe that it is a key feature. Conversely we do not believe that producing a static map is its main mission. Other tools are in a better position for that task, and we prioritize exploration features over graphic outputs. Instant visual feedback central to Gephi’s identity. What it is in the best position to do, is making things visible when users apply an algorithm to their network. Fostering this kind of awareness helps users reflect on their method, make sense of their activity, and streamline their workflow.
The Gephi Toolkit has lost most of its relevance. Graph processing libraries like NetworkX have matured and feature most if not all operations you can do in Gephi. The toolkit is basically a separate branch of the project that requires a certain amount of maintenance. It drains forces from the main project. Considering that Gephi’s source code is open and that it is possible to tinker experiments without the Toolkit, we believe that it would make sense to discontinue it – though we did not officially pull the trigger so far.
Refocusing Gephi is not only about removing parts, but also filling holes. For instance though we will deprecate hierarchical graphs because they are not so common, we consider supporting parallel edges, well represented in datasets. In the same spirit, because spatialization layouts are so central to user experience, we consider adding algorithms evaluating the quality of a layout and other features supporting visual network analysis. For instance we believe that edges visualization should be improved in the exploration panel. Last but not least, refocusing Gephi is also about reordering the general user interface to put emphasis on what is important and simplifying what is not. Reflexions about Gephi’s future user interface have already been presented in a previous blog post.
Finally it is worth talking about the project. We like that Gephi is opinionated, multiplatform, free, and open source. We do not want to change any of that. We will not go as far as writing a manifesto but we state here that Gephi is not a company, we do not want it to be company, and it will not become one. This does not mean that there can be no economic activity involving Gephi, but that when it happens it is not hosted by the project. So what is the Gephi project? An informal network of contributors that involves multiple individuals at various degrees, with no clear boundaries, and where anyone can bring their own thing to the project. However being free and open does not mean that we have no structure: the GPL 3 licence protects the project, codes and contents have authors, and different persons have different roles. Gephi is not only software and plugins but also website, blog, Facebook community… A good part of people’s energy goes to producing contents. There is a Gephi project around the Gephi software, and it might become increasingly important.
As a conclusion to this section, lets us summarize what Gephi is and will remain:
- Open source
- Extensible by the community
- Installable as a normal software
- With local based files (no cloud hosting, works offline)
- User centric
- Focused on exploration
- Beginner friendly (as much as possible)
- Opinionated – it will not always do what other tools do.
Gephi’s future: version 1.0 and beyond
An important part of our discussion revolves around future features. It is not only about what Gephi should focus on, but also what we can do in today’s and tomorrow’s context. As explained above, we have a limited technical leadership and we are constrained by the evolution of Java and OpenGL. This leads us to consider which features can be considered in the current state of Gephi and which features would require a paradigm change. We are not only imagining future Gephi but also future future Gephi (what our project could be if we challenged a number underlying assumptions). We have two different horizons: Gephi 1.0, a focused version of today’s software, and Gephi 2, a possible future on a different ground.
For Gephi 2 we are anticipating that Java is not fully supporting our needs, and we are considering porting a part (and possibly all) of the software in a different platform. Current technological context incentivizes us to use a Java brain behind a web-based face, but WebGL is still a bottleneck for big networks. We have no good solution but it might emerge in time. We are also acknowledging the blooming of the network analysis ecosystem and we believe that a single software might not be the best solution to address a constellation of user needs. For instance if Gephi focuses more on exploration, it leaves room for a different tool about network publication. This tool might be a part of our project and not be the software itself. It might not sound dramatic but for us it is an decisive psychological step to think of the project as multiple tools and not just the Java software. It brings clarity to our intentions and opens new possibilities to address difficult problems.
Future features: fragments of road map
Gephi 1.0 can feature a number of changes that make sense as a natural extension of today’s Gephi, while the more dramatic changes are postponed to Gephi 2. We have no clear picture of what Gephi 2 might be, but its existence helps us select the right features for a close future. Here is a list of improvements we would like to implement before moving to a different paradigm.
- UNDO feature, limited to the “GEXF scope”: network data, metadata, positions, sizes, colors…
- Default save to GEXF. More stable than “.gephi” though it does not save the state of the user interface.
- Activity log, possibly coordinated to undo, possibly stored in the GEXF. A plugin is already exploring that direction.
- Parallel edges. The GraphStore supports it but not the rest of Gephi.
- New OpenGL engine. Eduardo already prototyped an alpha version.
- Curved edges in visual exploration. These are important because they help identifying edge orientation.
- Quick search in nodes and metadata. It turns out it should be pretty easy to implement.
- New icons. Many resources are now available to do better and the technical part is trivial.
- Cleaner data laboratory
- Embed Java: no more hassle with installing the right Java version.
- Install from MacStore. Easier for Mac users.
- Fix filter composition.
- Better statistics reports in HTML5.
- Revamp appearance, label color & size, sliders… For instance incentivize rankings as opposed to default unitary mode.
- Label anchor (start, middle, end)… and possibly some jitter.
- Better label adjust (one that works better). Possibly with label jitter.
Gephi is not obsolete, and we have a good hope to make its strengths more apparent by refocusing our development efforts towards version 1.0. As an additional outcome of our discussion, we now welcome Eduardo as our new lead developer, but more on that in a separate blog post. Thank you for your support and cheers from Berlin!