Today at the 13th International AAAI Conference on Web and Social Media (ICWSM) the “Gephi paper”, published ten years ago in the same conference, obtained the Test of Time Award.
I (Mathieu Jacomy) attended the conference and received the award on our behalf. I had the occasion to say a few words, which I share here in a slightly redacted form.
Let me fix a misunderstanding, and pay my debt by acknowledging three persons.
This paper is the “Gephi paper”. If it is still cited 10 years later, it is not because its content is decisive. It is because researchers use Gephi. The paper is a proxy. I thank these researchers, their citations matter to us. And the people who get this award, really, are the Gephi contributors.
It also matters that Gephi has been made by software engineers, not computer scientists. Mathieu Bastian, 1st author, is CTO of a startup in Berlin. Sébastien Heymann, 2nd author, is CEO of his own startup. The award goes to us, but also secretly to Eduardo Ramos Ibáñez. He is not an author of the paper, but the current lead developer of Gephi, and his invisible work has been crucial to maintaining Gephi to this date. As for me, the designer of Gephi (to make it short), after 10 years as a research engineer I finally decided myself to get a PhD, in a techno-anthropology lab, and I enjoy the irony of receiving a test of time award during that time.
I think a few things deserve to be stated on the occasion of this award. Science is not only done by researchers, of course. Research engineers also do science, although sometimes indirectly. Some designers as well. And some researchers also do engineering or design. This non-strictly-academic work is not so visible. Having a proxy paper for Gephi, and getting this award, help to make this work visible.
The reasoning of the ICWSM committee was pretty much the same, which I greatly appreciate. On behalf of the Gephi team, we sincerely thank the academic community for its outstanding support.
PS: We dedicate this award to our beloved professor, Franck Ghitalla, who passed away in December 2018. We did not left the way to knowledge he showed us.
Eduardo has developed from scratch a new rendering engine that fixes our current OpenGL issues and improves the performance of Gephi. It has a lower CPU overhead, which provides a better scalability and better leverages the GPU power. It can be used as a library, and it crashes less thanks to its ability to fall back on supported features on older graphic cards. A key to these benefits is the shader-based architecture. Though the engine is still lacking some features (labels…) a demo is available on GitHub, (requires you to build).
We have made a small indicative benchmark comparing current Gephi engine to the new OpenGL engine and to a WebGL engine (Sigma.js v2 alpha). It turns out that the current Gephi engine is sensibly outperformed even by the WebGL engine, as you can see below!
As you can see below, all engines experience a performance drop around 10-100 thousands nodes or edges. The intensity of this drop varies, but it is pretty clear that after 10 million items, a normal computer cannot display a network smoothly enough to allow interactions (it lags too much). That being said, scaling up to hundred thousands nodes/edges is quite a lot already!
Ultimately, we believe that web technologies are the new multiplatform for graph visualization. It comes with very real challenges, but it is also a perfectly valid option. It does not mean that we will drop the Java Gephi, but that we are starting to think Gephi as a project hosting multiple tools and not only as a single piece of software, and that the web technologies will be part of its future.
Mathieu Bastian has been our lead developer for more than ten years. He is now the proud father of an adorable little girl, congratulations! 🍾🎈🎉 At this occasion he decided to step down from his leading role in Gephi development, and hand over the reins to Eduardo Ramos Ibáñez.
Mathieu has been the true architect of Gephi’s source code. Not only is he its most prolific author, but also the engineer reflecting on its structure, drawing its blueprints, and building the foundations. He transformed my clunky 2007 prototype into an actual software over half a dozen complete refactorings, never drawing back from facing challenges. We owe him everything that makes it work: the ability to be installed, to be maintained, to have plugins… There would be no Gephi without him. Today is a good occasion to write thank you Mathieu for your years of service to the project! Fortunately he will stay an active member of the community – we would be lost without his invaluable knowledge on the most intricate depths of Gephi’s source code.
So who is Eduardo? Let him present himself:
I am a spanish software developer, currently living in Madrid and I have been helping to maintain Gephi for several years. I love to create interesting software and trying to push its limits, specially data visualization!
I am kind of a progressive music fan, and a cat lover 🙂
You can follow or contact me on twitter @eduramiba
Eduardo is the person who knows best Gephi’s source code after Mathieu, and it is only natural that he is the next in line to lead development efforts. You already know his work since he almost entirely developed the data laboratory, but as often an important part of his contribution is not so visible – maintaining the source code, fixing this bug… This is how he became an expert of Gephi development over the years. He is now developing a new OpenGL engine for Gephi 1.0. Welcome Eduardo, and thank you for stepping up to this new role!
If you want to know more about the situation and future of Gephi, we wrote about it in this blog post.
Despite years of collaboration, for the first time, Eduardo, Mathieu and I sit together at the same place. The Gephi community mainly exists online, and its members have few occasions to see each other in person. But we have to talk. Mathieu Bastian is Gephi’s lead developer and currently lives in Berlin. Eduardo Ramos Ibáñez is the second most prolific contributor after Mathieu and the only other one to know Gephi’s core in depth. He lives in Madrid. As for me who started our project, Mathieu Jacomy, I live in Paris. We just arrived in Berlin to have an in-depth talk about Gephi: state of the project, its relevance, its future. Our goal is to question the Gephi project and reevaluate our commitment to it. We need a picture of the different options. We start by the elephant in the room: is the project still worth it? Here is our answer.
What is wrong with the Gephi project
We aim at identifying the project’s strengths and weaknesses. It is not only about evaluating if its benefits counterbalance its issues, but also about finding the right course of actions. Let us start with the problems.
A common issue to niche open source projects, our most limited resource is technical leadership. What does it mean? It is a consequence of Gephi’s code being fairly complicated. Fortunately this is not an issue for all contributors, for instance it is pretty easy to implement a statistics plugin. Many parts of Gephi could be improved by plugin developers, but not all parts. Sometimes we need to modify architecture itself, or a deep and specific part like the GraphStore engine. When it comes up, only a few community members are competent. Namely Mathieu and Eduardo. Coding these parts would not require crazy skills, but a fair amount of Gephi-specific knowledge. Unfortunately that knowledge is imprisoned in the brains of two people (well it’s still better than one!). This is what we call the bottleneck of technical leadership. We may choose to fix core issues ourselves or disseminate the knowledge to other developers, but both scenarios require the precious time of Eduardo and/or Mathieu.
Technology is changing, we must adapt, and it wears out technical leadership. It is obvious to developers but not to users: we cannot just produce a version of Gephi that works well and let it be. It would stop working because technology changes. New versions of Java, new operating systems would break features that work well in today’s environment. Sadly when incompatibilities arise, it is generally for the core developers to deal with. We were in such situation before version 0.9, at a moment when the new GraphStore engine was not ready yet but the Java compatibility broke and during that time, Mac users were not able to use Gephi without a convoluted turnaround. We are not sure to be able to keep up efficiently with these changes because of our limited technical leadership.
Technology evolves in an unfavorable direction. User experience is at the center of the Gephi project. Unfortunately it seems that the Java language tends to drift away from user interface design and development. Admittedly, it has never been a strength of Java. This technology does not support modern UI design – I feel like Java assumes that the UI will be developed by an engineer rather than a designer. It may become even worse. With the obsolescence of OpenGL on Mac and the removal of JavaFX from the runtime environment, we could live in a world where multiplatform softwares have a Java brain and a web face. Gephi is based on the JOGL library whose development is increasingly uncertain, which forces us to consider alternatives like WebGL. We understand that it makes sense to delegate modern UI design to a well-established environment (HTML5 and friends). However WebGL is far from OpenGL stability and performance. We think that from the user stance, Gephi is a lot about forging one’s network exploration and analysis skills on small and easy cases, and scaling them up to larger, more complicated cases. Thanks to its OpenGL engine, it is able to work almost as well for networks of hundred thousands nodes than of tens of nodes. If the ability to visualize huge datasets is key to Gephi, then web technologies are not a viable alternative. We have no definite solution to this issue and we might be facing a technological dead-end in a not so distant future.
Gephi is not only about tech. As a projet it must also face the changes in the lives of its key contributors. Mathieu just had his first child, and more generally our careers follow their own paths that do not always align with the needs of the project. On the one hand we become more efficient at what we are doing, but on the other hand we have less and less time to dedicate to the project. In fact, we just have less spare time. We do not want Gephi to die but we are at risk of becoming tired of the burden it represents. We did not lose our desire for this unexpected journey, but reality often knocks on the door and it would be dishonest to omit this aspect of the situation.
Finally user needs are also changing. Users can access many other systems for network analysis and visualization. A market of web-based solutions emerged and each system found a niche to settle in. A landscape of network tools. Gephi is not necessary anymore, if it ever was. Complex networks were once the most fashionable trick of social science pioneers in a big data world, but now they have UMAP and deep neural networks. Complex networks entered a “business as usual” era. They ceased to draw the attention of the most creative minds. Complex networks had their moment, and it passed. We do not think that it is bad or sad, it might actually be a chance. Nevertheless the context has changed and it is possible that Gephi is not anymore what people need. So what do they need?
What is right with Gephi
We believe that Gephi actually still meets some needs, sometimes in its own unique way. Note that these ideas are not the outcome of a systematic study, but stem out of our empirical contacts with users, during workshops, online, or in our everyday lives. Eduardo, Mathieu and I were pretty convergent in our feelings.
First of all Gephi still has a public and it lies mostly in the sphere of education and research. The Facebook community is active and often features the visualization of digital data in a social science perspective, such as Twitter networks. Since it is the main place where to ask for help, it also attracts a certain amount of exotic tinkering and experimentation. The Gephi community is more than just about using the software, it is also a space where people share what they have done, discuss various topics, and get feedback. It has something of a subculture. We believe that Gephi has some appeal to curious minds, and that it helps a certain public getting engaged with network analysis. Following who mentions Gephi on Twitter also made us realize that “Gephi” is sometimes used as a label to refer to a visual exploration. This seems to be particularly the case in social network analysis (SNA), the community where Gephi spread the most. Since they emerged, digital humanities also made a wide use of Gephi. From what we observe Gephi tends to be more used in social science and by beginners, but it is nevertheless used in natural sciences and by advanced users like data scientists. We can measure its success in the research sector by its 3780 citations (counted in Google Scholar). This public probably finds something in Gephi that it does not find elsewhere, even if just that it is free. This fairly large amount of users is still a good reason to keep maintaining and developing it.
Gephi also has some specificities that could be lost with it in the unfortunate event that its development comes to an end. It has its niche and many users value it for what makes it special. We believe that this specificity comes in three parts. (1) It is a free software that you can install easily on multiple platform. This make it one of the few inexpensive options for teaching, workshops etc. (2) It approaches network analysis from a graphical and interactive perspective that is more intuitive than the math equations of graph theory. It can be understood by non experts such as students and data journalists or social science researchers reaching out of their core competencies. (3) It allows you to scale up your network analysis and exploration skills to much bigger networks. Its learning curve bridges small qualitative networks with large quantitative datasets. The effects of complexity and the way you explore data will be very different but the basic tools at your disposal will stay the same (layout, statistics, filtering…). Gephi is an all-around tool that allows beginners to understand the gist of network exploration. It is at its best in a pedagogical setting where people will leverage practice to improve their data analysis skills.
I want to mention that some of the things that make Gephi appealing are not, in our views, essential. We are well aware that Gephi allows to produce impressive images and that the sight of a spatialization layout unfolding a network have something fascinating. They certainly are an important factor in its success. They also play a role in user engagement with data, which is key to progressing in data science. However these attractive features only make sense insofar as they lead users to improving their network analysis skills. Though Gephi may be used to produce “data porn”, we believe it does not end there. Toying is just the first step towards the ability to get insights out of networks. Other devices might produce evocative visualizations, but Gephi is one of the few that actively leverage play to arouse interest for science (in the field of network analysis).
Where the Gephi project currently stands
Gephi is not the only software for network analysis and more importantly, it does not want to be. Depending on one’s style and skills, other options might be preferable. NetworkX might be more flexible if you know Python. To draw diagrams you should head for GraphViz. As a biologist, Cytoscape is the tool your community is using… and have you tried NodeXL? Different devices do different things and Gephi does not want to be all of that. In the past we have been tempted to build a generic tool for any kind of network, even the dreaded dynamic hierarchical mixed weighted graph. We now want to focus on what Gephi does best and articulate it with other tools that have specific benefits.
We think that Gephi’s niche is visual, interactive exploration of common types of networks with a set of features that are not too specific, and that scale to large number of nodes and edges. We have observed that most users tend to explore networks of multiple orders of magnitude: from 10 to 10K nodes, or from 100 to 10M nodes… We believe that it is a key feature. Conversely we do not believe that producing a static map is its main mission. Other tools are in a better position for that task, and we prioritize exploration features over graphic outputs. Instant visual feedback central to Gephi’s identity. What it is in the best position to do, is making things visible when users apply an algorithm to their network. Fostering this kind of awareness helps users reflect on their method, make sense of their activity, and streamline their workflow.
The Gephi Toolkit has lost most of its relevance. Graph processing libraries like NetworkX have matured and feature most if not all operations you can do in Gephi. The toolkit is basically a separate branch of the project that requires a certain amount of maintenance. It drains forces from the main project. Considering that Gephi’s source code is open and that it is possible to tinker experiments without the Toolkit, we believe that it would make sense to discontinue it – though we did not officially pull the trigger so far.
Refocusing Gephi is not only about removing parts, but also filling holes. For instance though we will deprecate hierarchical graphs because they are not so common, we consider supporting parallel edges, well represented in datasets. In the same spirit, because spatialization layouts are so central to user experience, we consider adding algorithms evaluating the quality of a layout and other features supporting visual network analysis. For instance we believe that edges visualization should be improved in the exploration panel. Last but not least, refocusing Gephi is also about reordering the general user interface to put emphasis on what is important and simplifying what is not. Reflexions about Gephi’s future user interface have already been presented in a previous blog post.
Finally it is worth talking about the project. We like that Gephi is opinionated, multiplatform, free, and open source. We do not want to change any of that. We will not go as far as writing a manifesto but we state here that Gephi is not a company, we do not want it to be company, and it will not become one. This does not mean that there can be no economic activity involving Gephi, but that when it happens it is not hosted by the project. So what is the Gephi project? An informal network of contributors that involves multiple individuals at various degrees, with no clear boundaries, and where anyone can bring their own thing to the project. However being free and open does not mean that we have no structure: the GPL 3 licence protects the project, codes and contents have authors, and different persons have different roles. Gephi is not only software and plugins but also website, blog, Facebook community… A good part of people’s energy goes to producing contents. There is a Gephi project around the Gephi software, and it might become increasingly important.
As a conclusion to this section, lets us summarize what Gephi is and will remain:
Extensible by the community
Installable as a normal software
With local based files (no cloud hosting, works offline)
Focused on exploration
Beginner friendly (as much as possible)
Opinionated – it will not always do what other tools do.
Gephi’s future: version 1.0 and beyond
An important part of our discussion revolves around future features. It is not only about what Gephi should focus on, but also what we can do in today’s and tomorrow’s context. As explained above, we have a limited technical leadership and we are constrained by the evolution of Java and OpenGL. This leads us to consider which features can be considered in the current state of Gephi and which features would require a paradigm change. We are not only imagining future Gephi but also future future Gephi (what our project could be if we challenged a number underlying assumptions). We have two different horizons: Gephi 1.0, a focused version of today’s software, and Gephi 2, a possible future on a different ground.
For Gephi 2 we are anticipating that Java is not fully supporting our needs, and we are considering porting a part (and possibly all) of the software in a different platform. Current technological context incentivizes us to use a Java brain behind a web-based face, but WebGL is still a bottleneck for big networks. We have no good solution but it might emerge in time. We are also acknowledging the blooming of the network analysis ecosystem and we believe that a single software might not be the best solution to address a constellation of user needs. For instance if Gephi focuses more on exploration, it leaves room for a different tool about network publication. This tool might be a part of our project and not be the software itself. It might not sound dramatic but for us it is an decisive psychological step to think of the project as multiple tools and not just the Java software. It brings clarity to our intentions and opens new possibilities to address difficult problems.
Future features: fragments of road map
Gephi 1.0 can feature a number of changes that make sense as a natural extension of today’s Gephi, while the more dramatic changes are postponed to Gephi 2. We have no clear picture of what Gephi 2 might be, but its existence helps us select the right features for a close future. Here is a list of improvements we would like to implement before moving to a different paradigm.
UNDO feature, limited to the “GEXF scope”: network data, metadata, positions, sizes, colors…
Default save to GEXF. More stable than “.gephi” though it does not save the state of the user interface.
Activity log, possibly coordinated to undo, possibly stored in the GEXF. A plugin is already exploring that direction.
Parallel edges. The GraphStore supports it but not the rest of Gephi.
New OpenGL engine. Eduardo already prototyped an alpha version.
Curved edges in visual exploration. These are important because they help identifying edge orientation.
Quick search in nodes and metadata. It turns out it should be pretty easy to implement.
New icons. Many resources are now available to do better and the technical part is trivial.
Cleaner data laboratory
Embed Java: no more hassle with installing the right Java version.
Install from MacStore. Easier for Mac users.
Fix filter composition.
Better statistics reports in HTML5.
Revamp appearance, label color & size, sliders… For instance incentivize rankings as opposed to default unitary mode.
Label anchor (start, middle, end)… and possibly some jitter.
Better label adjust (one that works better). Possibly with label jitter.
Gephi is not obsolete, and we have a good hope to make its strengths more apparent by refocusing our development efforts towards version 1.0. As an additional outcome of our discussion, we now welcome Eduardo as our new lead developer, but more on that in a separate blog post. Thank you for your support and cheers from Berlin!