Today at the 13th International AAAI Conference on Web and Social Media (ICWSM) the “Gephi paper”, published ten years ago in the same conference, obtained the Test of Time Award.
I (Mathieu Jacomy) attended the conference and received the award on our behalf. I had the occasion to say a few words, which I share here in a slightly redacted form.
Let me fix a misunderstanding, and pay my debt by acknowledging three persons.
This paper is the “Gephi paper”. If it is still cited 10 years later, it is not because its content is decisive. It is because researchers use Gephi. The paper is a proxy. I thank these researchers, their citations matter to us. And the people who get this award, really, are the Gephi contributors.
It also matters that Gephi has been made by software engineers, not computer scientists. Mathieu Bastian, 1st author, is CTO of a startup in Berlin. Sébastien Heymann, 2nd author, is CEO of his own startup. The award goes to us, but also secretly to Eduardo Ramos Ibáñez. He is not an author of the paper, but the current lead developer of Gephi, and his invisible work has been crucial to maintaining Gephi to this date. As for me, the designer of Gephi (to make it short), after 10 years as a research engineer I finally decided myself to get a PhD, in a techno-anthropology lab, and I enjoy the irony of receiving a test of time award during that time.
I think a few things deserve to be stated on the occasion of this award. Science is not only done by researchers, of course. Research engineers also do science, although sometimes indirectly. Some designers as well. And some researchers also do engineering or design. This non-strictly-academic work is not so visible. Having a proxy paper for Gephi, and getting this award, help to make this work visible.
The reasoning of the ICWSM committee was pretty much the same, which I greatly appreciate. On behalf of the Gephi team, we sincerely thank the academic community for its outstanding support.
PS: We dedicate this award to our beloved professor, Franck Ghitalla, who passed away in December 2018. We did not left the way to knowledge he showed us.
Eduardo has developed from scratch a new rendering engine that fixes our current OpenGL issues and improves the performance of Gephi. It has a lower CPU overhead, which provides a better scalability and better leverages the GPU power. It can be used as a library, and it crashes less thanks to its ability to fall back on supported features on older graphic cards. A key to these benefits is the shader-based architecture. Though the engine is still lacking some features (labels…) a demo is available on GitHub, (requires you to build).
We have made a small indicative benchmark comparing current Gephi engine to the new OpenGL engine and to a WebGL engine (Sigma.js v2 alpha). It turns out that the current Gephi engine is sensibly outperformed even by the WebGL engine, as you can see below!
As you can see below, all engines experience a performance drop around 10-100 thousands nodes or edges. The intensity of this drop varies, but it is pretty clear that after 10 million items, a normal computer cannot display a network smoothly enough to allow interactions (it lags too much). That being said, scaling up to hundred thousands nodes/edges is quite a lot already!
Ultimately, we believe that web technologies are the new multiplatform for graph visualization. It comes with very real challenges, but it is also a perfectly valid option. It does not mean that we will drop the Java Gephi, but that we are starting to think Gephi as a project hosting multiple tools and not only as a single piece of software, and that the web technologies will be part of its future.
Mathieu Bastian has been our lead developer for more than ten years. He is now the proud father of an adorable little girl, congratulations! 🍾🎈🎉 At this occasion he decided to step down from his leading role in Gephi development, and hand over the reins to Eduardo Ramos Ibáñez.
Mathieu has been the true architect of Gephi’s source code. Not only is he its most prolific author, but also the engineer reflecting on its structure, drawing its blueprints, and building the foundations. He transformed my clunky 2007 prototype into an actual software over half a dozen complete refactorings, never drawing back from facing challenges. We owe him everything that makes it work: the ability to be installed, to be maintained, to have plugins… There would be no Gephi without him. Today is a good occasion to write thank you Mathieu for your years of service to the project! Fortunately he will stay an active member of the community – we would be lost without his invaluable knowledge on the most intricate depths of Gephi’s source code.
So who is Eduardo? Let him present himself:
I am a spanish software developer, currently living in Madrid and I have been helping to maintain Gephi for several years. I love to create interesting software and trying to push its limits, specially data visualization!
I am kind of a progressive music fan, and a cat lover 🙂
You can follow or contact me on twitter @eduramiba
Eduardo is the person who knows best Gephi’s source code after Mathieu, and it is only natural that he is the next in line to lead development efforts. You already know his work since he almost entirely developed the data laboratory, but as often an important part of his contribution is not so visible – maintaining the source code, fixing this bug… This is how he became an expert of Gephi development over the years. He is now developing a new OpenGL engine for Gephi 1.0. Welcome Eduardo, and thank you for stepping up to this new role!
If you want to know more about the situation and future of Gephi, we wrote about it in this blog post.
Despite years of collaboration, for the first time, Eduardo, Mathieu and I sit together at the same place. The Gephi community mainly exists online, and its members have few occasions to see each other in person. But we have to talk. Mathieu Bastian is Gephi’s lead developer and currently lives in Berlin. Eduardo Ramos Ibáñez is the second most prolific contributor after Mathieu and the only other one to know Gephi’s core in depth. He lives in Madrid. As for me who started our project, Mathieu Jacomy, I live in Paris. We just arrived in Berlin to have an in-depth talk about Gephi: state of the project, its relevance, its future. Our goal is to question the Gephi project and reevaluate our commitment to it. We need a picture of the different options. We start by the elephant in the room: is the project still worth it? Here is our answer.
What is wrong with the Gephi project
We aim at identifying the project’s strengths and weaknesses. It is not only about evaluating if its benefits counterbalance its issues, but also about finding the right course of actions. Let us start with the problems.
A common issue to niche open source projects, our most limited resource is technical leadership. What does it mean? It is a consequence of Gephi’s code being fairly complicated. Fortunately this is not an issue for all contributors, for instance it is pretty easy to implement a statistics plugin. Many parts of Gephi could be improved by plugin developers, but not all parts. Sometimes we need to modify architecture itself, or a deep and specific part like the GraphStore engine. When it comes up, only a few community members are competent. Namely Mathieu and Eduardo. Coding these parts would not require crazy skills, but a fair amount of Gephi-specific knowledge. Unfortunately that knowledge is imprisoned in the brains of two people (well it’s still better than one!). This is what we call the bottleneck of technical leadership. We may choose to fix core issues ourselves or disseminate the knowledge to other developers, but both scenarios require the precious time of Eduardo and/or Mathieu.
Technology is changing, we must adapt, and it wears out technical leadership. It is obvious to developers but not to users: we cannot just produce a version of Gephi that works well and let it be. It would stop working because technology changes. New versions of Java, new operating systems would break features that work well in today’s environment. Sadly when incompatibilities arise, it is generally for the core developers to deal with. We were in such situation before version 0.9, at a moment when the new GraphStore engine was not ready yet but the Java compatibility broke and during that time, Mac users were not able to use Gephi without a convoluted turnaround. We are not sure to be able to keep up efficiently with these changes because of our limited technical leadership.
Technology evolves in an unfavorable direction. User experience is at the center of the Gephi project. Unfortunately it seems that the Java language tends to drift away from user interface design and development. Admittedly, it has never been a strength of Java. This technology does not support modern UI design – I feel like Java assumes that the UI will be developed by an engineer rather than a designer. It may become even worse. With the obsolescence of OpenGL on Mac and the removal of JavaFX from the runtime environment, we could live in a world where multiplatform softwares have a Java brain and a web face. Gephi is based on the JOGL library whose development is increasingly uncertain, which forces us to consider alternatives like WebGL. We understand that it makes sense to delegate modern UI design to a well-established environment (HTML5 and friends). However WebGL is far from OpenGL stability and performance. We think that from the user stance, Gephi is a lot about forging one’s network exploration and analysis skills on small and easy cases, and scaling them up to larger, more complicated cases. Thanks to its OpenGL engine, it is able to work almost as well for networks of hundred thousands nodes than of tens of nodes. If the ability to visualize huge datasets is key to Gephi, then web technologies are not a viable alternative. We have no definite solution to this issue and we might be facing a technological dead-end in a not so distant future.
Gephi is not only about tech. As a projet it must also face the changes in the lives of its key contributors. Mathieu just had his first child, and more generally our careers follow their own paths that do not always align with the needs of the project. On the one hand we become more efficient at what we are doing, but on the other hand we have less and less time to dedicate to the project. In fact, we just have less spare time. We do not want Gephi to die but we are at risk of becoming tired of the burden it represents. We did not lose our desire for this unexpected journey, but reality often knocks on the door and it would be dishonest to omit this aspect of the situation.
Finally user needs are also changing. Users can access many other systems for network analysis and visualization. A market of web-based solutions emerged and each system found a niche to settle in. A landscape of network tools. Gephi is not necessary anymore, if it ever was. Complex networks were once the most fashionable trick of social science pioneers in a big data world, but now they have UMAP and deep neural networks. Complex networks entered a “business as usual” era. They ceased to draw the attention of the most creative minds. Complex networks had their moment, and it passed. We do not think that it is bad or sad, it might actually be a chance. Nevertheless the context has changed and it is possible that Gephi is not anymore what people need. So what do they need?
What is right with Gephi
We believe that Gephi actually still meets some needs, sometimes in its own unique way. Note that these ideas are not the outcome of a systematic study, but stem out of our empirical contacts with users, during workshops, online, or in our everyday lives. Eduardo, Mathieu and I were pretty convergent in our feelings.
First of all Gephi still has a public and it lies mostly in the sphere of education and research. The Facebook community is active and often features the visualization of digital data in a social science perspective, such as Twitter networks. Since it is the main place where to ask for help, it also attracts a certain amount of exotic tinkering and experimentation. The Gephi community is more than just about using the software, it is also a space where people share what they have done, discuss various topics, and get feedback. It has something of a subculture. We believe that Gephi has some appeal to curious minds, and that it helps a certain public getting engaged with network analysis. Following who mentions Gephi on Twitter also made us realize that “Gephi” is sometimes used as a label to refer to a visual exploration. This seems to be particularly the case in social network analysis (SNA), the community where Gephi spread the most. Since they emerged, digital humanities also made a wide use of Gephi. From what we observe Gephi tends to be more used in social science and by beginners, but it is nevertheless used in natural sciences and by advanced users like data scientists. We can measure its success in the research sector by its 3780 citations (counted in Google Scholar). This public probably finds something in Gephi that it does not find elsewhere, even if just that it is free. This fairly large amount of users is still a good reason to keep maintaining and developing it.
Gephi also has some specificities that could be lost with it in the unfortunate event that its development comes to an end. It has its niche and many users value it for what makes it special. We believe that this specificity comes in three parts. (1) It is a free software that you can install easily on multiple platform. This make it one of the few inexpensive options for teaching, workshops etc. (2) It approaches network analysis from a graphical and interactive perspective that is more intuitive than the math equations of graph theory. It can be understood by non experts such as students and data journalists or social science researchers reaching out of their core competencies. (3) It allows you to scale up your network analysis and exploration skills to much bigger networks. Its learning curve bridges small qualitative networks with large quantitative datasets. The effects of complexity and the way you explore data will be very different but the basic tools at your disposal will stay the same (layout, statistics, filtering…). Gephi is an all-around tool that allows beginners to understand the gist of network exploration. It is at its best in a pedagogical setting where people will leverage practice to improve their data analysis skills.
I want to mention that some of the things that make Gephi appealing are not, in our views, essential. We are well aware that Gephi allows to produce impressive images and that the sight of a spatialization layout unfolding a network have something fascinating. They certainly are an important factor in its success. They also play a role in user engagement with data, which is key to progressing in data science. However these attractive features only make sense insofar as they lead users to improving their network analysis skills. Though Gephi may be used to produce “data porn”, we believe it does not end there. Toying is just the first step towards the ability to get insights out of networks. Other devices might produce evocative visualizations, but Gephi is one of the few that actively leverage play to arouse interest for science (in the field of network analysis).
Where the Gephi project currently stands
Gephi is not the only software for network analysis and more importantly, it does not want to be. Depending on one’s style and skills, other options might be preferable. NetworkX might be more flexible if you know Python. To draw diagrams you should head for GraphViz. As a biologist, Cytoscape is the tool your community is using… and have you tried NodeXL? Different devices do different things and Gephi does not want to be all of that. In the past we have been tempted to build a generic tool for any kind of network, even the dreaded dynamic hierarchical mixed weighted graph. We now want to focus on what Gephi does best and articulate it with other tools that have specific benefits.
We think that Gephi’s niche is visual, interactive exploration of common types of networks with a set of features that are not too specific, and that scale to large number of nodes and edges. We have observed that most users tend to explore networks of multiple orders of magnitude: from 10 to 10K nodes, or from 100 to 10M nodes… We believe that it is a key feature. Conversely we do not believe that producing a static map is its main mission. Other tools are in a better position for that task, and we prioritize exploration features over graphic outputs. Instant visual feedback central to Gephi’s identity. What it is in the best position to do, is making things visible when users apply an algorithm to their network. Fostering this kind of awareness helps users reflect on their method, make sense of their activity, and streamline their workflow.
The Gephi Toolkit has lost most of its relevance. Graph processing libraries like NetworkX have matured and feature most if not all operations you can do in Gephi. The toolkit is basically a separate branch of the project that requires a certain amount of maintenance. It drains forces from the main project. Considering that Gephi’s source code is open and that it is possible to tinker experiments without the Toolkit, we believe that it would make sense to discontinue it – though we did not officially pull the trigger so far.
Refocusing Gephi is not only about removing parts, but also filling holes. For instance though we will deprecate hierarchical graphs because they are not so common, we consider supporting parallel edges, well represented in datasets. In the same spirit, because spatialization layouts are so central to user experience, we consider adding algorithms evaluating the quality of a layout and other features supporting visual network analysis. For instance we believe that edges visualization should be improved in the exploration panel. Last but not least, refocusing Gephi is also about reordering the general user interface to put emphasis on what is important and simplifying what is not. Reflexions about Gephi’s future user interface have already been presented in a previous blog post.
Finally it is worth talking about the project. We like that Gephi is opinionated, multiplatform, free, and open source. We do not want to change any of that. We will not go as far as writing a manifesto but we state here that Gephi is not a company, we do not want it to be company, and it will not become one. This does not mean that there can be no economic activity involving Gephi, but that when it happens it is not hosted by the project. So what is the Gephi project? An informal network of contributors that involves multiple individuals at various degrees, with no clear boundaries, and where anyone can bring their own thing to the project. However being free and open does not mean that we have no structure: the GPL 3 licence protects the project, codes and contents have authors, and different persons have different roles. Gephi is not only software and plugins but also website, blog, Facebook community… A good part of people’s energy goes to producing contents. There is a Gephi project around the Gephi software, and it might become increasingly important.
As a conclusion to this section, lets us summarize what Gephi is and will remain:
Extensible by the community
Installable as a normal software
With local based files (no cloud hosting, works offline)
Focused on exploration
Beginner friendly (as much as possible)
Opinionated – it will not always do what other tools do.
Gephi’s future: version 1.0 and beyond
An important part of our discussion revolves around future features. It is not only about what Gephi should focus on, but also what we can do in today’s and tomorrow’s context. As explained above, we have a limited technical leadership and we are constrained by the evolution of Java and OpenGL. This leads us to consider which features can be considered in the current state of Gephi and which features would require a paradigm change. We are not only imagining future Gephi but also future future Gephi (what our project could be if we challenged a number underlying assumptions). We have two different horizons: Gephi 1.0, a focused version of today’s software, and Gephi 2, a possible future on a different ground.
For Gephi 2 we are anticipating that Java is not fully supporting our needs, and we are considering porting a part (and possibly all) of the software in a different platform. Current technological context incentivizes us to use a Java brain behind a web-based face, but WebGL is still a bottleneck for big networks. We have no good solution but it might emerge in time. We are also acknowledging the blooming of the network analysis ecosystem and we believe that a single software might not be the best solution to address a constellation of user needs. For instance if Gephi focuses more on exploration, it leaves room for a different tool about network publication. This tool might be a part of our project and not be the software itself. It might not sound dramatic but for us it is an decisive psychological step to think of the project as multiple tools and not just the Java software. It brings clarity to our intentions and opens new possibilities to address difficult problems.
Future features: fragments of road map
Gephi 1.0 can feature a number of changes that make sense as a natural extension of today’s Gephi, while the more dramatic changes are postponed to Gephi 2. We have no clear picture of what Gephi 2 might be, but its existence helps us select the right features for a close future. Here is a list of improvements we would like to implement before moving to a different paradigm.
UNDO feature, limited to the “GEXF scope”: network data, metadata, positions, sizes, colors…
Default save to GEXF. More stable than “.gephi” though it does not save the state of the user interface.
Activity log, possibly coordinated to undo, possibly stored in the GEXF. A plugin is already exploring that direction.
Parallel edges. The GraphStore supports it but not the rest of Gephi.
New OpenGL engine. Eduardo already prototyped an alpha version.
Curved edges in visual exploration. These are important because they help identifying edge orientation.
Quick search in nodes and metadata. It turns out it should be pretty easy to implement.
New icons. Many resources are now available to do better and the technical part is trivial.
Cleaner data laboratory
Embed Java: no more hassle with installing the right Java version.
Install from MacStore. Easier for Mac users.
Fix filter composition.
Better statistics reports in HTML5.
Revamp appearance, label color & size, sliders… For instance incentivize rankings as opposed to default unitary mode.
Label anchor (start, middle, end)… and possibly some jitter.
Better label adjust (one that works better). Possibly with label jitter.
Gephi is not obsolete, and we have a good hope to make its strengths more apparent by refocusing our development efforts towards version 1.0. As an additional outcome of our discussion, we now welcome Eduardo as our new lead developer, but more on that in a separate blog post. Thank you for your support and cheers from Berlin!
A new version of Gephi has been released! Thanks to Eduardo’s relentless issue fixing, Gephi’s overall stability has been improved. Eduardo is the author of the Data Laboratory, and at this occasion he revamped its CSV importer for a more flexible and straightforward user experience.
The new CSV/spreadsheet importer
Did you know that Gephi can export and import just the table of nodes or the table of edges? This feature is useful in many situations, for instance to produce charts in Excel or to clean data in Open Refine. Below we will showcase the new features and more generally explain how to import a spreadsheet as a list of nodes.
To import a spreadsheet you have to reach the Data Laboratory and click on Import Spreadsheet. In the example below a network is already loaded: we will decide later whether the imported nodes will be merged into the existing ones or not.
Gephi is now able to recognize the type of file you upload, and the support of Excel files has been added. Choosing the right separator is crucial since improperly separated columns would compromise the data. In the example below Gephi recognized that the separator is the Comma (as in a properly formatted CSV file).
The encoding of the file is a common issue, notably with languages using accents and special characters. Gephi can guess the encoding and you can manually edit it if necessary. In the example below Gephi correctly guessed the UTF-8 encoding.
Selecting a different encoding would produce errors. Fortunately the Preview table allows you to see them and fix the encoding. In the screenshot below, see how the wrong encoding produces exotic characters in the data.
When you validate these settings, Gephi now opens the exact same panel as when you open a new network. I personally love this addition since it brings more consistency to the user experience. It allows Gephi to provide a number of useful informations like the number of nodes detected or the issues found during the import process.
Do not miss an important feature here: in this panel you decide either to create a new workspace with the imported data or to merge the new nodes with the old ones. This very useful feature was already present at the opening of a new network, but many users still ignore it exists. Mind to select the Append option if you intend to merge the nodes. In that case when an imported node has the same Id than an already present one, the new node data will override the old one.
A new Gephi version has been released and can be downloaded from gephi.org. This version is an update from the 0.9.0 version released last December and mostly addresses issues discovered since.
One notable improvement is a new localization: German! Gephi is now localized in nine languages (English, French, Spanish, Japanese, Brazilian Portuguese, Russian, Chinese, Czech and German) and we hope to continue the momentum on this effort in the future.
Other notable improvements include a better support for parallel edges, appending to existing workspaces and how filters are saved in .gephi files. More than 60 bugs were fixed with a majority of them reported by the community. Thanks to all users who took the time to help! The complete list of bugfixes and improvements can be found in the changelog on GitHub.
In the next few weeks we would like to focus on documentation as there’s still many features brought in the 0.9.0 version without up-to-date documentation. This is especially important for more complex features such as dynamic graphs, which got a major upgrade.
Last December we asked Gephi users to participate in a survey. The survey’s main objective was to better understand who users are and what kind of projects they work on. One important dimension we wanted to explore was the diversity of the user community. Through the projects we’ve seen in research and on the web we knew that Gephi users were diverse, but we wanted to quantify it. Ultimately, we aim to make the tool better so it supports users’ needs, but this is a process that requires first a good understanding of who the audience is and what are their objectives. Below we summarized our findings about the profile of users, the types of networks they work with and finally useful usage statistics the community can reflect on.
The largest share of Gephi users work in academia. The project started in the academic sphere from where it has spread into business, artistic and non-profits domains as well. Working at a profit organization is the second most common occupation, which confirms that network analysis is no longer reserved to scientists.
Q12. What is your occupation? n=285; multiple choice
Given that the largest group of users works in academia, it is not surprising that the most common title among Gephi users is a researcher.
Q14. What is your title? n=285; multiple choice
The user community is also widely spread around the world. Users from 46 different countries participated in this study. This confirms the importance of localization for as many languages as possible (Gephi currently supports eight). While many countries were represented by only a handful number of participants in the study, large concentration of users is, as expected, in the US (23%) and in France (15%). Significant presence in France is predetermined by Gephi’s presence in universities and businesses within which Gephi was originally founded.
Social networks are by far the most commonly analyzed type of networks when using Gephi. 70% say that they typically analyze social networks when using Gephi. Social media and semantic network analysis are also common and typically analyzed by 46% and 43%, respectively. The rest of the networks are less common with ecological network analyzed by about 5% of users.
Despite SNA (Social Network Analysis) being the dominant use there is a large variety of other use as well. That said, networks can be analyzed only if the data are accessible and we (the community) still have work to do to ease network collection and formatting.
We always wondered if given occupations are more likely to work with specific types of networks. Based on this study, some differences exist, but they are not as prominent as we have expected. We found that people working at profit organizations are more likely to use Gephi to analyze business and financial networks. While in total 24% use Gephi to analyze business network, it is 44% among those who work in a profit organization compared to only 12% among those who do not work in a profit company. Differences for other types of networks were not conclusive.
Q5. What type(s) of network do you typically analyze using Gephi? n=285; multiple choice
Gephi users commonly deal with a wide range of network sizes. Although the typical network has between 100 to 10K nodes, every size from <100 nodes to 1M nodes represent at least 10% of users. In total that is more than 5 orders of magnitude difference in data size, and without taking edges in consideration!
Q6. What is/are the graph size(s) you deal with when working with Gephi? n=285; multiple choice Q7. And what is the TYPICAL size of a graph that you manipulate with Gephi? n=285; single choice
While more than half of Gephi users have never used Gephi to analyse dynamic networks, the vast majority of the community is likely to use it in the future. This confirms the importance of the set of features related to dynamic networks that has long been one of Gephi’s primary focus.
Q8. Have you ever used Gephi to work with dynamic networks (networks over time)? n=285; single choice
Q9. How likely are you to use Gephi to analyze dynamic networks (networks over time) in the future? n=285; single choice
Both online and offline sources are important touch points through which people learn about Gephi for the first time. While web search is the most common way how people find Gephi, word of mouth remains an important channel and is not to be underestimated.
Q2. How did you first learn about Gephi? n=285, single choice
The community is very diverse when it comes to usage frequency which suggests that Gephi users are likely to have diverse needs. Occasional users are likely to have different expectations from a software than regular users. About one third uses Gephi at least once a week which confirms that there is a relatively large base of heavy users who use Gephi regularly.
Q3. On average, how often do you use Gephi? n=285; single choice
Online tutorials and online forums are key sources for users to learn about Gephi. This confirms the importance of creating and updating online tutorials. It also suggests that the community is well engaged to be able to provide answers one another on online forums and groups.
Q4. What source(s) have you used/are you using to learn how to use Gephi? n=285, multiple choice
This survey is a first, yet important step in understanding the Gephi user community at large. It also gives a general overview of the network visualization and analytics field and we hope this can be useful for others as well. But for us – the Gephi leadership team – this will help us in our future community management efforts. It will also help design a better tool in the future as we better understand its user community.
In addition, talking about what kinds of projects users work on also helps shape the understanding of what network analytics is used for, and ultimately bring more people to the community. In the near future we want to double-down on this topic and start a series of articles highlighting the most interesting projects. Many of the respondents indicated their willingness to share what they have worked on so there’s already plenty to choose from.
Finally, to reflect on the diversity of users we believe it simply reflects that networks are everywhere. Analyzing networks bring insights and answers to many different problems.
Survey was conducted among Gephi users community. While the results provide a unique view into the Gephi community it is important to clarify that they are not meant to be representative of the entire community world wide.
The survey invitations were distributed throughout the week of Dec 1st 2015 via email, Twitter and Facebook
Final data set contains responses collected between Dec 1st 2015 and Dec 23rd 2015
We’re proud to announce the release of the next major version of Gephi! This 0.9.0 version has been more than three years in the making but today brings an exciting new life to this project, and the graph/network analytics community at large. You can download it here for Windows, Mac OS X and Linux.
Gephi is the leading graph visualization software – known as the “Photoshop for networks” and is open-source and free. It has been downloaded more than a million times and is used by many scholars and data scientists around the world. This new release brings new features in the area of dynamic networks (i.e. network over time) and major compatibility and performance improvements.
Since the last release in 2013, users were facing compatibility issues with Java, which have been resolved with this release. Development had slow down three years ago but had never stopped. In fact, in March 2013 the time had come to think about what Gephi 1.0 would look like and realize it needed a new core. This was by far the most complex project the team had to overcome but developers had a long-term vision and know that future developments will now rely on a robust and extensible core, with world-class performances.
The world is increasingly complex and interconnected. Gephi’s purpose is to unfold this complex relational data in a way anyone can understand them. It allows you to visualize graph data as a map and create the visualizations to support your narratives. State-of-the-art algorithms make readable layouts, highlighting communities or influential nodes. Visual tools tweak colors and shapes to reveal hidden patterns in the data, helping solving complex problems. More and more network-maps are pictured in online, offline press and other communication media. They spread from science to business, art and activism. People are increasingly exposed to them and learn how to read them. Gephi aims to accelerate this commoditization process by providing free and easy-to-use tools.
What’s new in Gephi 0.9?
The list is too long! The complete changelog for this version can be found on GitHub’s release page.
There are a few immediate next steps coming up right after this release. Following-up on the recent plugin development announcement we’ll get in touch with plugin developers and start migrating plugins to this version. There’s more than 80 plugins to update!
Then, we’ll identify and resolve new issues that appeared with this version. A future Gephi 0.9.1 release will come next year to address those.
A Gephi Toolkit release will also be made very soon so developers can update their application built on top of Gephi’s modules. In the meantime, we’re interested in users’ feedback and want to hear from you on Twitter or Facebook. Issues can directly be reported on GitHub as well, where the developers are.
Finally, thanks to all contributors and the community for supporting this project!
Since the introduction of the Gephi Marketplace and tools such as the Plugins Bootcamp we’ve seen more and more plugins being developed. Even developers with little experience with Java give it a try and succeed in creating their first plugin. We want developers to be productive and make it as easy as possible to get started with plugin development and find help along the way. As the release of the 0.9 version is near, it’s time to review our plan on that matter and upcoming improvements. Here’s the summary:
The gephi-plugins base repository (i.e. repository plugin developers fork) is now using Maven for building and is simpler. It contains only 4 files versus 890 for the Ant-based system.
All Gephi modules are published on Maven central, making it very easy to inspect and extend.
The submission and review of plugins will be entirely based of GitHub, making it more scalable and transparent.
A new online portal for plugins is coming up with an easier edit experience and new features.
From Ant to Maven
Before diving into plugins, let’s first review what has changed on how Gephi is compiled, built and packaged – as this directly affects plugins as well. Since the Gephi 0.8.2 version we have migrated our build system from Ant to Maven. This is in line with what the Netbeans Platform (i.e. which Gephi is based on) community recommends. It already has increased the level of automation we’re capable of as a result. The main benefits are (compared to Ant):
Maven is great at dependencies management. It’s now very clear what version of what library Gephi depends on, making it simpler to integrate. Dependencies are also downloaded automatically instead of being checked in the codebase
Unlike the Ant-based system, it’s independent from Netbeans. This allows developers not using Netbeans to develop Gephi and produce a build entirely from the command-line.
Gephi modules can now be placed on Maven Central (i.e. global repository where Maven finds its dependencies). This allows plugins to automatically find the Gephi dependencies online, reducing the manual steps at each Gephi upgrade.
There are a few critical steps we want to help plugin developers with and as a result started the development of a custom Maven plugin. This new tool will work behind the scenes when developers build their plugin. No installation or configuration is needed as it comes already as dependency of the gephi-plugins module. It already addresses common pain points and hope to automate more and more of the steps in the future. This is what it can do as of today:
Plugin validation: The assistant reviews the plugin configuration and metadata at each build. This allows for instance to check if the plugin depends on the correct Gephi version or remind the developer to define an author or license in its configuration.
Run Gephi with plugins: A single command allows to run Gephi with the plugins pre-installed. This makes testing faster than ever when developing plugins.
New plugin generator: A step-by-step command-line tool that creates the correct folder structure and configuration to get started.
In the future, we want to rely on this build assistant to further automate the process and for instance do easy migration or code generation. For instance, you could ask to generate a Layout plugin code and configuration. Afterwards, all needed would be to fill in the blanks in the code.
A new way to review and submit plugins
As the number of plugins grows, it’s important to have a clear process how plugins are reviewed and updated. We also want this process to be transparent and open to the community. So far, the process was based on the submission of the plugin binaries with a manual review done by the team. This helped us get where we are today, but we want to get it to the next level and propose to entirely move this process to GitHub – using the pull-request mechanism. This has multiple advantages, listed below:
Reviewing new/updated plugins can scale because any developer can read the code and contribute to the pull requests.
Developers are already asked to fork the gephi-plugins repository so submitting the plugin via GitHub is a natural extension to it.
There’s a clear history of each version, comment and what code has changed from one version to another.
It makes it easier to test plugins and detect issues before the plugin is approved.
As part of this migration, we’ll no longer add plugins with closed source code but all existing plugins for Gephi 0.8.2 will remain available. For security and stability reasons, it’s essential that each plugin’s code can be inspected before approval. In order for this to work, all existing plugins not already on GitHub or not forking the gephi-plugins repository will need to migrate. For those already set up, the migration will be easier but Ant-based plugins will still need to migrate to Maven.
To summarize, this is what the new 4-steps process looks like for developers:
In the current submission process we ask for additional information such as description, author or license as well as allow the upload of images. Going forward with GitHub, all of these data will directly be defined in the plugin’s configuration making it easier to update.
A new home for plugins (again)
Plugins are currently available online from the Gephi Marketplace, where users could also reach people providing teachings and support. We have ideas on how to improve these community services and will be migrating them to a new architecture, starting with the plugins. We will tell you more about these changes in an upcoming post but for now our focus is on developing a new lightweight plugin portal that can directly be connected with the data source on GitHub.
Here is a preview of what it will look like for plugin pages:
The content of this website will be automatically updated when plugins are published or updated. The way it works is with Travis CI (i.e. continuous integration platform) simply refreshing the JSON file after changes to the plugin repository on GitHub. Developers can even embed images and write the description in Markdown. This will remove entirely the need for plugin developers to login to the marketplace, update NBMs and metadata.
This new Maven-based repository along with the new submission process will be introduced with the Gephi 0.9 release. Let’s review what plugin developers need to know to bring their plugin to this new major version.
As with all major Gephi release, plugins compatibility needs to be evaluated as APIs may have changed. In fact, given this new version is based on an entirely redeveloped core it’s very likely code changes will be required. Hopefully, these changes will often be minor and actually simplify things (i.e less, more efficient code). Documentation will be published on these API changes and core developers will be available to answer questions as well.
Plugin developers will also get contacted regarding moving their code to GitHub with a step-by-step guide. We’re considering adding a migrate command to the new Gephi Maven plugin to facilitate the transition from Ant but that’s an unfunded project at the moment (if you’re interested contributing to that, please let us know). Stay tuned for details right after the release on the path to migration.
And again, thanks for all your hard work on bringing your ideas to life though new Gephi plugins!
Gephi is a graph visualization and analysis platform – the entire tool revolves around the graph the user is manipulating. All modules (e.g. filter, ranking, layout etc.) touch the graph in some way or another and everything happens in real-time, reflected in the visualization. It’s therefore extremely important to rely on a robust and fast underlying graph structure. As explained in this article we decided in 2013 to rewrite the graph structure and started the GraphStore project. Today, this project is mostly complete and it’s time to look at some of the benefits GraphStore is bringing into Gephi (which its 0.9 release is approaching).
Performance is critical when analyzing graphs. A lot can be done to optimize how graphs are represented and accessed in the code but it remains a hard problem. The first versions of Gephi didn’t always shine in that area as the graphs were using a lot of memory and some operations such as filter were slow on large networks. A lot was learnt though and when the time came to start from scratch we knew what would move the needle. Compared to the previous implementation, GraphStore uses simpler data structures (e.g. more arrays, less maps) and cache-friendly collections to make common graph operations faster. Along the way, we relied on many micro-benchmarks to understand what was expensive and what was not. As often with Java, this can lead to surprises but it’s a necessary process to build a world-class graph library.
We wanted to compare Gephi 0.8.2 and Gephi 0.9 (development version) so we’ve created a benchmark to test the most common graph operations. Here is what we found. The table below represents the relative improvement between the two versions. For instance, “2X” means that the operation is twice faster to complete. A benchmarking utility was used to guarantee the measurements precision and each scenario was performed at least 20 times, and up to 600 times in some cases. We used two different classic graphs, one small (1.5K nodes, 19K edges) and one medium (83K nodes, 68K edges) . Larger graphs may be evaluated in a future blog article.
Benchmark / Graph
SMALL (n=1490, e=19025)
MEDIUM (n=82670, e=67851)
Add & Remove Nodes
Add & Remove Edges
Iterate Nodes In View
Iterate Edges In View
Project File Size
These benchmarks show pretty remarkable improvements in common operations, especially read ones such as node or edge iteration. For instance, in average it takes 40 to 100 times less CPU to read all the edges in the graph. Although this benchmark focus on low-level graph operations it will bring material improvements to user-level features such as filter or layout. The way GraphStore creates views is different from what we were doing before, and doesn’t require a deep graph copy anymore – explaining the large difference. Finally, only the set attribute is significantly slower but that can be explained by the introduction of inverted indices, which are updated when attributes are set.
And what about memory usage? Saving memory has been one of our obsession and there’s good news to report on that front as well. Below is a quick comparaison between Gephi 0.8.2 and Gephi 0.9 for the same medium graph above.
Graph with 5 attribute columns
This benchmark shows a clear reduction of memory usage in Gephi’s next version. How much? It’s hard to say as it really depends on the graph but the denser (i.e. more edges) and the more attributes, the more memory saved as significant improvements have been made in these areas. Dynamic graphs (i.e. graphs that have their topology or attributes change over time) will also see a big boost as we’ve redesigned this part from scratch.
All of the GraphStore project benefits are included in the upcoming 0.9 release and that’s the most important. However, the work doesn’t end and there’s many more features and performance optimization that can be added.
Then, we count on the community’s help to start collaborating with us on the GraphStore library – calling all database and performance experts. GraphStore will continue to live as an all-purpose Java graph library, released under the Apache 2.0 license and independent from Gephi (i.e. Gephi uses GraphStore but not the opposite). We hope to see it used in other projects in the near future.