Gephi code sustainability retreat 2021: debriefing

From 29 November to 3 December 2021, 6 people met in Copenhagen to make the Gephi codebase more sustainable, discuss the project, pave the way to the long-waited version 1.0, and improve the tool overall. Before the event even started, it had already tackled its main goal. In this post we present what we have done.

The event was sponsored by the Aalborg University TANT Lab in Copenhagen, whom we warmly thank, because they basically funded the whole thing, and hosted us. The participants featured a whooping 50% of Mathieux, and more importantly, a healthy mix of skills:

  • Mathieu Bastian, who was Gephi’s lead dev for many years, and knows the codebase like none;
  • Mathieu Jacomy, who designed Gephi’s UX and some of its algorithms, and organized the event;
  • Matthieu Totet, who authored the Gephi Twitter plugin;
  • Eduardo Ramos Ibáñez, Gephi’s current lead developer;
  • Tiago Peixoto, world-class expert on community detection;
  • and Antonin Delpeuch, the main developer of Open Refine.

We spend the first two days discussing and preparing stuff, and the last three to code. The coding went surprisingly well, compared to many comparable situations, for example hackathons. That is why I wrote above that it had tackled its main goal before it even started: the codebase is sustainable. I know someone who cleans their house before the house cleaning service comes, because they don’t want to expose the actual messiness of their lives. Similarly, Mathieu Bastian and Eduardo had so well prepared the codebase in anticipation of the retreat that there was nothing left to do on that front. Bravo!

Code sustainability was our main goal, but that is not where it ends. We changed our perspective in the process. Let’s start with the main takeaways: goals met or not, and what we have learned.

Takeaways

Code sustainability. The codebase is in a good shape and developers can healthily engage with it. However, we can improve the documentation and entry points for aspiring developers.

Enrolling new developers. This did not work so well, but we will get there. We hoped for more new developers to come to this retreat. We had room and funding for two more people, so our call for participation did not work great. However, some developers manifested themselves during the event, when we started communicating about the retreat on Twitter. We aim at more participants next time.

Anticipating the fundraising phase. We learned a lot from the Open Refine project through Antonin. They have a fiscal sponsor: a structure that represents them legally and manages money, allowing them to collect funds and pay developers. This is what we need, and our next step will be to seek one.

Goodwill. There is still a huge amount of it around the project. The response of the public to this event was outstandingly positive. This is important notably to raising funds.

Dev infrastructure. The GitHub issues system, that we use to track and fix bugs, does not work that great for us in practice. We are thinking of an alternative.

Web presence. Reflecting on our website, blog, and online tools was an explicit non-goal: we decided to focus on that another time. But we could see that it definitely required a good reworking. We are aware of it.

Governance. As the project involves funds and more people, we will need to change our model. We want to talk about it with potential fiscal sponsors.

Mailing list. We have to make one, dedicated to developers, at least for the moment.

Achieved

Here is a summary of what we have achieved during the three coding days. Note that it does not directly translate into a release. You will have to wait for Christmas at least for that (it requires more work).

  • We triaged a lot of issues and tested the contribution process.
  • We fixed a bunch of bugs on our bug bash.
  • We defined and enforced code style on the repository, making it simpler going forward to collaborate between developers.
  • We made the project saving/loading more resilient, preventing users from losing their work due to corrupted .gephi files.
  • We embedded the Java JRE on the Windows and Linux installation, so that users don’t need to install Java by themselves anymore.
  • We migrated the localization system from Transifex to Weblate, making it easier to translate Gephi.
  • We made unit testing easier so that developers are more productive.
  • We integrated the new visualization engine into the Gephi desktop app. We got it working the first day, but without workspace switch support.
  • The following days we implemented the workspace switch, added support for High DPI screens, and got most of the interactivity and tools working fine.
  • We implemented Tiago Peixoto’s statistical inference algorithm for community detection and its unit tests (in progress).
  • We sketched a specification for the undo/redo feature.

Discussed

We discussed the state of the project in various ways during the first two days: its community of developers and users, its scientific state, the infamous (lack of) undo, our road map, governance, funding… We learned a lot from Antonin (Open Refine), and on community detection algorithms from Tiago. That part is hard to transcribe here, but I will write down a few knowledge points we have established.

Our community is broad. It consists of developers and users, and we find those in various scientific fields: digital humanities, SNA (social network analysis), network science (notably teaching). Outside of research, we also find users in data journalism, activism, and in the industry: SEO (search engine optimization), social media listening, patents and papers analysis, intelligence (OSINT), and cybersecurity. We noted that Open Refine does user surveys to know their community better (we’ve done so in 2016).

Gephi is not a commercial product. By that, we mean that we do not want to make Gephi for a specific public. We do not aim at normalizing or formatting usage. We just want to help different kinds of people, even when they want different things. [Mathieu Jacomy’s note: as I am writing this I realize that part of our audience will rightfully remark that we do have methodological commitments and that we necessarily shape usage. Hence this precision:] In other words, contrary to a company whose interest might be to serve certain consumers to the detriment of others, for example because they have higher purchasing powers, we do not have a fixed persona in mind when we make Gephi. We aim at satisfying the existing users including those who have marginal needs. In short, the features are decided on the pragmatic ground of usefulness to people versus implementation difficulty.

Do people leave Gephi and why? We don’t think we have a “users leave Gephi” problem. Here is what we believe: many users naturally move to more advanced tools, yet they may go back to Gephi on various occasions, because it’s easy to use. Using Gephi is sporadic anyway (one does not need to use it every day). That being said, some developers leave Gephi when they write code, because it’s easier to script (e.g. Python).

Web presence. Nice things we want to have: an introductory video to Gephi, a list of the best tutorials produced by the community, a simpler website because there is too much irrelevant information, the content for developers should be moved somewhere else (e.g. GitHub), remove the content about the Gephi Consortium (obsolete), a unified navigation bar over our different online spaces, a YouTube channel, and a way to promote the good content produced by the community.

Book. We would like to write one. Or a MOOC. The Gephi 1.0 release would be the ideal moment.

Community detection. Tiago Peixoto presented his work on the topic. He champions a Bayesian inference approach and considers modularity maximization as an obsolete (disproven) method. He also acknowledged that not everyone necessarily agrees in the research community. He documented his perspective in a series of blog posts (1, 2 & 3). We collectively agreed that we would keep the current popular Louvain method, add the more recent Leiden method, and add Tiago’s statistical inference approach. We will also group them in a specific section of the statistics panel, so that the users can identify them as alternatives, and muscle their own critical thinking by engaging with them.

Undo/redo. We learned how it is done currently in Open Refine, the problems it creates, and how it could be done better. The feature is doable in Gephi, and we sketched an architecture.

Funding. From Antonin’s feedback on the Open Refine project, we realized that what we needed was a fiscal sponsor, for instance Code for Science and Society. We also aggregated a list of possible funders: crowdfunding, Google Summer of Code (good to engage devs over the long run), Outreachy (idem), Chan Zuckerberg Initiative…

Feedback

Here is what we thought of how it went, to remember for next time.

To be improved: organization of travel and accommodation; we should identify decisions when we make them and note them apart; we should record some of the talks, notably the introduction to the codebase; it would be nice to have a (social) occasion to interact with Gephi users; the big map of the source code was not useful.

Went well: the coding was well prepared, thanks to Mathieu Bastian and Eduardo; the coding went often beyond our expectation; knowledge exchange across Gephi and Open Refine was great thanks to the meeting being in person; the live tweeting was engaging to our community including developers; the T-shirts are nice.

To consider for next time: live-streaming moments; a hybrid format or possibly a 100% online edition; preparing explainer videos.

Photos

Here is a small selection ofpictures of us at work, to get you an idea of what it looked like. We hope it makes you consider joining next time! Also, in the meanwhile, consider proposing a talk and attending the Open Research Tools and Technologies devroom where, notably, the Gephi team met Antonin: it’s a great place to meet like-minded people.

Mathieu Jacomy explaining stuff
Tiago Peixoto presenting his approach to community detection. On the left, Antonin Delpeuch at work.
Mathieu Bastian discussing how the Gephi architecture could support an undo/redo à la Open Refine (action stack).

In short, this is the Gephi codebase.

Coding session with Matthieu Totet (left), Eduardo Ramos Ibáñez (center), and Mathieu Bastian (right).

Tiago Peixoto monitoring his algo.

Antonin Delpeuch sieving Gephi issues. Behind, a map of the Gephi source code.
Attempts to track the steps of Tiago’s community detection algorithm during a debugging session.

A social event in the company of Ann-Sofie and Martin Grandjean, who visited us.

The end of the retreat intersected with the university’s Christmas party. More social events!

During this party, Matthieu Totet became a Danish legend (here with Anders Munk on the right).

One cannot fully know what will happen at a Gephi coding retreat. Consider applying in 2022!

4 Comments

Leave a comment