A close look at the Gephi user community

Last December we asked Gephi users to participate in a survey. The survey’s main objective was to better understand who users are and what kind of projects they work on. One important dimension we wanted to explore was the diversity of the user community. Through the projects we’ve seen in research and on the web we knew that Gephi users were diverse, but we wanted to quantify it. Ultimately, we aim to make the tool better so it supports users’ needs, but this is a process that requires first a good understanding of who the audience is and what are their objectives. Below we summarized our findings about the profile of users, the types of networks they work with and finally useful usage statistics the community can reflect on.

Profile

The largest share of Gephi users work in academia. The project started in the academic sphere from where it has spread into business, artistic and non-profits domains as well. Working at a profit organization is the second most common occupation, which confirms that network analysis is no longer reserved to scientists.

surveyq12

Q12. What is your occupation? n=285; multiple choice

Given that the largest group of users works in academia, it is not surprising that the most common title among Gephi users is a researcher.

surveyq14

Q14. What is your title? n=285; multiple choice

The user community is also widely spread around the world. Users from 46 different countries participated in this study. This confirms the importance of localization for as many languages as possible (Gephi currently supports eight). While many countries were represented by only a handful number of participants in the study, large concentration of users is, as expected, in the US (23%) and in France (15%). Significant presence in France is predetermined by Gephi’s presence in universities and businesses within which Gephi was originally founded.

Networks

Social networks are by far the most commonly analyzed type of networks when using Gephi. 70% say that they typically analyze social networks when using Gephi. Social media and semantic network analysis are also common and typically analyzed by 46% and 43%, respectively. The rest of the networks are less common with ecological network analyzed by about 5% of users.

Despite SNA (Social Network Analysis) being the dominant use there is a large variety of other use as well. That said, networks can be analyzed only if the data are accessible and we (the community) still have work to do to ease network collection and formatting.

We always wondered if given occupations are more likely to work with specific types of networks. Based on this study, some differences exist, but they are not as prominent as we have expected. We found that people working at profit organizations are more likely to use Gephi to analyze business and financial networks. While in total 24% use Gephi to analyze business network, it is 44% among those who work in a profit organization compared to only 12% among those who do not work in a profit company. Differences for other types of networks were not conclusive.

surveyq5

Q5. What type(s) of network do you typically analyze using Gephi? n=285; multiple choice

Gephi users commonly deal with a wide range of network sizes. Although the typical network has between 100 to 10K nodes, every size from <100 nodes to 1M nodes represent at least 10% of users. In total that is more than 5 orders of magnitude difference in data size, and without taking edges in consideration!

surveyq6q7

Q6. What is/are the graph size(s) you deal with when working with Gephi? n=285; multiple choice
Q7. And what is the TYPICAL size of a graph that you manipulate with Gephi? n=285; single choice

While more than half of Gephi users have never used Gephi to analyse dynamic networks, the vast majority of the community is likely to use it in the future.  This confirms the importance of the set of features related to dynamic networks that has long been one of Gephi’s primary focus.

surveyq8

Q8. Have you ever used Gephi to work with dynamic networks (networks over time)? n=285; single choice

surveyq9

Q9. How likely are you to use Gephi to analyze dynamic networks (networks over time) in the future? n=285; single choice

Usage

Both online and offline sources are important touch points through which people learn about Gephi for the first time. While web search is the most common way how people find Gephi, word of mouth remains an important channel and is not to be underestimated.

surveyq2

Q2. How did you first learn about Gephi? n=285, single choice

The community is very diverse when it comes to usage frequency which suggests that Gephi users are likely to have diverse needs. Occasional users are likely to have different expectations from a software than regular users.  About one third uses Gephi at least once a week which confirms that there is a relatively large base of heavy users who use Gephi regularly.

surveyq3

Q3. On average, how often do you use Gephi? n=285; single choice

Online tutorials and online forums are key sources for users to learn about Gephi. This confirms the importance of creating and updating online tutorials. It also suggests that the community is well engaged to be able to provide answers one another on online forums and groups.

surveyq4

Q4. What source(s) have you used/are you using to learn how to use Gephi? n=285, multiple choice

Conclusion

This survey is a first, yet important step in understanding the Gephi user community at large. It also gives a general overview of the network visualization and analytics field and we hope this can be useful for others as well. But for us – the Gephi leadership team – this will help us in our future community management efforts. It will also help design a better tool in the future as we better understand its user community.

In addition, talking about what kinds of projects users work on also helps shape the understanding of what network analytics is used for, and ultimately bring more people to the community. In the near future we want to double-down on this topic and start a series of articles highlighting the most interesting projects. Many of the respondents indicated their willingness to share what they have worked on so there’s already plenty to choose from.

Finally, to reflect on the diversity of users we believe it simply reflects that networks are everywhere. Analyzing networks bring insights and answers to many different problems.

separator

Appendix
  • Survey was conducted among Gephi users community. While the results provide a unique view into the Gephi community it is important to clarify that they are not meant to be representative of the entire community world wide.
  • The survey invitations were distributed throughout the week of Dec 1st 2015 via email, Twitter and Facebook
  • Final data set contains responses collected between Dec 1st 2015 and Dec 23rd 2015
  • A total of 285 participants completed the survey

"Everything looks like a graph, but almost nothing should ever be drawn as one."

seb

I get scratched with this statement made by Ben Fry in the book ‘Visualizing Data‘ (2008). Although I have a great respect for Ben Fry’s work and his position may have evolve since then, I want to moderate this statement so that data explorers like danbri can make their own opinion.

Ben Fry in ‘Visualizing Data‘:

Graphs can be a powerful way to represent relationships between data, but they are also a very abstract concept, which means that they run the danger of meaning something only to the creator of the graph. Often, simply showing the structure of the data says very little about what it actually means, even though it’s a perfectly accurate means of representing the data. Everything looks like a graph, but almost nothing should ever be drawn as one.

There is a tendency when using graphs to become smitten with one’s own data. Even though a graph of a few hundred nodes quickly becomes unreadable, it is often satisfying for the creator because the resulting figure is elegant and complex and may be subjectively beautiful, and the notion that the creator’s data is “complex” fits just fine with the creator’s own interpretation of it. Graphs have a tendency of making a data set look sophisticated and important, without having solved the problem of enlightening the viewer.

I totally disagree. Look at this simple plot:

pareto_convergence_r050a01

Can anyone tell me how, simply showing this plot, one is enlightened if I don’t tell how it was done, and what is interesting to look at? It however appears very simple: only one curve, something that you are used to see since the time you discovered this kind of drawing in primary school. And even if I give some insights on how I made it and the context of the work, I’m still, as the creator, the only one able to deeply understand the information that can be extracted because I know the process that built the underlying data. To criticize my conclusions, you will need to learn as much as I did and you will need to get the same data and apply the same manipulations. Depending on the curation, reformatting, filtering or whatever the algorithms you used to capture, extract and use some data, each action has an impact on the meaning carried on by the data. Graph visualization is no exception, and is like any plot except that you can’t hide the structural complexity without explicit filtering.

Let’s enumerate all the dimensions used in a graph visualization: x+y coordinates, size of nodes, color of nodes, thickness of edges. Well, it is not easy to read on 5 dimensions. But is the “simple” plot a better deal? You have x+y coordinates, so 2 dimensions only (we might also have used colors and dot sizes as well, and get 4 dimensions). So you might think that you and your readers can interpret it easily and reliably. You are all wrong because of the hidden dimension: scaling.

Here you see a plot in a log-lin scale, that mean the y-axis is in a logarithm scale, while the x-axis is in a normal scale. I found this visual pattern interesting on these data because of my research question, because I understand the meaning on the process that made them, and because I found it in this particular scale. Plotted in lin-lin scale, I can find less information. Or maybe should I use a cumulative function to plot my data? Maybe an inverse cumulative? Etc. An exploration of both data and projection techniques is required.

By doing one projection, I focus on something very particular on the data, and I still need other plots and statistical tests a) to decide whether it supports an hypothesis I have in mind, or b) if I can find something new, something unexpected. The distortion of vision is therefore at the same time an issue and a tool to better dig inside the data. I could also make very wrong conclusions, even on analyzing this simple drawing, so why external readers should be more protected this way? There is a balance to find between a drawing that looks simple to read so conclusions appear obvious (even if they are not and you might be wrong), and the opposite one that looks too complex to read so little conclusions will be made, if any. Hence this is a fallacy to argue that graphs are meaningful only for their creator, because it is the case for any plot taken solely, and it is a hard job to enter into the work of somebody else anyway.

So graph visualization is not naturally worse compared to any data drawing: we just don’t teach how to read them in primary school. Do you remember the first time you saw a plot? I guess you find it really abstract. Most of the people don’t really know what to look at on a graph, and produce visualizations that don’t show something in particular. I personally think that it is a good thing, because put in context graph visualization is very young compared to other data drawings, and a language of networks that combine layout algorithms and visual variables is still in the making. Moreover, after meeting and discussing with people publishing such visuals, it seems that they already use it in a pragmatic way: by showing their complexity, graphs communicate to the reader that a) data might contain interesting information (“so please, read until the end!”), b) they made things and propose some findings but it was hard and many other things could be done (“hey, let’s try by yourself!”). It is useless to discover the secrets of the universe if nobody listen to you. Before enlightening the viewer, one should attract the viewer enough so that he/she will take the time to read, and graphs are useful for that need.

But drawing graphs as graphs is not only useful to communicate. Their primary use for researchers is exploratory analysis when the study is not focused on the sole structure of the data, but when elements in context matter because you have a prior knowledge on them, and your questions are related to another perspectives (say, sociology). I take the example of our work at Sciences-Po, where we teach the mapping of controversies to students that will become the future decision makers of companies or public policies. Part of the controversies in the public space are expressed on the Web. The dynamics of the discussions and the hyperlink structure of the Web makes this field particularly hard to investigate. We successfully use graph visualization of websites to help the students to orientate in this space, to assist and justify the classification of websites, and to assert the position of the actors of a controversy. This is just one case among others where there is currently no viable alternative to graph drawing and it’s synoptical property (see the whole without reduction of data).

Finally, the different usages of graph drawing are growing as it becomes mainstream and more people are acculturated. I trust on the people to innovate and progressively learn how to read and extract information. Just practice.