"Everything looks like a graph, but almost nothing should ever be drawn as one."

seb

I get scratched with this statement made by Ben Fry in the book ‘Visualizing Data‘ (2008). Although I have a great respect for Ben Fry’s work and his position may have evolve since then, I want to moderate this statement so that data explorers like danbri can make their own opinion.

Ben Fry in ‘Visualizing Data‘:

Graphs can be a powerful way to represent relationships between data, but they are also a very abstract concept, which means that they run the danger of meaning something only to the creator of the graph. Often, simply showing the structure of the data says very little about what it actually means, even though it’s a perfectly accurate means of representing the data. Everything looks like a graph, but almost nothing should ever be drawn as one.

There is a tendency when using graphs to become smitten with one’s own data. Even though a graph of a few hundred nodes quickly becomes unreadable, it is often satisfying for the creator because the resulting figure is elegant and complex and may be subjectively beautiful, and the notion that the creator’s data is “complex” fits just fine with the creator’s own interpretation of it. Graphs have a tendency of making a data set look sophisticated and important, without having solved the problem of enlightening the viewer.

I totally disagree. Look at this simple plot:

pareto_convergence_r050a01

Can anyone tell me how, simply showing this plot, one is enlightened if I don’t tell how it was done, and what is interesting to look at? It however appears very simple: only one curve, something that you are used to see since the time you discovered this kind of drawing in primary school. And even if I give some insights on how I made it and the context of the work, I’m still, as the creator, the only one able to deeply understand the information that can be extracted because I know the process that built the underlying data. To criticize my conclusions, you will need to learn as much as I did and you will need to get the same data and apply the same manipulations. Depending on the curation, reformatting, filtering or whatever the algorithms you used to capture, extract and use some data, each action has an impact on the meaning carried on by the data. Graph visualization is no exception, and is like any plot except that you can’t hide the structural complexity without explicit filtering.

Let’s enumerate all the dimensions used in a graph visualization: x+y coordinates, size of nodes, color of nodes, thickness of edges. Well, it is not easy to read on 5 dimensions. But is the “simple” plot a better deal? You have x+y coordinates, so 2 dimensions only (we might also have used colors and dot sizes as well, and get 4 dimensions). So you might think that you and your readers can interpret it easily and reliably. You are all wrong because of the hidden dimension: scaling.

Here you see a plot in a log-lin scale, that mean the y-axis is in a logarithm scale, while the x-axis is in a normal scale. I found this visual pattern interesting on these data because of my research question, because I understand the meaning on the process that made them, and because I found it in this particular scale. Plotted in lin-lin scale, I can find less information. Or maybe should I use a cumulative function to plot my data? Maybe an inverse cumulative? Etc. An exploration of both data and projection techniques is required.

By doing one projection, I focus on something very particular on the data, and I still need other plots and statistical tests a) to decide whether it supports an hypothesis I have in mind, or b) if I can find something new, something unexpected. The distortion of vision is therefore at the same time an issue and a tool to better dig inside the data. I could also make very wrong conclusions, even on analyzing this simple drawing, so why external readers should be more protected this way? There is a balance to find between a drawing that looks simple to read so conclusions appear obvious (even if they are not and you might be wrong), and the opposite one that looks too complex to read so little conclusions will be made, if any. Hence this is a fallacy to argue that graphs are meaningful only for their creator, because it is the case for any plot taken solely, and it is a hard job to enter into the work of somebody else anyway.

So graph visualization is not naturally worse compared to any data drawing: we just don’t teach how to read them in primary school. Do you remember the first time you saw a plot? I guess you find it really abstract. Most of the people don’t really know what to look at on a graph, and produce visualizations that don’t show something in particular. I personally think that it is a good thing, because put in context graph visualization is very young compared to other data drawings, and a language of networks that combine layout algorithms and visual variables is still in the making. Moreover, after meeting and discussing with people publishing such visuals, it seems that they already use it in a pragmatic way: by showing their complexity, graphs communicate to the reader that a) data might contain interesting information (“so please, read until the end!”), b) they made things and propose some findings but it was hard and many other things could be done (“hey, let’s try by yourself!”). It is useless to discover the secrets of the universe if nobody listen to you. Before enlightening the viewer, one should attract the viewer enough so that he/she will take the time to read, and graphs are useful for that need.

But drawing graphs as graphs is not only useful to communicate. Their primary use for researchers is exploratory analysis when the study is not focused on the sole structure of the data, but when elements in context matter because you have a prior knowledge on them, and your questions are related to another perspectives (say, sociology). I take the example of our work at Sciences-Po, where we teach the mapping of controversies to students that will become the future decision makers of companies or public policies. Part of the controversies in the public space are expressed on the Web. The dynamics of the discussions and the hyperlink structure of the Web makes this field particularly hard to investigate. We successfully use graph visualization of websites to help the students to orientate in this space, to assist and justify the classification of websites, and to assert the position of the actors of a controversy. This is just one case among others where there is currently no viable alternative to graph drawing and it’s synoptical property (see the whole without reduction of data).

Finally, the different usages of graph drawing are growing as it becomes mainstream and more people are acculturated. I trust on the people to innovate and progressively learn how to read and extract information. Just practice.

Slides of the ICWSM Gephi tutorial

Sébastien Heymann and Julian Bilcke gave yesterday the official Gephi tutorial at the ICWSM conference, in Barcelona. The International Conference on Weblogs and Social Media is a unique forum that brings together researchers from the disciplines in computer science, linguistics, communication, and the social sciences. The broad goal of ICWSM is to increase understanding of social media in all its incarnations. This is also a special conference for us because we introduced Gephi for the first time 2 years ago, at the 3rd ICWSM conference.

Thought the tutorial was not recorded, you’ll find here the slides of the tutorial.

This month about 80 people were trained to Gephi thanks to the fundings we receive at our non-profit organization, the Gephi Consortium: 40 people at ICWSM, but also 20 people at UKSNA and 20 people at the French Complex System summer school. We will have our next talk at ECCS, the European Conference on Complex Systems. Looking forward to see you there!

Google Summer of Code 2011

GSoC2011 It’s a great news, Gephi has been accepted again for Google Summer of Code. The program is the best way for students around the world to start contributing to an open-source project. The 2009 and 2010 editions were a great success and dramatically boosted Gephi’s project development.

What is Gephi?

If you look around you, you may notice that networks are everywhere. For instance, social networks, relationships among people or computer networks, links between computers. Transportations routes, power grids, emails networks or the relations between scientific papers are other examples of networks. The ability to analyze, manipulate and represent a network is a key-feature for solving difficult problems and boost knowledge discovery in many disciplines.

Gephi’s project aims to bring the perfect tool for visualizing and manipulating networks. We focus on usability, performance and modularity:

  • Usability: Easy to install, an UI without scripts and real-time manipulation.
  • Performance: Visualization engine and data structures are built scalable. Supporting always-larger graphs is an endless challenge!
  • Modularity: Extensible software architecture, built on top of Netbeans Platform. Add plug-ins with ease.

Learn more about Gephi, watch Introducing Gephi 0.7, download and try it by following Quick Start Tutorial.

Gephi’s project is recent, the growing community is composed of a mixture of engineers and research scientists involved in network science, datavis and complex networks.

List of ideas

List of ideas are availabe on our wiki. They cover various skills and level of difficulties:

* Preview_RefactoringSimplify and modularize the Preview architecture
* Web-based network visualization with WebGLStart a new project by developing an efficient network visualization library for the web using WebGL
* Timeline player and movie creation Add a ‘Play’ feature to the timeline component and create animated network movies.
* New Visualization EngineDevelop the new visualization engine, add interMake the new visualization engine using Shaders on GPU, and aims to release a feature-complete version
* Indexed Attributes API using LuceneAdd index support to Gephi attributes system
* Scripting GephiDevelop a scripting language and a console plug-in for Gephi
* Automated build & MavenChoose and create a deployment server to generate releases automatically

You can also propose your ideas, please post on this forum topic. They will be considered and discussed by the community. Have a look on our long-term Roadmap.

Students, join the network

Students, apply now for Gephi proposals. Come to the GSOC forum section and say Hi! to this topic. The fill in and follow the questionnaire. Be careful, deadline is April 8 (timeline)!

Helder Suzuki, student in 2009 wrote:
At Gephi students will have the opportunity to produce high impact work on a rapidly growing area and be noted for it.

Have a look to 2009 pages and Helder’s interview.

Follow gephi on Twitter

Gephi 0.7alpha2 released

Gephi 0.7 alpha2 was just released. It increases stability and fix issues that were reported on our forum and bug tracker. Thanks for your valuable feedbacks!

Normally, for minor versions like those, Gephi updates itself by asking users to update when Gephi starts. This AutoUpdate feature was not yet available for 0.7alpha but is working now. Therefore users need to download this new 0.7alpha2 version to profit for upcoming updates through the plugin center.

Check releases notes and download latest version. Please uninstall previous versions first before installing it.

One can find many network and graph datasets on this wiki page.