This week end at the data in sight hackathon in San Francisco, one of the winning team worked with Gephi and the cool Marvel dataset provided by Infochimps.

From Friday evening to Sunday afternoon, Kai Chang, Tom Turner, and Jefferson Braswell were tuning their visualizations and had a lot of fun exploring Spiderman or Captain america ego network. They came with these beautiful snapshots and created a zoomable web version using the Seadragon plugin. The won the “Most aesthetically pleasing visualization” category, congratulations to Kai, Tom and Jefferson for their amazing work!

The datasets have been added to the wiki Datasets page, so you can play with it and maybe calculate some metrics like centrality on the network. The graph is pretty large, so be sure to increase you Gephi memory settings with > 2GB.

Gephi maps exhibited at the International Design Biennale

affiche_last2-187x300 The Saint Étienne International Design Biennial, holding from 20 November to 5 December, is a unique event in the domain of design, due to the exhibitions shown as well as the diversity of its attendees. The Biennial democratizes design and makes it accessible to all kinds of audiences, proving that this creative discipline can take many forms, and is often driven by human aspects, including its uses by humans.

The theme of the 2010 biennial is around teleportation. It intends to explore paths of discoveries that will tend in their extreme expression to lead to a possible teleportation as the dematerialization of movement which appears to be an incredibly revealing notion of our era.

Sebastien Heymann will exhibit maps of designers’ conceptual world, placed at the center of the “Prédiction” exhibition. Made in collaboration with Benjamin Loyauté, curator of the event, these inscriptions are a proposal to reveal the state of knowledge sharing in Design today.

You may contact Sebastien by email to appoint a meeting during the first weekend.

EDIT: photos are available on the Facebook page of Gephi.

Map Geocoded data with Gephi

The mixture between network and geographic data has a fantastic potential and didn’t completely reveals its power yet. Alexis Jacomy, a student member of the Gephi community just released a new Plugin named GeoLayout, which aims to bridge this gap. Congratulations!

The Plugin use latitude/longitude coordinates to set correct nodes position on the network. Several projections are available, including Mercator which is used by Google Maps and other online services.

The Plugin is available from Gephi Plugin Center. The author is looking for feedbacks, please visit the plugin page.

I wanted to try with the classical USA Airline Routes network dataset, and detail the experience.

Install Plugin

In Gephi, go to the Tools menu and then Plugins. In the Available Plugins tab check the GeoLayout and click on Install. The plugin is installed and you are asked to reboot Gephi. Click OK.

Open Dataset

Download the airlines-sample.gexf (Save As…) dataset and open it with Gephi.

The network is an undirected graph with 235 nodes and 1297 edges. For each node there are two additional data latitude and longitude, expressed in degrees.

You should see the graph opened like this.

Use GeoLayout

Go to the Layout module and choose Geo Layout in the list. And then just click the Run button.


You can see the result immediately. Analysis and aesthetics refinement can be done now. Please visit the Quick Start Tutorial for a step by step introduction to Gephi.

Diseasome, explore the human disease network

DiseasomeGephi team presents today a science-mapping project: Diseasome. Asked by Magali Roux, Senior Scientist at CNRS, to create a website to come with the publication of her book, Biology – The digital era, we worked on the “Human Disease Network” dataset and built a network exploration platform.

“On a unique place, one can find information about the book, the dataset related to the writings, an online data exploration framework and the file to manipulate these data with Gephi.”

The HDN (Human Disease Network) and the GDN (Gene Disease Network) were extracted from the original dataset and treated with Gephi. From the results, an interactive map has been created with the help of RTGI/Linkfluence tools. A poster is also available, with the full network and some useful statistics.

Although this work is experimental, we hope it can help scientists to explore and search in this complexity. The Diseasome is above all an innovative way to present a scientific work. The importance of complex data in science and particularly network graphs brings a lot of challenges. As well as computational issues, many things can be done with graphic design and interaction.

Explore the Diseasome

CPAN-Explorer, an interactive exploration of the Perl ecosystem

We are proud to announce the first Gephi-based system for exploring a complex network, CPAN-Explorer. This is a visualization project aiming at analyzing the relationships between the developers and the packages of the Perl language, known to be organized as the CPAN community (Comprehensive Perl Archive Network). Produced by RTGI Labs and our team, it was initially discussed in a talk at the FPW’09.

You can download original graph source files from each subproject page.
Available formats are: GEXF (Gephi graph format), GDF (Guess graph format), SVG, and PDF.
For some of the subprojects, an embedded javascript visualization is also available. For the community graph, a special Flash webpage is available for online exploration.

Website: http://www.cpan-explorer.org/

map of the Perl community on the Web

We generated two maps (authors and modules) using the CPANTS data. For the websites, we crawled a seed generated from the CPAN pages of the previous authors. Each of this graphs are generated using a force base algorithm.

All the map are available in PDF files, in creative common licence. The slides are in french, but we will explain the three maps here.

Flash interface

CPAN’s modules

The first map is about the modules available on the CPAN. We selected a list of modules which are listed as dependancies by at least 10 others modules, and the modules who used them. This graph is composed of 7193 nodes (or modules) and 17510 edges. Some clusters are interesting:

  • LWP and URI are really the center of the CPAN
  • a lot of web modules (XML::*, TemplateToolkit, HTML::Parser, …)
  • TK is isolated from the CPAN
  • Moose, DBIx::Class and Catalyst are forming a group. This data are from march, we will try to do a newer version of this map this summer. This one will be really interesting as Catalyst have switched to Moose

The CPAN’s authors

This map is about the authors on the CPAN. There is about 700 authors and their connections. Each time an author use a module of another author, a link is created.

  • Modern Perl, constitued by Moose, Catalyst, DBIx::Class. Important authors are Steven, Sartak, perigin, jrockway, mstrout, nothingmuch, marcus ramberg
  • Slaven Rezić and others TK developpers are on the border
  • Web map

    We crawled the web using the seed generated using the CPAN’s authors pages.

    • again, the “modern group”, on the top of the map, with Moose/Catalyst/DBIx::Class developpers
    • some enterprises, like shadowcat and iinteractive in the middle of the “modern Perl”, Booking in the middle of the YAPC’s websites (they are a major sponsor of this events), 6apart, …
    • perl.org is the reference for the Perl community (the site is oriented on their sides)
    • cpan.org is the reference for the open source community
    • github is in the center of the Perl community. It’s widely adopted by the Perl developpers. It offers all the “social media” features that are missing on the CPAN

    We hope you like this visualisations, have fun analyzing them 🙂

    Semantic graphs of French IPR

    Main semantic graph of French intellectual property rights
    Semantic graph of French contract rights

    These work-in-progress maps are a study produced the last spring for the economist Yann Moulier-Boutang, law professor at the engineer school UTC. They represent the linked terms of vocabulary used on the Web to talk about the intellectual property rights in French language. Datasets come from Exalead SA (web/intranet search engine).

    Each node is a term and each edge exists when two terms or expressions are co-cited on a sufficient number of web pages, over more than 120,000 pages. 1283 expressions and 4984 co-citing links have been selected, assuming a representative approach against an exhaustive one. Semantic clusters are represented with node colors in the general map. The contract rights map shows the “imprint” of this cluster (red nodes) inside the overall graph (in grey). The last image is a test to display the imprint of two meta-clusters : the vocabulary of intellectual property rights (in red) versus the one of industrial property rights (in blue).

    How we did this ? After an information extraction phase from the Exalead databases and a manual filtering, we made a GDF file of row data (download it and feel free to make your own viz !), spatialized the graph in Gephi with a force-vector algorithm (in a nutshell, two nodes tied by an edge are graphically attracted to each other, otherwise they are rejected), applied some filters to colorize the nodes regarding to which semantic cluster they belong to, and then export the maps in SVG format. The second part of the work consisted in softly improving the rendering and adding contextual informations like names of clusters, main terms clouds, legend…in Inkscape, to finally build the PDF files.

    Download the general map in PDF (50Mo) – Creative Commons by-nc-sa

    Download the map of the contract rights in PDF (50Mo) – Creative Commons by-nc-sa

    Download the GDF file (graph not spatialized) (a shift in nodes positioning is normal)

    Main semantic graph

    Zoom on the Free Software cluster

    Contract rights map

    Zoom on the core of contract rights

    Intellectual (red) vs industrial (blue) rights

    Euro SiS Map

    Euro SiS Map

    WebAtlas recently produced a work-in-progress map of european websites in a training workshop during which european scientists got involved in web crawling and information spatialization.

    This map uses a force vector to place the websites (nodes), related to others by their hyperlinks (edges). Colors highlight the clusters by country, e.g. green for Poland, light violet for France etc.

    Download the PDF map of all countries (Note that with PDF Reader you can search URLs in the PDF)

    Download the Excel file of all websites (with pre-expanded websites as “NEXT”)