Explore the Marvel Universe Social Graph

This week end at the data in sight hackathon in San Francisco, one of the winning team worked with Gephi and the cool Marvel dataset provided by Infochimps.

From Friday evening to Sunday afternoon, Kai Chang, Tom Turner, and Jefferson Braswell were tuning their visualizations and had a lot of fun exploring Spiderman or Captain america ego network. They came with these beautiful snapshots and created a zoomable web version using the Seadragon plugin. The won the “Most aesthetically pleasing visualization” category, congratulations to Kai, Tom and Jefferson for their amazing work!

The datasets have been added to the wiki Datasets page, so you can play with it and maybe calculate some metrics like centrality on the network. The graph is pretty large, so be sure to increase you Gephi memory settings with > 2GB.

New GraphViz DOT, CSV and UCINET formats

Gephi now supports GraphViz DOT file format. This new feature is shipped with two others: UCINET DL and CSV formats. With a broader set of input file formats, it reinforces interoperability between tools and allows Gephi to be found effective on different problems. Tabular data and other delimited text files are now supported through the CSV (comma-separated values) importer. Two columns which represents relationships between elements can now easily be pushed to Gephi.

Note that DOT support is still incomplete. Subgraphs, shapes and some attributes are not supported for the moment. Please report on the forum or bug tracker any issue you found using these new features.

To have these features, just update your Gephi application. In Gephi, go to Help > Check for Updates.

Consult the datasets page to find sample networks.

Documentation has been completed for these three new formats:

Diseasome, explore the human disease network

DiseasomeGephi team presents today a science-mapping project: Diseasome. Asked by Magali Roux, Senior Scientist at CNRS, to create a website to come with the publication of her book, Biology – The digital era, we worked on the “Human Disease Network” dataset and built a network exploration platform.

“On a unique place, one can find information about the book, the dataset related to the writings, an online data exploration framework and the file to manipulate these data with Gephi.”

The HDN (Human Disease Network) and the GDN (Gene Disease Network) were extracted from the original dataset and treated with Gephi. From the results, an interactive map has been created with the help of RTGI/Linkfluence tools. A poster is also available, with the full network and some useful statistics.

Although this work is experimental, we hope it can help scientists to explore and search in this complexity. The Diseasome is above all an innovative way to present a scientific work. The importance of complex data in science and particularly network graphs brings a lot of challenges. As well as computational issues, many things can be done with graphic design and interaction.

Explore the Diseasome

[nggallery id=3]

CPAN-Explorer, an interactive exploration of the Perl ecosystem

We are proud to announce the first Gephi-based system for exploring a complex network, CPAN-Explorer. This is a visualization project aiming at analyzing the relationships between the developers and the packages of the Perl language, known to be organized as the CPAN community (Comprehensive Perl Archive Network). Produced by RTGI Labs and our team, it was initially discussed in a talk at the FPW’09.

You can download original graph source files from each subproject page.
Available formats are: GEXF (Gephi graph format), GDF (Guess graph format), SVG, and PDF.
For some of the subprojects, an embedded javascript visualization is also available. For the community graph, a special Flash webpage is available for online exploration.

Website: http://www.cpan-explorer.org/

map of the Perl community on the Web

We generated two maps (authors and modules) using the CPANTS data. For the websites, we crawled a seed generated from the CPAN pages of the previous authors. Each of this graphs are generated using a force base algorithm.

All the map are available in PDF files, in creative common licence. The slides are in french, but we will explain the three maps here.

Flash interface

CPAN’s modules

The first map is about the modules available on the CPAN. We selected a list of modules which are listed as dependancies by at least 10 others modules, and the modules who used them. This graph is composed of 7193 nodes (or modules) and 17510 edges. Some clusters are interesting:

  • LWP and URI are really the center of the CPAN
  • a lot of web modules (XML::*, TemplateToolkit, HTML::Parser, …)
  • TK is isolated from the CPAN
  • Moose, DBIx::Class and Catalyst are forming a group. This data are from march, we will try to do a newer version of this map this summer. This one will be really interesting as Catalyst have switched to Moose

The CPAN’s authors

This map is about the authors on the CPAN. There is about 700 authors and their connections. Each time an author use a module of another author, a link is created.

  • Modern Perl, constitued by Moose, Catalyst, DBIx::Class. Important authors are Steven, Sartak, perigin, jrockway, mstrout, nothingmuch, marcus ramberg
  • Slaven Rezić and others TK developpers are on the border
  • Web map

    We crawled the web using the seed generated using the CPAN’s authors pages.

    • again, the “modern group”, on the top of the map, with Moose/Catalyst/DBIx::Class developpers
    • some enterprises, like shadowcat and iinteractive in the middle of the “modern Perl”, Booking in the middle of the YAPC’s websites (they are a major sponsor of this events), 6apart, …
    • perl.org is the reference for the Perl community (the site is oriented on their sides)
    • cpan.org is the reference for the open source community
    • github is in the center of the Perl community. It’s widely adopted by the Perl developpers. It offers all the “social media” features that are missing on the CPAN

    We hope you like this visualisations, have fun analyzing them 🙂

    Thanks Franck for the original post.

    cpan_community

    Connected: The Power of Six Degrees

    A very rich documentary about the new science of network. It is built around reproducing the “Six degrees of separation” experience. The main concepts are well explained and the history of discoveries is emphasized with many examples and interviews of the researchers. I personally think the documentary is well realized and may be understood by any public.

    The first part is about small worlds model and explaining why six degrees works. The second part brings the concepts of hubs and power-law degree distribution. At the end we learn more about network theory applications, in particular about cancer research.

    Warning: the video is no longer available, due to right owner’s decision to remove it from vimeo.

    EDIT: Video available here

    To go further :

    Found here.