GSoC 2010 mid-term: Direct Social Networks Import

Yi Du

During this summer, six students are working on Gephi with the Google Summer of Code. They contribute to Gephi by developing new features that will be integrated in the 0.8 version, released later this year.

Yi Du is adding the module Direct Social Networks Import during this summer, which provides several kinds of importers like Emails, Twitter or Facebook. The goal of this article is to briefly introduce some of the importers, as well as several samples provided.

The ability to import any kind of structured data and build network from it is essential for users. This step is often missing and requires time and scripting abilities, although tools and libraries able to read and parse all type of data already exist. Moreover it has never been so easy to quickly access meaningful datasets online.

Email importer

Email is a simple and widely used tool in communication among people, yet many people have no knowledge of its mechanism. To some extent, our work on analyzing emails can help people better know their relationship with others. In our email importer module, each email address is represented as a node. If there are two email addresses with the same display name, an option will be provided to allow the user to determine whether to regard them as a node or two different nodes. Afterwards, if there is an email from A to B, an edge will be built, along with an option permitting the user to determine whether Cc or Bcc will be viewed as an edge.

We provide two ways to import emails: on the one hand, the emails are obtained from the email server (POP3 or IMAP), in a one-by-one manner. On the other hand, we get the emails from local files or folder. This importer will arise a problem, that is, different email clients may have different file format. Fortunately, our importer has an easy-to-extend API, as well as a default implementation (EML files). EML is standard and can be obtained from Thunderbird, Outlook and Gmail with tools like Gmail Backup.

This is a sample to illustrate how email importer outputs the data (2000 emails with EGO filter, 700 nodes, 1300 edges).
fig1a_The_EGO_graph fig1b_Graph_whose_indegrees_bigger_than_0
fig1c_Modularize_the_graph fig1d_Subgraph_who_has_the_max_number_of_Modularity_count

Twitter importers

Twitter is a very popular social network. People can send and receive short messages, which we usually call tweets, using Twitter. We can follow person we are interest in and topics we like. Twitter networks has been popularized by NodeXL which has a similar feature. See this nice gallery.

We provide two kinds of networks: “Twitter Search Network” and “Twitter User Network”.

We support Twitter search network to analyze people who search or mention similar keywords. We present one Twitter user as a node and define three kinds of edge construction:

  • Replies-to relationship: If A reply to B in a searched tweet, an edge from A to B will be added.
  • Mentions relationship: If A mentions B in a searched tweet, an edge from A to B will be added.
  • Followers relationship: If A follows B in constructed graph, an edge from A to B will be added.

The second network we provide is “twitter user network”. We analyze people who follow each other to show the relationships between twitter users. We add an edge from A to B if A follows B in the whole graph by default. We provide three options for vertex construction:

  • Person followed by the user: If searched user A follows B, B will be added as a vertex.
  • Person following the user: If A follows searched user B, A will be added as a vertex.
  • Both: Both the above two options.

The interface of the two importers are shown as below.
fig2a_User_network_importer_ui fig2b_Search_network_importer_ui

New-York Times importer

The New York Times is an American daily newspaper founded and continuously published in New York City. It has a series of APIs for developers on news and social networks. There are several APIs of NYT, such as Article Search API, Best Seller API, etc.

We provide two kinds of social network importers in Gephi: “Article Network” and “TimesPeople Network”. We use article network to analyze articles with specific filters (date, facets, etc). User can choose which option constructs the edge. For example, user can choose date as the edge. If two articles have the same date attribute, an edge between them will be built. TimesPeople is a social network for Times readers, it’s similar to Facebook, we can analyze the relationship between them.

Interface of NYT article network import and TimesPeople network are shown below:
fig3a_NYT_article_network_importer_ui fig3b_NYT_timespeople_network_importer_ui

Display of TimesPeople network:
Display of TimesPeople network Display of TimesPeople network
Display of TimesPeople network

Conclusion and future work

In this article, we introduced several importers: Email, Twitter and NYT. By using these importers, users can import data they want and analyze them. They can find the hottest group, the relationship of their friends, the most related author of a facet and other import information by analyzing them.
Until the end of the GSoC, we will have four major importers: Email, Twitter, NYT and Facebook. Among these four importers, Twitter will have “Twitter User Network” and “Twitter Search Network”. NYT will have “NYT article search network” and “NYT TimesPeople Network”. Facebook will have “Facebook Friends Network” and “Facebook Group Network”. Besides adding Facebook importer, we will also optimizing the UI of the importers, and make them more user friendly.

Yi Du


  1. Hey guys, this looks really promising!

    Curious, are there any instructions on getting this plugin up and running before the 0.8 release? I did a quick scan of launchpad, but wasn’t able to find any instructions to do with this plugin. Would love to play with an alpha build.



  2. Sebastien, Seadragon seems to be the export side, whereas the post is talking about the import side. Am I missing something? I want to try importing my Twitter graph into Gephi and play with it there.


  3. Huh, why did I answer on Seadragon…my mistake.

    This plug-in is currently on development and will be released soon, however an unstable version is available by compiling this branch:

    If you can’t wait, follow these instructions to check out the source code, by using “lp:~duyi001/gephi/DSNI” instead of “lp:gephi”:
    And these ones to compile it:


  4. Hi,

    Do you have any more firm dates on when this plugin will be released? I think it will be awesome! I’m not a programmer so don’t feel comfortable testing it using the instructions above, so I’d really love to know when it becomes available. Unless there’s something available in a developer update center?

    Thanks, D.


  5. Hi Dave, the release of this plugin will be possible once the 0.7beta version is released, which is coming in cuple of days (stay tuned!).

    Then, probably some more time to package this as a plugin and fix latest major issues. It’s important for us to provide stable plugins, so they require testing. If you wanna support us, please consider donate or join the Gephi Consortium.


  6. Super. Loving the new beta version and can’t wait for the DSNI plugin to be available – are you close?? PS – will the Facebook importer allow you to import a network from any other pages/groups you belong to?


  7. Really interesting stuff!
    I have been looking to add a third dimension and I would like to ask your advice and experience to find out if this is possible.
    I have created a map with my email contacts. You can clearly see who I have been emailing with most and what networks exist.

    What I would like to add are the topics we have been emailing about.
    This can be derived from the words in the subject (leaving out some common words like a, the, an, etc).

    The way I would like to represent this, would be something like a word cloud that becomes visible once I select a node or an edge.

    Once this is possible, it can also be turned: first create a map of the topics and group them and by selecting a node or an edge, see who has been emailing who about it.

    Is this (or something similar) already possible?
    Thanks a lot and keep up the good work!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s