The Egyptian Revolution on Twitter

This is a preliminary result of the network of retweets with the hashtag #jan25 at February 11 2011, at the time of the announcement of Mubarak’s resignation. If you retweeted someone, or has been retweeted, it is possible that your username is one of these tinny points (or maybe a bigger one?).

To collect the network data, I used the Gephi Graph Streaming plugin, connected it to a Python web server I made myself. This web server works like a bridge, it connects to the Twitter Streaming API using the statuses/filter service and converts the users and retweets to nodes and edges in a network format that can be read by the Gephi Graph Streaming plugin. Nodes are twitter users, and links appear between the nodes A and B when B retweeted a message of A containing the hashtag #jan25.

The static network visualization is just the final result of about one hour of data collection. It is a dynamic network, and it’s possible to get much more information from the collected data. For example, before the announcement, there were few nodes and edges, sparse in time. But when the announcement arrives, a boom of retweets appears on the network. A video with the flow of retweets is available on YouTube. It shows the dynamic network construction during the hour of data collection, compacted in less than four minutes. During the collection, I run Gephi with the Force Atlas layout just adjusting some parameters from default: repulsion strength to 2000, attraction strength to 0.3 and speed to 10.



I was very lucky to get this data. On February 11 afternoon I was testing the Python server that works as bridge and connected to Twitter. I tried some interesting hashtags to see it working, and at the moment #jan25 seemed to be an active hashtag. I let the application run for some time, adjusted some parameters for visualization, and at some point there was a burst in the activity. I didn’t understood what was happening until I checked again my Twitter account and realized that the Egypt’s vice-president had just made the resignation announcement. After it, I proceeded collecting data, and the final result was this network. It was very interesting to see, in real time, the exact moment when Tahrir Square, from a mass protest demonstration, has been transformed in a giant party, and the burst in the Twitter’s activity. It was like covering in real time a virtual event, a big event that was happening in the Twitter virtual world.

After playing with the data, I found that the data I got through the Twitter Streaming API is only approximately 10% of the total. I’m now working to recover all data and hopeful soon I can make available the full graph of retweets.

Dataset available in a GEXF file here. Download it and play with it with Gephi!

André Panisson / www.

—–
This work is part of a research project involving the Computer Science Department of the University of Turin (www.di.unito.it), the Complex Networks and Systems Group of the ISI Foundation (www.isi.it), and the Informatics department of Indiana University (http://cnets.indiana.edu/).
—-

/seadragon-samples/twitter_jan25/seadragon.html

OpenOrd: New layout plugin, the fastest algorithm so far

openorb1-300x300

A new force-directed layout algorithm plugin named OpenOrd has just been released. It is one of the few force-directed layout algorithms that can scale to over 1 million nodes, making it ideal for large graphs.

Features:

  • Very fast, scales to millions nodes
  • Can be run in parallel, run it on multicore processors
  • Aims to highlight clusters

Install it directly from Gephi (Tools > Plugins > Available Plugins) or download it from the Plugin Center. Longer description and source code can be found directly on the plug-in page.

Below is a small demo of how fast this algorithm is layouting a 10K nodes network, and only using one processor.

OpenOrd Layout Demo in Gephi from gephi on Vimeo.

The algorithm original design and implementation can be found at this address. Kudos to the authors!

GSoC 2010 mid-term: Adding support for Neo4j in Gephi

Martin Škurla

During this summer, six students are working on Gephi with the Google Summer of Code. They contribute to Gephi by developing new features that will be integrated in the 0.8 version, released later this year.

My name is Martin Škurla and this summer I was working on GSoC project called “Adding support for Neo4j in Gephi”. In this article we will look at implemented features including these under the hood, pictures of dialogs, common use cases and future plans.

 

Gephi project

At first I want to make quick introduction into Gephi project. Gephi is Open Source Visualization Platform build on top of the NetBeans platform. It is written in Java so you can run it on various Operating Systems including Windows, Linux, Mac OS. It supports many interesting graph analysis capabilities including:

  • Real-time visualization
  • Layout
  • Metrics
  • Dynamic network analysis
  • Cartography clustering and hierarchical graphs
  • Dynamic filtering

The story so far

The main idea of my project is to add support for Neo4j in Gephi. This means the ability to transform the Neo4j graph into Gephi graph. In fact, both graph models are different so the first task was to make mapping between Neo4j graph items and Gephi graph items and vice versa.

There was also a mismatch between types supported in Neo4j and these supported by Gephi. This mismatch was solved by adding new “List” types into Gephi, so now every type in Neo4j has its appropriate type in Gephi.

There were also some changes under the hood which are not visible to end user, but must be defined and implemented. The most interesting thing is adding “Delegating mechanism”. This mechanism is responsible for getting values from storing engine (Neo4j) as well as manipulation with data. In fact during the importing process, graph representation of Neo4j graph is created in Gephi, but all values are not stored directly, but they are queried using delegating mechanism.

Another minor tasks were to customize the open dialogs used for importing local Neo4j database and debugging the imported database. The open dialog for importing accepts only valid Neo4j database directories. I defined valid Neo4j database directory structure and every valid directory now includes picture of Neo4j in the open dialog. User is able to open only valid Neo4j directories in the process of importing. The open dialog for debugging accepts only Java class files that can be used for debugging process. This simply means they have to implement required interface and have public nonparam constructor. Every valid class file will have Neo4j picture and after selecting a valid debug file, Target and Visualization options will be automatically filled based on data from selected class file.

 

Open Neo4j directory dialog customization

Open Neo4j debug file dialog customization

 

Neo4j integration

Menu integration

All possible actions started in menu. As we can see, this is the entry point to import from, export to and debug the Neo4j graph. Both importing and exporting support local as well as remote Neo4j databases.

Importing

Whole graph import dialog

Importing process consist of 2 approaches:

  • whole import
  • traversal import

Whole graph import dialog is designed for importing whole graph. We can customize the rules responsible for returning nodes by defining filtering expressions. For example previous dialog can be used when we want to find all people working on project Gephi with maximum age 30 years. Only people with at least 5 years of experience and those which have driver licence types A, B and C will be included.

Let’s have a deeper look at the dialog:

  • Property key is the name of property we want to filter
  • Property value is the value which will be compared to actual Node property value using chosen operator. Values will be automatically converted into appropriate types and if the value cannot be converted, the node will not be included into graph. All types supported in Neo4j are supported in this dialog. We can also see the support for array types in the last filter expression.
  • Operator will be applied on the final expression and if the expression is evaluated to true, node will be included
  • Match case means the ability to compare String, char, String[] and char[] types with respect of the same case
  • Restrict mode is used to restrict some nodes. Imagine we have people stored in database which have only subset of required property names used in filtering expressions. If the Restrict mode is on, only nodes which have all property names and all filtering expressions evaluated to true will be included. If the Restrict mode is off, every node which has any subset of required property names (even empty subset) will be included if all the filtering expressions applicable to the subset will be evaluated to true.

All the filtering expressions are combined together using AND and the list of current supported operators consist of: ==, !=, <, <=, >, >=.

In fact, usefulness of adding new operators as well as including OR and other useful import options is the main idea behind Questionnaire which is part of this article.

Traversal graph import dialog is designed for importing any subgraph using traversal capabilities of Neo4j v 1.1. Traversal import adds additional options:

  • Start node can be set in two ways, either by its id or by its indexing key and value pair
  • Order can be set to depth or breadth first algorithms
  • Max depth can be set to concrete number or to end of graph
  • Relationships can be restricted too. We can set any combination of Relationship types and directions which should traversal include. The list of Relationship types is dynamically filled from database with existing values.

 

Traversal graph import dialog

This was the quick summary of Gephi Neo4j importing capabilities implemented in the project. We focused on more features and one of them is the support for exporting. We can export any loaded graph into local or remote Neo4j database. The exporting process can be customized in similar way as importing.

Exporting

Export dialog

Exporting means opposite process to importing. Previous dialog shows exporting options as well as validation. We can customize exporting process by setting:

  • From column is used to set the RelationshipType to appropriate values from any of Gephi edge columns. During importing Neo4j graph, column with name “Neo4j Relationship Type” is automatically created.
  • Default value is used in the case when processed Gephi edge does not have value in selected From column
  • Export Node columns is the set of Gephi columns in node table which will be exported
  • Export Edge columns is the set of Gephi columns in edge table which will be exported

Remote importing/exporting

The only difference between local and remote importing/exporting is the existence of Remote dialog, where we need to set following connection information:

  • Remote database URL
  • Login
  • Password

All of them must be filled in order to successfully import/export remote graph.

Remote import/export dialog

Delegation process

Nodes values exploration (click on the image to enlarge)

As we can see from previous picture, we can very simply explore all the node and edge values. This is exactly the place where delegating mechanism is used. All values are in fact not stored directly in memory in some kind of Gephi data structure, but the storing engine (Neo4j) is requested for actual values every time we need them.

Debugging

Debugging in action

We can see debugging in action in previous picture. The dialog is initialized with data from chosen debug class file, but we can change all of them at the runtime too. Any change in options will automatically update graph visualization. We can change visibility of nodes and edges as well as colors for both nodes and edges. User proceeds to next step of debugging/traversal by clicking on the Next button.

Use cases

That was the quick summary of all implemented features and now we can summarize common use cases every user can be interested in.

Visualizing Neo4j graphs

One of the main ideas of my project was to implement the ability to visualize Neo4j graphs, even big ones. As we saw from the dialog pictures, we have many options how to customize the importing process including filtering. After the import we can use all the rich graph analysis features Gephi provides.

Analyzing only part of the whole graphs

Quite common use case is to analyze only part of the graph, which is possible in Gephi too. We can take advantage of traversing where we can set starting node and other traversal options. After that we can visualize and analyze only part of the graph.

Export graph stored in text files/databases into Neo4j

Another use case could be exporting graphs stored in graph text files or relational databases into Neo4j. In fact, every graph loaded into Gephi can be easily exported to Neo4j database. Importing formats depends on Gephi abilities themselves, currently following formats are supported:

  • Text formats: GEXF, GDF, GML, GraphML, Pajek NET, GraphViz DOT, CSV, UCINET DL, Tulip TPL, XGMML
  • Relational databases: MySQL, PostgreSQL, SQL Server

Future plans

There are more things which we want to implement, including:

  • support for Gephi Toolkit, which is in general set of Gephi core libraries which you can use in your own Java projects for graph visualization and manipulation
  • implementing proof of concept Web application using both Gephi Toolkit & Neo4j to manipulate with Neo4j database & show results (probably using GWT)
  • more features, bug fixing, performance optimizations

Questionnaire

One of the big advantages of Gephi is the fact that it is developed as Open Source project. We want to add additional features according to user requests and their opinions. That’s why we created questionnaire focusing on usefulness of proposed additions. We will be very happy if you fill the questionnaire because it is very valuable source of information and we can focus on features Neo4j users think useful. Please fill in the questionnaire.

Conclusion

I am very happy that I can be part of the Gephi developer community and introduce integration with Neo4j. During this summer I learned a lot and I am proud that I was chosen as GSoC student. The fact is that none of these features can be done without great help of my mentors, so big thank to both of them: Mathieu Bastian & Tobias Ivarsson.

If you are interested in and want to test the code, you can download source codes from my branch using bzr branch lp:~bujacik/gephi/support-for-neo4j

All the pictures were made on data stored in testing Neo4j database which can be created using Java SE project and you can download it using:
bzr branch lp:~bujacik/+junk/testing-new-neo4j-traversal-api

 

Martin Škurla

Download this article in PDF.

Semantic plugin: AlchemyAPI

A new plugin is available for Gephi that utilizes the power of natural language processing (NLP) software to analyze text documents and visualize their contents. The plug-in was created by AlchemyAPI (alchemyapi.com), and utilizes the AlchemyAPI REST service to semantically process a web page or text file and show all the subjects of the text (people, places and things, known collectively as named entities) as nodes in Gephi.

 

Graph of the American Revolution wikipedia entry.

The plug-in is a powerful tool to distill dense and unstructured textual data into easy to understand graphs. Extracted entities possess a relevance attribute which is a measure of how pertinent the subject is to the source text, and also a count attribute that indicates the number of times the subject is named in the source text. Both of these attributes can be used to affect the visualization.

Once installed, the plug-in can be accessed through the File->Generate->Semantic Analysis menu. As an example of the functionality of the plug-in, we’ll examine the wikipedia entry for the American Revolution. To make a graph with this article, enter the article’s url into the Semantic Analysis dialog box. The plug-in will extract over 350 people, places, and things from the wikipedia page. You can use this data to create a word cloud type visualization of the article, like the one above.

If subtype analysis is enabled, you can also visualize the types and subtypes of named entities. For example, the nodes in the image below were extracted from a recent news article. They represent Dmitry Medvedev and his ontological classifications. The edges from Medvedev’s node identify him as a Person, Politician, and President (classifications he shares with Mahmoud Ahmadinejad). A complete list of the subtypes AlchemyAPI returns can be found at http://www.alchemyapi.com/api/entity/types.html.

Detail of named entity subtypes

The plug-in can also be used to visualize the connections between multiple text documents. Connections will be drawn between the document node and the entities that the texts share, creating a powerful way of discovering recurring themes within an archive. As an example, see the connections shared between the wikipedia pages for the American Revolution and the French Revolution in the picture below. Common entities like ‘France’, ‘Britain’, and ‘Thomas Paine’ are linked by both the French Revolution and American Revolution articles.

Graph of connections between American and French Revolution wikipedia entries.

As more documents are added to the graph, a web of entities form. The relevance and count of connected entities increase with the number of documents that mention them.

We hope you use this plug-in to make the data in your text more accessible. If you have any questions or suggestions for the makers of this plug-in, please leave them in the comments section.

Our thanks to the Gephi team for their remarkable visualization program, and all the documentation and help that made this plug-in possible.
/seadragon-samples/espn_out_2/seadragon.html
Graph of espn.com front page and linked articles.

Shaun Roach

Download the Gephi plugin for AlchemyAPI here, or find it in your Gephi plug-in center.

Graph visualization on the web with Gephi and Seadragon

The project takes another big step forward and bring dynamic graph exploration on the web in one click from Gephi with the Seadragon Web Export plugin.

Mathieu Bastian and Julian Bilcke worked on a Seadragon export plugin. Directly export large graph pictures and put it on the web. Seadragon is pure Javascript and works on all modern browsers. As it uses images tiles (like Google Maps), there is no graph size limit.

Go to your Gephi installation and then to the Plugin Center (Tools > Plugin) to install the plugin. You can also download manually the plugin archive or get the source code.

/seadragon-samples/diseasome/seadragon.html

Sample with Diseasome Network dataset directly exported from Gephi

Communication about (large) graphs is currently limited because it’s not easy to put them on the web. Graph visualization has very much same aims as other types of visualization and need powerful web support. It’s a long time we are thinking about the best way to do this and found that there is no perfect solution. We need in the same time efficiency, interactivity and portability. The simpleness of making and hacking the system is also important, as we want developers to be able to improve it easily.

By comparing technologies we found that Seadragon is the best short-term solution, with minimum efforts and maximum results. It has however still a serious limitation: interactivity. No search and no click on nodes are possible for the moment. But as it is JS, I don’t see hurdles to add these features in the future, help needed.

The table below see our conclusions on technologies we are considering. We are very much eager to discuss it on the forum. As performance is the most important demand, WebGL is a serious candidate but development would require time and resources. We plan to start a WebGL visualization engine prototype next summer, for Google Summer of Code 2011, but we would like to discuss specifications with anyone interested and make this together.

Portability Efficiency Effort Interactivity
Flash
Java2D/Processing
Canvas (Processing.js/RaphaelJS)
WebGL
Seadragon
Figure: Comparing technologies able to display networks on the web.

How to use the plugin?

Install the plugin from Gephi, “Tools > Plugin” and find Seadragon Web Export. After restarting Gephi, the plugin is installed in the export menu. Load a sample network and try the plugin. Go to the Preview tab to configure the rendering settings like colors, labels and edges.

Export directly from Gephi Export menu

The settings asks for a valid directory where to export the files and the size of the canvas. Bigger is the canvas, more you can zoom in, but it takes longer time to generate and to load.

Export settings, configure the size of the image

Note that result on the local hard-drive can’t be viewed with Chrome, due to a bug. Run Chrome with “–allow-file-access-from-files” option to make it work.

Kudos to Microsoft Live Labs for this great library, released in Ms-PL open source license. Thank you to Franck Cuny for the CPAN Explorer project that inspired this plugin. Other interesting projects are GEXF Explorer, a Flash-based dynamic widget and gexf4js, load GEXF files into Protovis.

GSoC 2010 mid-term: Direct Social Networks Import

Yi Du

During this summer, six students are working on Gephi with the Google Summer of Code. They contribute to Gephi by developing new features that will be integrated in the 0.8 version, released later this year.

Yi Du is adding the module Direct Social Networks Import during this summer, which provides several kinds of importers like Emails, Twitter or Facebook. The goal of this article is to briefly introduce some of the importers, as well as several samples provided.

The ability to import any kind of structured data and build network from it is essential for users. This step is often missing and requires time and scripting abilities, although tools and libraries able to read and parse all type of data already exist. Moreover it has never been so easy to quickly access meaningful datasets online.

Email importer

Email is a simple and widely used tool in communication among people, yet many people have no knowledge of its mechanism. To some extent, our work on analyzing emails can help people better know their relationship with others. In our email importer module, each email address is represented as a node. If there are two email addresses with the same display name, an option will be provided to allow the user to determine whether to regard them as a node or two different nodes. Afterwards, if there is an email from A to B, an edge will be built, along with an option permitting the user to determine whether Cc or Bcc will be viewed as an edge.

We provide two ways to import emails: on the one hand, the emails are obtained from the email server (POP3 or IMAP), in a one-by-one manner. On the other hand, we get the emails from local files or folder. This importer will arise a problem, that is, different email clients may have different file format. Fortunately, our importer has an easy-to-extend API, as well as a default implementation (EML files). EML is standard and can be obtained from Thunderbird, Outlook and Gmail with tools like Gmail Backup.

This is a sample to illustrate how email importer outputs the data (2000 emails with EGO filter, 700 nodes, 1300 edges).
fig1a_The_EGO_graph fig1b_Graph_whose_indegrees_bigger_than_0
fig1c_Modularize_the_graph fig1d_Subgraph_who_has_the_max_number_of_Modularity_count
fig1e_The_hottest_group

Twitter importers

Twitter is a very popular social network. People can send and receive short messages, which we usually call tweets, using Twitter. We can follow person we are interest in and topics we like. Twitter networks has been popularized by NodeXL which has a similar feature. See this nice gallery.

We provide two kinds of networks: “Twitter Search Network” and “Twitter User Network”.

We support Twitter search network to analyze people who search or mention similar keywords. We present one Twitter user as a node and define three kinds of edge construction:

  • Replies-to relationship: If A reply to B in a searched tweet, an edge from A to B will be added.
  • Mentions relationship: If A mentions B in a searched tweet, an edge from A to B will be added.
  • Followers relationship: If A follows B in constructed graph, an edge from A to B will be added.

The second network we provide is “twitter user network”. We analyze people who follow each other to show the relationships between twitter users. We add an edge from A to B if A follows B in the whole graph by default. We provide three options for vertex construction:

  • Person followed by the user: If searched user A follows B, B will be added as a vertex.
  • Person following the user: If A follows searched user B, A will be added as a vertex.
  • Both: Both the above two options.

The interface of the two importers are shown as below.
fig2a_User_network_importer_ui fig2b_Search_network_importer_ui

New-York Times importer

The New York Times is an American daily newspaper founded and continuously published in New York City. It has a series of APIs for developers on news and social networks. There are several APIs of NYT, such as Article Search API, Best Seller API, etc.

We provide two kinds of social network importers in Gephi: “Article Network” and “TimesPeople Network”. We use article network to analyze articles with specific filters (date, facets, etc). User can choose which option constructs the edge. For example, user can choose date as the edge. If two articles have the same date attribute, an edge between them will be built. TimesPeople is a social network for Times readers, it’s similar to Facebook, we can analyze the relationship between them.

Interface of NYT article network import and TimesPeople network are shown below:
fig3a_NYT_article_network_importer_ui fig3b_NYT_timespeople_network_importer_ui

Display of TimesPeople network:
Display of TimesPeople network Display of TimesPeople network
Display of TimesPeople network

Conclusion and future work

In this article, we introduced several importers: Email, Twitter and NYT. By using these importers, users can import data they want and analyze them. They can find the hottest group, the relationship of their friends, the most related author of a facet and other import information by analyzing them.
Until the end of the GSoC, we will have four major importers: Email, Twitter, NYT and Facebook. Among these four importers, Twitter will have “Twitter User Network” and “Twitter Search Network”. NYT will have “NYT article search network” and “NYT TimesPeople Network”. Facebook will have “Facebook Friends Network” and “Facebook Group Network”. Besides adding Facebook importer, we will also optimizing the UI of the importers, and make them more user friendly.

Yi Du

Map Geocoded data with Gephi

The mixture between network and geographic data has a fantastic potential and didn’t completely reveals its power yet. Alexis Jacomy, a student member of the Gephi community just released a new Plugin named GeoLayout, which aims to bridge this gap. Congratulations!

The Plugin use latitude/longitude coordinates to set correct nodes position on the network. Several projections are available, including Mercator which is used by Google Maps and other online services.

The Plugin is available from Gephi Plugin Center. The author is looking for feedbacks, please visit the plugin page.

I wanted to try with the classical USA Airline Routes network dataset, and detail the experience.

Install Plugin

In Gephi, go to the Tools menu and then Plugins. In the Available Plugins tab check the GeoLayout and click on Install. The plugin is installed and you are asked to reboot Gephi. Click OK.

Open Dataset

Download the airlines-sample.gexf (Save As…) dataset and open it with Gephi.

The network is an undirected graph with 235 nodes and 1297 edges. For each node there are two additional data latitude and longitude, expressed in degrees.

You should see the graph opened like this.

Use GeoLayout

Go to the Layout module and choose Geo Layout in the list. And then just click the Run button.

Result

You can see the result immediately. Analysis and aesthetics refinement can be done now. Please visit the Quick Start Tutorial for a step by step introduction to Gephi.