Introducing Data Laboratory

Eduardo Ramos

My name is Eduardo Ramos and this summer I have been working on a project for Gephi called Data Laboratory, which was initially designed as an idea for a GSoC project. The first specifications can be found at this wiki page . The new data laboratory features will be included in the 0.8 version of Gephi, released later this year.

Presentation

The general purpose of this project is to improve the basic and common features offered in “Data Laboratory” section of Gephi software.
In this section there are two tables (nodes and edges) that show the attributes of every node or edge as table columns. Before starting this project, the available features for modifying the table structure and values were not enough so there was a lack of graph manipulation and edition, refine or even create a graph with attributes in a tabular view.

New key-features coming:

  • Graph edition in many ways
  • Columns add/remove
  • Search & Replace
  • Import & Export CSV files
  • Charts and statistics reports
  • Sparlines
  • Merge columns
  • And the possibility of extending all of this with plugins!

Data Laboratory needed to provide an API/SPI for using and extending the new options available as plugins. A complete API is being designed to be able to use these new general features from any module or plugin. As this API is independent from the user interface, these actions will be included in the toolkit.

Types of manipulators (i.e actions) and how they look in the UI

Click to enlarge image

General actions

These are not related to a specific element like nodes, edges or columns and normally provide a UI. They appear as buttons in the toolbar at the top of data table, and can be grouped in a drop down button called “Plugins” if necessary.

Some of the new basic features of this type are: Add node/edge or Clear graph and clear edges. The rest of general actions in the picture are more special. Search/Replace shows an advanced UI to search and replace values in the table cells. It can do a normal search or a regular expression based search, among other useful options. It is implemented in a separate controller that is part of the Data Laboratory API.

Export CSV table allows the user to export the data in the current table selecting the desired columns, separator and charset to use:

Import CSV does the contrary operation, showing a wizard to configure the import settings in 2 steps:

Click on the image to enlarge.

Click on the image to enlarge.

Nodes/Edges actions

These actions are shown in a context menu on right click on one or more table rows that represent a node or an edge.
The currently implemented manipulators for nodes are:

  • Edit node properties – Shows edit window for the clicked node
  • Select on graph view – Centers graph view on the clicked node
  • Select neighbor nodes on table – Modifies the same table rows selection to highlight neighbor nodes of the clicked node
  • Delete – Deletes the selected node(s)
  • Clear node data… – Shows a UI to choose what columns to clear of the selected node(s)
  • Copy node data to the other selected nodes – Is only enabled when more than one node is selected, and copies the chosen columns of the clicked node to the other nodes
  • Group – Groups the selected nodes
  • Ungroup – Ungroups the selected groups
  • Ungroup recursively – Ungroups the selected groups, and all their descendant groups
  • Move to group… – Shows a UI to choose an available group to move the selected nodes
  • Remove from group – Removes the selected nodes from their group, putting them in the superior level of hierarchy
  • Settle and Free – Lock/Unlock the node(s) position
  • Set node size – Sets the given size in the UI to the selected node(s)
  • Link nodes – Is only enabled when more than one node is selected and shows a UI to select a source node which will be linked to all the other selected nodes
  • Copy node – Makes the desired number of copies of the selected node(s) with their attributes and properties

And for edges:

  • Select source node on graph view – Centers graph view on the source node of the clicked edge
  • Select target node on graph view – Centers graph view on the target node of the clicked edge
  • Select source and target on nodes table – Modifies the nodes table row selection to highlight the source and target nodes of the clicked edge
  • Delete – Deletes the selected edge(s)
  • Delete with nodes – Deletes the selected edge(s) and the nodes that the user chooses in the UI (source and/or target)
  • Clear edge data… – Shows a UI to choose what columns to clear of the selected edge(s)
  • Copy edge data to the other selected edge – Is only enabled when more than one edge is selected, and copies the chosen columns of the clicked edge to the other edges

Attribute columns  actions

Like nodes or edges manipulators, these are designed to operate with a specific type of data, attribute columns in this case.
They appear in Data Laboratory UI as independent (or grouped by type) drop down buttons. Each one represents an action that can be done with a single attribute column, therefore when the button is clicked, a list of the available columns for that operation (being the conditions specified by the manipulator) is shown to select one for execution.

Some of the implemented attribute column manipulators are:
Basic operations like adding columns, deleting columns, clearing and copying columns data, filling a column with some value and duplicating a column to other with the data type that the user needs, doing a data conversion when possible.

Other column manipulators operate with specific types of columns data like boolean or numeric or use regular expressions to obtain a new column from other column.

Click on the image to enlarge.

Click on the image to enlarge.

Columns merge strategies

Finally, other part of the project SPI can define different strategies for merging various columns.

Click on the image to enlarge.

Conclusion

This project has been a great opportunity for me to experience working with an open source community and learn about many programming aspects like API/SPI design, creating better user interfaces and creating modular applications and I will be happy to participate on future projects for Gephi.

Some more information can be found at this wiki page which will be updated soon with more documentation and help about how-to extend Data Laboratory using the new SPI.

Also your opinion and needs are very important to improve Gephi, so you can suggest and ask anything about Data Laboratory project at this forum post.

Eduardo Ramos

10 Comments

  1. Hey Guys,

    I have a problem with importing aditional attributes to nodes, can I do that? If not I could just copy/paste them into the data lab, but the problem is that nodes are not sorted the way they are in excel or so. (1, 10, 1111, 2, 22) is there a hint or way to just ad data in the way node id; data?

    Thanks

    Reply

  2. Hi, yes you can use the Import CSV feature in the Data Lab to import addiitonnal attributes. It has many options.

    For the sorting problem, I think this has been fixed. Try to update your Gephi (Help > Check for updates).

    Reply

  3. Hi, i’m using Gephi for visualisation of Social Network of BIO researchers. I have a problem by adding columns in Data Lab.I have the names of researchers and want to add country & field of study as well but Any time I use Import/Spreadsheet it makes new nodes with a new culomn as well.How can I have all my data not repeated & placed well in the appropriate column?

    Reply

  4. […] Das großartige Gephi nimmt bekanntlich am diesjährigen Google Summer of Code teil und langsam werden die ersten Zwischenergebnisse bekannt. Ein auch von mir lang erwartetes Feature ist ein ordentliches Datenmanagement innerhalb der Anwendung – was hiermit offensichtlich Formen annimmt: Introducing Data Laboratory […]

    Reply

Leave a reply to Mathieu Bastian Cancel reply