Gephi 0.8 beta released

The latest beta version of Gephi has been released, download it for Windows, Mac OS and Linux platforms. This release focus on new features for both users and developers, and the new license unlocks opportunities for business. The Ranking and Preview modules have been completely rewritten in a modular way and can be now extended with plug-ins! Preview can now be extended in many ways, for instance group shapes or edge bundling. Moreover, continuous progress have been made on the dynamic network support and we release today the last big part: statistics over time, available from the Statistics module when the network is dynamic. Thanks to users who reported bugs, it’s the only way to fix them.

The team will now start developing the 0.9 version of Gephi (please consider joining us!) and integrate the latest Google Summer of Code projects, including a new timeline. We are also willing to help as much as possible plug-ins developers to get things done and improve documentation. We want to leverage the new Preview and will help newcomers to get started.

Because it’s a major release, changes are not deployed through the AutoUpdate, you need to download and install the new version. Plug-ins also need to be checked for compatibility. They will reappear on the Plugin Center in the coming days, as they are verified. Thanks for your patience.

Consult the release notes and the new Javadoc for more information.

Features highlight

Ranking Auto Apply Dynamic Ranking

Ranking now works with dynamic networks and it’s easy! When manipulating a dynamic network with the Timeline simply enable the ‘Auto Apply’ and the color/size is updated in real-time. To import a dynamic node size simply import a regular dynamic column in your GEXF and select it in Ranking like before.

PNG Export PNG Export

A new powerful PNG export has been added to the existing PDF and SVG export. One can create high-resolution network images with all the customization available in Preview. You can even create transparent background images!

New Preview New Preview

Major effort to completely rewrite the Preview module in a modular way. One can now create plug-ins for Preview! The new Preview includes new opacity options, a text outline, radius (customize edge start/end points) and a simplified list of properties.

Dynamic Metrics Dynamic Metrics

New dynamic metrics in the Statistics module: Dynamic Degree, Dynamic Node Count, Dynamic Edge Count and Dynamic Clustering Coefficient. Dynamic metrics are executed on a dynamic network and allow to analyze how network properties evolve over time.

New And Noteworthy

* Data Lab node merging
* ForceAtlas2 layout algorithm, with multi-thread option
* Node and Edge transparency in Preview
* Edge labels on curved edges in Preview
* Text outline now in Preview
* Database importer now supports time columns (start & end)
* DL Export (Thanks to Taras Klaskovsky)
* GML Export (Thanks to Taras Klaskovsky)
* NET Export (Thanks to Daniel Bernardes)
* K-core filter
* Inter and Intra partition filter
* Now supports SQLite databases
* Display the number of layout iterations in status bar when ended
* Recent Palette in Ranking
* Weighted degree now also for directed graphs

New localization (Go to Tools > Languages)

* Portuguese (Brazilian) (Thanks to Célio Faria Jr)
* Japanese (Thanks to Siro Kida and Koji Chono)

Performance

* Memory starvation manager, warns the user before running out of memory
* Less memory usage with attributes

Bug fixes

* Timeline need more precision when dealing with dates (bug 521937)
* Exception on range slider (bug 541808)
* Inconsistent label data from Overview to Preview (bug 660204)
* Statistics: sub-optimal modularity (bug 727701)
* Timeline cann’t drag if the two sliders moved to the left (bug 745476)
* Missing Polish characters when exporting to pdf (bug 746740)
* Edge selection color is not correct on OSX (bug 752300)
* Workspace name truncated, hard to read (bug 758578)
* Average degree cannot be switched to directed / undirected (bug 760454)
* Window->Favorites appears in 0.8 alpha (bug 764494)
* Disable ‘directed’ on metric settings if the graph is undirected (bug 771318)
* Timeline does not work (exception) (bug 774455)
* Layout properties can’t be saved in a language and loaded in another language (bug 783637)
* Preview: edge label not shown (bug 783868)
* Possible memory leak on Dynamic Range Filter (bug 784606)
* Edge attributes not saved in .gephi project file (bug 785268)
* Data Lab: Exception when selecting only one column for merging (bug 785269)
* Data Lab Import Spreadsheet should not ignore parallel edges (bug 785635)
* Data Laboratory: wrong edge type created (mutual instead of directed) (bug 787401)
* Data Lab: impossible to edit time intervals in a date format (bug 793163)
* Spelling of Proportionnal (bug 794358)
* Graphics errors when JOGL installed as a JRE/JDK extension (bug 799545)
* NPE if source/target is empty in GEXF import (bug 799574)
* Toolkit can’t open .gephi files (bug 802101)
* Resizing edge sizes changes edge weight values (bug 803763)
* Preview does not use node label settings from overview tab (bug 805763)
* Data Lab ‘Import Spreadsheet’ dialogue should accept other file types than .csv (bug 806798)
* Edge weights not imported from CSV matrix (bug 808078)
* Preview tab: no option to switch off node borders? (bug 808606)
* Gephi runs out of memory without warning the user (bug 811373)
* Counter-intuitive filename in Data export dialog (bug 814178)
* NullPointerException when creating newProjects too quickly (bug 817170)
* Nodes and edges Id attribute dictionary is not properly created when loading a .gephi file (bug 818181)
* Database driver doesn’t persist in Edge List Database import UI (bug 822316)
* NodeEqualNumberFilter does not work (bug 823038)
* Gephi does not build on JDK 7 (bug 823543)
* SVG node, edge export should include relevant node IDs as classes (bug 827706)
* Import Spreadsheet: need to trim column names (bug 829956)
* Layout list not sorted by name (bug 830149)
* In/Out degree metric is computed on the main graph instead of the visible graph (bug 830752)
* Layout is not giving the algorithm’s number of iterations (bug 831782)
* Banner height issues, need a fixed height (bug 834400)
* NPE when running ClusteringCoefficient on a filtered graph (bug 852799)
* Missing node properties from dot file (bug 855410)
* ‘The value column doesn’t exist’ error when opening a gephi file (bug 857595)
* Import fails for NET (Pajek) file with position/color data (bug 860825)
* GEXF export referes to v1.1 schema, should be v1.2 schema (bug 864484)

New Plug-ins documentation

Checkout the documentation for the newly created Preview module: HowTo write a Preview Renderer. Learn also how to extend the Data Laboratory features in a new tutorial.

New license

Gephi is now released in the dual license CDDL + GNU GPLv3. We abandon the GNU AGPL to offer new opportunities to reuse and integrate parts of Gephi in a full Open Source way. The dual license system means the possibility to choose to apply either the CDDL or the GNU GPLv3 when Gephi source code is integrated to a derivative work. When modified, original files of Gephi should always be published publicly so that the community benefit from the improvements. However, the CDDL license does not require to publish the whole work, so you can build commercial applications for free using Gephi source code!
The CDDL is a license created by Sun and approved by the Open Source Initiative. It is business-friendly. Read the Legal FAQs to learn more, and ask questions on the forum.

Contribute

It’s fun to contribute to an open-source project! Contribute whatever the time you can give: few minutes to  report a bug, some hours to fix one or to translate the user interface or more to create a plug-in. If you’re a student and looking for cool and challenging semester projects check out the Gephi Student Program or contact us.

Do Gephi technologies matter for your research or business? You can support us by donating to the Gephi Consortium, or becoming a member to have an impact on our roadmap.

Feel free to reach to us if you are willing to organize events (meetups, workshops, hackathon, etc.), we will support them.

New Tutorial: Layouts in Gephi

A new tutorial is available about Layouts in Gephi. It will guide you to the basic and advanced layout settings in Gephi. You will learn how to use various layouts in Gephi according to the feature you want to emphasis in the topology and the size of the network, how to avoid node overlapping and how to do some geometric transformations.

This tutorial explains when and how to use each layout, including:

Download as PDF Tutorial: Download it in PDF.

New Gephi Toolkit release, based on 0.8alpha

toolkitarticleexample1-300x211 A new release of the Gephi Toolkit arrived, based on the 0.8alpha version. Download the latest package, including Javadoc and demos by clicking on the link below.

It includes all features and bugfixes the 0.8alpha version has, notably:

  • GEXF 1.2 support (partial)
  • Add Neighbour Filter
  • Improve support of meta-edges in Statistics and Filters
  • Edge weight option in PageRank, which can now be used by the algorithm
  • VNA Import (Thanks to Vojtech Bardiovsky)
  • Label Adjust algorithm 3 times faster
  • Saving/Loading projects is faster and use less memory

Demos available on the Toolkit Portal have been adapted when necessary and tested. If you are intrested in using plug-ins from the Toolkit, checkout How to use plug-ins with the Toolkit.

Links you may be interested:

This summer, the student Luiz Ribeiro is working on GSoC Scripting Plugin, a project to bring advanced scripting features in Gephi, using Python. This project will work with the Gephi Toolkit, and greatly facilitate its usage.

Scientific graphs Generators plugin

Cezary Bartosiak and Rafał Kasprzyk just released the Complex Generators plugin, introducing many awaited scientific generators. These generators are extremely useful for scientists, as they help to simulate various real networks. They can test their models and algorithms on well-studied graph examples. For instance, the Watts-Strogatz generator creates networks as described by Duncan Watts in his Six Degrees book.

The plugin contains the following generators:

  • Balanced Tree
  • Barabasi Albert
  • Barabasi Albert Generalized
  • Barabasi Albert Simplified A
  • Barabasi Albert Simplified B
  • Erdos Renyi Gnm
  • Erdos Renyi Gnp
  • Kleinberg
  • Watts Strogatz Alpha
  • Watts Strogatz Beta

The plug-in can be installed directly from Gephi 0.8, from the Plugins menu.

The source code is available on Launchpad.

Gephi 0.8alpha released

The latest version of Gephi has been released today, download it for Windows, Mac OS and Linux platforms. Focus has again been made on stability with more than 80 bug fixes, and performance improvements. Thanks to users who reported bugs, that always makes the difference.

The team will now focus on the Google Summer of Code. Have a look at the exciting projects developed this summer. We are also willing to help as much as possible plug-ins developers to get things done and improve documentation.

Because it’s a major release, changes are not deployed through the AutoUpdate, you need to download and install the new version. Plug-ins also need to be checked for compatibility. They will reappear on the Plugin Center in the coming days, as they are verified.

Consult the release notes and the new Javadoc for more information.

Features highlight

Localization

Localization is coming with this release, with French and Spanish! In Gephi, simply go to Tools -> Languages to switch.

You speak German, Russian or Italian? All three? Localization need your help, show up on the Forum to get started.

Email spigot

Now import emails from files or servers and look at communication networks. Spigots in Gephi are more advanced way to import networks, look for ‘Import Spigots’ in the File menu. Using wizards to configure settings, new spigots like Twitter or New York Times API are available as plug-ins.

New And Noteworthy

* GEXF 1.2 support (partial)
* Add Neighbour Filter
* Improve support of meta-edges in Statistics and Filters
* Improve usability of Filters
* Edge weight option in PageRank, which can now be used by the algorithm
* Duplicate workspaces (Edit Menu)
* Graph files now supports GZ compression
* Better Filters support in .gephi files
* VNA Import

Performance

* Label Adjust algorithm 3 times faster
* Saving/Loading projects is faster and use less memory

Bug fixes

* Windows installer should not require admin privileges (bug 663337)
* Cancelled Vector Export Disabled “File” Menu (bug 728871)
* Misformated SQL-Server JDBC url (bug 745414)
* Partition Filter Loses Categories As Subfilter (bug 726107)
* Workspace name does not increment (bug 711185)
* Ranking Color can’t be changed on OSX (bug 737727)
* Filter panel not cleared after query removed (bug 737992)
* Ego Filter “with self” option doesn’t work with depth > 1 (bug 671007)
* Maximum Degree Range Doesn’t Update on Subfilter (bug 725688)
* Cannot save and reload dynamic network as project (bug 709270)
* GDF exports attributes when option is disabled (bug 735927)
* Rename “Edit” menu to “Workspaces”? (bug 735475)
* Data laboratory context menu takes too much time to appear when a lot of nodes are selected (bug 735721)
* export svg/pdf with no edges causes NPE (bug 693789)
* Can’t import the same file twice in Welcome window (bug 598157)
* Graph Window in Overview Tab Fails to Load (bug 659773)
* Timeline appears first wrong when timeformat=”date” (bug 709234)
* Filter query not saved when Filter button is active (bug 671004)
* GEXF option doesn’t work(bug 709235)
* Ego Network Filter Searches for Substring, Does Not Match Value (bug 726114)
* Label text settings not saved in .project (bug 660205)
* saved preset for layouts creates several instances (bug 612848)
* Partition colors in Filters are different from those in Partition (bug 616037)
* import of pajek net has floating pt problem (bug 619893)
* NullPointerException on saving project (bug 622154)
* Name of currently opened file not updated after a “save as” (bug 629374)
* Labels are not hidden on Preview (bug 654006)
* Chaining Dynamic Filter (bug 654018)
* NullPointerException on importing CSV data in Data Laboratory (bug 654030)
* Statistics report not refreshed after a new execution (bug 654036)
* Maximum lock count exceeded error when running Label Adjust (bug 655544)
* GEXF export: missing attvalues element (bug 655975)
* Can’t use ranking label transformers with toolkit (bug 656172)
* GEXF export: attribute definitions exported even if Attributes option is unchecked (bug 656276)
* Preview throws an Exception with negative edges (bug 656955)
* Closeness centrality chart empty with normalized values (bug 658361)
* Statistics fail to work on a hierarchy level different from the leaves (bug 658394)
* Exported Data Table doesn’t use sorted columns (bug 658816)
* Importing a TIME_INTERVAL column in Data Laboratory CSV import doesn’t enable dynamic features (bug 659017)
* Error when using filter export features and filtering off (bug 659229)
* Exception when using flatten filter (bug 659270)
* GDF export generates invalid files (bug 660200)
* Wrong color type exported in GraphML (bug 660356)
* GEXF exporter doesn’t export the label if they are the same as the id (bug 660382)
* Tool Selection tooltip under the graph window (bug 660459)
* Column settings in Data Lab are not saved in .project (bug 660469)
* Data Lab filter not executed when changing the column (bug 660471)
* Data Lab: column used by node filter is automatically reset (bug 660517)
* Personalized color of a specific partition is rolled back (bug 660529)
* DOT importer ignores edge weight and .gv file extension (bug 661257)
* Exceptions when importing mixed graph (bug 662488)
* Error on selecting nodes from filter if graph window not shwn at startup (bug 663561)
* Blank preview screen (bug 664300)
* Edge text not visible on preview with other attributes (bug 664444)
* Data does not appear on nodes table (bug 667440)
* saving a project uses too much memory for large graphs (bug 672071)
* Data Lab: search/replace only on a given column (bug 676087)
* Workspace number incremented by opening a new project (bug 681038)
* Filter “out degree range” does not work (bug 681184)
* NullPointerException on exporting dynamic GEXF file with Toolkit (bug 686432)
* Wrong relative betweenness (bug 687267)
* graphml generated syntax is incorrect (bug 688678)
* Import Report freezes when the number of issues or logs is too high (bug 688865)
* Filters fail to work on a hierarchy level different from the leaves (bug 691278)
* Wrong edge count in Context Panel with hierarchies (bug 692225)
* RepaintCell exception on Mac OS X (bug 692379)
* Exceptions when group/ungroup from Partition after delete (bug 692382)
* GML importer don’t process ‘weight’ column as weight (bug 703877)
* Degree doesn’t take edge weight into account (bug 703933)
* Edge attribute values not imported from graphml file (bug 707390)
* PageRank not for weighted networks (bug 715621)
* Data Lab: boolean column edition facility (bug 717869)
* Exception on Delete Column if sorted by this column (bug 719987)
* Data Lab: edge rows not displayed when the hierarchy level /= 0 in Overview (bug 720033)
* Java Null Pointer when using Merge Columns (bug 722287)
* Open Recent files doesn’t work with project files (bug 734105)
* Timeline disappears after saving project.gephi file (bug 695558)
* Exception on Visualization Settings if opened before Overview (bug 734117)
* Database import is not cancelable (bug 734126)
* ‘A task is still executing’ error after cancelling a custom importer, not LongTask (bug 734132)
* Edge weight slider not refreshed between workspaces (bug 731599)
* Exception on New workspace after deleting last workspace (bug 735273)
* Colors not imported in Pajek Net format (bug 530028)

To 0.8 beta and beyond

Following the Gephi Manifesto, we continue our way to the release 1.0 with goals sets in the Roadmap. You can speed up our progresses in many ways whatever the time you can give: few minutes to  report a bug, some hours to fix one or to translate the user interface, some weeks to create a plug-in…you will be always greatly welcomed!

Do Gephi technologies matter for your research or business? You can support us by donating to the Gephi Consortium, or becoming a member to have an impact on our roadmap.

Proud to be part of the Gephi Community? On our fresh new store you can buy mugs, T-shirts and more to show it!

man woman mug

Go to the Gephi store

The stores sells in Europe (EUR). To get a tee-shirt in the US, check our official Gephi tee-shirt.

Introducing Data Laboratory

Eduardo Ramos

My name is Eduardo Ramos and this summer I have been working on a project for Gephi called Data Laboratory, which was initially designed as an idea for a GSoC project. The first specifications can be found at this wiki page . The new data laboratory features will be included in the 0.8 version of Gephi, released later this year.

Presentation

The general purpose of this project is to improve the basic and common features offered in “Data Laboratory” section of Gephi software.
In this section there are two tables (nodes and edges) that show the attributes of every node or edge as table columns. Before starting this project, the available features for modifying the table structure and values were not enough so there was a lack of graph manipulation and edition, refine or even create a graph with attributes in a tabular view.

New key-features coming:

  • Graph edition in many ways
  • Columns add/remove
  • Search & Replace
  • Import & Export CSV files
  • Charts and statistics reports
  • Sparlines
  • Merge columns
  • And the possibility of extending all of this with plugins!

Data Laboratory needed to provide an API/SPI for using and extending the new options available as plugins. A complete API is being designed to be able to use these new general features from any module or plugin. As this API is independent from the user interface, these actions will be included in the toolkit.

Types of manipulators (i.e actions) and how they look in the UI

Click to enlarge image

General actions

These are not related to a specific element like nodes, edges or columns and normally provide a UI. They appear as buttons in the toolbar at the top of data table, and can be grouped in a drop down button called “Plugins” if necessary.

Some of the new basic features of this type are: Add node/edge or Clear graph and clear edges. The rest of general actions in the picture are more special. Search/Replace shows an advanced UI to search and replace values in the table cells. It can do a normal search or a regular expression based search, among other useful options. It is implemented in a separate controller that is part of the Data Laboratory API.

Export CSV table allows the user to export the data in the current table selecting the desired columns, separator and charset to use:

Import CSV does the contrary operation, showing a wizard to configure the import settings in 2 steps:

Click on the image to enlarge.

Click on the image to enlarge.

Nodes/Edges actions

These actions are shown in a context menu on right click on one or more table rows that represent a node or an edge.
The currently implemented manipulators for nodes are:

  • Edit node properties – Shows edit window for the clicked node
  • Select on graph view – Centers graph view on the clicked node
  • Select neighbor nodes on table – Modifies the same table rows selection to highlight neighbor nodes of the clicked node
  • Delete – Deletes the selected node(s)
  • Clear node data… – Shows a UI to choose what columns to clear of the selected node(s)
  • Copy node data to the other selected nodes – Is only enabled when more than one node is selected, and copies the chosen columns of the clicked node to the other nodes
  • Group – Groups the selected nodes
  • Ungroup – Ungroups the selected groups
  • Ungroup recursively – Ungroups the selected groups, and all their descendant groups
  • Move to group… – Shows a UI to choose an available group to move the selected nodes
  • Remove from group – Removes the selected nodes from their group, putting them in the superior level of hierarchy
  • Settle and Free – Lock/Unlock the node(s) position
  • Set node size – Sets the given size in the UI to the selected node(s)
  • Link nodes – Is only enabled when more than one node is selected and shows a UI to select a source node which will be linked to all the other selected nodes
  • Copy node – Makes the desired number of copies of the selected node(s) with their attributes and properties

And for edges:

  • Select source node on graph view – Centers graph view on the source node of the clicked edge
  • Select target node on graph view – Centers graph view on the target node of the clicked edge
  • Select source and target on nodes table – Modifies the nodes table row selection to highlight the source and target nodes of the clicked edge
  • Delete – Deletes the selected edge(s)
  • Delete with nodes – Deletes the selected edge(s) and the nodes that the user chooses in the UI (source and/or target)
  • Clear edge data… – Shows a UI to choose what columns to clear of the selected edge(s)
  • Copy edge data to the other selected edge – Is only enabled when more than one edge is selected, and copies the chosen columns of the clicked edge to the other edges

Attribute columns  actions

Like nodes or edges manipulators, these are designed to operate with a specific type of data, attribute columns in this case.
They appear in Data Laboratory UI as independent (or grouped by type) drop down buttons. Each one represents an action that can be done with a single attribute column, therefore when the button is clicked, a list of the available columns for that operation (being the conditions specified by the manipulator) is shown to select one for execution.

Some of the implemented attribute column manipulators are:
Basic operations like adding columns, deleting columns, clearing and copying columns data, filling a column with some value and duplicating a column to other with the data type that the user needs, doing a data conversion when possible.

Other column manipulators operate with specific types of columns data like boolean or numeric or use regular expressions to obtain a new column from other column.

Click on the image to enlarge.

Click on the image to enlarge.

Columns merge strategies

Finally, other part of the project SPI can define different strategies for merging various columns.

Click on the image to enlarge.

Conclusion

This project has been a great opportunity for me to experience working with an open source community and learn about many programming aspects like API/SPI design, creating better user interfaces and creating modular applications and I will be happy to participate on future projects for Gephi.

Some more information can be found at this wiki page which will be updated soon with more documentation and help about how-to extend Data Laboratory using the new SPI.

Also your opinion and needs are very important to improve Gephi, so you can suggest and ask anything about Data Laboratory project at this forum post.

Eduardo Ramos

GSoC 2010 mid-term: Adding support for Neo4j in Gephi

Martin Škurla

During this summer, six students are working on Gephi with the Google Summer of Code. They contribute to Gephi by developing new features that will be integrated in the 0.8 version, released later this year.

My name is Martin Škurla and this summer I was working on GSoC project called “Adding support for Neo4j in Gephi”. In this article we will look at implemented features including these under the hood, pictures of dialogs, common use cases and future plans.

 

Gephi project

At first I want to make quick introduction into Gephi project. Gephi is Open Source Visualization Platform build on top of the NetBeans platform. It is written in Java so you can run it on various Operating Systems including Windows, Linux, Mac OS. It supports many interesting graph analysis capabilities including:

  • Real-time visualization
  • Layout
  • Metrics
  • Dynamic network analysis
  • Cartography clustering and hierarchical graphs
  • Dynamic filtering

The story so far

The main idea of my project is to add support for Neo4j in Gephi. This means the ability to transform the Neo4j graph into Gephi graph. In fact, both graph models are different so the first task was to make mapping between Neo4j graph items and Gephi graph items and vice versa.

There was also a mismatch between types supported in Neo4j and these supported by Gephi. This mismatch was solved by adding new “List” types into Gephi, so now every type in Neo4j has its appropriate type in Gephi.

There were also some changes under the hood which are not visible to end user, but must be defined and implemented. The most interesting thing is adding “Delegating mechanism”. This mechanism is responsible for getting values from storing engine (Neo4j) as well as manipulation with data. In fact during the importing process, graph representation of Neo4j graph is created in Gephi, but all values are not stored directly, but they are queried using delegating mechanism.

Another minor tasks were to customize the open dialogs used for importing local Neo4j database and debugging the imported database. The open dialog for importing accepts only valid Neo4j database directories. I defined valid Neo4j database directory structure and every valid directory now includes picture of Neo4j in the open dialog. User is able to open only valid Neo4j directories in the process of importing. The open dialog for debugging accepts only Java class files that can be used for debugging process. This simply means they have to implement required interface and have public nonparam constructor. Every valid class file will have Neo4j picture and after selecting a valid debug file, Target and Visualization options will be automatically filled based on data from selected class file.

 

Open Neo4j directory dialog customization

Open Neo4j debug file dialog customization

 

Neo4j integration

Menu integration

All possible actions started in menu. As we can see, this is the entry point to import from, export to and debug the Neo4j graph. Both importing and exporting support local as well as remote Neo4j databases.

Importing

Whole graph import dialog

Importing process consist of 2 approaches:

  • whole import
  • traversal import

Whole graph import dialog is designed for importing whole graph. We can customize the rules responsible for returning nodes by defining filtering expressions. For example previous dialog can be used when we want to find all people working on project Gephi with maximum age 30 years. Only people with at least 5 years of experience and those which have driver licence types A, B and C will be included.

Let’s have a deeper look at the dialog:

  • Property key is the name of property we want to filter
  • Property value is the value which will be compared to actual Node property value using chosen operator. Values will be automatically converted into appropriate types and if the value cannot be converted, the node will not be included into graph. All types supported in Neo4j are supported in this dialog. We can also see the support for array types in the last filter expression.
  • Operator will be applied on the final expression and if the expression is evaluated to true, node will be included
  • Match case means the ability to compare String, char, String[] and char[] types with respect of the same case
  • Restrict mode is used to restrict some nodes. Imagine we have people stored in database which have only subset of required property names used in filtering expressions. If the Restrict mode is on, only nodes which have all property names and all filtering expressions evaluated to true will be included. If the Restrict mode is off, every node which has any subset of required property names (even empty subset) will be included if all the filtering expressions applicable to the subset will be evaluated to true.

All the filtering expressions are combined together using AND and the list of current supported operators consist of: ==, !=, <, <=, >, >=.

In fact, usefulness of adding new operators as well as including OR and other useful import options is the main idea behind Questionnaire which is part of this article.

Traversal graph import dialog is designed for importing any subgraph using traversal capabilities of Neo4j v 1.1. Traversal import adds additional options:

  • Start node can be set in two ways, either by its id or by its indexing key and value pair
  • Order can be set to depth or breadth first algorithms
  • Max depth can be set to concrete number or to end of graph
  • Relationships can be restricted too. We can set any combination of Relationship types and directions which should traversal include. The list of Relationship types is dynamically filled from database with existing values.

 

Traversal graph import dialog

This was the quick summary of Gephi Neo4j importing capabilities implemented in the project. We focused on more features and one of them is the support for exporting. We can export any loaded graph into local or remote Neo4j database. The exporting process can be customized in similar way as importing.

Exporting

Export dialog

Exporting means opposite process to importing. Previous dialog shows exporting options as well as validation. We can customize exporting process by setting:

  • From column is used to set the RelationshipType to appropriate values from any of Gephi edge columns. During importing Neo4j graph, column with name “Neo4j Relationship Type” is automatically created.
  • Default value is used in the case when processed Gephi edge does not have value in selected From column
  • Export Node columns is the set of Gephi columns in node table which will be exported
  • Export Edge columns is the set of Gephi columns in edge table which will be exported

Remote importing/exporting

The only difference between local and remote importing/exporting is the existence of Remote dialog, where we need to set following connection information:

  • Remote database URL
  • Login
  • Password

All of them must be filled in order to successfully import/export remote graph.

Remote import/export dialog

Delegation process

Nodes values exploration (click on the image to enlarge)

As we can see from previous picture, we can very simply explore all the node and edge values. This is exactly the place where delegating mechanism is used. All values are in fact not stored directly in memory in some kind of Gephi data structure, but the storing engine (Neo4j) is requested for actual values every time we need them.

Debugging

Debugging in action

We can see debugging in action in previous picture. The dialog is initialized with data from chosen debug class file, but we can change all of them at the runtime too. Any change in options will automatically update graph visualization. We can change visibility of nodes and edges as well as colors for both nodes and edges. User proceeds to next step of debugging/traversal by clicking on the Next button.

Use cases

That was the quick summary of all implemented features and now we can summarize common use cases every user can be interested in.

Visualizing Neo4j graphs

One of the main ideas of my project was to implement the ability to visualize Neo4j graphs, even big ones. As we saw from the dialog pictures, we have many options how to customize the importing process including filtering. After the import we can use all the rich graph analysis features Gephi provides.

Analyzing only part of the whole graphs

Quite common use case is to analyze only part of the graph, which is possible in Gephi too. We can take advantage of traversing where we can set starting node and other traversal options. After that we can visualize and analyze only part of the graph.

Export graph stored in text files/databases into Neo4j

Another use case could be exporting graphs stored in graph text files or relational databases into Neo4j. In fact, every graph loaded into Gephi can be easily exported to Neo4j database. Importing formats depends on Gephi abilities themselves, currently following formats are supported:

  • Text formats: GEXF, GDF, GML, GraphML, Pajek NET, GraphViz DOT, CSV, UCINET DL, Tulip TPL, XGMML
  • Relational databases: MySQL, PostgreSQL, SQL Server

Future plans

There are more things which we want to implement, including:

  • support for Gephi Toolkit, which is in general set of Gephi core libraries which you can use in your own Java projects for graph visualization and manipulation
  • implementing proof of concept Web application using both Gephi Toolkit & Neo4j to manipulate with Neo4j database & show results (probably using GWT)
  • more features, bug fixing, performance optimizations

Questionnaire

One of the big advantages of Gephi is the fact that it is developed as Open Source project. We want to add additional features according to user requests and their opinions. That’s why we created questionnaire focusing on usefulness of proposed additions. We will be very happy if you fill the questionnaire because it is very valuable source of information and we can focus on features Neo4j users think useful. Please fill in the questionnaire.

Conclusion

I am very happy that I can be part of the Gephi developer community and introduce integration with Neo4j. During this summer I learned a lot and I am proud that I was chosen as GSoC student. The fact is that none of these features can be done without great help of my mentors, so big thank to both of them: Mathieu Bastian & Tobias Ivarsson.

If you are interested in and want to test the code, you can download source codes from my branch using bzr branch lp:~bujacik/gephi/support-for-neo4j

All the pictures were made on data stored in testing Neo4j database which can be created using Java SE project and you can download it using:
bzr branch lp:~bujacik/+junk/testing-new-neo4j-traversal-api

 

Martin Škurla

Download this article in PDF.

GSoC 2010 mid-term: Dynamic attributes and statistics

Cezary Bartosiak

During this summer, six students are working on Gephi with the Google Summer of Code. They contribute to Gephi by developing new features that will be integrated in the 0.8 version, released later this year.

 

The project which is done by Cezary Bartosiak focuses special attention on further development of dynamic network analysis (DNA) in Gephi. The aim is to create a framework which would make it possible to build and query a dynamic graph with use of proper API. It has got a practical purpose, for instance analyzing evolution of networks (see in particular M. Argollo de Menezes, A.-L. Barabási Fluctuations in Network Dynamics) or dynamic networks visualization. The article shows the most important features provided by this GSoC project.

 

In the current 0.7 version we can import dynamic graphs written in GEXF syntax and then filter them using Timeline component. Unfortunately, it only filters graphs topologies and that means hiding nodes and/or edges.

The obvious step is make it possible to handle dynamic changes not only of graph topology but also attributes connected with nodes and edges. It can be done by creating a proper API. This API could be used by other modules, like Statistics to make dynamic versions of them. Computing metrics like Degree Distribution or Clustering Coefficient for each time interval in the time series has got a great interest to analyze graphs within time.

So, getting down to brass tacks, the most important tasks are:

  • A data structure to host dynamic attributes efficiently which would make it possible to present them in Data Laboratory module.
  • A Dynamic API which has got the following features: the Dynamic Graph Decorator, that wraps the graph and a time interval, returns static graphs copies for given time intervals, attributes values arrays for given nodes/edges and time intervals.
  • Adapting Metrics framework to use Dynamic API to propose dynamic versions of existing metrics.

There are also additional features, which will be done in the future (probably they will not be included in the nearest release):

  • Dynamic visualization of attributes.
  • Dynamic version of the Ranking module – dynamic visualization attributes transformation.

I’ll try to shortly describe how these features are done.

Dynamic attributes

It is a very interesting task from a programmer’s point of view since it requires implementing a complicated data structure like Interval Tree (see also Antoine Vigneron – Segment trees and interval trees). But also users will judge it necessary. The purpose is to make it possible to read dynamic attributes from GEXF files and host them efficiently. Thanks to that we are able to get values of attributes of different time intervals. It goes without saying how powerful feature it is. To show how it is working, let’s consider one node (written in GEXF syntax):

<node id="1" label="Some node">
<attvalues>
<attvalue for="0" value="abcdefgh"/>
<attvalue for="2" value="1" end="2009-03-01"/>
<attvalue for="2" value="2" start="2009-03-01" end="2009-03-10"/>
<attvalue for="2" value="1" start="2009-03-10"/>
</attvalues>
</node>

As we can see we have got one dynamic attribute (id = 2) which has three different values in different time intervals. The first interval starts in the “negative infinity”. We simply assume that it only ends, never starts. But if we have got some bounds, for instance, a related graph has its start and end times, this attribute would “start” in the same moment as the graph. It is rather intuitive. The second interval exists from 2009-03-01 to 2009-03-10 and the last one exists from 2009-03-10 to “positive infinity” or graph’s bound.

After importing this to Gephi we can simply get values of ANY time interval we want, for example [-inf, +inf]. But we should know how to estimate a final value. In the above example we have got three values: 1, 2 and 1. To solve the problem which of them should be returned, we provide a set of estimators like AVERAGE, MEDIAN, MODE, SUM, MIN, MAX, FIRST and LAST. Each of them has got different behavior that depends on a type of attribute, i.e. for real numbers they behave like in statistics.

So, users will be able to get values of different time intervals on demand, for instance in Data Laboratory module or (in the future) see them on the screen as a part of a rendered graph. For instance we have got some attribute like priority. A potential user will be able to choose between several possibilities like: nothing (it means this attribute should not be visualized), color, stroke, thickness etc. It means, for instance, that if some node has got this attribute close to its upper bound its stroke thickness would be very high. And, on the other hand, if one node has got this attribute close to its lower bound only its internal color could be visualized.

Metrics framework

For now it is possible to count a set of important metrics but all of them take a “static graph” into consideration. The idea of dynamic metrics is then to execute the static ones in a loop, where the graph changes according to time interval. The following screen shows that use of these additional metrics is similar to their static brothers:

Dynamic Metric (click on the image)

In the screen we can see only Dynamic Degree Power Law, but of course every dynamic metric will be implemented (during writing this article this module was still under development – it also means that the final product could differ from this one presented above). So, user inserts important information like time interval etc. and gets a separate report for every time interval. What are the other results?
The result for each node/edge is written in the graph, so one can see this in Data Laboratory.
General result is also written and presented in the report.

Conclusion

Evolution of networks, network dynamics and dynamic network analysis are hot topics nowadays. There is growing interest in studying these issues. It causes that there is bigger and bigger need of DNA analysis tools. In my opinion Gephi is heading towards being one of the best…

Cezary Bartosiak

GSoC 2010 mid-term: Graph Streaming API

andre-panisson

During this summer, six students are working on Gephi with the Google Summer of Code. They contribute to Gephi by developing new features that will be integrated in the 0.8 version, released later this year.

The purpose of the Graph Streaming API project, run by André Panisson, is to build a unified framework for streaming graph objects. Gephi’s data structure and visualization engine has been built with the idea that a graph is not static and might change continuously. By connecting Gephi with external data-sources, we leverage its power to visualize and monitor complex systems or enterprise data in real-time. Moreover, the idea of streaming graph data goes beyond Gephi, and a unified and standardized API could bring interoperability with other available tools for graph and network analysis, as they could start to interoperate with other tools in a distributed and cooperative fashion.

 

With the increasing level of connectivity and cooperation between systems, for a system that aim to be interoperable, it is imperative to comply with the available standards. Graph objects are abstractions that can represent a wide range of real-world structures, from computer networks to human interactions, and there are a lot of standards to exchange graph data in different formats, from text-based formats to xml-based formats. But the real-world structures are constantly changing, and the current formats are not suitable to exchange such type of dynamic data.

A lot of well-established systems already stream data to its users using a streaming API. Twitter for example defined a Streaming API to allow near-realtime access to its data. They are using two different formats: XML and JSON, but JSON is strongly encouraged over XML, as JSON is more compact and parsing is greatly simplified.

We are not the first to implement a Graph Streaming API, and another very interesting experience is the GraphStream Java Library. It is composed of an API that gives a way to add edges and nodes in a graph and make them evolve. The graphs are composed of nodes and edges that can appear, disappear or be modified, and these operations are called events. The sequence of operations that occur in a graph is seen as a stream of events.

So, as other people already had successful experiences with graph streaming, why not start our work based on these experiences? That’s what we are doing, and beyond finding these experiences very useful, we are also trying to be compatible with the available work. The first Gephi Graph Streaming release will use two formats: JSON for flexibility, and a text-based format, based in the GraphStream implementation.

The first version of the Graph Streaming features will be available in the next release of Gephi, but it’s already possible to taste some of these features. To illustrate how simple it will be to connect to a master, the following video shows Gephi connecting to a master and visualizing the received graph data in real time. The graph in this demo is a part of the Amazon.com library, where the nodes represent books and the edges represent their similarities. For each book, a node is added, the similar books are explored, adding the similar ones as nodes and the similarity as an edge.

 

 

The Graph Streaming specification goes beyond the simple fact that a client can pull data from a master: in fact, clients can interact with the master pushing data to it, in a REST architecture. The same data format used by the master to send graph events to the clients is used by clients to interact with the master.

In the next example, we will transform Gephi in a master to provide graph information to its clients. At the Streaming Tab in the Gephi application, you can access all the features of graph streaming. You can connect to a Master by clicking the ‘+’ button, but you can also transform your Gephi in a master by right-clicking the “Master Server” and selecting “Start” (You are not limited to a single master by host: each Gephi workspace can be available as a master). By default, the HTTP server will listen at port 8080 in plain HTTP, and at port 8443 using SSL. The server path depends on your workspace: each workspace uses a different path. You can configure these parameters (and also Basic Authentication) at the “Settings…” button:

 

Graph Steaming Server start

Graph Steaming Settings Panel

 

Now, you can connect to it using some simple HTTP client. For example, you could use curl to see the data flowing. First of all, open a shell window and execute the following command:

curl "http://localhost:8080/workspace0"

With this, you are connecting to your workspace at Gephi. If the workspace is empty, you will receive no data, but you will remain connected, so you will receive all events from now.

Now open another shell prompt, and with the following commands, you could see a triangle appearing at Gephi:

curl "http://localhost:8080/workspace0?operation=updateGraph" -d $'
{"an":{"A":{"size":10,"r":1,"g":0,"b":0,"z":0,"y":500,"x":70}}}r
{"an":{"B":{"size":10,"r":1,"g":0,"b":0,"z":0,"y":90,"x":250}}}r
{"ae":{"AB":{"source":"A","target":"B","weight":10,"r":0,"g":0,"b":0,"directed":false}}}r
{"an":{"C":{"size":10,"r":1,"g":0,"b":0,"z":0,"y":90,"x":-90}}}r
{"ae":{"BC":{"source":"B","target":"C","weight":10,"r":0,"g":0,"b":0,"directed":false}}}r
{"ae":{"CA":{"source":"C","target":"A","weight":10,"r":0,"g":0,"b":0,"directed":false}}}'

At the same time, all events will be sent to your connected client, in the other shell window.

With the following commands you can retrieve some of the data:

curl "http://localhost:8080/workspace0?operation=getNode&id=A"
curl "http://localhost:8080/workspace0?operation=getEdge&id=AB"

And you could start manipulating your graph through command line, as you like. There are other event types for changing and removing edges and nodes, for more information about them see the current status of the JSON Streaming Format, available at this page. We recall that this format is subject to changes, as the API was build to be very flexible and more requirements are being added to it.

But what about connecting two different Gephi instances together? One instance will be master, and the other client. Using the Graph Streaming API, a change in a graph at the master’s workspace should cause a change in the client’s workspace, and a change at the client’s workspace will cause it to send requests to the master to update its graph accordingly. Both instances working in a distributed mode. In fact, different people could work in a distributed mode to construct a graph: it’s the Collaborative Graph Construction.

My personal impressions about it

For me as a researcher, Gephi has the potential to become a de-facto standard for manipulating and visualizing large scale graphs. I believe that the research community is still lacking a high-quality, general-purpose, community-supported framework for exploratory analysis of large-scale dynamical graph data, and I believe that Gephi has the potential to fill this gap. I’m working also in collaboration with ISI Foundation at the SocioPatterns project, an example of research use case that currently uses Gephi for exploratory data analysis and visualization. The support for dynamic networks, the readiness of the Gephi data model for dynamical update of graph topology and attributes and, in a near future, the support for graph streaming are exciting features that suit very well the large-scale real-time data sources we are dealing with. The potential for processing live streams from our experiments is a unique feature that we are eager to see working.

André Panisson

GSoC 2010 mid-term: Direct Social Networks Import

Yi Du

During this summer, six students are working on Gephi with the Google Summer of Code. They contribute to Gephi by developing new features that will be integrated in the 0.8 version, released later this year.

Yi Du is adding the module Direct Social Networks Import during this summer, which provides several kinds of importers like Emails, Twitter or Facebook. The goal of this article is to briefly introduce some of the importers, as well as several samples provided.

The ability to import any kind of structured data and build network from it is essential for users. This step is often missing and requires time and scripting abilities, although tools and libraries able to read and parse all type of data already exist. Moreover it has never been so easy to quickly access meaningful datasets online.

Email importer

Email is a simple and widely used tool in communication among people, yet many people have no knowledge of its mechanism. To some extent, our work on analyzing emails can help people better know their relationship with others. In our email importer module, each email address is represented as a node. If there are two email addresses with the same display name, an option will be provided to allow the user to determine whether to regard them as a node or two different nodes. Afterwards, if there is an email from A to B, an edge will be built, along with an option permitting the user to determine whether Cc or Bcc will be viewed as an edge.

We provide two ways to import emails: on the one hand, the emails are obtained from the email server (POP3 or IMAP), in a one-by-one manner. On the other hand, we get the emails from local files or folder. This importer will arise a problem, that is, different email clients may have different file format. Fortunately, our importer has an easy-to-extend API, as well as a default implementation (EML files). EML is standard and can be obtained from Thunderbird, Outlook and Gmail with tools like Gmail Backup.

This is a sample to illustrate how email importer outputs the data (2000 emails with EGO filter, 700 nodes, 1300 edges).
fig1a_The_EGO_graph fig1b_Graph_whose_indegrees_bigger_than_0
fig1c_Modularize_the_graph fig1d_Subgraph_who_has_the_max_number_of_Modularity_count
fig1e_The_hottest_group

Twitter importers

Twitter is a very popular social network. People can send and receive short messages, which we usually call tweets, using Twitter. We can follow person we are interest in and topics we like. Twitter networks has been popularized by NodeXL which has a similar feature. See this nice gallery.

We provide two kinds of networks: “Twitter Search Network” and “Twitter User Network”.

We support Twitter search network to analyze people who search or mention similar keywords. We present one Twitter user as a node and define three kinds of edge construction:

  • Replies-to relationship: If A reply to B in a searched tweet, an edge from A to B will be added.
  • Mentions relationship: If A mentions B in a searched tweet, an edge from A to B will be added.
  • Followers relationship: If A follows B in constructed graph, an edge from A to B will be added.

The second network we provide is “twitter user network”. We analyze people who follow each other to show the relationships between twitter users. We add an edge from A to B if A follows B in the whole graph by default. We provide three options for vertex construction:

  • Person followed by the user: If searched user A follows B, B will be added as a vertex.
  • Person following the user: If A follows searched user B, A will be added as a vertex.
  • Both: Both the above two options.

The interface of the two importers are shown as below.
fig2a_User_network_importer_ui fig2b_Search_network_importer_ui

New-York Times importer

The New York Times is an American daily newspaper founded and continuously published in New York City. It has a series of APIs for developers on news and social networks. There are several APIs of NYT, such as Article Search API, Best Seller API, etc.

We provide two kinds of social network importers in Gephi: “Article Network” and “TimesPeople Network”. We use article network to analyze articles with specific filters (date, facets, etc). User can choose which option constructs the edge. For example, user can choose date as the edge. If two articles have the same date attribute, an edge between them will be built. TimesPeople is a social network for Times readers, it’s similar to Facebook, we can analyze the relationship between them.

Interface of NYT article network import and TimesPeople network are shown below:
fig3a_NYT_article_network_importer_ui fig3b_NYT_timespeople_network_importer_ui

Display of TimesPeople network:
Display of TimesPeople network Display of TimesPeople network
Display of TimesPeople network

Conclusion and future work

In this article, we introduced several importers: Email, Twitter and NYT. By using these importers, users can import data they want and analyze them. They can find the hottest group, the relationship of their friends, the most related author of a facet and other import information by analyzing them.
Until the end of the GSoC, we will have four major importers: Email, Twitter, NYT and Facebook. Among these four importers, Twitter will have “Twitter User Network” and “Twitter Search Network”. NYT will have “NYT article search network” and “NYT TimesPeople Network”. Facebook will have “Facebook Friends Network” and “Facebook Group Network”. Besides adding Facebook importer, we will also optimizing the UI of the importers, and make them more user friendly.

Yi Du