rgexf: An R library to work with GEXF graph files

 

george_picture-100x100George Vega is an economist working at the Chilean Pension Supervisor and cofounder of nodoschile.org. His research interests are Statistical Computing, HPC, Complex Systems and Public Policy.

The first R library to work with GEXF files, rgexf allows both writing (exporting) and reading (importing) .gexf files.

Features:

  • Writing and reading GEXF files
  • Writing dynamic graphs
  • Writing graphs with attributes (boolean, numeric, char)
  • Writing graphs with VIZ attributes (color, size, shape)
  • Building GEXF graphs from scratch (node/edge by node/edge)

rgexf is written in such a way that it is not necessary to have knowledge about XML.

Some examples:

# Installing from CRAN and loading
install.packages("rgexf", dependencies=TRUE)
library(rgexf)

# Reading lesmiserables graph (and summarizing)
lesmiserables <- read.gexf("http://gephi.org/datasets/LesMiserables.gexf")
summary(lesmiserables)

# Building a GEXF class object (includes data frames of nodes/edges +
# XML representation of it) from two two-column data.frames
mygraph <- write.gexf(nodes=people, edges=relations)

# Exporting to some place
print(mygraph, output="mygraph.gexf", replace=TRUE)

# Creating a GEXF object from scratch (and adding a node)
mynewgraph <- new.gexf.graph()
mynewgraph <- add.gexf.nodes(mynewgraph, id=1, label="George")

The source code plus more examples can be found on the project website.

For suggestions, bug reports or support (any) ask me through Twitter @gvegayon or just write me an email to george [dot] vega [at] nodoschile [dot] org

George Vega Yon

Gexf4j, a new Java library to create GEXF files

francesco

Francesco Ficarola is a Computer Engineer and a Ph.D. student at the Sapienza University of Rome. In addition he is been working for an Italian company as R&D Engineer for one year. His main research interests are Wireless Sensor Networks, Social Networking and whatever concerns “Internet of Things”.

Gephi supports the 1.2draft of GEXF file format since version 0.8. Until now, if you are a Java developer, you couldn’t use any up-to-date Java library to manage this version of the format. The only available library for building GEXF graphs was gexf4j-core v.0.2.0-ALPHA by J. Campanini. Unfortunately that library implemented GEXF 1.1draft only, and is no longer maintained. So I have decided to update that library in order to work and build GEXF 1.2draft compliant graphs. This version introduced many improvements, see the changelog.

The latest version of gexf4j (currently 0.3.1) supports new XML attributes and data types to encode dynamic networks:

  • timeformat or spell
  • open intervals (startopen/endopen)
  • double
  • date
  • xsd:dateTime

In addition, the javadoc has been added and all methods have now meaningful names for their parameters.

Creating a GEXF file with gexf4j is very simple and requires very few lines of code: read the two examples in the project README file, or find them in the “it.uniroma1.dis.wiserver.gexf4j.examples” project package.

<!–If you have been a gexfj-core v.0.2.0-ALPHA user, you will be able to easily program with the new version. You only need to change the code a little for the dynamic features, in particular:

  • “TimeType” class has been renamed “TimeFormat”.
  • “Slice” components are renamed “Spell”, according to 1.2draft.
  • The methods “setStartDate(Date d)” and “setEndDate(d)” are changed in “setStartValue(Object o)” and “setEndValue(Object o)”, respectly, in order to support multiple timeformat type: double (default), date and xsd:dateTime.
  • The Methods “Date getStartDate()” and “Date getEndDate()” are changed in “Object getStartValue()” and “Object getEndValue()”, respectly.
  • The methods “setStartIntervalType(IntervalType startIntervalType)” and “setEndIntervalType(IntervalType endIntervalType)” have been added in order to allow the user to choose the interval type (open or close) of the “start/end” attribute.

Finally, if you would like to support this project, please let me know, you are welcome… and if you want to stay informed about the latest news on gexf4j, just follow me on Twitter: @f_ficarola

Checkout code

Run
git clone git://github.com/francesco-ficarola/gexf4j.git

Report issues

Simply go to the Issues tab.

Have a nice “GEXF graph”!

Francesco Ficarola

GSoC 2010 mid-term: Dynamic attributes and statistics

Cezary Bartosiak

During this summer, six students are working on Gephi with the Google Summer of Code. They contribute to Gephi by developing new features that will be integrated in the 0.8 version, released later this year.

 

The project which is done by Cezary Bartosiak focuses special attention on further development of dynamic network analysis (DNA) in Gephi. The aim is to create a framework which would make it possible to build and query a dynamic graph with use of proper API. It has got a practical purpose, for instance analyzing evolution of networks (see in particular M. Argollo de Menezes, A.-L. Barabási Fluctuations in Network Dynamics) or dynamic networks visualization. The article shows the most important features provided by this GSoC project.

 

In the current 0.7 version we can import dynamic graphs written in GEXF syntax and then filter them using Timeline component. Unfortunately, it only filters graphs topologies and that means hiding nodes and/or edges.

The obvious step is make it possible to handle dynamic changes not only of graph topology but also attributes connected with nodes and edges. It can be done by creating a proper API. This API could be used by other modules, like Statistics to make dynamic versions of them. Computing metrics like Degree Distribution or Clustering Coefficient for each time interval in the time series has got a great interest to analyze graphs within time.

So, getting down to brass tacks, the most important tasks are:

  • A data structure to host dynamic attributes efficiently which would make it possible to present them in Data Laboratory module.
  • A Dynamic API which has got the following features: the Dynamic Graph Decorator, that wraps the graph and a time interval, returns static graphs copies for given time intervals, attributes values arrays for given nodes/edges and time intervals.
  • Adapting Metrics framework to use Dynamic API to propose dynamic versions of existing metrics.

There are also additional features, which will be done in the future (probably they will not be included in the nearest release):

  • Dynamic visualization of attributes.
  • Dynamic version of the Ranking module – dynamic visualization attributes transformation.

I’ll try to shortly describe how these features are done.

Dynamic attributes

It is a very interesting task from a programmer’s point of view since it requires implementing a complicated data structure like Interval Tree (see also Antoine Vigneron – Segment trees and interval trees). But also users will judge it necessary. The purpose is to make it possible to read dynamic attributes from GEXF files and host them efficiently. Thanks to that we are able to get values of attributes of different time intervals. It goes without saying how powerful feature it is. To show how it is working, let’s consider one node (written in GEXF syntax):

<node id="1" label="Some node">
<attvalues>
<attvalue for="0" value="abcdefgh"/>
<attvalue for="2" value="1" end="2009-03-01"/>
<attvalue for="2" value="2" start="2009-03-01" end="2009-03-10"/>
<attvalue for="2" value="1" start="2009-03-10"/>
</attvalues>
</node>

As we can see we have got one dynamic attribute (id = 2) which has three different values in different time intervals. The first interval starts in the “negative infinity”. We simply assume that it only ends, never starts. But if we have got some bounds, for instance, a related graph has its start and end times, this attribute would “start” in the same moment as the graph. It is rather intuitive. The second interval exists from 2009-03-01 to 2009-03-10 and the last one exists from 2009-03-10 to “positive infinity” or graph’s bound.

After importing this to Gephi we can simply get values of ANY time interval we want, for example [-inf, +inf]. But we should know how to estimate a final value. In the above example we have got three values: 1, 2 and 1. To solve the problem which of them should be returned, we provide a set of estimators like AVERAGE, MEDIAN, MODE, SUM, MIN, MAX, FIRST and LAST. Each of them has got different behavior that depends on a type of attribute, i.e. for real numbers they behave like in statistics.

So, users will be able to get values of different time intervals on demand, for instance in Data Laboratory module or (in the future) see them on the screen as a part of a rendered graph. For instance we have got some attribute like priority. A potential user will be able to choose between several possibilities like: nothing (it means this attribute should not be visualized), color, stroke, thickness etc. It means, for instance, that if some node has got this attribute close to its upper bound its stroke thickness would be very high. And, on the other hand, if one node has got this attribute close to its lower bound only its internal color could be visualized.

Metrics framework

For now it is possible to count a set of important metrics but all of them take a “static graph” into consideration. The idea of dynamic metrics is then to execute the static ones in a loop, where the graph changes according to time interval. The following screen shows that use of these additional metrics is similar to their static brothers:

Dynamic Metric (click on the image)

In the screen we can see only Dynamic Degree Power Law, but of course every dynamic metric will be implemented (during writing this article this module was still under development – it also means that the final product could differ from this one presented above). So, user inserts important information like time interval etc. and gets a separate report for every time interval. What are the other results?
The result for each node/edge is written in the graph, so one can see this in Data Laboratory.
General result is also written and presented in the report.

Conclusion

Evolution of networks, network dynamics and dynamic network analysis are hot topics nowadays. There is growing interest in studying these issues. It causes that there is bigger and bigger need of DNA analysis tools. In my opinion Gephi is heading towards being one of the best…

Cezary Bartosiak

Gephi initiator interview: how “Semiotics matter”

Today I have the honnor to interview a special member of Gephi Team: Mathieu Jacomy.

Mathieu is an engineer, a founder of the WebAtlas NGO, teacher in Sciences Po Paris, and leads R&D in the TIC Migrations program in the Fondation Maison des Sciences de l’Homme and Telecom ParisTech school.
He is the main developer of the “Navicrawler” software. He also created the first Gephi prototype.

 

heymann2_8080
Sebastien Heymann: Hi Mathieu Jacomy, you are the creator of Graphiltre, the first Gephi prototype that you developed in 2006. What was the purpose of making a yet-another-graph-software?
jacomy8080
Mathieu Jacomy: Hi ! I’m glad to answer your questions, and I hope our readers will be pleased to know more about Gephi.

At this time I was analyzing a lot of graphs and I wasn’t satisfied by the existing free tools. That’s why I started to build my own tools.

I had no money to use professional tools, and I needed to understand precisely what the software was doing : the open source, free softwares perfectly fit these constrains.
I was using the amazing software Guess proposed by Eytan Adar, that himself built for his own needs. I was doing quite the same thing as him, and I couldn’t start to explore graphs without this tool.
But I wasn’t satisfied because the software didn’t allow so much manipulations. I couldn’t look at the substructures as easily as I wanted, and it was difficult to make nice cartographies.
I was dreaming of a “graph-dedicated Photoshop“, a visualization-oriented software rather than a script-oriented tool.

A good way to figure out what I mean is to look at the spatialization process. In famous softwares such as Pajek or Guess, you have algorithms called “layout”, “force-vectors” or “energy model”. These algorithms give its shape to the graph, and it is probably the most critical part of the process to build a clear visualization. Because the substructures or “patterns” that one may see in the image strongly depend on the algorithm and the settings chosen. But in the same time, most of users also want to quickly look at the global shape of the graph, and may not be aware that it’s important to search for the best algorithm to use depending on the time you have, the quality you want, the size of the graph, its degree distribution, the substructure that you expect to recognize… I was careful with these algorithms but even if I understood their principles and specificities, I couldn’t figure out how they were transforming the graph, and I couldn’t evaluate their differences.

Why? Because in these softwares you can’t :
– Manipulate the graph while the algorithm is running
– Modify the settings while the algorithm is running
– And sometimes, you can’t event see the graph while the algorithm is running
How can you just understand what’s happening there? Of course I started to work on a software that allowed this. But the same kind of problems appears again in other parts of the process, like filtering, image exporting… Pajek is clearly built in a mathematical perspective. Guess is more user-friendly, but not enough. I didn’t want a tool for mathematics experts, but a tool for people that actually have to explore and understand graphs. A professional tool for a job that didn’t exist at this time.

This was the starting point of “Graphiltre“. Building a graph exploration system so that you can understand what you are doing by looking at what happens on the screen, and do anything (including filtering) without typing a single script line.

Continue reading →

gexf.net, a new website for gexf graph format, libgexf and gexfExplorer

Hi,
It’s been a while we didn’t give you some fresh news, but hopefully this fall will see great surprises for the community! First of them today, we just published a website solely dedicated to the GEXF file format and applications : http://gexf.net

You will find :

  • clear GEXF specifications, examples and primer.
  • libgexf, the official C++ toolkit for GEXF.
  • gexfExplorer, a brand new open source Flash application created by Alexis Jacomy to visualize networks encoded in GEXF, directly in a web browser.

This website concentrates all the useful links to communicate and staying involved in these projects. They are now independent from Gephi itself, and take part of a bigger Gephi Community project.

[nggallery id=4]

libgexf 0.1.1 is out!

We started the libgexf project a few months ago to help people creating, reading and writing efficiently GEXF files. Today we announce the second alpha release of this dynamic library, which brings new file validation (RelaxNG and XML-schema based) and data integrity checking (see examples)! These features provide a quick way to find and correct some mistakes or missing elements in your files. GEXF files created are also now compatible with Gephi 0.6 by using the Legacy writer. See the complete changelog here for more details.

Finally, New bindings are available for Perl5 and Java6, which increase the number of language bindings to 3 with the existing one in Python. Don’t hesitate to give us feedbacks and requests features.

Go to libgexf page

libgexf, a C++ toolkit library for GEXF file format

We released the first version of libgexf, the official toolkit library for creating and manipulating GEXF files, distributed under the MIT licence. The GEXF (Graph Exchange XML Format) format has now its dedicated project, independent from Gephi. It allows to use GEXF in daily needs for exchanging rich network data. It compiles both basic use cases like network topology and advanced features like dynamic and hierarchical networks.

This tool has been long-awaited by the community, to make the daily use of reading and producing GEXF files as easily and efficiently as we can. Forget the boring activity of reading the format specification, interpreting it correctly and implementing a solution to export your data to Gephi. Libgexf do this for you, and is up-to-date according to the GEXF specifications.

Though the library is written in C++ a variety of language bindings make it available in other environments. Libgexf currently only works on Linux systems (tested on Ubuntu 8.10 and 9.04), but the portability will be increased on demand. A Python binding facility is also provided, and Perl will be added soon.

You are welcome to try it and help us improving this toolkit for your benefits! A dedicated forum section has just opened.

Go to libgexf page

Import graph files

application_x_lhaThis article discusses graph file format and introduce a new support page about graph import, which explain a bit of each format and gives tips about current Importer’s specificities.

Gephi supports major graph file format and GEXF, which is our own creation. These file format come from different editors and therefore none is really a standard. Needless to say Gephi supports them for enable previous work to be imported but it is not always easy, due to differences between them. Some standardization efforts exist but they are difficult to apply anyway, due to very different features within graph editors, for instance mixed diagrams and graphs editors.

Our approach when building the GEXF format is slightly different. Instead being specific to our software focus is made on what may be common to all network editors. In addition it will be the first standard dynamic, yet easy format. You can follow the process of specification and see some samples on this page. It recently reached a 1.1 version.

Gephi’s domain of application are networks only, this tends to clarify which data shall be imported. Although these (old) formats can have sometimes complicated functionalities, only few are essential to import a graph structure in Gephi, in sum topology and attributes.

See below current and future status about file import. Note that graph file format export is so far not a priority, as long as we have GEXF export, but we may perhaps consider it more in the future. Note that as PDF is not a graph format, export feature is set at high priority for 0.7.

Import Implementation status (Gephi 0.6 beta2)

* GEXF Implement GEXF 1.0 Specification.
* GDF Implemented, but some rare bugs remains.
* GraphML Support basic node, edge and attribute. Don’t currently support hierarchy but planned for 0.7.
* Pajek Implemented, works fine.
* XGMML As for GraphML, hierarchy is not yet supported.

More details and compare on Supported Graph Formats.

Future

* GML Needs to be done, planned for 0.7 version.
* Excel/CSV We are thinking how to do this.
* Database In the 2009 roadmap.