Thursday, October 14, 2010

A standard for GraphDB

Yesterday GraphDBs have covered a niche market. In the last period things are changed and they are becoming much more popular than ever. This is due to:
  • The always growing need to have data strongly interconnected. What better than a GraphDB for it?
  • Applications are much more "social" than before. And social means relationships among subjects that perfectly bound to the Graph model
  • Performance on large datasets. You can live without a pure GraphDB, and therefore continue to use a Relational DBMS, until you need a real fast access to your data
  • The NoSQL movement has contributed a lot to change the mind of developers to focus to DBMS different (and sometime very different) from the Relational model.

The GraphDB has few but strong concepts. Graphs are composed by:
  • Vertex or Node, the linked entity. Vertexes can have properties.
  • Edge or Arc, as the link between the Vertexes. Edges can have properties and can be unidirectional or bidirectional
  • Property, is a value to assign to Vertexes and Edges. A property has a name and a value
Each GraphDB product is based upon the above concepts, but features can be very different among them. Furthermore there isn't a standard for them and an application can't migrate from one to another one at zero or low cost.

This is the reason why I want to introduce the Tinkerpop Blueprints. This Open Source project aims to have a shared set of basic interfaces to abstract the concepts of Graph, Vertex, Edge and Property. In this way applications can easily migrate to another available implementation of the specification.

Furthermore Tinkerpop has created a complete stack of tools and framework to work with Graphs. Below the main ones:
  • Gremlin is a Turing-complete, graph-based programming language designed for key/value-pair multi-relational graphs. Gremlin makes use of an XPath-like syntax to support complex graph traversals. This language has application in the areas of graph query, analysis, and manipulation.
  • Rexster is a RESTful graph shell that exposes any Blueprints graph as a standalone server. Extensions support standard traversal goals such as search, score, rank, and, in concert, recommendation. Rexster makes extensive use of Blueprints, Pipes, and Gremlin. In this way its possible to run Rexster over various graph systems.
  • Pipes is a graph-based data flow framework for Java 1.6+. A process graph is composed of a set of process vertices connected to one another by a set of communication edges. Pipes supports the splitting, merging, and transformation of data from input to output.

(1) What do you think about this effort to standardize GraphDB?

(2) What is missed today in Tinkerpop Blueprints API?

(3) Does a .NET porting make sense?


rowillis said...

The only .NET graph database out there is sones, but it is not well documented and seems to be in a fairly immature state at the moment, although it does have some good ideas.

OrientDB seems to be more mature, but is currently closed to the .NET community. I would be more than happy to help with a .NET port of OrientDB if you are in need of people.

It seems that the opportunities for GraphDB's are immense, and tapping into the .NET community will only help increase exposure at this critical juncture.

Luca Garulli said...

While the .NET porting is interesting I'd like to provide a .NET driver for OrientDB as first thing. The best would be to implement the binary protocol to achieve the best performance, but the HTTP/RESTful one is easier to implement.

WDYT? Would you like to start it?