Thursday, September 09, 2010

OrientDB the fastest GraphDB available today?

Two days ago I've finished the integration of the last part of the Blueprints: the Index. Now OrientDB can be used as for Neo4J with the entire ThinkerPop stack. This include the Gremlin language as well.

First tests shows that OrientDB outperforms Neo4J, the market leader of GraphDB, in all the tests but the iteration (and therefore counting). This is due to the implementation of the Blueprints that need to create a new wrapper object to contain the OrientDB's OGraphVertex and OGraphEdge objects. I've some ideas to improve it, but I need more time for it. Maybe in the next weeks or earlier if some users need it. However using native OrientDB Graph APIs this overhead is removed.

These are the results:

Test name Description Time in ms (less is better) OrientDB vs Neo4J
OrientDB 0.9.22 Neo4J + faster, - slower
testVertexEquality 1 vertex added and retrieved in 4.74 2.33 -203,4%
testRemoveVertexNullId 1000 vertices added in 216.73 2,070.74 +955,4%

1000 vertices deleted in 1,093.71 1,910.02 +174,6%
testVertexIterator 5000 vertices added in 476.62 8,314.04 +1.744,4%

5000 vertices counted in 86.94 1.60 -5.433,8% *
testAddManyVertexProperties 750 vertex properties added (with vertices being added too) in 72.34 43,437.29 +60.046,0%
testAddEdges 6 elements added and checked in 0.79 30.70 +3.886,1%
testAddManyEdges 3000 elements added in 2,314.44 8,031.12 +347,0%

1000 edges counted in 8.45 12.54 +148,4%

2000 vertices counted in 31.62 0.51 -6.200,0% *

2000 vertices checked in 98.27 14.05 -699,4% *
testGetEdges 3 edges retrieved in 0.45 0.12 -375,0%
testRemoveManyEdges 200 vertices counted in 167.28 0.14 -119.485,7% *

100 edges counted in 34.92 0.79 -4.420,3% *

100 edges removed and graph checked in 20,555.44 332.54 -6.181,3% *
testStringRepresentation 1 graph string representation generated in 0.01 0.01 100,0%
testClear 75 elements added in 45.89 152.87 +333,1%

75 elements deleted in 30.76 422.05 +1.372,1%
testRemovingEdges 500 vertices added in 133.62 974.37 +729,2%

1000 edges added in 1,130.64 4,521.90 +399,9%

1000 edges deleted (with size check on each delete) in 36,773.66 4,411.69 -833,6% *
testRemovingVertices 500 vertices added in 10.48 1,110.03 +10.591,9%

250 edges added in 132.32 1,089.45 +823,3%

500 vertices deleted (with size check on each delete) in 70,675.74 2,140.07 -3.302,5% *
testTreeConnectivity 1464 vertices added in a tree structure in 1,506.82 5,832.02 +387,0%

1464 vertices iterated in 427.67 0.75 -57.022,7% *

1463 edges iterated in 9.89 5.39 -183,5% *
testTinkerGraphEdges graph-example-1 loaded in 43.75 541.15 +1.236,9%
testTinkerGraphVertices graph-example-1 loaded in 6.81 520.29 +7.640,1%
testTinkerGraphSoftwareVertices graph-example-1 loaded in 5.31 543.02 +10.226,4%
testTinkerGraphVertexAndEdges graph-example-1 loaded in 5.01 544.17 +10.861,7%

* are the tests with iteration


TinkerPop said...

I think that these tests are misleading because Neo4j is in AUTOMATIC transaction mode and thus, for every update to the graph, a new transaction is created. Blueprints is not a benchmark suite, but a operational semantics suite. You are comparing apples and oranges.

Luca Garulli said...

Hi Marko,
this is a micro-benchmark and as all the micro-benchmarks won't measure the absolute performance of a product. The micro-benchmark wants to measure specific use cases.

Since we've not yet a benchmark suite the most closer thing I know to compare two GraphDBs is the Blueprints unit tests that work on thousands of vertices, edges, properties, indexes, etc.

Even if we would have a benchmark suite, you'll always find someone that say you that the comparison is not fair for N reasons... This is the magic of the benchmark itself.

chubbard said...

From the looks of the test it's hard to say that OrientDB is faster than Neo4J out right. For one there a lot of swings in the data with Neo4J being 100,000x or 50,000x faster. However, the greatest swing in favor of OrientDB is 60,000x faster. The fact that there are such wild swings makes me think there might be a problem in the way the test was conducted. Assuming all is fine I think overall performance would be about the same if not a wash because the gains you get in one would be offset by the magnitude of the loss during the reads.

Luca Garulli said...

if you look at the test types Neo4J is faster than OrientDB only on counting and iteration. For the others OrientDB is far faster: add, update and delete.

Iteration doesn't means loading, but just browsing all the vertices or edges in the database (by the way to count OrientDB has own methods but are part of custom API outside Blueprints spec).

It's not a very common use case to browse all the database. This is the reason why I'm not interested in the improvement of it, but to every day improve all the others that are much more commons: read, add, update and delete of vertices and nodes.

pShah2k said...

Thanks for a great graph database. I am currently trying to read through all the information on your site and wiki.
I am primarily interested in search timings. Do you have any plans of doing any comparison regarding how long it takes to search for a node/property in the database?


Luca Garulli said...

these results are quite old, now OrientDB performs much better.

Search time? I'd like to see 3rd party benchmarks to avoid to see flame on my blog ;-)