Thursday, April 28, 2011

GraphDB benchmark part II

After some months since my last post about OrientDB (sorry but I prefer micro-blogging than blogging...) I'm back to write some news about the OrientDB engine.

The hard work has been the optimization at many levels:
  • minimize the wasted space created by set/delete operations (HOLES)
  • minimize marshalling/unmarshalling operations, specially on LinkSet type responsible of relationships between vertices and edges
  • fine tuning of Transactions
The main difficulty has been reduce the Disk I/O when you update a record. In facts when you execute an update, rarely the serialized content size will be the same of the original size. In this case you've a new HOLE, namely a free space marked to being reused.

The problems I found with the HOLES were that small spaces aren't reused at all and huge defragmentation was present. This caused a global slowness and the growth of the database on disk (in some cases many times the original size). After 2 weeks of work I've published in the SVN and maven the new version of the OrientDB storage with:
  • In-line defrag: something like some File Systems already do by joining small holes all together. In-line defrag works while the database is online and in use
  • Improved the management of small changes to records
  • 2 configurable strategies of how to find the best hole to join during defrag process
  • configurable hole distance to decide when to join multiple holes all together
The gain of overall speed has been perceived by a lot of users. Since there are not benchmarks against GraphDB yet I've re-run the TinkerPop Blueprints Test Cases (see my previous post about this).

This test suite is part of TinkerPop Blueprints project and is NOT a BENCHMARK, but just a lot of tests against GraphDB implementations to test the compliance level of them. So please don't flame about this. It's not an official benchmark, just a way to test how OrientDB performs in comparison with other GraphDBs and with the previous releases of the same OrientDB. More how OrientDB performs on different platforms. The comparison in this case is only with Neo4J (the market leader?) since DEX supports only few Test Cases and it wouldn't be fair. Note that both GraphDBs run with default settings.

These are the results on a cheap Linux server (Linux CentOS, Intel Atom Dual Core 330 1,6Ghz, 1GB Ram DDR2, HD U-ATA 7200rpm):

Test name Times in ms. Less is better = faster
OrientDB 1.0rc1 snapshot Neo4J 1.3 + faster, - slower
VertexTestSuite 11,190.80 48,354.85 +432,1%
EdgeTestSuite 6,421.55 24,361.78 +379,4%
GraphTestSuite 12,642.21 36,932.11 +292,1%
IndexableGraphTestSuite 1,173.55 2,618.00 +223,1%
IndexTestSuite 462.70 1,052.32 +227,4%
AutomaticIndexTestSuite 1,719.91 4,912.81 +285,6%
TransactionGraphTestSuite 1,603.59 4,237.17 +264,2%
GraphMLReaderTestSuite 1,291.93 2,365.85 +183,1%
Total 36,506.24 124,834.89 +342,0%

OrientDB completes all the tests in less than a third of the time of Neo4J: +342% faster than Neo4J!

The difference is lower running the same test against a MacBook Pro (OS X 10.6.7 64bit, 4GB Ram, CPU Intel core 2 duo 2.4ghz, HD 5400rpm):

Test name Times in ms. Less is better = faster
OrientDB 1.0rc1 snapshot Neo4J 1.3 + faster, - slower
VertexTestSuite 13,380.67 23,647.54 +176,7%
EdgeTestSuite 7,508.78 14,139.71 +188,3%
GraphTestSuite 9,514.58 18,664.99 +196,2%
IndexableGraphTestSuite 523.92 969.06 +185,0%
IndexTestSuite 263.58 640.80 +243,1%
AutomaticIndexTestSuite 1,672.92 2,943.86 +176,0%
TransactionGraphTestSuite 1,378.59 3,047.77 +221,1%
GraphMLReaderTestSuite 1,418.21 1,805.27 +127,3%
Total 35,661.25 65,859.00 +184,7%

In this case OrientDB completes all the tests in about half time: +184,7% faster than Neo4J. I was not able to complete all the tests on Windows 7 machine (Intel i7 720q, 4GB Ram, HD 7200rpm) since Neo4J gives errors and break the test suite:

Test name Times in ms. Less is better = faster
OrientDB 1.0rc1 snapshot Neo4J 1.3 + faster, - slower
VertexTestSuite 7,762.61 51,474.76 +663,1%
EdgeTestSuite 6,737.59 71,479.28 +1.060,9%
GraphTestSuite 7,280.43 75,695.77 +1.039,7%
IndexableGraphTestSuite 2,766.07 Error n.a.
IndexTestSuite 1,064.49 Error n.a.
AutomaticIndexTestSuite 2,796.08 Error n.a.
TransactionGraphTestSuite 3,471.80 Error n.a.
GraphMLReaderTestSuite 3,067.81 Error n.a.
Total 34,946.88 198,649.81 +921,2%

However by looking at the 3 available tests OrientDB outperforms Neo4J also on Windows machines of +921,2%!

Another interesting point is about the platforms. Linux CentOS performs very well even if the underlying HW is cheaper & older than the other 2 machines.

I'm pretty satisfied of these results in relation to the previous ones where Neo4J performed better in some circumstances. Thank you to all the OrientDB contributors and users that have made this possible!

It's not time to rest or sleep, because the work is not yet ended: all the efforts now are for the 1.0 release.

How to execute these test on my PC?

To re-execute the same test on your machine just install Java 6, Git and Apache Maven. Then execute these command in a shell (or command prompt if you've MS Windows):

> git clone git://github.com/tinkerpop/blueprints.git
> cd blueprints
> mvn install

Now wait that all the software is compiled and all the tests start. Now you've your results. Please share them to the OrientDB Group attaching your HW/SW configuration!