Well, performance seemed good, specially in comparison with Neo4J, but the insertion time was not-linear and after some millions of records the engine slow down a lot. Furthermore a nasty bug in the management of multiple segments broke the import job after some minutes in his machine while on my PC all went correctly.
This is the reason why I stopped the development of the Cluster feature in favor of a strong review of the RB+Tree algorithm. Indexing has been alway fast, but the Mark Harwood's case stressed the engine as no one did until now. The reason is in the import process itself that executes random reads into the index while the RB+Tree seemed optimized for sequential insertion and retrieval.
I'm fixing all the pending issues but first benchmarks show a speed improvement in Indexing between 100% and 800%! Furthermore times now are really linear. Inserting the first millions of Vertices is the same of the last million. This is a big result to the already super-fast OrientDB engine.
Once finished the test phase (with new Unit tests, of course) I'll commit all the changes in SVN to be reviewed by the community before the next release.
4 comments:
Hello Luca,
good job!
It would be useful to the community, especially to produce fair benchmarks against other platforms, if you could share the full IMDB dataset, or provide a link to download it (or how to retrieve it from the source).
Thanks,
valerio
IMDB snapshots are available directly from the official site:
http://www.imdb.com/interfaces
I'd like to compare OrientDB performances with others with the same use case.
hi is it possibile / have you plan some benchmark on OrientDB as a triple-store? I mean: in order to have a comparizon on numebers of triples instead of numbers of edges?
Alfredo
Nothing of official. Why don't start some benchmark by your own? What RDF triple store are using now?
Lvc@
Post a Comment