Thursday, October 01, 2009

Terracotta as distributed DBMS? Bad idea!

Some months ago I have had a fucking genius idea about a new application. In order to start working to my idea I need to distributed at large tons of objects in several nodes around the Internet. After a deep research about all the best solutions available now I decided to try Terracotta.

Basically I need a real distributed Object repository so why don't create a simple Object DBMS virtually all in RAM and let to Terracotta most of the hard work?
  1. Handle the cache on nodes
  2. Manage the object/page fault in a transparent way for the application
  3. Share the load among nodes
  4. And obviously: read/write objects in persistent way
So I've developed a very simple library that uses the JDK 1.6+ TreeMap to collect data and to query them. I was surprised to discover how much terracotta makes a good job in hiding the dirty work and complexity of data distribution.

But before to think to use this piece of code in a production system I need to be sure that the objects became really persistent in ACID way. For this reason I developed some Test Cases as microbenchmarks to see the real throughtput and if data are written in synchronous way to the disk. This was my test using the library I wrote:

UserDatabase db = ServerNode.getInstance().connect("petshop", null, null);
clazz = db.getClazz("Animal");

for( int i = 0; i < 100000; ++i ){
UserObjectTransient object = clazz.createObject();

object.setValue("name", "Gaudi");
object.setValue("description", "My crazy ferret!");
object.setValue("from", "Barcelona, Spain");

Pretty nice, don't you? The Terracotta server instance was on a remote server with 1GB of heap and enought bandwitch available and the client runned on my laptop. Yes I know, this is a microbenchmark and can't tell to me the performance in all the scenarios. But to go on on my experiments I needed to know if I was totally crazy or if there was a way to build a robust & scalable solution to satisfy my needs.

Well, in this test my library + Terracotta (after some tuning by reading the documentation and the forum) are able to write around 190 objects per second. Not really bad for the first release and in comparison to a RDBMS solution. But I've stored just a dynabean with two properties... I remember in the past (about 8 years ago!) when I tried a real ODBMS (Orient ODBMS) it was able to store 100x of this solution using a 8-years-old hardware!

But the really bad news is another one: When the "for cycle 100.000" was finished and the application seemed to be succesfully ended I noted that the console had no control. The test was still running! After 10 seconds I killed the java process and counted the objects created (using the Terracotta console): 99.345!

Where are the 655 objects missed? Ok, probably I got wrong about ACID features. No, the Terracotta documentation tells ACID! But how it can be ACID if the client sends the objects to the server in asynchronous way?

I'm pretty confident that it would be exist in any point of configuration the real synchronous mode but even though it exists what about the performance? If this ASYNCHRONOUS solution was able for 190 obejcts/sec as throughtput the synchronous one must be much more slow!

These are the reasons why I abandoned this path. Terracotta surely fits well in traditional replication contexts and the product is really good on documentation, presentation, APIs, etc. But for my needs I have to continue in searching...