2 July 2009 - 16:47Key-value stores and relational databases

Relational databases are going thru a rough patch right now, some pundits going as far as writing them off. Their main problem is the fact that they are constrained to run within the same physical box (*) and scaling out is pretty hard. Once the datasets reach a certain size you will probably need to look beyond the typical relational database at other ways to store your data.

One type of alternative datastore that is getting a lot of attention is the distributed key-value store which maps values to keys and then assign keys onto multiple storage nods according to a hash function (**). In such a setting you get an object thru a key, you work with it and then you save it back in the datastore. The fact that this configuration scales out easily (***) makes this datastore very appealing, and if you are working with large datasets you will probably have no choice but to use something like it.

The transition from relational databases to key-value stores will probably include taking relations out of your application, re-modelling your application in order to group data together and streaming data to a reporting database. Relational databases gave the user both data storage as well as relations between entities which came as both a blessing and a curse (while relations made reporting easier they also gave un-checked access to data which allowed all sorts of corners to be cut). Well, key-value stores will take relations out: data access is more local, you typically have access only to what is immediate to a particular entity and you cannot perform queries spanning your entire data model. This type of data access could turn out to be actually a good thing, because these very close relationships will force an application to have a cleaner design since you will not be able to rely on monster SQL statements for compensating for design deficiencies.

In an application which needs to handle massive amounts of data relations between data will be delegated to activities which require them (reporting is a pretty good example) and data will be streamed from the key-value stores to the relational databases where these activities take place.
All in all, a pretty good division of tasks: applications with a (hopefully) cleaner model relying on key-value stores streaming data to relational databases for reporting.

All the above doesn’t mean that relational databases will disappear, the vast majority of applications do not require to process the massive amounts of data which render your typical database unpractical. Relational databases will be around for quite a while.

* Obviously, there are database clusters (such as Oracle RACs), but they are pretty costly.

** Examples of key-value stores are Google’s Big Table, Amazon’s Dynamo and IBM’s eXtreme scale.

*** Actually, there are constraints attached to it. In order to get the best performance you need to model your application so that data access occurs within the same node in order to avoid a distributed transaction spanning across multiple nodes. Please read this article by Billy Newport from IBM for a better understanding.

No Comments | Tags: Development, Management

Add a Comment