small medium large xlarge

30 Nov 2011, 16:12
Brian Tarbox (41 posts)

I’m a bit puzzled that you included Postgres in this book. Is the idea to use it as a “normal” db to compare the others against?

Also, I’m curious about the decision to focus on Riak rather than Cassandra.

Anyway, I’m looking forward to read this very timely book.

05 Dec 2011, 18:01
Aston J (18 posts)

I too am looking forward to reading this book, and am also wondering why these particular databases are included(?)

Would it be possible to include all of the main (or most popular) databases in a separate chapter perhaps? (If they’re not already) I’m not sure if others will be reading this book for the same reason as me, but I’m hoping it will leave me able to choose the best DB for my (Rails) projects - and maybe that won’t be so easy if some of the DB’s are not covered at all.

I liked how Eric showed a list of DBs/situations in his holly grail of databases slides::

I would also like to see which DB would also make a great fit for Rails - but I understand this is not a Rails book and that might be asking a bit too much.

11 Dec 2011, 17:15
Eric Redmond (21 posts)

The general idea was to cover a database of each of the major varieties: Relational, Column Family, Key/Value, Document, Graph. Then is was a matter of choosing which ones to cover. The thought process went like this:

PosgreSQL as relational DB because it’s mature, and MySQL is already well represented in books. Relational wasn’t chosen as a “normal” comparison - it’s a valid style of database. Note this book is not “Seven NoSQL DBs in Seven Weeks”. I would be remiss to ignore the most popular style that has been going strong for 40 years!

HBase as a columnar db because Jim knows it well, it’s growing in popularity. Cassandra also falls in this style but having two column databases seemed overkill, and besides, we cover the Dynamo style in Riak anyway, so there just wasn’t a place for it in this book.

Redis is an excellent example of a Key/Value store, being fast and full featured. Riak is technically another KV store, but it’s distributed in a Dynamo-style consistent hashing style with all sort of add-ons like map/reduce, secondary indexes, etc. It’s different enough to show how variable KV implementations can be.

Mongo was the first chapter I wrote as my favorite Document datastore at the time. Couch was added later because we were asked what the difference was between Mongo and Couch so often we figured it made sense to add it. It turns out they are much more distinct than the grapevine would have you believe.

Finally, I picked Neo4j as a Graph datastore. I had been following this project for years, and liked the direction it was going. It supports ACID transactions (which really turns many people’s expectations on their heads), it’s fast, and I really liked Gremlin.

I hope this helps, Eric

12 Dec 2011, 17:15
Jim R. Wilson (104 posts)

Chiming in with my thoughts on our choices. As Eric explained, when debating about what to include we identified five database styles that we wanted to cover: relational, key/value, columnar, document and graph. Within that, we had some choices.


  • PostgreSQL - We opted against MySQL for the reasons Eric mentioned, and because Postgres is the closest to the SQL specification of popular OSS databases. Presented first since “database” still means RDBMS to many people and as such we thought it’d be a good foundation for later chapters.


  • Riak - Excellent example of a key/value store which heavily uses REST and introduces mapreduce. Implements Amazon’s Dynamo paper closely.
  • Redis - Another key/value store, but this time focused on blazing speed and basic aggregation types (sets, lists, hashes, etc). Makes for a good contrast to Riak.


  • HBase - Archetypical column-oriented database modeled after Google’s BigTable. Opted against Cassandra because I was much more experienced at HBase, and because we wanted a strong BigTable representative.


  • MongoDB and CouchDB - Both popular document-oriented databases with ties to JavaScript, JSON and mapreduce. Initially, we thought that one would be enough, but when talking to people about the project, we were often asked about the differences between these two. Per Eric’s comment, they’re actually quite different under the surface, and we try to illuminate those differences.


  • Neo4j - Far and away the most popular and polished graph database is Neo4j, so it was a natural choice. In our opinion, graph databases are just starting their ascent into the collective consciousness of professional developers, and so we’re happy to give it a push in that direction. Certain classes of problems which are quite problematic for the other styles are handled easily by the graph model.

There are plenty of other databases that we didn’t include, as you might imagine. Early on, I had championed Lucene as a database to include. People tend to think of Lucene as a search engine, but it’s actually a document datastore that happens to have extensive indexing and query capabilities. Many databases and applications defer their full-text indexing and search tasks to Lucene, in fact.

Another notable mention is LDAP. LDAP is arguably one of the most successful NoSQL technologies out there. Almost any company that has a global workforce uses an LDAP server (ex: Microsoft’s ActiveDirectory, OpenLDAP, or Oracle’s OID server) to manage its employee data, at least with respect to authentication and authorization. It has a robust lisp-like query syntax and features eventual consistency across unreliable networks.

Having said all that, I’m extremely pleased with the list we finally settled on for the book, and I hope you will be too. Cheers!

You must be logged in to comment