Wednesday, June 26, 2013

NoSQL is not schemaless

NoSQL datastores give us a lot of flexibility when it comes to putting data in. They don't need a fixed schema to accept data, and they're pretty good about handling data with mixed attributes. Consider I insert the following data into a mongodb store (I'll use Mongo to illustrate my point, but this could apply to most NoSQL stores):
[ { "first_name" : "Jeff", "last_name" : "Storey" } , { "color" : "red", "a_number" : 50 } ]
Sure, I can do this, but imagine trying to query this data. Mongo will let me write a query that selects all documents whose "first_name" is "Jeff" or whose "color" is "red," but this doesn't make much sense at an application level. In order for the data to be easily processed, it needs to conform to some schema.

Where having this flexibility is really useful is with a schema that changes over time - and what schema doesn't? Rather than having to add new columns, new constraints, etc to the database, mongo will happily allow you to add new columns. Now consider an original schema that looks like:
first_name, last_name, age
Then I realize I want to start collecting income data, and I change it to
first_name, last_name, age, income
For records that have income, just insert it. No database changes necessary!

Technically, yes, NoSQL stores can be schemaless, but that doesn't mean you shouldn't have a schema.