Three Things to Watch with NoSQL
Last week I painted a rosy picture of NoSQL databases. Before you deploy any kind of NoSQL database, you need to be aware of the potential pitfalls of NoSQL databases.
1) Architecture
NoSQL databases have different architectural concerns than traditional RDMBSes. If you’re wokring with a BigTable or Dynamo database, your network backbone will need to be able to handle the increased traffic from replication and materializing MapReduce data. Dynamo databases are very chatty – reads and writes are only guaranteed when a specific number of servers respond to a query. MapReduce queries run across many nodes in a cluster. One stage processing a MapReduce query is temporarily materializing intermediate results from each node. The intermediate results may then be shipped to another server for additional processing, aggregation, or just be combined into the final result set.
2) Querying Strategy
RDBMSes are able to perform a mix of batch and ad hoc querying. They produce reasonably fast results for doing online aggregation and analysis of data. It’s just as easy to write a query that returns a single row from a relational database as it is to write a query that aggregates data from several thousand rows. Not all NoSQL databases are created equally. Hadoop and Cassandra are built to perform large scale analytical queries. Their implementation of MapReduce is designed to process batches of analytical results. Frequently these results are going to be pre-aggregated rather than delivered in real time. Riak and MongoDB, on the other hand, are designed to handle interactive querying.
3) Data Modeling
Data modeling is a tricky field of study in the world of relational databases. A huge number of books have been written on the subject and they deal with both the theoretical and practical aspects of modeling data. In short, while data modeling is tricky it is also a well understood field of study. Data modeling in the non-relational world is an entirely new field. Many problems that have been solved in relational data modeling are new and require a different way of thinking. BigTable databases, while their structure may initially appear familiar, require different modeling techniques to get the most out of the data store. MongoDB’s document oriented approach makes a great deal of sense to developers, but it still poses a number of data modeling questions about how to link related objects and how deeply nested object graphs should be in the database. Key/Value stores, like Riak, have very little structure and give the application developer a huge amount of freedom to model data.
Summation & Further Reading
NoSQL Databases hold a lot of promise to help businesses rapidly respond to data growth. Whenever you’re planning to make a major infrastructure decision, you need to weigh the pros and cons before determining which solution is right for you. While NoSQL databases provide answers to a lot of questions, they also pose a different set of questions that require a different set of skills to answer. NoSQL Glossary MongoDB – Sharding with the Fishes Hadoop for Batch Processing (from the Yahoo! Developer Network) Schema Design in Riak Schema Design for Riak (presentation slides)