You’re not using a NoSQL database right now? I’m not all that surprised. Many IT shops are still evaluating moving from SQL Server 2000 to SQL Server 2005, much less these non-relational databases. A lot of people don’t even know what they would use a NoSQL database for. Does it replace the RDBMS? Work alongside it? Do something else?
One reason to consider adding a NoSQL database to your corporate infrastructure is that many NoSQL databases are well suited to performing analytical queries. Developers can use the same querying languages to perform analytical queries that they’re using to perform atomic queries. Typically this will be some variation of a MapReduce query, but it’s also possible to query data using Pig or Hive. Don’t worry too much about these weird language terms, MapReduce is a fancy way of saying “SELECT and then GROUP BY” and doing it in a way that is entirely confusing to people who are used to SQL.
Many NoSQL systems boast phenomenal write performance. When you combine high write performance with batch processing it is easy to pre-aggregate data, summarize results, and still guarantee ad hoc query performance.
NoSQL databases are designed to scale; it’s one of the primary reasons that people choose a NoSQL database. Typically, with a relational database like SQL Server or Oracle, you scale by purchasing larger and faster servers and storage or by employing specialists to provide additional tuning. Unlike relational databases, NoSQL databases are designed to easily scale out as they grow. Data is partitioned and balanced across multiple nodes in a cluster, and aggregate queries are distributed by default. Scaling is as easy as racking a new server and executing a few commands to add the new server to the cluster (yeah, it really is that easy). Data will start flowing and you’ll back in business in no time.
In addition to rapid scaleability, NoSQL databases are also designed with redundancy in mind. These databases were designed and built at massive scales where the rarest hardware problems go from being freak events to eventualities. Hardware will fail. Rather than treat hardware failure as an exceptional event, NoSQL databases are designed to handle it. While hardware failure is still a serious concern, this concern is addressed at the architectural level of the database, rather than requiring developers, DBAs, and operations staff to build their own redundant solutions. Cassandra uses a number of heuristics to determine the likelihood of node failure. Riak takes a different approach and can survive network partitioning (when one or more nodes in a cluster become isolated) and repair itself.
What’s the use of a database if it’s not flexible? While the data modeling issues are completely different in NoSQL, there is a large amount of flexibility in how data is stored for performance.
Databases modeled like Bigtable and Cassandra provide flexibility around how data is stored on disk. It’s possible to create derived column families. In plain English: you can design your database to duplicate frequently accessed data for rapid query response. This is, of course, based on the assumption that writes and storage space are cheap.
Databases based on the Bigtable model also have another benefit – outside of key structure it’s possible to store a variety of disparate data in the same table. Structure is largely irrelevant. Relational databases have adopted features to solve similar problems (such as sparse columns in SQL Server), but they carry overhead. Storing wildly different columns in multiple rows of the same column family is so cheap as to be invisible in a NoSQL database.
Lastly, key-value stores provide an incredible level of flexibility. Data is arbitrarily stored as a value. Key-value databases make it possible to store images, word documents, strings, integers, and serialized objects within the same database. This requires more responsibility and creative thinking on the part of application developers and architects but it also lets the people designing the system build custom a completely custom solution that fills their needs.
5) Rapid Development
Let’s face facts: everyone wants their application to be faster, have more features, and they want it yesterday. NoSQL databases make it easy to change how data is stored or change the queries you’re running. Massive changes to data can be accomplished with simple refactoring and batch processing rather than complex migration scripts and outages and it’s even easier to take nodes in a cluster offline for changes and add them back into a cluster as the new master server – replication features will take care of syncing up data and propagating the new data design out to the other servers in a cluster.