Five Reasons to Use NoSQL

You’re not using a NoSQL database right now? I’m not all that surprised. Many IT shops are still evaluating moving from SQL Server 2000 to SQL Server 2005, much less these non-relational databases. A lot of people don’t even know what they would use a NoSQL database for. Does it replace the RDBMS? Work alongside it? Do something else?

1) Analytics

One reason to consider adding a NoSQL database to your corporate infrastructure is that many NoSQL databases are well suited to performing analytical queries. Developers can use the same querying languages to perform analytical queries that they’re using to perform atomic queries. Typically this will be some variation of a MapReduce query, but it’s also possible to query data using Pig or Hive. Don’t worry too much about these weird language terms, MapReduce is a fancy way of saying “SELECT and then GROUP BY” and doing it in a way that is entirely confusing to people who are used to SQL.

Many NoSQL systems boast phenomenal write performance. When you combine high write performance with batch processing it is easy to pre-aggregate data, summarize results, and still guarantee ad hoc query performance.

2) Scale

NoSQL databases are designed to scale; it’s one of the primary reasons that people choose a NoSQL database. Typically, with a relational database like SQL Server or Oracle, you scale by purchasing larger and faster servers and storage or by employing specialists to provide additional tuning. Unlike relational databases, NoSQL databases are designed to easily scale out as they grow. Data is partitioned and balanced across multiple nodes in a cluster, and aggregate queries are distributed by default. Scaling is as easy as racking a new server and executing a few commands to add the new server to the cluster (yeah, it really is that easy). Data will start flowing and you’ll back in business in no time.

3) Redundancy

In addition to rapid scaleability, NoSQL databases are also designed with redundancy in mind. These databases were designed and built at massive scales where the rarest hardware problems go from being freak events to eventualities. Hardware will fail. Rather than treat hardware failure as an exceptional event, NoSQL databases are designed to handle it. While hardware failure is still a serious concern, this concern is addressed at the architectural level of the database, rather than requiring developers, DBAs, and operations staff to build their own redundant solutions. Cassandra uses a number of heuristics to determine the likelihood of node failure. Riak takes a different approach and can survive network partitioning (when one or more nodes in a cluster become isolated) and repair itself.

4) Flexibility

What’s the use of a database if it’s not flexible? While the data modeling issues are completely different in NoSQL, there is a large amount of flexibility in how data is stored for performance.

Databases modeled like Bigtable and Cassandra provide flexibility around how data is stored on disk. It’s possible to create derived column families. In plain English: you can design your database to duplicate frequently accessed data for rapid query response. This is, of course, based on the assumption that writes and storage space are cheap.

Databases based on the Bigtable model also have another benefit – outside of key structure it’s possible to store a variety of disparate data in the same table. Structure is largely irrelevant. Relational databases have adopted features to solve similar problems (such as sparse columns in SQL Server), but they carry overhead. Storing wildly different columns in multiple rows of the same column family is so cheap as to be invisible in a NoSQL database.

Lastly, key-value stores provide an incredible level of flexibility. Data is arbitrarily stored as a value. Key-value databases make it possible to store images, word documents, strings, integers, and serialized objects within the same database. This requires more responsibility and creative thinking on the part of application developers and architects but it also lets the people designing the system build custom a completely custom solution that fills their needs.

5) Rapid Development

Let’s face facts: everyone wants their application to be faster, have more features, and they want it yesterday. NoSQL databases make it easy to change how data is stored or change the queries you’re running. Massive changes to data can be accomplished with simple refactoring and batch processing rather than complex migration scripts and outages and it’s even easier to take nodes in a cluster offline for changes and add them back into a cluster as the new master server – replication features will take care of syncing up data and propagating the new data design out to the other servers in a cluster.

Comments

9 Comments so far. Comments are closed.
  1. I have to take issue with several statements.

    First, “NoSQL” is not a product, and is much fuzzier than even “RDBMS”
    So while individual statements in your post may apply to specific techbnologies, NONE of them apply to all NoSQL technologies

    Secondly, writing this way is misleading. By your post, if I’ve got to scale, I *must* go NoSQL.

    Number 1, there are tests showing MySQL scaling better than MongoDB.
    Number 2, one of the world’s largest databases is in fact an RDBMS (LHC producing 2 Gb of data a SECOND runs Oracle RAC)
    Number 3, “Rapid Development” . Errr… I would *definitely* say that getting SQL server installed and creating my first DB would be much faster than a Cassandra or Hadoop.
    Detailed customisation, yes, definitely easier. But not for garden / vanilla instances.

    IMHO NoSQL is for edge cases, places where the traditional tools break down.
    Those might be
    A) where the hard structure doesn’t work (document storage perhaps, text analytics almost certainly)
    B) The volume of the data is excessive, and consistency across nodes is not required to be instantaneous. These 2 points can’t be unlinked!!! Banking can NOT use NoSQL, scientific experiments can’t (and wont’, XLDB is most likely where this will go, custom LHC implementations aside)- Facebook can, because it doesn’t matter if your friend across the world sees a 5 minute old status that you’ve updated since.
    C) Other

    To answer your questions

    Does it replace the RDBMS? Work alongside it? Do something else?
    No, maybe, and Yes

    • I agree with you about the NoSQL term being remarkably vague. You’re right, my reasons to use some kind of NoSQL database where chosen based on the pluses of a variety of systems. I firmly believe that the idea of ‘polyglot persistence’ is going to be important in the future.

      There are tests that show MySQL on a single node scaling better than MongoDB. There are also tests showing a other NoSQL databases scaling better than MySQL when you involve multiple nodes. The MongoDB test was acknowledged to be problematic because the MongoDB driver has not seen nearly the level of optimization of the MySQL driver. When adjusted for more CPUs, MongoDB outperformed MySQL.

      The LHC also uses MongoDB for data collection.

      You’re right, Hadoop and Cassandra for rapid development wouldn’t be good. But MongoDB and Riak answer the rapid development question very easily – just change the model in code and keep on going.

      Like you said, these products all serve different purposes. I wouldn’t want to replace an RDBMS with Cassandra (I might replace an analytical database with it, though). If my primary need is atomic commits/reads, I might replace an RDBMS with MongoDB.

      I think that RDBMSes are a great tool for the bulk of all use cases. I wouldn’t suggest completely abandoning the RDBMS (not yet, at least). But I also think the RDBMS is overkill for a lot of scenarios and poorly suited for others. It’s all about choosing your tools well.

  2. SDC,

    ‘Structure is largely irrelevant. ‘

    Ay chihuahua. This is a selling point?

    I am pro new, shiny things, and I definitely believe NoSQL (and we really need to come up w/ a name that doesn’t define this as something it’s not – it kind of minimizes its value) has its place (Google, Amazon, LinkedIn, etc can’t all be wrong).

    Still, this article seems like it could really use some grains of salt. As Ted Dziuba once said (he’s got a potty mouth, but the dude does speak from hard-earned experience, including working at Google) ‘You are not Google’.

    Actually he had a pretty funny alternative viewpoint on this, I can’t wait for NoSQL to die. I also did a much less critical, somewhat more technical review of these things here. Dang. That’s old. I need to update that thing…

  3. Mark Shay,

    You make a strong case for NoSQL but I think most are still bit scared to go down that road.

    • yeah, I’m completely agree with Mark Shay, most are still bit scared to go down that road. And one more question does it really worth-full for them to go off the track.

  4. EarlyDoors,

    In your FLEXIBILITY paragraph you state:
    This requires more responsibility and creative thinking on the part of application developers and architects …..

    As an application developer from the RDBMS mainstream can you point me in the direction of sources discussing the kind of approach I need to be thinking about when coding applications for NoSQL cloud databases ?

  5. I just noticed your article, on five Reasons to Use NOSQL. Your first reason is analytics. Could you explain to me how one could achieve the following two queries in NOSQL:

    1. Give me a breakdown of Sales Revenue Amount and Sales Volume by customer by product by week.

    2. Give me Total Premiums Paid Amount and Total Profitability Amount for Households by Policy Type, and order the result by the Org Unit that sold the Product and the Org Unit that owns the Product, and roll it up from Product Type to Product Subgroup to Product Group.

    Each of these can be done with SQL, although #2 is a data design and query challenge. Yet, I am having a challenge to understand how these queries can be done in any form of NOSQL, especially key/value or column/family data stores.

    Thank you.

    • These are huge questions and are the subject of a lot of separate blog posts. Other people have done this topic much more justice that I am willing and able to put in the comment on a blog. However, you can look at videos from the Cassandra Summit 2012 to get an idea of how people are solving similar problems today.

      The underlying answer, though, is that your ability to solve these problems in a non-relational database is largely driven by your ability to design the appropriate data structures. Once you think about optimizing your data model for read performance instead of write performance you’ll find that many of these problems become pretty easy to solve.

This site is protected with Urban Giraffe's plugin 'HTML Purified' and Edward Z. Yang's Powered by HTML Purifier. 531 items have been purified.