A Technical Plan for 2011

January 13, 2011

The Last Ten Years

My career has been particularly interesting. I’ve been very fortunate to work with a variety of different languages, platforms, databases, frameworks, and people. I started off working with Perl on HP-UX. As I started automating more of my job, I added ASP.NET to the mix. Eventually I learned about databases, first with Oracle, then SQL Server, then with PostgreSQL, and finally back to SQL Server. Along the way I’ve held job a variety of different job titles – system administrator, system engineer, developer, consultant, architect, and database administrator. I’ve worked with a lot of different systems, architectures, and design philosophies. The one thing that’s stuck with me is that there is no one size fits all answer. That extends beyond languages and design patterns – it goes right down to the way we’re storing data. One of the most interesting things going on right now is that it’s easier than ever to pick the right tools for the job.

The Next Twelve Months

Over the next twelves months, I’m going to be digging into hybrid database solutions. Some people call it polyglot persistence. You can call it what you like, but the fact remains that it is no longer necessary to store all of our data in a relational database. Frankly, I’m encouraging people to look into ways to store their data outside of a relational database. Not because RDBMSes are bad or wrong, but because there is a lot of data that doesn’t need to be in a relational database – session data, cached data, and flexible data.

Why Focus on Hybrid Data?

The idea behind hybrid data is that we use multiple databases instead of one database. Let’s say that we have an online store where we sell musical equipment. We want to store customer data in a relational database, but where should we store the rest of our information? Conventional thinking says that we should keep storing our data in a relational database. Sessions might be stored in memory somewhere and shopping carts might get stored in the database, but they’ll end up on faster, more expensive, solid state drives. There are other ways to store data. Let’s think about all of this for a minute. Why do we force our databases into existing paradigms? Why aren’t we thinking about new and interesting ways to store our data? Sessions are a great place to start. Sure, we could use something like memcached, but why not examine Redis or App Fabric Cache? Both of these databases have support for strongly typed data. They both allow the data to be persisted to disk, if needed, and they allow for data to be expired over time. This is perfect for working with any kind of cached data – it stays in a format our applications need but we can expire it or save it as needed. The flexibility to store our data the way that applications use it is important. Session data should be rapidly accessible. Other applications don’t need to read it. It doesn’t need to be reportable. It merely needs to be fast. Shopping carts are different. Amazon’s own use cases and needs drove the development of Dynamo to be a durable, eventually consistent, distributed key-value store. Shopping carts are write heavy environments. It’s rare that users need to view everything that’s in a shopping car, but they need to be able to review it quickly when the time comes. Likewise, when the time comes to review a shopping cart, any delay or slowdown means there’s a chance the user will simply abandon the cart. Dynamo fills these requirements quite well. Since Dynamo is only available inside of Amazon, how are we supposed to work with it ourselves? Riak is a clone of Dynamo that meets our need for a shopping cart. It’s a key/value database; it’s fault tolerant, and it’s fast. Why not store a shopping cart in a relational database? It is, after all, a pretty simple collection of a user identifier, an item number, an item description, price, and quantity. Shopping carts are highly transient. Once an order has been placed, the shopping cart is cleared out and the data in the cart moves into the ordering system. Most shopping carts will be active for a very short period of time – a matter of minutes at most. Over their short lives shopping carts will almost entirely be written to and only read a few times. Instead of building complex sharding mechanisms to spread load out across a number of database servers, why not use a database designed to handle large load spread across a number of servers?

Where Does This Fit Into the Enterprise?

Enterprises should be adopting these technologies as fast as they can. Not because they are replacing the relational database, but because they free the relation database from things it’s bad at and leave it to perform tasks that it excels at. Relational databases are great for core business logic – they have a lot of baked in functionality like data integrity and validation. As we’ve already discussed, relational databases are not well suited to storing highly volatile data. By moving volatile data into better suited types of database, enterprises can increase the capacity of their database systems, provide redundancy, and increase scalability by using off the shelf solutions. The trick, of course, lies in integrating them. And that is what I’m going to be playing around with this year.