Category syndication

What You’re Missing About Stateless Computing

Once, long ago, men carved their knowledge into the walls of caves so that it would be available for all time. Unfortunately, their knowledge was tied to once place. Eventually, a forward thinking cave dweller thought about carving his knowledge into a clay tablet. He was savagely beaten to death and his family burned as witches. Eventually the other cave dwellers realized that it was probably a good idea to have a more portable way to store their knowledge and they too adopted this portable clay-based knowledge transfer system.

Fast forward to the tail end of 2010 and people are saying that Microsoft has got it wrong with the Azure VM role. People are already lambasting it as a laughable concept that’s needlessly complex to patch. I can’t argue with that, it is complex to patch, but there’s a reason for that complexity and it’s called stateless computing.

It’s like the switch from procedural/object-oriented programming to functional programming. When you first switch, you get pissed off that you can’t reassign variables and that functions can’t have side effects. You get used to that pretty quickly and start doing crazy things with tail recursion and other functional paradigms that ultimately save you memory. remove debugging headaches, and give you an incredible amount of computing stability.

With the last paragraph in mind, let’s look at Azure VMs again – we can’t patch the VM directly. It’s stateless. What does a stateless VM buy you?

  • It’s easy to spin up additional, identical VMs. There’s no worrying if some master image is the same: it is.
  • It’s easy to back out incompatible patches – just remove the differencing VHD.
  • There are no side effects because of errant garbage living on the C: drive.
  • Security – if a virus infects your VM, just reboot.
  • Complex, time consuming patches can be applied once and quickly moved into place.
  • There’s a load balancer in front of every Azure instance. Operations must be idempotent, even when executed against different instances.

Managing state isn’t a component of your operating system in Azure, it’s a component of the storage tier. New paradigms require new ways of thinking. Sometimes a new way of thinking seems broken, wrong, or foolish.

If you’re looking to customize your Azure deployment stack without sacrificing the flexibility of using Azure, then Azure VM roles are for you.

If you’re looking for a replacement for your current VM Ware installation, Microsoft’s Azure VM roles aren’t for you. But while you’re fiddling around with VM settings, I’m going to be playing Scrabble. 

Default Values, Triggers, and You

A friend of mine sent me an email the other day asking about default values in SQL Server. I realized that I’ve had to think about this a few times over the years and I’ve been asked about it more than once, too.

Setup

We need a table first, right? We’ll also want a few sample rows in there.

CREATE TABLE Employees (
  emp_id INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
  emp_name varchar(50) NULL);
GO

INSERT INTO Employees (emp_name) VALUES ('a');
INSERT INTO Employees (emp_name) VALUES ('b');
GO

SELECT * FROM Employees;

/*
emp_id      emp_name
----------- --------------------------------------------------
1           a
2           b
*/

How Defaults Work

So far we just have two rows in our two column table. It’s pretty boring. Let’s add a default value:

ALTER TABLE Employees
ADD last_modified DATETIME NULL CONSTRAINT Employees_last_modified
DEFAULT CURRENT_TIMESTAMP;
GO

We might as well add some new rows while we’re having fun with our employees, right?

INSERT INTO Employees (emp_name) VALUES ('c');
INSERT INTO Employees (emp_name) VALUES ('d');

What’s it look like now?

SELECT * FROM Employees;

/*
emp_id      emp_name                                           last_modified
----------- -------------------------------------------------- -----------------------
1           a                                                  NULL
2           b                                                  NULL
3           c                                                  2010-12-06 18:21:37.787
4           d                                                  2010-12-06 18:21:37.787
*/

Hold up. Employees 1 and 2 don’t have a last_modified value. Why not? Well, that’s because we’ve told SQL Server that our last_modified column can allow NULLs. They’re allowable in our table. If we wanted to automatically provide a default value when we added the constraint, we could do so by specifying the datatype as DATETIME NOT NULL. A best practice would be to add the column as a NULLable data type, add a value for all NULL rows, and set the column to NOT NULL.

If we do want to update a NULLable column and set it to the default value, we just issue an update using the DEFAULT keyword for the value. If that makes no sense, perhaps this example will help:

UPDATE Employees
SET last_modified = DEFAULT
WHERE last_modified IS NULL ;

What About Updates?

UPDATE Employees
SET emp_name = 'zzz'
WHERE emp_id = 3;

SELECT *
FROM Employees
WHERE emp_id = 3;

/*
emp_id      emp_name                                           last_modified
----------- -------------------------------------------------- -----------------------
3           zzz                                                2010-12-06 18:21:37.787
*/

As you can see, when we update employee 3, it doesn’t change the value of last_modified. That’s because the default value is only set on insert. We could specify DEFAULT in our UPDATE statement, but then we’d need to specify that every time we update the table. What can we do?

The Answer is Triggers

That’s right: triggers. If we want to track the modification timestamp of an object in the database, we need to use a trigger to keep things updated:

CREATE TRIGGER TR_Employees$AfterUpdate ON dbo.Employees
AFTER UPDATE
AS
BEGIN
  UPDATE  e
  SET     e.last_modified = CURRENT_TIMESTAMP
  FROM    dbo.Employees e
          JOIN inserted i ON e.emp_id = i.emp_id;
END
GO

UPDATE Employees
SET emp_name = 'asdf'
WHERE emp_id = 1;

SELECT * FROM Employees;

/*
emp_id      emp_name                                           last_modified
----------- -------------------------------------------------- -----------------------
1           asdf                                               2010-12-06 18:34:04.340
2           b                                                  NULL
3           zzz                                                2010-12-06 18:21:37.787
4           d                                                  2010-12-06 18:21:37.787
*/

And that, my friends, is how we keep a modification timestamp up to date.

New Uses for NoSQL

We all know that you can use NoSQL databases to store data. And that’s cool, right? After all, NoSQL databases can be massively distributed, are redundant, and really, really fast. But some of the things that make NoSQL database really interesting aren’t just the redundancy, performance, or their ability to use all of those old servers in the closet. Under the covers, NoSQL databases are supported by complex code that makes these features possible – things like distributed file systems.

What’s a Brackup?

Brackup is a backup tool. There are a lot of backup tools on the market, what makes this one special?

First, it’s free.

Second, it’s open source; which means it’s always going to be free.

Third, it can chunk your files – files will be crammed into chunks for faster access and distributed across your backup servers. Did you know that opening a filehandle is one of the single most expensive things you can ever do in programming?

Fourth, it supports different backends.

It Can Backup to Riak

I’ve mentioned Riak a few times around here. Quick summary: Riak is a distributed key-value database.

So?

So, this means that when you take a backup, Brackup is going to split your data into different chunks. These chunks are going to be sent to the backup location. In this case, the backup location is going to be your Riak cluster. As Brackup goes along and does its work, it sends the chunks off to Riak.

Unlike sending your data to an FTP server or Amazon S3, it’s going to get magically replicated in the background by Riak. If you lose a backup server, it’s not a big deal because Riak will have replicated that data across multiple servers in the cluster. Backing up your backups just got a lot easier.

Why Is the NoSQL Part Important?

NoSQL can be used for different things. It’s not a just a potential replacement for an RDBMS (and the beginning of another nerd holy war). Depending on the data store and your purpose, you can use a NoSQL database for a lot of different things – most notably as a distributed file system. This saves time and money since you don’t have to buy a special purpose product, you can use what’s already there.

Comparing MongoDB and SQL Server Replication

MongoDB has replication built in. So does SQL Server, Oracle, DB2, PostgreSQL, and MySQL. What’s the difference? What makes each MongoDB a unique and special snowflake?

I recently read a three part series on MongoDB repication (Replication Internals, Getting to Know Your Oplog, Bending the Oplog to Your Will) in an effort to better understand MongoDB’s replication compared to SQL Server’s replication.

Logging Sidebar

Before we get started, it’s important to distinguish between the oplog and MongoDB’s regular log. By default, MongoDB pipes its log to STDOUT… unless you supply the --logpath command line flag. Logging to STDOUT is fine for development, but you’ll want to make sure you log to a file for production use. The MongoDB log file is not like SQL Server’s log. It isn’t used for recovery playback. It’s an activity log. Sort of like the logs for your web server.

What’s The Same?

Both MongoDB and SQL Server store replicated data in a central repository. SQL Server stores transactions to be replicated in the distribution database. MongoDB stores replicated writes in the oplog collection. The most immediate difference between the two mechanisms is that SQL Server uses the transaction as the demarcation point while MongoDB uses the individual command as the demarcation point.

All of our transactions (MongoDB has transactions… they’re just only applied to a single command) are logged. That log is used to ship commands over to a subscriber. Both SQL Server and MongoDB support having multiple subscribers to a single database. In MongoDB, this is referred to as a replica set – every member of the set will receive all of commands from the master. MongoDB adds some additional features: any member of a replica set may be promoted to the master server if the original master server dies. This can be configured to happen automatically.

The Ouroboros

The Ouroboros is a mythical creature than devours its own tail. Like the Ouroboros, the MongoDB oplog devours its own tail. In ideal circumstances, this isn’t a problem. The oplog will happily write away. The replica servers will happily read away and, in general, keep up with the writing to the oplog.

The oplog file is a fixed size so, like the write ahead log in most RDBMSes, it will begin to eat itself again. This is fine… most of the time.

Unfortunately, if the replicas fall far enough behind, the oplog will overwrite the transactions that the replicas are reading. Yes, you read that correctly – your database will overwrite undistributed transactions. DBAs will most likely recoil in horror. Why is this bad? Well, under extreme circumstances you may have no integrity.

Let’s repeat that, just in case you missed it the first time:

There is no guarantee of replica integrity.

Now, before you put on your angry pants and look at SQL Server Books Online to prove me wrong, this is also entirely possible with transactional replication in SQL Server. It’s a little bit different, but the principle still applies. When you set up transactional replication in SQL Server, you also need to set up a retention period. If your replication is down for longer than X hours, SQL Server is going to tell you to cram it up your backside and rebuild your replication from scratch.

Falling Behind

Falling behind is easy to do when a server is under heavy load. But, since MongoDB avoids writing to disk to increase performance, that’s not a problem, right?

Theoretically yes. In reality that’s not always the case.

When servers are under a heavy load, a lot of weird things can happen. Heavy network traffic can result in TCP/IP offloading – the network card can offload work to the CPU. When you’re using commodity hardware with commodity storage, you might be using software RAID instead of hardware RAID to simulate one giant drive for data. Software RAID can be computationally expensive, especially if you encounter a situation where you start swapping to disk. Before you know it, you have a perfect storm of one off factors that have brought your shiny new server to its knees.

In the process, your oplog is happily writing away. The replica is falling further behind because you’re reading from your replica and writing to the master (that’s what we’re supposed to do, after all). Soon enough, your replicas are out of sync and you’ve lost data.

Falling Off a Cliff

Unfortunately, in this scenario, you might have problems recovering because the full resync also uses a circular oplog to determine where to start up replication again. The only way you could resolve this nightmare storm would be to shut down your forward facing application, kill incoming requests, and bring the database back online slowly and carefully.

Stopping I/O from incoming writes will make it easy for the replicas to catch up to the master and perform any shard reallocation that you need to split the load up more effectively.

Climbing Gear, Please

I’ve bitched a lot in this article about MongoDB’s replication. As a former DBA, it’s a scary model. But I’ve bitched a lot in the past about SQL Server’s transactional replication – logs can grow out of control if a subscriber falls behind or dies – but it happens with good reason. The SQL Sever dev team made the assumption that a replica should be consistent with the master. In order to keep a replica consistent, all of the undistributed commands need to be kept somewhere (in a log file) until all of the subscribers/replicas can be brought up to speed. This does result in a massive hit to your disk usage, but it also keeps your replicated databases in sync with the master.

Just like with MongoDB, there are times when a SQL Server subscriber may fall so far behind that you need to rebuild the replication. This is never an easy choice, no matter which platform you’re using, and it’s a decision that should not be taken lightly. MongoDB makes this choice a bit easier because MongoDB might very well eat its own oplog. Once that happens, you have no choice but to rebuild replication.

Replication is hard to administer and hard to get right. Be careful and proceed with caution, no matter what your platform.

At Least There is a Ladder

You can climb out of this hole and, realistically, it’s not that bad of a hole. In specific circumstances you may end up in a situation where you will have to take the front end application offline in order to resync your replicas. It’s not the best option, but at least there is a solution.

Every feature has a trade off. Relational databases trade integrity for performance (in this case) whereas MongoDB trades immediate performance for potential maintenance and recovery problems.

Further Reading

MongoDB

SQL Server

What I’m Reading 2010-11-05

A Bit of Troubleshooting

A client recently asked me for help with their SQL Server environment. It seems that replication was running slowly and was getting further and further behind – replication had been turned off during heavy data modification and was turned on after several days.

Protip: This is why it’s important to have a full checklist for everything that you do on a server.

Check Everyone’s Health

When you have a complicated system you want to take a look at everything, not just the symptoms of the problem. This happens in medicine, economics, and manufacturing. Why shouldn’t we do it in the datacenter?

The very first thing I did was take a look at the health of the publication server. That server was running well within normal parameters – there were no readily apparent disk I/O, memory, or CPU problems. Since the distributor lives on the publication, that was covered as well.

On a lark, I checked all of the other subscribers. They were also functioning normally. I did this to make sure that were weren’t seeing glaring performance problems on one subscriber that were really a symptom of a problem with the replication set up.

Everything was healthy… except one subscription.

The Problem Child

Having ruled out an unknown problem on the other servers, I took a look at the rest of the issues on the problem server. I found a few underlying issues and was quickly able to figure out that the poorly performing replication was only a symptom of the problem.

Oh I/O

When I started digging deeper and looked at the wait stats and I/O activity, I was in for a huge shock- there were queries that had been running for close to a day!

Digging deeper, there were two queries that were causing major performance problems. The first was a daily bulk data load. It read from the replicated tables, so if there was going to be heavy contention on those tables, this might be part of the problem. Luckily, the bulk load had been re-written long ago to use small batches so that the transaction log didn’t grow out of control. Rampant transaction log growth had been a huge problem when the server had tiny log drives – the longer running jobs were re-written using a WHILE loop to read blocks of data and produce smaller, explicit transactions. This design also makes it possible to stop and restart the job whenever you want.

I immediately killed the bulk load job and looked into the second query. This was the nightly index maintenance script. It had been happily chugging away for over 24 hours and was chewing through more disk that I thought was possible (probably because I was never awake at two in the morning to watch the job run). Figuring that bad indexes were a better option than thrashing disks, I killed the index defragment query and moved on to the next problem.

My Memory’s Not What It Used To Be

Turns out that the server was running low on memory. This server has two purposes – it’s both an ad hoc reporting server and runs regular reports. As a result, SQL Server Reporting Services was installed and the SQL Server had been configured with a max memory setting of 4GB out of the 8GB available. I dug deeper into the memory and I discovered that over half of SQL Server’s memory structure was being used to manage locking. The rest was going to plan cache and a few other internal structures, but at no point was memory being used as a cache for data. The server’s page life expectancy was effectively 0 – every read was going to disk.

My immediate recommendation was to double the RAM in the server and increase SQL Server’s max memory setting from 4GB to 12GB. As a longer term recommendation, I cautioned my client that they should invest in a new server since this reporting server was 4 years old and well past its expected lifespan.

Back to the I/O Again

As I was wrapping up, the other production servers started having I/O problems. This was right around the same time that business normally picks up for this client. On a lark I said, “Wouldn’t it be great if this was a hardware problem?”

Five minutes later we had great news: it was a hardware problem. One of the power supplies in the SAN had died. Although the SAN had four power supplies, losing a single one caused the SAN to power down the battery backed cache and perform all reads and writes straight from disk. This more than explained the strange I/O we had been seeing on the reporting server. A new power supply was immediately ordered from EMC and the problem was eventually solved.

Wrapping Up

Have a set of canned scripts ready to help you figure out what kind of performance problems you might have on your systems. I started with Glenn Berry’s diagnostic scripts and customized them over time to give me the information that I want to see. If I weren’t so lazy, I would probably make this into something that I could throw into Management Studio’s canned reports with pretty colors to tell me when there was a problem. I’ve also gotten used to scanning over the output and looking for potential problems. Learn which problems are really just symptoms of a bigger issue. It doesn’t do you any good to troubleshoot slow queries only to find out that the SAN is experiencing horrible performance issues.

T-SQL Tuesday – Why Are DBA Skills Necessary?

T-SQL Tuesday

They aren’t. 99% of what you do could be replicated by a fairly stupid shell script.

When I started as a DBA, I didn’t have practical experience as a DBA. I had Books Online and google.

What’s necessary as a DBA has nothing to do with your knowledge of T-SQL or SQL Server’s internal fiddly bits. That’s icing on the cake.

The skills necessary to become a DBA are things that we learn over time. These are the skills and traits that make us successful professionals, students, friends, and lovers. You need patience, inquisitiveness, and a healthy dose of skepticism. You should also be able to follow a checklist. Making the checklist is for the advanced class.

The technical skills of a DBA are the same as those of a plumber – they’re both skilled trades. There are varying degrees of success and skill. You can distinguish between the skilled and unskilled very quickly by their approach to life and learning. People who are good at their job possess the skill of learning: their practical job skills themselves are secondary to their ability to learn.

CloudDBPedia is Changing Its Name to NoSQLPedia

That’s right, we’re changing the name from CloudDBPedia to NoSQLPedia.

When we originally started the site, the focus in the technology world was on cloud computing and cloud databases. Over time, the industry has changed and people have started focusing on NoSQL as the terminology of choice for not-so-relational databases. And, let’s face it, NotSoRelationalDBPedia.com is a bit lengthy for quick and easy typing.

The name change doesn’t reflect any change in focus for the site. We’re still going to be bringing you the best in blogging and community driven technical content about emerging database technology. As time goes on, we’ll be adding more and more into the mix and I think that you’ll be happy about what we’ve got in the pipeline.

Upcoming Talks – Next Week

Next week I’ll be in the San Francisco Bay area. More specifically, I’ll be giving three lightning talks at three separate Cloud Camps. It’s the same talk each time, I’ll be giving a general intro to NoSQL and cloud databases.

Silicon Valley Cloud Camp in Santa Clara, CA.
Cloud Camp Santa Clara also in Santa Clara, CA.
Cloud Camp SF @ QCon in San Francisco, CA.

All of these events start at 6:30PM, the Lightning Talks start at 6:45PM, and I have no idea when I’m going to take the stage, but it promises to be good. If you’re in the area and would like to hang out, hit me up in the comments and we can arrange a time to talk.

Thoughts on Free Amazon Web Services

Last week, Amazon announced that we could all get some free AWS if we signed up for a new account. Just what do you get for signing up? Take a look.

What you get with the AWS Free Usage Bundle

The First Catch

First off, this is only free for the first 12 months. After that you’re going to have to pay as you go.

In a way, this is like Microsoft’s BizSpark, but in the clouds. You get to ride a long for free, for a while. After that while, you’re going to have to pony up some money to keep riding. Nobody said it was free forever.

The Second Catch

Let’s say you start out with your free AWS account and you build an application to help manage your yarn collection. You write some code and you’re happily trucking along. One day you decide to share the link with a few friends who are also really into knitting. The next thing you know, everyone is using your yarn tracking website. This is great, right? It is great… until the bill shows up.

The free AWS only lasts for 12 months. Or until you exceed your service levels. Whichever comes first. Unless you have definite plans to monetize your service, you will need to carefully monitor your application load.

What Do You Really Get?

So what are you really getting for free? We all know that nothing is free, so what makes Amazon’s deal noteworthy?

The Server

You get a virtualized server with 613MB of memory. That barely sounds like enough to power your microwave, but in the world of Linux servers, that may be more than enough for your website. Even if it isn’t enough horsepower for your production application, it’s enough to develop in a realistic environment and deploy to your first round of beta users.

The Load Balancer

Once you’ve got more than a few users, you might need to move beyond that tiny 613MB server. Or you might shard your application out. Or you could use it to help you manage your fault tolerance. There are a lot of reasons to use a load balancer.

Amazon has been thinking of you and you get 750 hours of their load balancing service free. Per month. 750 hours is 31.25 days. You can’t use all the free that you get. It’s just that free.

There is a limit on the amount of free traffic that the load balancer will handle, but that should encourage you to keep your apps lean and mean, right?

Storage

Free storage! 10GB of Amazon Elastic Block Storage sounds like 10GB of free something or another that I don’t know about. Reading the documentation doesn’t clear this up. Basically, you get free magical storage that you can format however you want and attach wherever you want in your Amazon virtual server farm. You can even take point in time snapshots of these storage devices so you can revert your storage to a known good state. That’s pretty cool, eh? Try getting your SAN administrator to let you do that.

More free storage! Five gigs of Amazon S3 is a lot of S3 storage. I use S3 to host large files that I don’t want to upload through WordPress – video, PDFs, and presentations. It’s a great way to add a lot of hosting to your free or cheap hosting account. Likewise, it’s a great way to add storage to your free AWS instances and remove some load from your virtual server. On the down side, it does look like you’re going to pay for the traffic that your readers consume. But, hey, you already counted on that, right?

Even more free storage! 1GB of SimpleDB storage! SimpleDB is a distributed database with a few limitations. Despite the limitations, it’s a sold platform for developing web-based/cloud applications. You can access the database from just about anywhere and it should be available as long as Amazon’s servers are up and running. And you had better believe that Amazon is going to stay up and running.

Bandwidth

Surprise! You get 30GB of free traffic a month. Well, 15GB up and 15GB down. But it’s sort of the same. I think. Maybe? The point is that you get some free bandwidth. Bandwidth can get pricey which makes any amount of free bandwidth a good thing.

Implications For Your Design

With a hard cap of 15GB on your traffic, you’ll want to make sure that you’re using images that are as compressed as possible, minified CSS and JavaScript, clean HTML source code, and lean protocols. You’ll want to be careful in development and make sure that you’re only pulling back the data that you need and nothing more, and that you make as few round trips to the server as possible. Of course, you were doing all of this before, right?

Putting it All Together

The free tier of AWS is a good introduction to working with AWS. It provides more than just a simple virtual machine, it provides an entire infrastructure to get you started. You can rapidly build out from a single to server to a large number of servers and still use most of these free services. If you don’t need something, you’re not going to pay for it. This is a lot cheaper of an option than buying your own hardware or trying to work within the confines of a hosting provider. This is your own virtual hardware to develop with as you please.

One more thing – Amazon may stop accepting new people into the magical free AWS tier at any moment. Act now, supplies are limited.

This site is protected with Urban Giraffe's plugin 'HTML Purified' and Edward Z. Yang's Powered by HTML Purifier. 401 items have been purified.