Everything is horrible!
Wait, that’s not the message I want to send at all.
Planning for Failure Should Be Comprehensive
Think about the last time you thought about high availability and disaster recovery…
You’re lying, nobody ever thinks about HA and DR. Not until something is already on fire, at least.
Now, pretending you did think about HA and DR at some point in the distant past, how far down the rabbit hole did you go? Were there two servers? Did each server have redundant NICs? Power supplies? Were you using RAID? Did you think about the UPS?
Every component in the system needs to be considered when you’re looking into HA and DR. Using an AlwaysOn Availability Group, clustering, or database mirroring isn’t enough – there’s more to it.
Failure Has Consequences
Let’s use a specific example instead of talking in the abstract.
We’ll assume that you’ve decided those super fast consumer grade SSDs are the way to go. You’ve planned the rest of your deployment. You’ve got an AlwaysOn Availability Group. You’re ready to go. Right?
There’s still one more thing to talk about – power. See, most of those consumer grade SSDs don’t have any kind of battery in them. And, as you might know, disks lie. So we can’t really be sure if our writes are actually permanently stored somewhere unless we safely shut down the computer. Which always happens when the power goes out, right?
In this particular case, we need to keep worrying about power – what happens if the power fails? Is this server connected to a UPS? What happens when the UPS kicks in? Is there a backup generator? Will the server stay on? Can the server be automatically shut down? What’s that look like instead?
Ask Awful Questions
Being prepared has everything to do with asking yourself terrible questions. Work through the entire stack and come up with as many ways for things to fail as you can. Explore how you’d prevent these scenarios. You can’t provide a mitigation for everything that you come up with, but it’s good to think of these things.
Once you’ve got your List of Awfulness, work the feasible things into your HA and DR plans. Make sure that you’re covered as best as you can. Sometimes it makes sense to sweat the small stuff.
This is Jeremiah
I live in Portland, OR. I have two dogs.
I recently received a Master's of Science in Computer Science from Portland State University.
I'm was Microsoft MVP from 2009 - 2018 with a pile of certifications. Somewhere along the way, I wrote a database client for Riak and then handed it off to the community.