I’ve been working on implementing some infrastructure code for a client. We’re building robust partition swapping to make it easy to load data without disrupting user queries. We’re doing everything eles the right way, but partition swapping makes it really easy to correct a bad load of a past data.
The upside is that this code is really easy to write. There are enough examples, samples, and previous samples out there that a lot of the basics can be easily implemented. Even the complex parts of implementing the partition swapping are fairly trivial. The trick is making the code robust enough to handle almost any failure scenario.
Table partitioning is good to use in different ETL scenarios, but we never want it to fail. If it does fail, we want to make sure that we’re in a recoverable state. Likewise, this code needs to be automated and recover from any potential failures.
It turns out that the actual functionality is just a few lines of code. The robust error handling, logging, and recovery code is about 30 times longer than the functionality. It can be difficult to go through the code and update all of the error handling and logic in response to minor changes to business requirements, but the end product is a stable piece of functionality.