June 2010
Mon Tue Wed Thu Fri Sat Sun
« May   Jul »
 123456
78910111213
14151617181920
21222324252627
282930  

Day June 11, 2010

This Is How We Handle Problems

I had a production issue tonight. Still am, actually. I’ve admitted to it and here’s the email I’m sending to my management.

At 9:00PM I took a backup of database_a and database_b prior to running the database migration scripts. Once the backups were finished, I began the migration process at approximately 9:20.

I stopped the migration process at 10:15 after multiple failures and restarts. There are too many unknown cross-dependencies to go on with the roll forward. At this time I called SUPPORT PERSON and explained the situation. I also called MANAGEMENT and left a voice mail. I then began the process of restoring the production databases on SERVER_A.

No changes were made to SERVER_FIGHTING_MONGOOSE or SERVER_C.

Once I had restored database_b and database_a on SERVER_A, I began seeing multiple failures from replication and the rest of SQL Server indicating severe problems with the physical disk structure. I immediately stopped all replication involving database_a and database_b on SERVER_A and I have begun a physical drive integrity check using SQL Server’s built-in integrity check tool: DBCC CHECKDB. The CHECKDB for database_b finished at 11:15 with a clean bill of health. database_a is still running as of 11:31PM.

Once the CHECKDB process for database_a is complete, I will begin re-initialize the subscription for the database_a database on SERVER_A. Following the successful completion of the database_a re-initialization, I will begin the process of re-initializing the subscription to database_b.

If you have any questions, feel free to contact me at 867-5309.

See what I did there?

  • I stated how we got into this mess – I dropped a running chainsaw into the SAN.
  • I outlined my decision making process and took ownership of rolling back our production migration.
  • I described the situation after the migration was rolled back and provided an assessment based on what I had observed.
  • I outlined a course of action to mitigate our problems and restore our production database to an operational state as soon as possible.

Am I proud? Not really. I like it when things work. Am I tired and cranky? Yes.

Will I get this fixed before I go to bed? Hell yeah.

Is this something that I, in a sick way, live for? Only because it reminds me to keep studying and to stay on my toes.

This site is protected with Urban Giraffe's plugin 'HTML Purified' and Edward Z. Yang's Powered by HTML Purifier. 230 items have been purified.