Production Problems, Community, and You

How many of you out there have had production issues? You, the liar in the back, put your hand up. Well, we experienced a major performance issue in production last week.

For those that don’t know, I’m not a production DBA. I’m what they call a “development DBA.” I write stored procedures, tune the SQL, write indexes, and work with developers to ensure speedy data access. As someone who comes from the software development world, it’s a logical place to be. But, history aside, I don’t possess the knowledge of the SQL Server internals like many of my production DBA brethren.

Back to the problem at hand. On Wednesday, the users started noticing exceptional slow downs in the application. These weren’t the usual slow downs where things took a little bit longer than expected, queries that normally took 2 seconds were taking minutes to come back. Since our ASP.NET application is, largely, a passthrough that calls stored procedures and loads the results into a datagrid, we quickly ruled out application performance. Plus, I tested some stored procedures on the production database and they ran slowly. (I say our application is largely a passthrough, but I don’t mean to do a disservice to the developers. It is quite a sophisticated piece of software, it just doesn’t do a lot of heavy lifting.)

Luckily, I was able to leverage a few resources to resolve this issue.

1) BLOGS

I subscribe to a lot of blogs. A LOT of blogs. From reading Brent Ozar’s blog, I remembered that he has a section at the top of his website on using perfmon and another on performance tuning. It also helps that he recently presented at the CBusPASS user group on using perfmon. I fired up my browser, went to Brent’s blogs, and started working with the monitoring set up that he outlined. This got me going pretty quickly and gave me a great deal of information.

In addition to checking up on Brent’s step by step instructions, I remembered that another SQL Server blogger had posted a SQL script to find the top 10 slowest running queries and their execution plans. I couldn’t (and still can’t) remember who wrote it. Which leads to:

2) SOCIAL NETWORKS

In this case, Twitter. I opened up TweetDeck and sent out a tweet with a question asking if anyone had seen that blog post. Someone remembered what I was talking about and sent me a link. Since I’m not much of a production DBA, yet, I kept raising questions on twitter which were rapidly answered by many different people. I was able to work with them to figure out the best solution to my specific questions and move forward quickly.

Within 15 minutes, I was armed with a methodology to track down root cause of the problem as well as a query that would help me pinpoint any specific query problems. Without the collective knowledge of the SQL Server DBA community, it would have taken me a lot longer to understand and solve the problem.

It turns out that the problem was a missing column in an index that caused a full table scan on one of the largest tables in the database which, in turn, caused locking issues.

Menu