Category SQL

When is a Lookup not a Lookup?

Execution plans are great things. They give us an insight into how SQL Server is putting together queries and why they run slowly.

One immediate thing I look for in an execution plan is a Key Lookup. In a Key Lookup operation, SQL Server has to reference the clustered index on the table because a value it’s looking for is not present in the index that was used to find the row.

Normally, when you look at the Key Lookup, you will see a list of output columns. This is a great way to help you modify your indexes and make sure that you can avoid these extra disk hits by sacrificing a tiny bit of storage space.

A Normal Output List from a Query Plan

Today, while troubleshooting a query that is never finishing, I ran across this gem.

Empty. Bereft of Meaning. Nothing.

There’s no output list in the Key Lookup. I, wrongly, thought that Key Lookups would always include an output list. What’s going on here?

Well, here’s the scoop. I have the following, sample, table:

CREATE TABLE dbo.ServiceList (
  ServiceListId BIGINT NOT NULL PRIMARY KEY,
  BillId BIGINT NOT NULL,
  ServiceMonth DATETIME
);

In this query, ServiceList is referenced through a CROSS APPLY’d inline function. The inline function references the BillId and ServiceListId. There’s an index on ServiceList that contains both of these columns:

CREATE INDEX IX_ServiceList_BillId ON dbo.ServiceList
(
  BillId
)
INCLUDE (ServiceListId, ServiceMonth);

This index is correctly being used by the query optimizer but in the past it’s only been used to perform lookups and push data to screen (or somewhere). So, what we’re seeing here makes sense. The CROSS APPLY needs to reference the BillId so the compiler checks the situation out and decides to use IX_ServiceList_BillId in order get the BillId. Since the index only includes ServiceListId, it has to perform a Key Lookup.

Protip: included data is only included in the index; it is not indexed.

To avoid this kind of behavior, I need to change this index to actually index on the combination of BillId and ServiceListId like so:

CREATE INDEX IX_ServiceList_BillId ON dbo.ServiceList
(
  BillId,
  ServiceListId
)
INCLUDE (ServiceMonth);

Links for the Week of 2009.12.04

SQL Server

  • Kendal Van Dyke: Delegation: What It Is And How To Set It Up – The title says it all, folks.
  • wmarow’s disk & disk array calculator – Storage calculator. This wouldn't be terribly fascinating, except I know the performance characteristics of our I/O subsystem and when I plug in the variables, this is pretty close to real life performance. (Not directly SQL Server related, but it’s my blog so it goes where I wants it.)
  • A Loan At Last! – Brad Schulz thoroughly describes how to create a loan payment schedule stored procedure using nothing but T-SQL. The end result is beautiful to behold.

Development

Stuff & Things

Links for the Week 2009.11.20

Bonus link dump week! Since I haven’t done one of these in several weeks, some of these links go all the way back to October 29th, enjoy.

SQL Server

Development

Stuff & Things

GUIDs: Not Necessarily Unique

So, you though that GUIDs were supposed to be unique, eh? You aren’t alone. A lot of people assume that GUIDs are unique. Hell, unique is in the name!

On Friday, I had a bug report filed about a UNIQUE constraint violation. Seeing as how I had bigger fish to fry, and this functionality is only rarely used, I put it aside until this morning when I took a closer look at the bug report and the code in question.

The table in question looks something like this:

CREATE TABLE dbo.Widget
(
  WidgetId BIGINT IDENTITY(1,1) NOT NULL,
  -- other columns
  LegacyIdentifier VARCHAR(50) NOT NULL
);

Before I get yelled at for a VARCHAR(50) in my database let me assure that this is actually a 50 character random garbage string from our legacy COBOl application. Nobody got lazy and just made a VARCHAR(50)… this time.

LegacyIdentifier has a unique constraint on it because all of the Widgets need to be unique between the new system and the old system. In order to keep this uniqueness going I thought to myself, “I know, the U in GUID stands for Unique, I’ll use that since I’m not clustering on it.” I wrote a little piece of code that puts “New_” at the beginning of a GUID and then uses that string as the LegacyIdentifier. That was about 11 months ago. On Friday, we received an exception warning that the application had attempted to insert a duplicate value into the LegacyIdentifier column.

What now? Well, I’m not sure. I’m going to implement functionality to force the database to keep generating GUIDs and attempting inserts until it actually generates a unique GUID.

This solution doesn’t make me happy, though, because GUIDs are not unique. A GUID is, in theory, unique. There are, after all, 2^128 possible permutations, so the probability of the same GUID being generated twice are incredibly slim. However, it can happen. (SQL Server uses a V4 GUID algorithm, which is a pseudo-random number and, as such, is not truly random. Newer GUID algorithms use an SHA hash for randomness, but how random is random?)

What are we to do when we need a truly unique identifier? Frankly, I’m not sure. The thought crossed my mind of generating a random number and casting it to VARCHAR and appending it to the GUID. The statistical likelihood of generating two random values that are the same as a prior value is so astronomically low that I should probably buy lottery tickets if there is ever a collision. As it stands right now, I’ve opted to retry the insert until it actually succeeds (or for 5 times, after which the user is doomed).

So, there you have it, GUIDs are not guaranteed to be unique and you should plan accordingly.

Links for the Week of 2009.10.30

These are a day late. Sorry about that. It’s been one helluva week.

Also, November 1st marks the beginning of National Novel Writing Month. So, if you’ve had a story inside you waiting to get out, wait no further!

SQL Server

Development

Stuff & Things

Links for the Week of 2009.10.23

I came across a lot of great links in the last week. So many that I had to cut about 40% of them to make this list.

As always, you can check out the firehose at my delicious bookmarks page.

SQL Server

Development

Stuff & Things

SQL University – Snow Day

I was supposed to have a second blog post up this week for SQL University, however I haven’t been feeling well and have been focusing my energy on my day job. Unfortunately, I’ve had to drop the ball on a few other things. My second SQL University post for the internals week is what got dropped (as well as a few other things that don’t directly involve anyone else).

However, I did come across this little gem that I’d like to share. Logical Query Processing Poster (PDF alert)

This is the logical query processing diagram that is in Itzik Ben-Gan’s fantastic book Inside Microsoft® SQL Server® 2008: T-SQL Querying. This explains how SQL Server builds a query from the ground up and includes all phases of query processing.

An Introduction to SQL Server System Databases

SQL Server 2008 (and 2005) contain five special databases: master, model, tempdb, msdb, and mssqlsystemresource (aka Resource). These databases are used by SQL Server for its own maintenance and management.

master

The master database contains all of the system level information for SQL Server – all of the logins, linked servers, endpoints, and other system-wide configuration settings. The master database is also where SQL Server stores information about the other databases on this instance and the location of their files. If the master database is not present, SQL Server cannot start.

Always take regular backups of the master database.

Since the master database hold all of the information related to logins, endpoints, linked servers, and user databases, it’s important that you take a backup of the master database after configuring any of these server level changes. Otherwise, if your SQL Server suffers a catastrophic failure, those changes will be lost to the sands of time.

Books Online contains a great deal of helpful information about the master database that you can find in the appropriately named master Database article.

Speaking from personal experience, one of the most important points from that article is this: “Do not create user objects in master. Otherwise, master must be backed up more frequently.” I cannot tell you the number of times I have connected to a development server, opened up a file, and run the DDL contents of the file only to realize that I have created new tables on the master database because I did not have a USE AdventureWorks at the top of my DDL script.

model

The model database is used as a template whenever a new user database is created. You can change most database properties, create users, stored procedures, tables, views, etc – whatever you do will be applied to any new databases.

The nice thing is that you can create a guaranteed set of users, stored procedures, and options (including collation options) by making changes to the model database. Once those changes are in place, they are applied to every new database.

Outside of its role as a template, model doesn’t do anything else. You can find more information in the model Database article of Books Online.

msdb

msdb is used by the SQL Server Agent, database mail, Service Broker, and other services. If you aren’t actively working with things like jobs, alerts, log shipping, etc you can pretty safely ignore msdb… sort of.

One important item is that msdb holds backup history. Using the msdb tables (you can start by taking a look at msdb.dbo.backupset), it’s possible to determine when each database and filegroup was last backed up. This is very useful, especially when you’ve just started working at a new company or taken over the maintenance of new servers.

A word of warning: you need to make sure that you are pruning old backup history from msdb. Leaving old backup data can slow down the performance of backup and restore operations. Brent Ozar documented this during his Backup Tricks Week article Checking MSDB Cleanups. In this article, the sp_delete_backuphistory system stored procedure is mentioned. This stored procedure will delete information older than the @oldest_date parameter. It is important that you are incredibly careful when using this store procedure and don’t attempt to delete all of the backup history data in your msdb at once. Attempting to clear out a large number of database backup history records in one fell swoop can have an adverse effect on performance. I suggest removing the data one to two days at a time. It’s tedious, I know, but your users will thank you.

Resource database

The resource database is a hidden system database. This is where system objects are stored. It isn’t possible to see the resource database by normal means. However you can see the data file by navigating to C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\Binn. The exact size and modification data of this file will be different from version to version, but the modified date should be the same date that you see when you run SELECT @@version.

It is best to think of the resource database as if it were another system DLL. The resource database is designed to make it easy for quick database upgrades. If new system objects are being put in place, it is only necessary to swap out the resource database MDF file.

Typically, the only way to view the contents of the resource database is using the OBJECT_DEFINITION system function.

SELECT OBJECT_DEFINITION(OBJECT_ID('sys.objects'))

-- on my test system this yields the following:
CREATE VIEW sys.objects AS
    SELECT name, object_id, principal_id, schema_id, parent_object_id,
        type, type_desc, create_date, modify_date,
        is_ms_shipped, is_published, is_schema_published
    FROM sys.objects$

tempdb

We come, at last, to tempdb. Tempdb is the workhorse of the system databases. It is the workspace that SQL Server uses to store the intermediate results of query processing and sorting. You know how you see those spools in your execution plans? When you see one of those, SQL Server is probably spooling the data to a temporary table in the tempdb. Outside of storing temporary results, tempdb is also used during snapshot isolation and for user created temporary tables (this includes table variables).

One thing that is interesting to note about tempdb is that it is re-created every time the SQL Server service is started. Any objects that you have created in tempdb will be gone once the server restarts. If you want specific tables or stored procedures to always be available in tempdb, you will need to add them to the model database or else use a stored procedure to create them in tempdb when the SQL Server is started.

A properly sized and configured tempdb is vital for effective SQL Server performance. By default tempdb starts at 8MB in size and will continue to grow by ten percent until the drive is full. If the tempdb is sized too small, system resources will be used growing the tempdb file. This overhead can be avoided by increasing the default size of tempdb.

SQL Server System Databases
System Databases
Working with tempdb in SQL Server 2005
Microsoft SQL Server 2008 Internals pp 126 – 128

SQL Server Error Logs

What is the Error Log?

SQL Server maintains its own error logs that contain messages describing informational and error events.

These are simple text files stored on disk, but it’s best to view them using the facilities provided by SQL Server to prevent any SQL operations from being blocked by opening one of these files in a text editor. Also, the error log files can become quite large – SQL Server will stream the files into the Log File Viewer whereas notepad will open the file into a single buffer and consume a great deal more memory.

A new error log file will be created when one of two things happens:

  1. The SQL Server service is started
  2. sp_cycle_errorlog is called

Once this happens, any pending writes to the current log file will complete and a new log file will be created. The actual error log files can be found at Program Files\Microsoft SQL Server\MSSQL.n\MSSQL\LOG\ERRORLOG and ERRORLOG.n.

Viewing the Error Log

The Basics

Locating the error log

Locating the error log

Viewing the Error Logs

Viewing the Error Logs

Viewing the error log is a simple operation:

  1. Connect to the SQL Server using SQL Server Management Studio
  2. Expand the Server > Management > SQL Server Logs
  3. Double click on a log file

This will open up the log file. One thing to keep in mind is that by default this log file will only cycle when the SQL Server service has started. If this is a production server and you haven’t made any configuration changes, then this file could be quite large and take a long time to open.

Refining Your View

The Error Log Filter Box

The Error Log Filter Box

Filtered Error Log Results

Filtered Error Log Results

By default, you will end up with a view of the entire contents of one or more log files. This probably isn’t very useful to you unless you want to scan the contents for something very recent. On a busy server, though, there could be hundreds of thousands of lines in a single error log file.

Clicking the filter button opens up the Filter Settings window. This gives you the opportunity to filter by User, Computer, Start and End Date, the message text, and the message source. Given that there are so many different sources for the messages in the error log, the ability to filter the error log is a huge boon. Once you’ve entered your filter criteria, check ‘Apply filter’ and then click OK. Just make sure you check ‘Apply filter’ otherwise SQL Server Management Studio will happily discard all of your filter criteria and not filter any of the data.

Care & Feeding of the Log File

Automated Log File Rotation

To prevent the log files from growing too large it’s necessary to rotate them on a regular basis. Restarting the SQL Server service every week at 3:34 AM on a Sunday isn’t advisable, so there must be some other option. Thankfully, there is. The log file can be automatically rotated using the sp_cycle_errorlog system stored procedure.

Depending on who you talk to, different people will tell you to cycle the error log on a daily, weekly, or monthly schedule. The frequency really depends on your particular system and how much data accumulates in the logs during the course of regular operations. The more writes to the error log, the more often you should cycle the log. To cycle the logs, create a new Agent Job with a single T-SQL task (or combine it with regular daily or weekly maintenance).

Number of Log Files to Maintain

Depending on the amount of storage space you have available and the amount of activity on your SQL Server, you will want to change how often you rotate your log files and also the number of log files that you want to retain. Why would you want to do this? To keep as much historical information available as you possibly can.

Configuring the Error Log

Configuring the Error Log

Changing the Number of Error Log Files

Changing the Number of Error Log Files

More importantly, how do you do this?

  1. Right click on the SQL Server Logs folder in the Object Explorer
  2. Select ‘Configure’
  3. Check the box to ‘Limit the number of error log files before they are recycled’. This seems silly, but by default SQL Server will only keep 6 error log files before trashing them.
  4. Put in a new value and click ‘OK’.

I typically keep 99 error logs before they are recycled. Why? Because I can.

Wrap Up

I hope that this has helped you understand more about the SQL Server Error Log and how you can can automate the successful maintenance of your servers’ log file information.

Links for the Week of 2009.10.09

SQL Server

Why I prefer surrogate keys instead of natural keys in database design Mladen Prajdic put together a great article about using surrogate keys in database design. The best part is that a lot of bright people sounded off in the comments.

Donald Farmer Discusses the Benefits of Managed Self-Service BI I spoke with Donald Farmer about this topic when I was at the Richmond Code Camp last weekend. This is an incredible topic that has a lot of promise for the future of Business Intelligence.

Development

Mastering CSS Coding: Getting Started CSS is difficult to master. Very difficult to master. However, once you understand how CSS works it is a very powerful tool.

Multiple View Engines with MVC Turbine One of the coolest things about ASP.NET MVC is the flexibility it brings to ASP.NET web development. Javier Lozano demonstrates how it is possible to use multiple view engines within the same page.

Stuff & Things

9 Ways to Make Your Writing More Compelling 9 great tips for dramatically improving your writing.

Freelance Contracts: Dos and Don’ts If you’re looking to freelance, it’s important to make sure that your contracts are in order. This can cause no end of pain, trouble, and difficulty if your contracts are ambiguous. Smashing Magazine has some great information to make sure your contracts are in good working order.

Cyanide & Happiness

This site is protected with Urban Giraffe's plugin 'HTML Purified' and Edward Z. Yang's Powered by HTML Purifier. 401 items have been purified.