December 2008
Mon Tue Wed Thu Fri Sat Sun
« Nov   Jan »
1234567
891011121314
15161718192021
22232425262728
293031  

Month December 2008

SQL Server Security – Database Roles

Database roles (also referred to as database-level roles) are security roles that exist on a database level, as opposed to the server level. If you are familiar with any aspect of system administration, database roles are similar to groups in the world of Windows system administration. Just like a Windows group, when a user is added to a role they inherit all the rights and permissions of the role.

There are two kinds of database roles in SQL Server: fixed roles and flexible roles.

Fixed Roles

Before getting into the fixed roles that are available, it’s important to know what fixed roles actually are. Fixed roles are roles that automatically exist in each database. Adding a user to one of these roles will not change that user’s permissions in any other database.

Any user or role can be added to a database role. Once a user has been added to a role, they can add other users or roles to that role.

A Word of Warning: Be exceptionally carefully when adding flexible roles to a fixed role. You could very easily elevate privileges for a large number of users in one simple step.

With that warning out of the way, let’s take a look at the fixed roles available in SQL Server.

  • db_owner
  • db_securityadmin
  • db_accessadmin
  • db_backupoperator
  • db_ddladmin
  • db_datawriter
  • db_datareader
  • db_denydatawriter
  • db_denydatareader
  • Public

db_owner

Users in the db_owner role have it all, within a single database. They can grant and revoke access, create tables, stored procedures, views, run backups, schedule jobs. Heck, a user who is db_owner can even drop the database.

However, just because you have the keys to the kingdom doesn’t mean that you can do everything. Users who have been granted db_owner will still need specific permissions to run traces and view many of the dynamic management views. Why is that? Those are managed at the server level and will require that server level permissions or roles are granted to your login.

Why use db_owner? You would want to add a user to the db_owner role if you have a user who needs to make extensive modifications to all aspects of a database – creating users, adding them to roles, creating tables/views, creating stored procedures, and adding security settings for the tables, views, and stored procedures. One example of this would be a developer who is creating a database for a new application (or creating extensive modifications to an existing application) but you don’t want to add that user to the sysadmin group.

  -- Create our sample database and switch to it
  CREATE DATABASE TestFixedRoles;
  GO
  USE TestFixedRoles;
  GO

  -- Set up our test users
  CREATE USER user_dbo WITHOUT LOGIN;
  CREATE USER user_security WITHOUT LOGIN;
  CREATE USER user_reader WITHOUT LOGIN;
  CREATE USER user_writer WITHOUT LOGIN;
  CREATE USER user_backup WITHOUT LOGIN;
  GO

  -- Grant db_owner to user_dbo
  EXEC sp_addrolemember N'db_owner', N'user_dbo';
  EXEC sp_addrolemember N'db_securityadmin', N'user_security';

  -- Create an orders table as dbo
  CREATE TABLE orders (
    id            INT       NOT NULL   IDENTITY(1,1),
    orderdate     DATETIME  NOT NULL,
    employee_id   INT       NOT NULL,
    customer_id   INT       NOT NULL,
    quantity      INT       NOT NULL,
    CONSTRAINT PK_Orders PRIMARY KEY (id)
  );
  GO

  -- temporarily switch to the context of user_dbo
  EXECUTE AS USER = 'user_dbo';
  SELECT USER_NAME(); -- This should return user_dbo

  -- This will return 1 row for our orders table
  SELECT t.[name]
    FROM sys.tables AS t
   WHERE t.[name] NOT LIKE 'sys%'

  DROP TABLE orders;

  -- revert back to our regular user
  REVERT;
  GO

db_securityadmin

Users in the db_securityadmin role can modify role permissions and manage permissions. Users in this role have, in theory, almost as much power members of db_owner. The only thing that a member of db_securityadmin can’t do is add users to the db_owner role. Members of db_securityadmin also cannot add users to fixed database roles (this requires membership in the db_owner role).

Why use db_securityadmin? You might want to use db_securityadmin when you need to grant privileges to a trusted user and allow them to manage privileges across an application. Take care when granting db_securityadmin to make sure that you trust the user to not give themselves additional permissions. This risk can be alleviated by adding auditing to the database to log when privileges are granted or revoked.

  EXECUTE AS USER = 'user_security';

  -- this will generate three errors since user_security isn't a member of db_owner.
  EXEC sp_addrolemember N'db_datareader', N'user_reader';
  EXEC sp_addrolemember N'db_datawriter', N'user_writer';
  EXEC sp_addrolemember N'db_backupoperator', N'user_backup';

  REVERT;
  GO

  -- this will now succeed since you are a member of db_owner, after all
  EXEC sp_addrolemember N'db_datareader', N'user_reader';
  EXEC sp_addrolemember N'db_datawriter', N'user_writer';
  EXEC sp_addrolemember N'db_backupoperator', N'user_backup';

db_accessadmin

Members of the db_accessadmin role have the ability to change database access. They can grant and revoke access to Windows logins, Windows groups, and SQL Server logins. The users that they grant access to will be members of the Public role and will have all the privileges associated with that role.

Why use db_accessadmin? This role would be used when you have a user who is responsible for maintaining access to a specific database. When combined with db_securityadmin you have a user who is capable of granting and revoking general access to a database as well as controlling the security permissions for almost any user. This combination is quite powerful and should be granted carefully. With auditing in place, you can mitigate any risk of granting both roles to a single user.

db_backupoperator

Members of this role can create database backups. It’s important to note that they cannot, by default, restore the backups that they create. The only users that can restore a backup are members of the sysadmin and dbcreator server roles and the owner of the database (dbo).

Why use db_backupoperator? If you have an automated process that connects to the database and creates a backup it would be a good idea to have all backup operations connect to the database using a user/login that only has db_backupoperator access to prevent any unauthorized data access due to a user being compromised.

db_datareader/db_denydatareader

Members of the db_datareader role are able to read all data from all user tables. That’s right, everything. Even the super secret table UserPayHistoryAndSocialSecurityNumbers table can be read by members of db_datareader.

Conversely, members of db_denydatareader are explicitly denied the ability to write to any user created tables. They live in the dark about the contents of the database.

Why use db_datareader? Let’s say the accounting department has a separate database. Everyone in accounting is able to create and run ad hoc reports directly against this database, but they shouldn’t be able to do anything else apart from see the contents of the database. Clearly granting db_owner access is out of the question. Your junior DBA is out sick today, so you can’t make him grant SELECT permissions to every table in the database while you go out to Chipotle for lunch. Instead, you can simply grant db_datareader access to the accounting department’s Windows group.

Why use db_denydatareader In addition to the accounting department’s requirement to be able to create ad hoc reports as needed, HR needs to be able to run the canned reports available to them through your carefully crafted stored procedures but they should not be able to run ad hoc reports. However, it turns out that they are also able to create and run ad hoc reports (thankfully your junior DBA emailed you about this yesterday before he got sick). By granting db_denydatareader permissions to the HR Windows group you can prevent members of the HR group from running the ad hoc reports (assuming Windows Forms authentication is being used) and get in touch with the application developers to have them hide ad hoc reports from the HR group in the application front end.

db_datawriter/db_denydatawriter

Much like db_datareader and db_denydatareader, the name of this role is largely self-explanatory: members of the db_datawriter role can INSERT, UPDATE, and DELETE data from any user created table. Likewise, members of the db_denydatawriter role are explicitly denied the ability to perform INSERT, UPDATE, and DELETE operations on any user created tables.

Why use db_datawriter? db_datawriter would be a good choice for a user or login that runs an automated ETL process on a regular basis. This ensures that access to all tables is maintained even when new tables are added and reduces maintenance overhead.

Why use db_denydatawriter If you want to limit write access for a login or user it is easy to add them to the db_denydatawriter role and allow specific access to a subset of tables. This might be desirable when users may need to make adjustments to certain tables – such as a bill, order, or account balance – but they do not need the ability to modify all data in the database.

db_ddladmin

Members of the db_ddladmin role are able to execute DDL commands (CREATE, ALTER, DROP) within the current database. It’s pretty self-explanatory – a member of db_ddladmin can run any CREATE, ALTER, or DROP command within the current database. They cannot create new databases, nor can they alter or drop the current database.

Why use db_ddladmin? Your company has brought on several consultants to help develop a new application. Their work will require that they are able to create, or change, new tables, views, indexes, and stored procedures. However, they don’t need the ability to create users or manage security. In this case, your best solution would be to add the consultants to the db_ddladmin role.

Public

The Public role is a bit different from all of the other roles. Every database user is a member of the Public role. If a user does not have any explicit permissions on a database object, they will inherit the permissions of the Public role.

Why use the public role? Going back to our example of the Accounting and HR departments, let’s say that there are now two databases: Accounting and HR. For the purposes of this example, only Accounting personnel should have access to the Accounting database and only HR personnel should have access to the HR database. What’s the best way to accomplish this?

The first step is to create a user for each department in their respective database. This user needs to be mapped to the appropriate Windows group. Once you have created the user and mapped it to the appropriate group, you can then add the user to the Public role.

Using this method it’s easy to add additional users and groups to the Public role without having to manage separate security settings for each one individually.

  /*****************************************************************************
   * PUBLIC ROLE DEMONSTRATION
   ****************************************************************************/
  CREATE DATABASE Accounting;
  GO
  USE Accounting;
  GO
  -- This is going to fail unless you have a LOGIN called group_Accounting
  CREATE USER user_Accounting FOR LOGIN group_Accounting;
  GO
  EXEC sp_addrolemember N'Public', N'user_Accounting';
  GO

  CREATE DATABASE HumanResources;
  GO
  USE HumanResources;
  GO
  -- This is going to fail unless you have a LOGIN called group_HumanResources
  CREATE USER user_HumanResources FOR LOGIN group_HumanResources;
  GO
  EXEC sp_addrolemember N'Public', N'user_HumanResources';
  GO

Links for the week of 2008-12-26

SQL Server

Best practices for taking on the DBA role as a developer – Armando Prato covers the basics of what it takes to go from developer to DBA. This is a good article and one that’s very near and dear to my heart since I’m pulling myself away from the development world and into the DBA world (I’m almost there!).

Auto generated SQL Server keys – uniqueidentifier or IDENTITY – Armando Prato (who gets a double dose of link love) wrote a pretty well-balanced article about a subject that is often a source of contention between DBAs and developers – primary key types. I have a very strong bias towards using IDENTITY columns, but this article presents both sides of the argument.

Graphical Exeuction Plans for Simple SQL QueriesGrant Fritchey has had an excerpt from his book, Dissecting SQL Server Execution Plans published on simple-talk.com. It’s a great read, especially if you, like me, have often wondered how to read an execution plan. I can’t claim to be an expert after reading this article, but I can claim to have a much better understanding.

Development

Composite Architecture – The Open Close Principle as applied to system architecture – Ayende Rahien weighs in with his opinions on unit testing and talks about how it’s possible to architect and build a system that doesn’t really need to be unit tested, well not nearly as much as a standard system.

Early Christmas Present from PowerShell Team: Community Technology Preview-3 (CTP3) of Windows PowerShell V2 It’s just what it says: PowerShell 2 CTP3 is out. Get it while it’s hot!

General

Your idea sucks, now go do it anyway – Jason Cohen talks about starting a company. Many companies start out with a completely different idea than the finished product. Oddly enough, he mentions a game that I used to play online – Game Neverending – which eventually became flickr.

Nuclear Urbanism – Geoff Manaugh highlights a cool Google Maps mash up that lets you see what an atomic weapon would do to your home city. This really brings the scary power of these things to light. Just some food for thought.

Web Persona and Your Career – Tom LaRock makes some great points about how to present yourself on the web. Ultimately it comes down to being yourself. As I’ve started writing more for this blog I’m finding more of my voice as an author and I’m getting comfortable enough to start trying different things. Time will tell, but the important part is to just be yourself.

A brief note: Due to the holidays and work there will be no real blog post this week. I’m working on several post ideas and should be fleshing them out over the 5 day weekend and returning with more content in the new year.

Links for the Week 2008-12-19

SQL Server

Cloud Computing and me – TJay Belt delivers the best look at cloud computing that I’ve read yet.

Some interesting affects of Table Partitioning – Kent Tegels discusses some interest side effects of partitioning tables and indexes.

The N pillars of a well built database? – Louis Davidson is in the process of re-working his book and is thinking out loud about what makes for good database design.

Development

Iron Python 2.0 has been released – Denis Gobo sums up the newest release of the Iron Python language. This is a big thing – dynamic languages have finally arrived on the scene in the .NET Framework.

Disable WiFi with PowerShell – Rob Farley demonstrates PowerShell’s amazing abilities by giving a quick demo of how to disable WiFi through PowerShell.

General

Fractal – Jason Santa Maria is a Graphic Designer with Happy Cog Studios and is an avid photographer. He showcases his work on his blog in the daily photo feature. He always has a fresh take on what is sometimes a very mundane subject.

Breaking the Rules – Jason Cohen gives several examples of people and companies who have broken all of the rules in grand ways and succeeded. His post can be summed up with this gem of a sentence: ‘There’s no rule that can’t be broken, so long as it’s broken with purpose.’

My Firsts

I stole this from TJay Belt. It seemed like a nice change of pace from the SQL shenanigans to

1. Who was your FIRST prom date?
Didn’t go to prom (they don’t have them in Scotland), but I went to my high school dance with a girl named Ashley.

2. Do you still talk to your FIRST love?
Sometimes, but it’s only via the internet.

3. What was your FIRST alcoholic drink?
Something horrendous, Cactus Juice, I think.

4. What was your FIRST job?
I mowed lawns when I was a kid.

5. What was your FIRST car?
1980 Ford Tempo All Wheel Drive

6. Who was the FIRST person to text you today?
Andy.

7. Who is the FIRST person you thought of this morning?
Apart from me, I think I wondered if my friend Dave was feeling better, he was sick yesterday.

8. Who was your FIRST grade teacher?
I can’t remember.

9. Where did you go on your FIRST ride on an airplane ride?
It was probably to Texas, but I don’t remember that either. I’ve been on a lot of airplanes.

10. Who was your FIRST best friend and are you still friends with him/her?
Matt Bartmes is the earliest best friend I can remember. Sadly we’ve lost touch.

11. What will be your FIRST thought when you wake up tomorrow morning?
Is it Friday? Seriously? That’s amazing.

12. Where was your FIRST sleep over?
I think it was at my house.

13. Who was the FIRST person you talked to today?
My co-worker Dave, not the Dave that was sick, but a different Dave.

14. Whose wedding were you in the FIRST time?
That would be my wedding.

15. What was the FIRST thing you did this morning?
At 3AM I stared at the ceiling for about 20 minutes. A few hours later I hit snooze a lot.

16. What was the FIRST concert you ever went to?
I can remember going to a Dire Straits show when I was pretty young.

17. FIRST tattoo or piercing?
I have a tattoo of a koi fish on my right leg.

18. Who was your FIRST kiss from?
Wow, umm… I don’t remember, that was a REALLY long time ago.

19. Who was your FIRST boss?
My first actual boss? Her name was Priscilla. That was when I worked at Subway.

20. When was your FIRST detention?
Probably in the fourth grade.

21. What was the FIRST state you lived in?
Illinois

22. Who was the FIRST person to really break your heart?
I can’t remember (I’m sensing a lack of a good memory as a theme). The earliest one I can remember is Lauren Hayes.

23. Who was your FIRST roommate?
Matt Fron at Ohio Wesleyan University

PASS Chapter SIG

A few weeks after the PASS conference finished up, Blythe Morrow (the community coordinator for PASS) contacted me and asked if I would like to help form a chapter special interest group (SIG). I didn’t even need to think about it for a minute, I immediately agreed and am itching to get started.

The focus of the chapter SIG is mainly going to be on building support systems for the chapters to use in addition to some administrative help. The administrative help will involve reviewing the chapter agreement, choosing a chapter of the month, getting mailings out, and general day to day tasks that help chapters run smoothly.

I’m not as certain what support systems we will be working on, but I could make a guess at it and say that we’ll be looking into knowledge sharing, putting together resource kits, helping chapter leaders start new chapters and revive chapters that are starting to slow down. But, then again I could be completely wrong about this.

What I do know is that I’m very excited to be working to support the growing SQL Server community, both locally and on a global level. We’ll be having a meeting coming up and I’ll post more after the meeting.

Official Chapter SIG announcement

Links for the Week

SQL Server

Amazon Launches Public Data Sets To Ease Research – Michael Arrington (via TechCrunch) outlines the new data sets that Amazon is bringing to the public. This is monumental for a couple of reasons. This data is going to be freely available to anyone and everyone. This alone will add considerable value to many existing applications. In addition, this gives people interested in BI a sufficiently large corpus of data to start learning about building dimensions outside of entry level tutorials.

SQL Server Rounding Methods Rounding isn’t always the clearest thing on earth. This article attempts to explain it and does a great job. Kudos all around!

Development

How To Guarantee That Your Software Will Suck – Justin Etheredge put together a well-formed list of ways to help out programmers cheaply while still spending money. I’m surprised at how often I’ll hear about people using old hardware to develop software. There are some really powerful, really cheap machines out there on the market. Tools are cheap. Time is expensive.

The Frustrating Magical Aspect – Why the Lucky Stiff takes a look at an alternative graphics programming library for Ruby: XRVG. Honestly, this looks pretty cool and could be a good way to start working with the generative art ideas I’ve been toying around with. Now I just need a semi-chaotic stream of data, oh yes, the Amazon sample data sets.

General

Robert Cain has begun posting a series on how to become a more marketable software developer. While this may be old hat to some developers out there, it’s always good to get a refresher in what keeps us all sharp. Step 1 – Become an Expert and Step 2 – Learn Iteratively are online now. I place a huge amount of importance on continuous learning. If it weren’t for my love of learning, I would never have gotten past many stages in my career (*NIX admin, Perl programmer, ASP.NET developer, Ruby/Rails developer, and now SQL Server development DBA).

Why You Should Mix Records on Crap Speakers – Matt Linderman of 37 Signals highlights why it’s important to test with sub-optimal hardware – whether you’re mixing audio or writing a piece of software it’s vital that you test how the final product will work everywhere. As developers, we’re often lucky to have high end hardware. When you switch from a pair of quad core Xeons with 4 gigs of RAM to a 2 GHz Core 2 Duo, the performance might be noticeably slower and it’s vital to experience the software the same way that your users will experience it.

A New Look at New Technology – Tim Benninghoff blogs about re-purposing technology for use in the IT community. Twitter just seems like texting on steroids for tweens, and texting just seems annoying… at first.

SQL Quiz Part 2

Chris Shaw called us out again. This time, it was Tom LaRock who tagged me in this round. This time around, the question is ‘What are the largest challenges that you have faced in your career and how did you overcome those?’

For me, the largest problem has always been myself. It’s difficult to admit, believe me, but over the years I’ve allowed a few bad habits to get in the way of getting my job done effectively.

One of the biggest has been my desire to get everything done and learn everything at the same time. Early in my career, I didn’t know that it was acceptable to say no to people. I took on every project that was thrown my way, regardless of how critical it was, and I tried to multitask to get everything done quickly. This led to very long days, long nights, and long weekends. Combine that with a recent college graduate and an active social life and you have a recipe for burning out.

Over the years, I realized that I didn’t have to know everything about everything and that I didn’t need to be a master of every new technology the instant it came out. I learned that it’s important to be aware of new technologies, to know the pluses and minuses, but I don’t need to have practical experience working with them just so long as I’m aware of the pluses and minuses of any approach and how it fits in as a piece of the entire puzzle.

The thing that has been harder for me is learning how to say ‘No.’ And this, ultimately, comes from my soft skills. I’ll admit freely that part of what drew me to IT was the stereotypical image that people in IT don’t need to have soft skills. Boy, nothing could have been further from the truth.

Due to my lack of soft skills, I have been unable to correctly and effectively say ‘no’ to people. In the past, I’ve over done things, taken on too much, and let myself become exceptionally rude with other people because I was struggling under the load I created for myself.

My soft skills are something that I’m still working on – it’s a daily struggle for me, but it’s a learning process that I appreciate and enjoy. Because I’ve been focused on developing my soft skills, I’ve gotten myself involved organizations like PASS, jumped on board with twitter, and have begun building my network – both amongst IT folks and outside of our immediate industry.

These days, I have a better idea of how to say ‘no’, I’m more comfortable with not knowing everything and I use my desire to know everything to direct myself in constructive directions, and my soft skills have improved greatly over the years. Protip: The best way to say ‘no’ is often to ask ‘why?’

So, in summary, my two biggest problems have been saying no and learning to let go and realize that I can only know so much.

I’m going to tag Louis Davidson aka Dr SQL and Grant Fritchey.

A Quick Introduction to Common Table Expressions

Before diving into how to go about using a common table expression, let’s take a look at what a common table expression is and why you would want to use one.

The What and Why of Common Table Expressions

Essentially, a common table expression (CTE) is a temporary result set. In this regard, a CTE is similar to a view, temporary table, or derived table. There are some important ways that a CTE is different from a view, temporary table, or a dervied table:

CTEs are not persisted, views are. If you only need a particular result set in a single query/stored procedure, creating a view will only cloud up the metadata that is being stored in the database. Using a CTE encapsulates this logic and stores it with the relevant query. If you don’t have the ability to create views, a CTE is also a great way around the lack of permissions.

A CTE can be referenced multiple times in the same statement. How is this better than referring to the same view/result set multiple times in a single query? For starters, every time you want to refer to the same result set from a view or query it’s necessary to repeat the query. This isn’t so bad if your query is as simple as ‘SELECT x, y, z FROM small_view’ but when you are creating a complicated result set that contains multiple aggregates, joins, and groupings your query will become cluttered very quickly. Not to mention the maintenance nightmare that this will cause when you have to change your query due to changes in the underlying table structure.

Every time you make use of a derived table, that query is going to be executed. When using a CTE, that result set is pulled back once and only once within a single query. Here’s a recent example I wrote: data is pulled back for multiple bills within a set of accounts. Within each account the total cost is aggregated by bill. In addition, four separate aggregations are made based on a subset of the result set for each bill. If I had done this with derived tables, that would come out to 5 hits to the underlying database. By using a CTE, only one main hit to disk is made, the result set is held in memory, the query is processed and returned to the user, and then the result set is dropped.

CTEs have limited scope. A CTE is only resident within a single query. Unlike temporary tables which will persist until the user disconnects from the server, a CTE disappears once the query has completed. This makes memory and table management a lot easier on both developers and DBAs.

CTEs can be recursive. A recursive CTE is a very powerful piece of functionality and can be used to retrieve complex hierarchies from a single table that might otherwise require multiple queries to retrieve.

CTEs offer better aggregation possibilities. Within a single query, it normally isn’t possible to produce aggregations on the output of non-deterministic functions. When the output of a non-deterministic function is included in a CTE, you can group by the output. Of course, you could also do this by including the function in a derived table, however the T-SQL to create the CTE ends up being much cleaner to read than using derived tables.

How to use a Common Table Expression

The basic structure of a CTE is very simple:

  WITH cte_name (col_a, col_b, ..., col_z)
  AS
  (
    -- query definition
  )
  SELECT col_a, col_b, ..., col_z
  FROM cte_name

First, it’s important to say that a CTE can be used with any type of query: INSERT, SELECT, UPDATE, DELETE. They are not limited to only SELECT statements.

Second, and very important, the query that immediately precedes a CTE definition has to be terminated with a semi-colon. For this reason you will often see CTEs written as:

;WITH cte_name

Third, the query that uses the CTE has to immediately follow the CTE definition. You can’t write a CTE at the top of a query batch and save it for later.

A quick example

This example requires that you have the AdventureWorks database installed. If you don’t, you can download it from codeplex. Make sure you’re using AdventureWorks and not AdventureWorks2008.

  /* non-CTE query */
  SELECT
  	AVG(OrdersPlaced)
  FROM (
  	SELECT
  		v.VendorID,
  		v.[Name] AS VendorName,
  		COUNT(*) AS OrdersPlaced
  	FROM Purchasing.PurchaseOrderHeader AS poh
  	INNER JOIN Purchasing.Vendor AS v ON poh.VendorID = v.VendorID
  	GROUP BY v.VendorID, v.[Name]
  ) AS x

  /* CTE query */
  WITH cte (VendorId, VendorName, OrdersPlaced)
  AS (
  	SELECT
  		v.VendorID,
  		v.[Name] AS VendorName,
  		COUNT(*) AS OrdersPlaced
  	FROM Purchasing.PurchaseOrderHeader AS poh
  	INNER JOIN Purchasing.Vendor AS v ON poh.VendorID = v.VendorID
  	GROUP BY v.VendorID, v.[Name]
  )
  SELECT
  	AVG(OrdersPlaced) AS AvgOrdersPlaced
  FROM cte;

The CTE definition retrieves the vendor id, name, and count of all orders grouped by the vendor id and name. That is to say it selects the number of orders for each vendor. Once this is collected and aggregated in the CTE definition, the average number of orders is computed.

If you were to take a look at the execution plans of both of the previous queries, there would be no difference. That’s because in this simple case there is no difference. This is a fairly contrived example to demonstrate the difference in syntax before jumping in. Let’s take a look at a less contrived example.

  /* non-CTE query */
  SELECT
  	(SELECT AVG(OrdersPlaced)
  	FROM (SELECT v.VendorID, v.[Name] AS VendorName, COUNT(*) AS OrdersPlaced
  			FROM Purchasing.PurchaseOrderHeader AS poh
  			INNER JOIN Purchasing.Vendor AS v ON poh.VendorID = v.VendorID
  			GROUP BY v.VendorID, v.[Name]) AS x) AS AvgOrdersPlaced,
  	(SELECT MAX(OrdersPlaced)
  	FROM (SELECT v.VendorID, v.[Name] AS VendorName, COUNT(*) AS OrdersPlaced
  			FROM Purchasing.PurchaseOrderHeader AS poh
  			INNER JOIN Purchasing.Vendor AS v ON poh.VendorID = v.VendorID
  			GROUP BY v.VendorID, v.[Name]) AS x) AS MaxOrdersPlaced,
  	(SELECT MIN(OrdersPlaced)
  	FROM (SELECT v.VendorID, v.[Name] AS VendorName, COUNT(*) AS OrdersPlaced
  			FROM Purchasing.PurchaseOrderHeader AS poh
  			INNER JOIN Purchasing.Vendor AS v ON poh.VendorID = v.VendorID
  			GROUP BY v.VendorID, v.[Name]) AS x) AS MinOrdersPlaced

  /* CTE query */
  WITH cte (VendorId, VendorName, OrdersPlaced)
  AS (
  	SELECT
  		v.VendorID,
  		v.[Name] AS VendorName,
  		COUNT(*) AS OrdersPlaced
  	FROM Purchasing.PurchaseOrderHeader AS poh
  	INNER JOIN Purchasing.Vendor AS v ON poh.VendorID = v.VendorID
  	GROUP BY v.VendorID, v.[Name]
  )
  SELECT
  	AVG(OrdersPlaced) AS AvgOrdersPlaced,
  	MAX(OrdersPlaced) AS MaxOrdersPlaced,
  	MIN(OrdersPlaced) AS MinOrdersPlaced
  FROM cte;
Multiple sub-queries means multiple steps to aggregate

Multiple sub-queries means multiple steps to aggregate

Already the non-CTE based query is starting to get ugly and complicated. There are three nested sub-selects. Each one is used to compute a separate aggregation of the data in the original query, which must be repeated in each query. Unfortunately this means that the query optimizer is going to be busy building multiple result sets and merging them together. This can rapidly become very complicated and produce poorly performing queries, not to mention become a maintenance nightmare.

A CTE uses only one step to perform multiple aggregates

A CTE uses only one step to perform multiple aggregates

The CTE query is still just as simple as the first query. The two new aggregations have been added and it only took two additional lines of T-SQL. If you take a look at the execution plan, this happens in the same step during query processing, vs the separate steps that are executed in the non-CTE query. While it would be nice for the query engine to notice what we’re doing in the first query, it doesn’t make any sense to force the optimizer to make up for poor coding standards.

Recursive Common Table Expressions

There’s one other great feature of CTEs: recursion. By referencing the CTE within the definition it’s possible to create a recursive query. A recursive CTE has two parts: the anchor member and the recursive member(s). The anchor must come before the recursive members.

Execution of a recursive CTE happens in three stages:

  1. The anchor member is evaluated
  2. The recursive member(s) are evaluated
  3. Rinse and repeat until a termination condition is met

There are, of course, a few more rules than this:

  • The anchor and recursive members have to be joined by UNION, UNION ALL, EXCEPT, or INTERSECT operators.
  • The FROM of each recursive member can only refer to the CTE expression.
  • You can’t use any of the following in a recursive CTE:
    • SELECT DISTINCT
    • GROUP BY
    • HAVING
    • TOP
    • LEFT, RIGHT, or OUTER JOIN (you can use an INNER JOIN)
    • A subquery
    • Scalar aggregation (AVG, MIN, MAX, COUNT, COUNT_BIG)
    • A query hint applied to a recursive reference to the CTE
    • Data types and columns must match

With all of those rules out of the way, let’s take a look at a recursive CTE. This one is straight out of Books Online:

  WITH DirectReports(ManagerID, EmployeeID, EmployeeLevel) AS
  (
    /* anchor */
    SELECT ManagerID, EmployeeID, 0 AS EmployeeLevel
    FROM HumanResources.Employee
    WHERE ManagerID IS NULL

    UNION ALL

    /* recursion */
    SELECT e.ManagerID, e.EmployeeID, EmployeeLevel + 1
    FROM HumanResources.Employee e
        INNER JOIN DirectReports d
        ON e.ManagerID = d.EmployeeID
  )
  SELECT ManagerID, EmployeeID, EmployeeLevel
  FROM DirectReports;

This query will retrieve a recursive list of all employees of AdventureWorks, starting at the CEO and going all the way down to the bottom of the org chart. It’s a relatively simple example, but it shows how to reference the anchor member within the recursive member.

It’s important to remember that CTEs can contain CTEs and that a recursive CTE can contain multiple references to the anchor member.

For more information on CTEs, please refer to Books Online:

Get a Table of Months Between Two Dates

Sometimes you need a list of months between two dates. Admittedly, this is another case where a calendar table would come in VERY handy. Unfortunately, I have not been able to build one yet.

In case you’re in a similar situation, here’s how you could go about doing this:

CREATE FUNCTION dbo.GetMonthList (
  @StartDate DATETIME,
  @EndDate   DATETIME
)
RETURNS @months TABLE (
  [month] DATETIME
)
WITH EXECUTE AS OWNER AS
BEGIN
/******************************************************************************
 * Author:      Jeremiah Peschka
 * Create Date: 2008-11-20
 * Description: Create a table of dates between @StartDate and @EndDate
 *****************************************************************************/
  DECLARE @MonthDiff         INT;
  DECLARE @counter           INT;
  DECLARE @tbl               TABLE ([month] DATETIME);

  SET @StartDate = '2008-01-01';
  SET @EndDate   = '2008-12-01';

  SET @counter      = 0;

  SELECT @MonthDiff = DATEDIFF(mm, @StartDate, @EndDate);

  WHILE @counter <= @MonthDiff
  BEGIN
      INSERT @months
      SELECT (DATEADD(mm, @counter, @StartDate));

    SET @counter = @counter + 1;
  END

  RETURN;
END
GO

Links for the week

SQL Server

Anatomy of a Deadlock – Jonathan Kehayias outlines how deadlocks can occur, even in scenarios where they seemingly shouldn’t occur. I answered a question about this several weeks ago, but I couldn’t determine what the underlying cause of the deadlock was, I did not realize at the time that a SELECT would issue a Shared with Intent Exclusive lock on the table. (More on lock modes)

Showplan error – Tom LaRock delves into an execution plan oddity: Are you a member of sysadmin? Can’t get an execution plan for some reason? EXECUTE AS might be the culprit.

Development

Ext Framework, jQuery and ASP.NET – Daniel Penrod outlines how to approach using either the Ext or jQuery framework within your ASP.NET applications. JavaScript isn’t scary. jQuery makes JavaScript exceptionally easy and, dare I say it, fun.

Using System.Web.Abstractions in Your WebForms Apps – Justin Etheredge covers how to make use of the new ASP.NET MVC classes in a traditional web forms application. Why might you want to do this? Testing and mocking, of course!

General

Big Three: Clean up your dealerships – Jason Fried (of 37signals) recently purchased a new car and writes about the appearance of the individual dealerships he visited. Appearances matter, especially on something like a car purchase.

Keeping Your Head In The Cloud – Jeff Blankenburg creates a fictional t-shirt company, becomes wildly successful, begins selling to China, and solves his concurrency and uptime issues with Windows Azure!

This site is protected with Urban Giraffe's plugin 'HTML Purified' and Edward Z. Yang's Powered by HTML Purifier. 219 items have been purified.