A Quick Introduction to Common Table Expressions

Before diving into how to go about using a common table expression, let’s take a look at what a common table expression is and why you would want to use one.

The What and Why of Common Table Expressions

Essentially, a common table expression (CTE) is a temporary result set. In this regard, a CTE is similar to a view, temporary table, or derived table. There are some important ways that a CTE is different from a view, temporary table, or a dervied table:

CTEs are not persisted, views are. If you only need a particular result set in a single query/stored procedure, creating a view will only cloud up the metadata that is being stored in the database. Using a CTE encapsulates this logic and stores it with the relevant query. If you don’t have the ability to create views, a CTE is also a great way around the lack of permissions.

A CTE can be referenced multiple times in the same statement. How is this better than referring to the same view/result set multiple times in a single query? For starters, every time you want to refer to the same result set from a view or query it’s necessary to repeat the query. This isn’t so bad if your query is as simple as ‘SELECT x, y, z FROM small_view’ but when you are creating a complicated result set that contains multiple aggregates, joins, and groupings your query will become cluttered very quickly. Not to mention the maintenance nightmare that this will cause when you have to change your query due to changes in the underlying table structure.

Every time you make use of a derived table, that query is going to be executed. When using a CTE, that result set is pulled back once and only once within a single query. Here’s a recent example I wrote: data is pulled back for multiple bills within a set of accounts. Within each account the total cost is aggregated by bill. In addition, four separate aggregations are made based on a subset of the result set for each bill. If I had done this with derived tables, that would come out to 5 hits to the underlying database. By using a CTE, only one main hit to disk is made, the result set is held in memory, the query is processed and returned to the user, and then the result set is dropped.

CTEs have limited scope. A CTE is only resident within a single query. Unlike temporary tables which will persist until the user disconnects from the server, a CTE disappears once the query has completed. This makes memory and table management a lot easier on both developers and DBAs.

CTEs can be recursive. A recursive CTE is a very powerful piece of functionality and can be used to retrieve complex hierarchies from a single table that might otherwise require multiple queries to retrieve.

CTEs offer better aggregation possibilities. Within a single query, it normally isn’t possible to produce aggregations on the output of non-deterministic functions. When the output of a non-deterministic function is included in a CTE, you can group by the output. Of course, you could also do this by including the function in a derived table, however the T-SQL to create the CTE ends up being much cleaner to read than using derived tables.

How to use a Common Table Expression

The basic structure of a CTE is very simple:

  WITH cte_name (col_a, col_b, ..., col_z)
  AS
  (
    -- query definition
  )
  SELECT col_a, col_b, ..., col_z
  FROM cte_name

First, it’s important to say that a CTE can be used with any type of query: INSERT, SELECT, UPDATE, DELETE. They are not limited to only SELECT statements.

Second, and very important, the query that immediately precedes a CTE definition has to be terminated with a semi-colon. For this reason you will often see CTEs written as:

;WITH cte_name

Third, the query that uses the CTE has to immediately follow the CTE definition. You can’t write a CTE at the top of a query batch and save it for later.

A quick example

This example requires that you have the AdventureWorks database installed. If you don’t, you can download it from codeplex. Make sure you’re using AdventureWorks and not AdventureWorks2008.

  /* non-CTE query */
  SELECT 
  	AVG(OrdersPlaced)
  FROM (
  	SELECT
  		v.VendorID,
  		v.[Name] AS VendorName,
  		COUNT(*) AS OrdersPlaced
  	FROM Purchasing.PurchaseOrderHeader AS poh
  	INNER JOIN Purchasing.Vendor AS v ON poh.VendorID = v.VendorID
  	GROUP BY v.VendorID, v.[Name]
  ) AS x
  
  /* CTE query */
  WITH cte (VendorId, VendorName, OrdersPlaced)
  AS (
  	SELECT
  		v.VendorID,
  		v.[Name] AS VendorName,
  		COUNT(*) AS OrdersPlaced
  	FROM Purchasing.PurchaseOrderHeader AS poh
  	INNER JOIN Purchasing.Vendor AS v ON poh.VendorID = v.VendorID
  	GROUP BY v.VendorID, v.[Name]
  )
  SELECT
  	AVG(OrdersPlaced) AS AvgOrdersPlaced
  FROM cte;

The CTE definition retrieves the vendor id, name, and count of all orders grouped by the vendor id and name. That is to say it selects the number of orders for each vendor. Once this is collected and aggregated in the CTE definition, the average number of orders is computed.

If you were to take a look at the execution plans of both of the previous queries, there would be no difference. That’s because in this simple case there is no difference. This is a fairly contrived example to demonstrate the difference in syntax before jumping in. Let’s take a look at a less contrived example.

  /* non-CTE query */
  SELECT
  	(SELECT AVG(OrdersPlaced)
  	FROM (SELECT v.VendorID, v.[Name] AS VendorName, COUNT(*) AS OrdersPlaced
  			FROM Purchasing.PurchaseOrderHeader AS poh
  			INNER JOIN Purchasing.Vendor AS v ON poh.VendorID = v.VendorID
  			GROUP BY v.VendorID, v.[Name]) AS x) AS AvgOrdersPlaced,
  	(SELECT MAX(OrdersPlaced)
  	FROM (SELECT v.VendorID, v.[Name] AS VendorName, COUNT(*) AS OrdersPlaced
  			FROM Purchasing.PurchaseOrderHeader AS poh
  			INNER JOIN Purchasing.Vendor AS v ON poh.VendorID = v.VendorID
  			GROUP BY v.VendorID, v.[Name]) AS x) AS MaxOrdersPlaced,
  	(SELECT MIN(OrdersPlaced)
  	FROM (SELECT v.VendorID, v.[Name] AS VendorName, COUNT(*) AS OrdersPlaced
  			FROM Purchasing.PurchaseOrderHeader AS poh
  			INNER JOIN Purchasing.Vendor AS v ON poh.VendorID = v.VendorID
  			GROUP BY v.VendorID, v.[Name]) AS x) AS MinOrdersPlaced
  
  /* CTE query */
  WITH cte (VendorId, VendorName, OrdersPlaced)
  AS (
  	SELECT
  		v.VendorID,
  		v.[Name] AS VendorName,
  		COUNT(*) AS OrdersPlaced
  	FROM Purchasing.PurchaseOrderHeader AS poh
  	INNER JOIN Purchasing.Vendor AS v ON poh.VendorID = v.VendorID
  	GROUP BY v.VendorID, v.[Name]
  )
  SELECT
  	AVG(OrdersPlaced) AS AvgOrdersPlaced,
  	MAX(OrdersPlaced) AS MaxOrdersPlaced,
  	MIN(OrdersPlaced) AS MinOrdersPlaced
  FROM cte;

Multiple sub-queries means multiple steps to aggregate

Already the non-CTE based query is starting to get ugly and complicated. There are three nested sub-selects. Each one is used to compute a separate aggregation of the data in the original query, which must be repeated in each query. Unfortunately this means that the query optimizer is going to be busy building multiple result sets and merging them together. This can rapidly become very complicated and produce poorly performing queries, not to mention become a maintenance nightmare.

A CTE uses only one step to perform multiple aggregates

A CTE using one step to perform multiple aggregates

The CTE query is still just as simple as the first query. The two new aggregations have been added and it only took two additional lines of T-SQL. If you take a look at the execution plan, this happens in the same step during query processing, vs the separate steps that are executed in the non-CTE query. While it would be nice for the query engine to notice what we’re doing in the first query, it doesn’t make any sense to force the optimizer to make up for poor coding standards.

Recursive Common Table Expressions

There’s one other great feature of CTEs: recursion. By referencing the CTE within the definition it’s possible to create a recursive query. A recursive CTE has two parts: the anchor member and the recursive member(s). The anchormust come before the recursive members.

Execution of a recursive CTE happens in three stages:

  1. The anchor member is evaluated
  2. The recursive member(s) are evaluated
  3. Rinse and repeat until a termination condition is met

There are, of course, a few more rules than this:

  • The anchor and recursive members have to be joined by UNION, UNION ALL, EXCEPT, or INTERSECT operators.
  • The FROM of each recursive member can only refer to the CTE expression.
  • You can’t use any of the following in a recursive CTE:
    • SELECT DISTINCT
    • GROUP BY
    • HAVING
    • TOP
    • LEFT, RIGHT, or OUTER JOIN (you can use an INNER JOIN)
    • A subquery
    • Scalar aggregation (AVG, MIN, MAX, COUNT, COUNT_BIG)
    • A query hint applied to a recursive reference to the CTE
    • Data types and columns must match

With all of those rules out of the way, let’s take a look at a recursive CTE. This one is straight out of Books Online:

  WITH DirectReports(ManagerID, EmployeeID, EmployeeLevel) AS 
  (
    /* anchor */
    SELECT ManagerID, EmployeeID, 0 AS EmployeeLevel
    FROM HumanResources.Employee
    WHERE ManagerID IS NULL
    
    UNION ALL
    
    /* recursion */
    SELECT e.ManagerID, e.EmployeeID, EmployeeLevel + 1
    FROM HumanResources.Employee e
        INNER JOIN DirectReports d
        ON e.ManagerID = d.EmployeeID 
  )
  SELECT ManagerID, EmployeeID, EmployeeLevel 
  FROM DirectReports;

This query will retrieve a recursive list of all employees of AdventureWorks, starting at the CEO and going all the way down to the bottom of the org chart. It’s a relatively simple example, but it shows how to reference the anchor member within the recursive member.

It’s important to remember that CTEs can contain CTEs and that a recursive CTE can contain multiple references to the anchor member.

For more information on CTEs, please refer to Books Online:

Menu