Counting Children with CTEs

Have you ever wanted to get a running total of all of the descendants of each tree node? This sort of thing is useful, especially if you don’t want to pull back an entire object graph just to compute the count of a child collection.

IF OBJECT_ID('tempdb..#categories') IS NOT NULL
  DROP TABLE #categories;
  
IF OBJECT_ID('tempdb..#product_categories') IS NOT NULL
  DROP TABLE #product_categories;

CREATE TABLE #categories (CategoryId INT, ParentCategoryId INT);
CREATE TABLE #product_categories (CategoryId INT, ProductId INT);

INSERT INTO #categories VALUES (1, NULL);
INSERT INTO #categories VALUES (2, 1);
INSERT INTO #categories VALUES (3, 1);
INSERT INTO #categories VALUES (4, 2);
INSERT INTO #categories VALUES (5, 3);
INSERT INTO #categories VALUES (6, 2);

INSERT INTO #product_categories VALUES (1, 1);
INSERT INTO #product_categories VALUES (2, 15);
INSERT INTO #product_categories VALUES (2, 18);
INSERT INTO #product_categories VALUES (3, 12);
INSERT INTO #product_categories VALUES (6, 34);
INSERT INTO #product_categories VALUES (4, 35);
INSERT INTO #product_categories VALUES (5, 99);
INSERT INTO #product_categories VALUES (3, 43);
INSERT INTO #product_categories VALUES (6, 54);
INSERT INTO #product_categories VALUES (3, 92);
INSERT INTO #product_categories VALUES (2, 77);
INSERT INTO #product_categories VALUES (5, 62);
INSERT INTO #product_categories VALUES (4, 42);
INSERT INTO #product_categories VALUES (1, 11);

;
WITH  cte(CategoryId, ParentCategoryId, c)
        AS (SELECT  c1.CategoryId,
                    c1.ParentCategoryId,
                    (SELECT COUNT(*)
                     FROM   (SELECT DISTINCT
                                    ProductId
                             FROM   #product_categories AS pc
                             WHERE  pc.CategoryId = c1.CategoryId
                            ) AS t1
                    ) AS c
            FROM    #categories AS c1
            UNION ALL
            SELECT  c2.CategoryId,
                    c2.ParentCategoryID,
                    d.c
            FROM    #categories c2
                    INNER JOIN cte d ON c2.CategoryId = d.ParentCategoryId
           )
  SELECT  cte.CategoryId,
          cte.ParentCategoryId,
          SUM(c) AS ProductCount
  FROM    cte
  GROUP BY cte.CategoryId,
          cte.ParentCategoryId;

So, how does mess work? Carefully.

The innermost select, t1, builds a virtual table that is nothing more than the product count under a single Category. There is a DISTINCT on here just to make sure that there are no duplicates from the #product_categories table. Theoretically this should be a group by, but this was a super fast hotfix that needed to go into production ASAP. Mistakes happen.

Moving outwards we encounter a SELECT COUNT on t1. This is where we get the actual count of products under the current category. Pretty standard.

What’s interesting, to me at least, is that in the recursive portion of the CTE, we’re working our way UP the hierarchy, from the bottom. This makes sure that when we do a sum of c, grouped by category and parent category, we’ll have the proper count in our CTE results.

What’s the advantage of this approach? You don’t have nasty deeply nested sub-selects. You can use this to populate a ChildCount column in your table as a result of a trigger. This is also, usually, fast enough to run live or else to use as a materialized view.

Menu