Have you ever wanted to get a running total of all of the descendants of each tree node? This sort of thing is useful, especially if you don’t want to pull back an entire object graph just to compute the count of a child collection.
IF OBJECT_ID('tempdb..#categories') IS NOT NULL
DROP TABLE #categories;
IF OBJECT_ID('tempdb..#product_categories') IS NOT NULL
DROP TABLE #product_categories;
CREATE TABLE #categories (CategoryId INT, ParentCategoryId INT);
CREATE TABLE #product_categories (CategoryId INT, ProductId INT);
INSERT INTO #categories VALUES (1, NULL);
INSERT INTO #categories VALUES (2, 1);
INSERT INTO #categories VALUES (3, 1);
INSERT INTO #categories VALUES (4, 2);
INSERT INTO #categories VALUES (5, 3);
INSERT INTO #categories VALUES (6, 2);
INSERT INTO #product_categories VALUES (1, 1);
INSERT INTO #product_categories VALUES (2, 15);
INSERT INTO #product_categories VALUES (2, 18);
INSERT INTO #product_categories VALUES (3, 12);
INSERT INTO #product_categories VALUES (6, 34);
INSERT INTO #product_categories VALUES (4, 35);
INSERT INTO #product_categories VALUES (5, 99);
INSERT INTO #product_categories VALUES (3, 43);
INSERT INTO #product_categories VALUES (6, 54);
INSERT INTO #product_categories VALUES (3, 92);
INSERT INTO #product_categories VALUES (2, 77);
INSERT INTO #product_categories VALUES (5, 62);
INSERT INTO #product_categories VALUES (4, 42);
INSERT INTO #product_categories VALUES (1, 11);
;
WITH cte(CategoryId, ParentCategoryId, c)
AS (SELECT c1.CategoryId,
c1.ParentCategoryId,
(SELECT COUNT(*)
FROM (SELECT DISTINCT
ProductId
FROM #product_categories AS pc
WHERE pc.CategoryId = c1.CategoryId
) AS t1
) AS c
FROM #categories AS c1
UNION ALL
SELECT c2.CategoryId,
c2.ParentCategoryID,
d.c
FROM #categories c2
INNER JOIN cte d ON c2.CategoryId = d.ParentCategoryId
)
SELECT cte.CategoryId,
cte.ParentCategoryId,
SUM(c) AS ProductCount
FROM cte
GROUP BY cte.CategoryId,
cte.ParentCategoryId;
So, how does mess work? Carefully.
The innermost select, t1, builds a virtual table that is nothing more than the product count under a single Category. There is a DISTINCT on here just to make sure that there are no duplicates from the #product_categories table. Theoretically this should be a group by, but this was a super fast hotfix that needed to go into production ASAP. Mistakes happen.
Moving outwards we encounter a SELECT COUNT on t1. This is where we get the actual count of products under the current category. Pretty standard.
What’s interesting, to me at least, is that in the recursive portion of the CTE, we’re working our way UP the hierarchy, from the bottom. This makes sure that when we do a sum of c, grouped by category and parent category, we’ll have the proper count in our CTE results.
What’s the advantage of this approach? You don’t have nasty deeply nested sub-selects. You can use this to populate a ChildCount column in your table as a result of a trigger. This is also, usually, fast enough to run live or else to use as a materialized view.