Solving Business Problems with SQL

Businesses have all kind of interesting rules for working with data. Sometimes these rules are incredibly easy to implement in the database. Sometimes these rules are incredibly to implement in the application layer. Sometimes, it’s difficult to construct these rules no matter where you are in the entire application stack.

I recently came across a situation that required a bit of head scratching before I got things working correctly. What made this incredibly interesting to me was the apparent simplicity of the business rules.

The stored procedure must return a set of users, given an administrator’s user id, that the administrator is able to edit. An administrator is able to edit a user if the following criteria are met:

  1. There is at least one client in common between the administrator and the user.
  2. A user’s set of clients must all be members of the admin’s set of clients.
  3. If conditions 1 & 2 are not met, the user is excluded from the set and is effectively invisible to the administrator.

Before getting started working on real data, I created a sample set of data that I can share here:

IF OBJECT_ID('tempdb..#users') IS NOT NULL
  DROP TABLE #users;

  UserId INT,
  ClientId INT

-- Admin
INSERT INTO #users VALUES (1, 1);
INSERT INTO #users VALUES (1, 2);
INSERT INTO #users VALUES (1, 3);

-- User with two clients
INSERT INTO #users VALUES (10, 1);
INSERT INTO #users VALUES (10, 2);

-- Another user with two clients
INSERT INTO #users VALUES (11, 2);
INSERT INTO #users VALUES (11, 3);

-- User with same clients as admin
INSERT INTO #users VALUES (12, 1);
INSERT INTO #users VALUES (12, 2);
INSERT INTO #users VALUES (12, 3);

-- User with overlapping set of clients
INSERT INTO #users VALUES (20, 2);
INSERT INTO #users VALUES (20, 3);
INSERT INTO #users VALUES (20, 4);

-- User with no matching clients
INSERT INTO #users VALUES (21, 4);
INSERT INTO #users VALUES (21, 5);
INSERT INTO #users VALUES (21, 6);

So, let’s start with an initial query:

  FROM #users AS a
       INNER JOIN #users AS b
          ON a.ClientId = b.ClientId

This really doesn’t tell us much, but it does give us a list of all users and any users who share the same client.

By adding a predicate to make sure that the user from the ‘a’ table is our administrator, like so:

  FROM #users AS a
       INNER JOIN #users AS b
          ON a.ClientId = b.ClientId
 WHERE a.UserId = 1

we’ve effectively fulfilled the first requirement:

There is at least one client in common between the administrator and the user.

Unfortunately, that’s the easy part. Determining if the user’s set of clients are a subset of the administrator’s set of clients is a little bit trickier. This is, actually, where a full outer join becomes incredibly helpful.

We’re going to change the query around considerably in order to get the desired results:

;WITH cte AS (
       a.UserId AS a_UserId,
       a.ClientId AS a_ClientId,
       b.UserId AS b_UserId,
       b.ClientId AS b_ClientId,
       COUNT(*) OVER (PARTITION BY a.UserId, b.UserId) AS admin_count,
       COUNT(*) OVER (PARTITION BY b.UserId) AS user_count
          FROM #users
         WHERE UserId = 1) AS a
                          FROM #users) AS b ON a.ClientId = b.ClientId
  FROM cte AS c
 WHERE admin_count = user_count 
   AND a_UserId  b_UserId     
 ORDER BY b_UserId

So, what did we change? Well, rather than directly joining from #users to #users, we’re using two sub-selects and have changed from an inner join to a full outer join. Changing to a full outer join will, as everyone knows, give us all rows from both queries: even if the user has access to a client but the administrator does not, that user’s information will still be returned.

Second, we’ve changed the core query to only return the administrator, rather than joining on all rows where the Client Ids are the same. Why? Well, we don’t care if other users have access to the same set or subset of clients, we only care if the users and the administrator share a common set of clients.

What’s with all of the COUNT(*) OVER (…) nonsense? Well, by using COUNT(*) and PARTITION BY (as I talked about last week in Using Partitioning Functions to Find Duplicate Rows ) you can detect duplicate rows. What you can also do is determine the count of specific criteria by using the PARTITION BY clause to do some implicit grouping in your query.

The first partition by on both a.UserId and b.UserId provides the row count for the number of times the unique combination of administrator and user occur in the overall result set.

The second partition by function provides the row count of the number rows that contain the user id in the total result set.

If the administrator row count does not equal the user row count we know that there are more user rows than there are administrator rows.

Why did I include the ROW_NUMBER() in the query? When I’m writing these kinds of queries I will typically include the ROW_NUMBER() windowing function so I can apply different criteria in the OVER clause to help keep track of the query. This is especially important when you’re dealing with a query that contains a large number of surrogate keys.

Finally, I wrapped everything in a CTE. This makes it possible to actually apply the count comparisons that I mentioned above. It also makes it easier to perform any additional filtering on the column names that I might need in the future for additional business rules.