Cursing at cursors

Cursors. Many database developers have to suppress a shudder of horror when they open up a stored procedure only to find a cursor lurking inside. They certainly have their place in the database developer’s arsenal of tools, but they can usually be rewritten as an easily understood set operation.

I’ve spent the first three days at my new client learning the database schema and migrating four load routines from the old to the new database schema. The new schema is, admittedly, easier to understand simply because of the virtue of using design up front principles. But I digress.

These four load processes make use of cursors and, in one instance, nested cursors.

Without divulging too much of the client’s inner workings, the first cursor I encountered looked something like this:


DECLARE @start_date DATETIME;
DECLARE @end_date DATETIME;
DECLARE @end_date2 DATETIME;
DECLARE @account_no INT;

DECLARE my_cursor CURSOR
FOR
  SELECT account_no, start_date, end_date
  FROM Bill;
-- obviously the real query was more complicated than this

OPEN my_cursor;
FETCH NEXT FROM my_cursor
INTO @account_no, @start_date, @end_date;

SET @end_date2 = @start_date

WHILE @@fetch_status = 0
BEGIN
  IF DATEDIFF(dd, @start_date1, @end_date2) > 7
    INSERT INTO some_table (some values);
  END
END;

SET @end_date2 = @end_date;

FETCH NEXT FROM my_cursor
INTO @account_no, @start_date, @end_date;

Essentially, this will insert any records that are at least 7 days older than the previous record. The downside is that, even in my overly simplified example, this isn’t easy to read. When the full query logic is involved this becomes very difficult to read.

I re-wrote the original query using a CTE:


WITH my_cte (id, account_no, start_date, end_date)
AS (
  SELECT
    ROW_NUMBER() OVER (account_no, start_date, end_date, some_values),
    account_no,
    start_date,
    end_date,
    some_values
  FROM Bills
  GROUP BY account_no, start_date, end_date
)
INSERT INTO some_table (some_columns)
SELECT DISTINCT
  account_no,
  start_date,
  end_date,
  some_values
FROM my_cte AS o
INNER JOIN my_cte AS o2 ON o2.account_no = o.account_no
WHERE o2.start_date < o.start_date
  AND DATEDIFF(dd, o2.start_date, o.start_date) > 7

Instead of having to process each record one at a time, the new query is able to process several million records and insert approximately 60,000 records in 11 seconds and is much more readable and reusable as a result.

This CTE pattern has become the basis for several data load procedures that I’ve re-written from cursors, including a pair of nested cursors.

Special thanks, once again, to my brother for helping me rubber duck my way through this and for providing the initial idea that resulted in using a CTE.

Menu