Cursors. Many database developers have to suppress a shudder of horror when they open up a stored procedure only to find a cursor lurking inside. They certainly have their place in the database developer’s arsenal of tools, but they can usually be rewritten as an easily understood set operation.
I’ve spent the first three days at my new client learning the database schema and migrating four load routines from the old to the new database schema. The new schema is, admittedly, easier to understand simply because of the virtue of using design up front principles. But I digress.
These four load processes make use of cursors and, in one instance, nested cursors.
Without divulging too much of the client’s inner workings, the first cursor I encountered looked something like this:
DECLARE @start_date DATETIME; DECLARE @end_date DATETIME; DECLARE @end_date2 DATETIME; DECLARE @account_no INT; DECLARE my_cursor CURSOR FOR SELECT account_no, start_date, end_date FROM Bill; -- obviously the real query was more complicated than this OPEN my_cursor; FETCH NEXT FROM my_cursor INTO @account_no, @start_date, @end_date; SET @end_date2 = @start_date WHILE @@fetch_status = 0 BEGIN IF DATEDIFF(dd, @start_date1, @end_date2) > 7 INSERT INTO some_table (some values); END END; SET @end_date2 = @end_date; FETCH NEXT FROM my_cursor INTO @account_no, @start_date, @end_date;
Essentially, this will insert any records that are at least 7 days older than the previous record. The downside is that, even in my overly simplified example, this isn’t easy to read. When the full query logic is involved this becomes very difficult to read.
I re-wrote the original query using a CTE:
WITH my_cte (id, account_no, start_date, end_date) AS ( SELECT ROW_NUMBER() OVER (account_no, start_date, end_date, some_values), account_no, start_date, end_date, some_values FROM Bills GROUP BY account_no, start_date, end_date ) INSERT INTO some_table (some_columns) SELECT DISTINCT account_no, start_date, end_date, some_values FROM my_cte AS o INNER JOIN my_cte AS o2 ON o2.account_no = o.account_no WHERE o2.start_date < o.start_date AND DATEDIFF(dd, o2.start_date, o.start_date) > 7
Instead of having to process each record one at a time, the new query is able to process several million records and insert approximately 60,000 records in 11 seconds and is much more readable and reusable as a result.
This CTE pattern has become the basis for several data load procedures that I’ve re-written from cursors, including a pair of nested cursors.
Special thanks, once again, to my brother for helping me rubber duck my way through this and for providing the initial idea that resulted in using a CTE.