Refactoring SQL code can be pretty easy. Just like refactoring any other programming language, sometimes you have to look around to find the culprit.
We have a slow running query that frequently times out when run by the users. A quick look at the query told me what was wrong. I found this clause in the middle of an otherwise simple query:
LEFT JOIN (
SELECT DISTINCT
key,
column_a
FROM ImportantData
WHERE FieldName = 'VALUES'
UNION ALL
SELECT DISTINCT
key,
column_a
FROM ImportantData_History
WHERE FieldName = 'VALUES'
) AS t ON other.key = t.key
LEFT JOIN (
SELECT DISTINCT
key,
column_b
FROM ImportantData
WHERE FieldName = 'VALUESXY'
UNION ALL
SELECT DISTINCT
key,
column_b
FROM ImportantData_History
WHERE FieldName = 'VALUESXY'
) AS t2 ON other.key = t2.key
The problem with this query is that we’re reading each table twice for the same data. I checked in the execution plan and, sure enough, SQL Server was performing two scans on the underlying data. After some careful thought I realized that I could accomplish the same thing with a single query:
LEFT JOIN (
SELECT key,
VALUES AS column_a,
VALUESXY AS column_b
FROM (
SELECT DISTINCT
key,
FieldName,
CASE WHEN FieldName = 'VALUES'
THEN column_a
WHEN FieldName = 'VALUESXY'
THEN column_b
END AS content
FROM ImportantData
WHERE FieldName = 'VALUES'
OR FieldName = 'VALUESXY'
UNION ALL
SELECT DISTINCT
key,
FieldName,
CASE WHEN FieldName = 'VALUES'
THEN column_a
WHEN FieldName = 'VALUESXY'
THEN column_b
END AS content
FROM ImportantData_History
WHERE FieldName = 'VALUES'
OR FieldName = 'VALUESXY'
) AS source PIVOT( MAX(CONTENT) FOR FieldName IN ([VALUES],
[VALUESXY]) ) AS pvt
) AS t ON other.key= t.key
The upside to this unreadable pile of junk is that it scans each table once and only once. This reduces the load on disk, the amount of data that needs to be read into memory, and it most likely reduces the amount of data stored in memory. And, while this only reduced the estimated query cost by 40 (580 down to 540), we all know that the query cost is only one factor in what actually affects the overall run time of a query.
Many readers are probably mortified that there’s a SELECT DISTINCT in this query. I am too. That being said, I plucked the low-hanging fruit and managed to get a huge performance boost as a result – query execution time went from over 1 minute to 14 seconds. With an improvement like that, removing that distinct isn’t on the radar any more.
Trackbacks
One Trackback