Jul 05 2008

Evaluating IS NULL vs <> with T-SQL DATETIMEs

Tag: sqljeremiah @ 9:20 pm

Many OR/Ms use an UpdatedAt column and some even use a DeletedAt column for soft deletes. This practice also carries over when using date ranges for controlling notification display (StartDate and EndDate).

I’ve often heard many DBAs make the claim that doing a date comparison is faster than checking if the value in a date column IS NULL, so I decided to run some primitive benchmarks to see what I would discover. Admittedly, I’ve also held this belief myself, but I’ve never witnessed any numbers to either prove or disprove the theory.

First, I created two tables with two columns each - CreatedAt and DeletedAt. One table allows NULLs in the DeletedAt column, the other does not.

Next I filled each table with 1,000,000 random DATETIME values. Each table recieved the exact same DATETIME values in each row and a random number seed was used to determine whether or not the NULL table would have a NULL value or a random DeletedAt value. In my test case, 500843 records had a NULL value. I then ran three separate queries to compare the performance of IS NULL against . Before and after each query, the current time was stored in a variable which was manipulated using a millisecond DATEDIFF to determine the total execution time.

  1. -- turn off printing row counts to speed up processing
  2. SET NOCOUNT ON;
  3.  
  4. -- Create test tables
  5. IF OBJECT_ID('Benchmarking.dbo.TestNullDatetime', 'U') IS NULL
  6. BEGIN
  7. CREATE TABLE TestNullDatetime
  8. (
  9. CreatedAt DATETIME NOT NULL,
  10. DeletedAt DATETIME
  11. );
  12. END;
  13.  
  14. IF OBJECT_ID('Benchmarking.dbo.TestNotNulLDatetime', 'U') IS NULL
  15. BEGIN
  16. CREATE TABLE TestNotNullDateTime
  17. (
  18. CreatedAt DATETIME NOT NULL,
  19. DeletedAt DATETIME NOT NULL
  20. );
  21. END;
  22.  
  23. DECLARE @counter INT;
  24. DECLARE @random FLOAT;
  25. DECLARE @threshold FLOAT;
  26. DECLARE @current DATETIME;
  27. DECLARE @deleted DATETIME;
  28.  
  29. SET @threshold = 0.5;
  30. SET @counter = 0;
  31.  
  32. WHILE @counter < 1000000
  33. BEGIN
  34. SET @random = RAND();
  35. SET @current = GETDATE();
  36. SET @deleted = @current + (@random * 9.5);
  37.  
  38.  
  39. IF @random > @threshold
  40. BEGIN
  41. INSERT INTO TestNullDatetime VALUES (@current, NULL);
  42. INSERT INTO TestNotNullDatetime VALUES (@current, @current);
  43. END
  44. ELSE
  45. BEGIN
  46. INSERT INTO TestNullDatetime VALUES (@current, @deleted);
  47. INSERT INTO TestNotNullDatetime VALUES (@current, @deleted);
  48. END
  49.  
  50. SET @counter = @counter + 1;
  51. END
  52.  
  53.  
  54.  
  55. DECLARE @start DATETIME;
  56. DECLARE @end DATETIME
  57.  
  58. DECLARE @null_isnullcompare INTEGER;
  59. DECLARE @null_notequalcompare INTEGER;
  60. DECLARE @notnull_notequalcompare INTEGER;
  61.  
  62. SET @start = GETDATE();
  63. SELECT * FROM TestNullDatetime WHERE DeletedAt IS NOT NULL;
  64. SET @end = GETDATE();
  65.  
  66. SET @null_isnullcompare = DATEDIFF(ms, @start, @end);
  67.  
  68. SET @start = GETDATE();
  69. SELECT * FROM TestNullDatetime WHERE DeletedAt <> CreatedAt;
  70. SET @end = GETDATE();
  71.  
  72. SET @null_notequalcompare = DATEDIFF(ms, @start, @end);
  73.  
  74. SET @start = GETDATE();
  75. SELECT * FROM TestNotNullDatetime WHERE DeletedAt <> CreatedAt;
  76. SET @end = GETDATE();
  77.  
  78. SET @notnull_notequalcompare = DATEDIFF(ms, @start, @end);
  79.  
  80. SELECT @null_isnullcompare AS [NULL - IS NULL comparison],
  81. @null_notequalcompare AS [NULL - NOT EQUAL comparison],
  82. @notnull_notequalcompare AS [NOT NULL - NOT EQUAL comparison];
  83.  
  84.  
  85. -- restore original state
  86. SET NOCOUNT OFF;

The results are as follows:

Run NULL table, IS NULL NULL table, <> NOT NULL table, <>
1 5470 5436 5373
2 5563 5516 5546
3 5936 5750 5466
4 5500 5453 5390
5 5576 5640 5440
AVG 5609 5559 5443

Overall, the <> comparison is faster than an IS NULL comparison, and it is approximately 100 milliseconds faster to perform on a table with no NULLs than on a table with roughly 50% NULL values.

Of course, it’s important to take note these are all hot cache hits. I didn’t take the time to reboot my machine and determine the difference running this against a cold cache, however I suspect that the results would still be the same.

Testing was performed on a 2.0 GHz Core 2 Duo laptop running Windows XP SP2, SQL Server 2005 Developer Edition (patch level 9.0.3054), with 2 GB of RAM and 7200 RPM drives. It’s an HP Pavilion dv9000t, for those who are interested.


Jul 03 2008

In Re: A default architecture – without stored Procedures

Tag: sqljeremiah @ 2:42 pm

I wrote this today in response to Patrik Löwendahl’s post: A default architecture – without stored Procedures . I figured that I might as well post it up here, too, so that people can comment on my rantings.

You’d be hard pressed to get me to concede that a data access scheme based on stored procedures is a bad idea. In the absence of that, buy a good ORM tool.

I’m going to take a stab and responding to each of your points because I think it’s good to get a dialog going on these kinds of things.

Pain Point the First: Version Management
I’m not sure what you mean by this. It’s fairly trivial to version control SQL files, however I’m guessing that you aren’t having any difficulty with adding plain text files to version control. So, I’m going to make a guess and say that this has something to do with quickly determining which version of development code is in the database. This can be somewhat tricky. It becomes necessary to use some kind of build/migration system similar to the one used in Ruby on Rails where you have a schema_info table with a version column that is incremented or decremented by an automated tool. At this point, a high level of DBA discipline is required to prevent changes from being made in production that are not present in source control and vice versa.

Pain Point the Second: Two Code Bases
Again, I’m not sure what is meant by this. You’re absolutely right, maintaining stored procedures does require two skill sets. In an ideal world there would be a distinct separation of skills between database and application developers. While basic to intermediate SQL can be handled by most developers, venturing into high performance/advanced SQL requires a distinct skill set that requires years of experience and conscious practice.

Pain Point the Third: Business Logic
I’ve said this a large number of times over my career as both an application developer and database developer: business logic does not belong in the database. By default T-SQL will compile and store the first execution path it encounters in a stored proc. Let’s say you have a procedure with two branches, one performs a simple insert and the other performs a complex operation that can benefit from a compiled execution plan. If the simple insert statement branch is the first branch to be executed, it will be compiled and the other, complex, path with be performed as an ad hoc query unless the stored proc is ALTERed and the second execution path is executed. If you need complex logic and you have SQL 2005, you can use CLR stored procedures.

Pain Point the Fourth: Testing Stored Procedures
While I’m definitely not an expert on the subject of testing stored procedures, I have read numerous resources on the subject of testing stored procedures. Adam Machanic devotes some time to it in Expert SQL Server Programming. I’ll leave this topic to the experts.

Pain Point the Fifth: Writing Trivial SQL
When you’re operating, or attempting to operate, a high performance, high access database, there are no trivial data access calls. Granted, for the majority of data access scenarios (SELECT a, b, c FROM xyz WHERE param = @param), a stored procedure could be described as development overhead. This is a point that we agree on. However, I see the point that many DBAs have - if they want to change the underlying schema for performance/storage considerations, they should be able to do so as long as they provide developers with the same set of outputs as before, given the same inputs of course.

Pain Point the Sixth: Stored Procedures aren’t Dynamic
Nor should they ever be. The point of a stored procedure is to access data as efficiently as possible. In the event that you have multiple potential join combinations, you will need multiple stored procedures. Not to sound insulting, but the demand for multiple potential join combinations sounds like, to me, a poorly planned data access scenario. There are ways to dynamically generate WHERE and ORDER BY clauses using T-SQL and parameterized procedures/queries.

Ultimately, of course, the purpose of a stored procedure is to increase performance, granularity of data access, and provide an additional layer of security on top of the data. A data access solution based around stored procedures isn’t for every one, but it’s always good to know the reasons why you should be using it.


Jul 03 2008

Dark Visual Studio

Tag: Visual Studiojeremiah @ 6:33 am

In my attempts to customize Visual Studio 2008, after 6 months of Rails development in e, I came across a couple of links.

John Lam on Vibrant Ink
and
Dark Visual Studio
almost forgot
Scott Hanselman’s slightly more manual instructions

Now, either one of these will attempt to set your layout to the author’s favorite settings which, frankly, are not my favorite settings. Make sure to uncheck at least the following items, if you want your Visual Studio layout to stay the same:

  • Work Item Tracking - View Settings
  • Visual Studio Team Foundation Server
  • General Settings -> Window Layouts
  • General Settings -> Toolbox
  • General Settings -> Menu and Command Bar Customizations

Any other suggestions for keeping as much of your preferred layout around are greatly appreciated.


Feb 20 2008

Dear PostgreSQL

Tag: nonsense, sqljeremiah @ 8:22 am

Dear PostgreSQL,

Your documentation is awesome. You are awesome.

Thanks,

me

But, seriously, PostgreSQL has the best documentation of any FOSS database I’ve used (I’m looking at you MySQL). Combined with its ability to handle just about anything I throw at it in a timely manner, PostgreSQL is impressing me more and more. I just need to figure out how to get plpgsql working in my normal queries instead of in functions… or else I need to give up and start writing all of my ETL scripts as functions (which seems somewhat expensive).


Jan 29 2008

Behold, the power of science!

Tag: nonsensejeremiah @ 8:26 am

Once upon a time, back in the 60s, some MIT nerds took it upon themselves to “waste” valuable research computer time to compute the ideal route to ride the entire New York subway system. This is a pretty interesting look at how to coordinate a project to achieve a monumental feat purely for the point of achieving it. Sort of like Alex Roy’s 31 hour drive across the US.

Anyways, The Rise and Fall of the Amateur New York Subway Riding Committee.

I found this think through anarchaia, so I can’t take any credit for finding it. But I can take credit for pronouncing it to be awesome


« Previous PageNext Page »