GUIDs: Not Necessarily Unique

So, you though that GUIDs were supposed to be unique, eh? You aren’t alone. A lot of people assume that GUIDs are unique. Hell, unique is in the name!

On Friday, I had a bug report filed about a UNIQUE constraint violation. Seeing as how I had bigger fish to fry, and this functionality is only rarely used, I put it aside until this morning when I took a closer look at the bug report and the code in question.

The table in question looks something like this:

CREATE TABLE dbo.Widget 
  -- other columns
  LegacyIdentifier VARCHAR(50) NOT NULL

Before I get yelled at for a VARCHAR(50) in my database let me assure that this is actually a 50 character random garbage string from our legacy COBOl application. Nobody got lazy and just made a VARCHAR(50)… this time.

LegacyIdentifier has a unique constraint on it because all of the Widgets need to be unique between the new system and the old system. In order to keep this uniqueness going I thought to myself, “I know, the U in GUID stands for Unique, I’ll use that since I’m not clustering on it.” I wrote a little piece of code that puts “New_” at the beginning of a GUID and then uses that string as the LegacyIdentifier. That was about 11 months ago. On Friday, we received an exception warning that the application had attempted to insert a duplicate value into the LegacyIdentifier column.

What now? Well, I’m not sure. I’m going to implement functionality to force the database to keep generating GUIDs and attempting inserts until it actually generates a unique GUID.

This solution doesn’t make me happy, though, because GUIDs are not unique. A GUID is, in theory, unique. There are, after all, 2^128 possible permutations, so the probability of the same GUID being generated twice are incredibly slim. However, it can happen. (SQL Server uses a V4 GUID algorithm, which is a pseudo-random number and, as such, is not truly random. Newer GUID algorithms use an SHA hash for randomness, but how random is random?)

What are we to do when we need a truly unique identifier? Frankly, I’m not sure. The thought crossed my mind of generating a random number and casting it to VARCHAR and appending it to the GUID. The statistical likelihood of generating two random values that are the same as a prior value is so astronomically low that I should probably buy lottery tickets if there is ever a collision. As it stands right now, I’ve opted to retry the insert until it actually succeeds (or for 5 times, after which the user is doomed).

So, there you have it, GUIDs are not guaranteed to be unique and you should plan accordingly.