So, you though that GUIDs were supposed to be unique, eh? You aren’t alone. A lot of people assume that GUIDs are unique. Hell, unique is in the name!
On Friday, I had a bug report filed about a UNIQUE constraint violation. Seeing as how I had bigger fish to fry, and this functionality is only rarely used, I put it aside until this morning when I took a closer look at the bug report and the code in question.
The table in question looks something like this:
CREATE TABLE dbo.Widget ( WidgetId BIGINT IDENTITY(1,1) NOT NULL, -- other columns LegacyIdentifier VARCHAR(50) NOT NULL );
Before I get yelled at for a VARCHAR(50) in my database let me assure that this is actually a 50 character random garbage string from our legacy COBOl application. Nobody got lazy and just made a VARCHAR(50)… this time.
LegacyIdentifier has a unique constraint on it because all of the Widgets need to be unique between the new system and the old system. In order to keep this uniqueness going I thought to myself, “I know, the U in GUID stands for Unique, I’ll use that since I’m not clustering on it.” I wrote a little piece of code that puts “New_” at the beginning of a GUID and then uses that string as the LegacyIdentifier. That was about 11 months ago. On Friday, we received an exception warning that the application had attempted to insert a duplicate value into the LegacyIdentifier column.
What now? Well, I’m not sure. I’m going to implement functionality to force the database to keep generating GUIDs and attempting inserts until it actually generates a unique GUID.
This solution doesn’t make me happy, though, because GUIDs are not unique. A GUID is, in theory, unique. There are, after all, 2^128 possible permutations, so the probability of the same GUID being generated twice are incredibly slim. However, it can happen. (SQL Server uses a V4 GUID algorithm, which is a pseudo-random number and, as such, is not truly random. Newer GUID algorithms use an SHA hash for randomness, but how random is random?)
What are we to do when we need a truly unique identifier? Frankly, I’m not sure. The thought crossed my mind of generating a random number and casting it to VARCHAR and appending it to the GUID. The statistical likelihood of generating two random values that are the same as a prior value is so astronomically low that I should probably buy lottery tickets if there is ever a collision. As it stands right now, I’ve opted to retry the insert until it actually succeeds (or for 5 times, after which the user is doomed).
So, there you have it, GUIDs are not guaranteed to be unique and you should plan accordingly.


Yeah, when the name was invented they tried PUID (prol’ly unique identifier) and NAUID (nearly always unique identifier) but they just never caught on…
I can’t help myself – if the 5th attempt fails, are you going to return an error that reads “You are doomed, because you are not unique”?
In honor of the 10th anniversary of Fight Club, it will say “You are doomed because you are not a unique and special snowflake.”
I thought the GUID generation algorithm used the MAC address and system time as part of the algorithm. Sounds like a bug in the GUID generation code.
V1 GUIDs did this, but there was a big stink about it because you can trace data back to the original machine that generated the GUID. For windows computers, this was changed back with Windows 2000, I do believe.