SQL sucks at string manipulation. It’s not just SQL Server, it’s the SQL language in general. I’ve mentioned this before on Stack Overflow in answer to the question “What are five things you hate about your favorite language?”It’s almost easier to return the raw rows to application code than it is to work with them inside of a database. There’s one place where SQL Server has made it easier to work with string data – string building.
I don’t mean string building in the
name = 'Jeremiah' + ' ' + 'Peschka' sense. I mean bulk string building.
So, what started this line of inquiry? Why am I filling your RSS reader with gibberish about string manipulation instead of my usual Memorial Day Ascot Review?
A friend hit me up on twitter for some advice and help with a particular database problem. To paraphrase: Human readable messages are being stored in the database in 2000 character chunks. This is part of some kind of legacy database design and, as such, cannot be changed. Stupid legacy apps. How could this friend get his data out of the database and combine it into a single row that made sense to the users?
STUFF is the answer – specifically the
STUFF function. In fact,
STUFF used to have the best Books Online page title:
The purpose of
STUFF is to cram one string into another string and remove some characters in the process. How does this happen? I have an example. An annotated example.
Basically, this works because we’re using a trick with
FORXML to let us produce a string of text that we then continue to push into itself. The
STUFF function is used to remove the leading space from the final version of the string.
IF OBJECT_ID(N'tempdb..#t1') IS NOT NULL DROP TABLE #t1 ; CREATE TABLE #t1 ( id INT IDENTITY(1, 1) , message_text NVARCHAR(35) , message_number INT ) ; INSERT INTO #t1 ( message_text ) SELECT [name] FROM master..spt_values WHERE type = 'A' ; -- set up a fake table with two "messages": 1 and 0 UPDATE #t1 SET message_number = 1 WHERE id % 2 = 1 ; UPDATE #t1 SET message_number = 2 WHERE id % 2 = 0 ; -- this grabs the message, as distinguished by the message_number -- and the text of the message -- the STUFF function mucks together a whole bunch of data and, -- in this case, starts at the 1st character and removes 1 -- character, replacing it with '' -- -- the FOR XML PATH('') is a dirty hack to do rapid string -- building in T-SQL SELECT x.message_number , STUFF((SELECT ' ' + message_text AS [text()] FROM #t1 AS t WHERE t.message_number = x.message_number FOR XML PATH('') ), 1, 1, '') AS the_message FROM #t1 AS x GROUP BY x.message_number ;
The best part: this solution scales incredibly well.