How many of you have had to load a massive quantity of data from other tables in the same database? How many of you have manage to fill up TEMPDB in the process?
I encountered both of these scenarios yesterday.
31 million rows is not a tiny amount of data, by any stretch of the imagination. When I attempted to run my query (which worked a few weeks ago with a partial set of data), everything worked for about 17 minutes and then errored out with a message that TEMPDB was unable to allocate any additional space. This led me to two conclusions: either TEMPDB is on C: and C: is out of space, or TEMPDB is not set to autogrow. Either way, I’m not in the sysadmin role, so I couldn’t look into this any further other than making up theories and emailing them to the sysadmin.
Unfortunately, in the process of trying to load data it was necessary to TRUNCATE the existing table to avoid any logging. This meant that the report developers couldn’t test their reports. Yesterday afternoon I scrambled to find a solution without resorting to using SSIS (all existing load processes are in T-SQL as it is).
In order to work around this issue, I looked into working with a loop to batch up the INSERT process. Normally, I wouldn’t blog something like this, but it took me some time to track this down, so I decided to put this together.
The original query looked like this:
INSERT INTO [UtilityWeb].[dbo].[SummarySiteConsumptionCharge]
(
ClientId,
SiteId,
PrimaryVendorId,
ProcessDate,
[Description],
UOM,
AmountBilled,
Usage
)
SELECT si.ClientId,
si.SiteId,
b.PrimaryVendorId,
b.ProcessDate,
st.Description,
bsc.UOM AS UOM,
SUM(bsc.Amount) AS AmountBilled,
SUM(bsc.Quantity) AS Usage
FROM [UtilityMartSub1].[dbo].[Account] AS a
INNER JOIN [UtilityMartSub1].[dbo].[Site] AS si
ON a.SiteId = si.SiteId
CROSS JOIN [UtilityMartSub1].[dbo].[BillService] AS bs
CROSS JOIN [UtilityMartSub1].[dbo].[BillStatus] AS bst
INNER JOIN [UtilityMartSub1].[dbo].[BillServiceCharge] AS bsc
ON bs.BillServiceId = COALESCE (bsc.BillServiceId , bsc.BillServiceId)
INNER JOIN [UtilityMartSub1].[dbo].[Service] AS s
ON a.AccountId = s.AccountId
INNER JOIN [UtilityMartSub1].[dbo].[ServiceType] AS st
ON s.ServiceTypeId = st.ServiceTypeId
INNER JOIN [UtilityMartSub1].[dbo].[Bill] AS b
ON bst.BillStatusId = b.BillStatusId
AND bs.BillId = b.BillId
AND a.AccountId = b.AccountId
WHERE bst.Reportable = 1
GROUP BY si.ClientId,
si.SiteId,
b.PrimaryVendorId,
b.ProcessDate,
st.Description,
bsc.UOM
After tinkering around and doing some research, I came up with the following batched INSERT script:
DECLARE @i AS INT;
DECLARE @batch AS INT;
SET @i = 0;
SET @batch = 1000;
WHILE @i <= (SELECT MAX(AccountId) FROM UtilityMartSub1.dbo.Account)
BEGIN
PRINT @i;
INSERT INTO [UtilityWeb].[dbo].[SummarySiteConsumptionCharge]
(
ClientId,
SiteId,
PrimaryVendorId,
ProcessDate,
[Description],
UOM,
AmountBilled,
Usage
)
SELECT si.ClientId,
si.SiteId,
b.PrimaryVendorId,
b.ProcessDate,
st.Description,
bsc.UOM AS UOM,
SUM(bsc.Amount) AS AmountBilled,
SUM(bsc.Quantity) AS Usage
FROM [UtilityMartSub1].[dbo].[Account] AS a
INNER JOIN [UtilityMartSub1].[dbo].[Bill] AS b
ON a.AccountId = b.AccountId
INNER JOIN [UtilityMartSub1].[dbo].[BillService] AS bs
ON b.BillId = bs.BillId
INNER JOIN [UtilityMartSub1].[dbo].[BillStatus] AS bst
ON b.BillStatusId = bst.BillStatusId
INNER JOIN [UtilityMartSub1].[dbo].[BillServiceCharge] AS bsc
ON bs.BillServiceId = bsc.BillServiceId
INNER JOIN [UtilityMartSub1].[dbo].[Service] AS s
ON s.AccountId = COALESCE (a.AccountId , a.AccountId)
AND b.AccountId = s.AccountId
INNER JOIN [UtilityMartSub1].[dbo].[ServiceType] AS st
ON s.ServiceTypeId = st.ServiceTypeId
INNER JOIN [UtilityMartSub1].[dbo].[Site] AS si
ON a.SiteId = si.SiteId
WHERE bst.Reportable = 1
AND a.AccountId >= @i
AND a.AccountId < (@i + @batch)
GROUP BY si.ClientId,
si.SiteId,
b.PrimaryVendorId,
b.ProcessDate,
st.Description,
bsc.UOM;
SET @i = @i + @batch;
END
Once I switched to using this batched process to INSERT data into the table, the load process ran quickly, much faster than a single, massive, insert.
The current trick is to tweak the @batch size to determine which values will run fast enough but not blow up TEMPDB.