Loading Data… Oops, I Broke the Database

How many of you have had to load a massive quantity of data from other tables in the same database? How many of you have manage to fill up TEMPDB in the process?

I encountered both of these scenarios yesterday.

31 million rows is not a tiny amount of data, by any stretch of the imagination. When I attempted to run my query (which worked a few weeks ago with a partial set of data), everything worked for about 17 minutes and then errored out with a message that TEMPDB was unable to allocate any additional space. This led me to two conclusions: either TEMPDB is on C: and C: is out of space, or TEMPDB is not set to autogrow. Either way, I’m not in the sysadmin role, so I couldn’t look into this any further other than making up theories and emailing them to the sysadmin.

Unfortunately, in the process of trying to load data it was necessary to TRUNCATE the existing table to avoid any logging. This meant that the report developers couldn’t test their reports. Yesterday afternoon I scrambled to find a solution without resorting to using SSIS (all existing load processes are in T-SQL as it is).

In order to work around this issue, I looked into working with a loop to batch up the INSERT process. Normally, I wouldn’t blog something like this, but it took me some time to track this down, so I decided to put this together.

The original query looked like this:


INSERT INTO [UtilityWeb].[dbo].[SummarySiteConsumptionCharge]
(
  ClientId,
  SiteId,
  PrimaryVendorId,
  ProcessDate,
  [Description],
  UOM,
  AmountBilled,
  Usage
)
SELECT si.ClientId, 
       si.SiteId, 
       b.PrimaryVendorId, 
       b.ProcessDate, 
       st.Description, 
       bsc.UOM AS UOM, 
       SUM(bsc.Amount) AS AmountBilled, 
       SUM(bsc.Quantity) AS Usage 
  FROM [UtilityMartSub1].[dbo].[Account] AS a 
       INNER JOIN [UtilityMartSub1].[dbo].[Site] AS si 
          ON a.SiteId = si.SiteId 
       CROSS JOIN [UtilityMartSub1].[dbo].[BillService] AS bs 
       CROSS JOIN [UtilityMartSub1].[dbo].[BillStatus] AS bst 
       INNER JOIN [UtilityMartSub1].[dbo].[BillServiceCharge] AS bsc 
          ON bs.BillServiceId = COALESCE (bsc.BillServiceId , bsc.BillServiceId) 
       INNER JOIN [UtilityMartSub1].[dbo].[Service] AS s 
          ON a.AccountId = s.AccountId 
       INNER JOIN [UtilityMartSub1].[dbo].[ServiceType] AS st 
          ON s.ServiceTypeId = st.ServiceTypeId 
       INNER JOIN [UtilityMartSub1].[dbo].[Bill] AS b 
          ON bst.BillStatusId = b.BillStatusId 
             AND bs.BillId = b.BillId 
             AND a.AccountId = b.AccountId 
 WHERE bst.Reportable = 1 
 GROUP BY si.ClientId, 
          si.SiteId, 
          b.PrimaryVendorId, 
          b.ProcessDate, 
          st.Description, 
          bsc.UOM

After tinkering around and doing some research, I came up with the following batched INSERT script:

DECLARE @i AS INT;
DECLARE @batch AS INT;

SET @i = 0;
SET @batch = 1000;

WHILE @i <= (SELECT MAX(AccountId) FROM UtilityMartSub1.dbo.Account)
BEGIN

  PRINT @i;

  INSERT INTO [UtilityWeb].[dbo].[SummarySiteConsumptionCharge]
  (
    ClientId,
    SiteId,
    PrimaryVendorId,
    ProcessDate,
    [Description],
    UOM,
    AmountBilled,
    Usage
  )
  SELECT si.ClientId, 
        si.SiteId, 
        b.PrimaryVendorId, 
        b.ProcessDate, 
        st.Description, 
        bsc.UOM AS UOM, 
        SUM(bsc.Amount) AS AmountBilled, 
        SUM(bsc.Quantity) AS Usage 
    FROM [UtilityMartSub1].[dbo].[Account] AS a 
        INNER JOIN [UtilityMartSub1].[dbo].[Bill] AS b 
            ON a.AccountId = b.AccountId 
        INNER JOIN [UtilityMartSub1].[dbo].[BillService] AS bs 
            ON b.BillId = bs.BillId 
        INNER JOIN [UtilityMartSub1].[dbo].[BillStatus] AS bst 
            ON b.BillStatusId = bst.BillStatusId 
        INNER JOIN [UtilityMartSub1].[dbo].[BillServiceCharge] AS bsc 
            ON bs.BillServiceId = bsc.BillServiceId 
        INNER JOIN [UtilityMartSub1].[dbo].[Service] AS s 
            ON s.AccountId = COALESCE (a.AccountId , a.AccountId) 
           AND b.AccountId = s.AccountId 
        INNER JOIN [UtilityMartSub1].[dbo].[ServiceType] AS st 
            ON s.ServiceTypeId = st.ServiceTypeId 
        INNER JOIN [UtilityMartSub1].[dbo].[Site] AS si 
            ON a.SiteId = si.SiteId 
  WHERE bst.Reportable = 1 
    AND a.AccountId >= @i 
    AND a.AccountId < (@i + @batch)
  GROUP BY si.ClientId, 
           si.SiteId, 
           b.PrimaryVendorId, 
           b.ProcessDate, 
           st.Description, 
           bsc.UOM;
            
  SET @i = @i + @batch;
END

Once I switched to using this batched process to INSERT data into the table, the load process ran quickly, much faster than a single, massive, insert.

The current trick is to tweak the @batch size to determine which values will run fast enough but not blow up TEMPDB.

Menu