December 2008
Mon Tue Wed Thu Fri Sat Sun
« Nov   Jan »
1234567
891011121314
15161718192021
22232425262728
293031  

Day December 4, 2008

Loading Data… Oops, I Broke the Database

How many of you have had to load a massive quantity of data from other tables in the same database? How many of you have manage to fill up TEMPDB in the process?

I encountered both of these scenarios yesterday.

31 million rows is not a tiny amount of data, by any stretch of the imagination. When I attempted to run my query (which worked a few weeks ago with a partial set of data), everything worked for about 17 minutes and then errored out with a message that TEMPDB was unable to allocate any additional space. This led me to two conclusions: either TEMPDB is on C: and C: is out of space, or TEMPDB is not set to autogrow. Either way, I’m not in the sysadmin role, so I couldn’t look into this any further other than making up theories and emailing them to the sysadmin.

Unfortunately, in the process of trying to load data it was necessary to TRUNCATE the existing table to avoid any logging. This meant that the report developers couldn’t test their reports. Yesterday afternoon I scrambled to find a solution without resorting to using SSIS (all existing load processes are in T-SQL as it is).

In order to work around this issue, I looked into working with a loop to batch up the INSERT process. Normally, I wouldn’t blog something like this, but it took me some time to track this down, so I decided to put this together.

The original query looked like this:


INSERT INTO [UtilityWeb].[dbo].[SummarySiteConsumptionCharge]
(
  ClientId,
  SiteId,
  PrimaryVendorId,
  ProcessDate,
  [Description],
  UOM,
  AmountBilled,
  Usage
)
SELECT si.ClientId,
       si.SiteId,
       b.PrimaryVendorId,
       b.ProcessDate,
       st.Description,
       bsc.UOM AS UOM,
       SUM(bsc.Amount) AS AmountBilled,
       SUM(bsc.Quantity) AS Usage
  FROM [UtilityMartSub1].[dbo].[Account] AS a
       INNER JOIN [UtilityMartSub1].[dbo].[Site] AS si
          ON a.SiteId = si.SiteId
       CROSS JOIN [UtilityMartSub1].[dbo].[BillService] AS bs
       CROSS JOIN [UtilityMartSub1].[dbo].[BillStatus] AS bst
       INNER JOIN [UtilityMartSub1].[dbo].[BillServiceCharge] AS bsc
          ON bs.BillServiceId = COALESCE (bsc.BillServiceId , bsc.BillServiceId)
       INNER JOIN [UtilityMartSub1].[dbo].[Service] AS s
          ON a.AccountId = s.AccountId
       INNER JOIN [UtilityMartSub1].[dbo].[ServiceType] AS st
          ON s.ServiceTypeId = st.ServiceTypeId
       INNER JOIN [UtilityMartSub1].[dbo].[Bill] AS b
          ON bst.BillStatusId = b.BillStatusId
             AND bs.BillId = b.BillId
             AND a.AccountId = b.AccountId
 WHERE bst.Reportable = 1
 GROUP BY si.ClientId,
          si.SiteId,
          b.PrimaryVendorId,
          b.ProcessDate,
          st.Description,
          bsc.UOM

After tinkering around and doing some research, I came up with the following batched INSERT script:

DECLARE @i AS INT;
DECLARE @batch AS INT;

SET @i = 0;
SET @batch = 1000;

WHILE @i <= (SELECT MAX(AccountId) FROM UtilityMartSub1.dbo.Account)
BEGIN

  PRINT @i;

  INSERT INTO [UtilityWeb].[dbo].[SummarySiteConsumptionCharge]
  (
    ClientId,
    SiteId,
    PrimaryVendorId,
    ProcessDate,
    [Description],
    UOM,
    AmountBilled,
    Usage
  )
  SELECT si.ClientId,
        si.SiteId,
        b.PrimaryVendorId,
        b.ProcessDate,
        st.Description,
        bsc.UOM AS UOM,
        SUM(bsc.Amount) AS AmountBilled,
        SUM(bsc.Quantity) AS Usage
    FROM [UtilityMartSub1].[dbo].[Account] AS a
        INNER JOIN [UtilityMartSub1].[dbo].[Bill] AS b
            ON a.AccountId = b.AccountId
        INNER JOIN [UtilityMartSub1].[dbo].[BillService] AS bs
            ON b.BillId = bs.BillId
        INNER JOIN [UtilityMartSub1].[dbo].[BillStatus] AS bst
            ON b.BillStatusId = bst.BillStatusId
        INNER JOIN [UtilityMartSub1].[dbo].[BillServiceCharge] AS bsc
            ON bs.BillServiceId = bsc.BillServiceId
        INNER JOIN [UtilityMartSub1].[dbo].[Service] AS s
            ON s.AccountId = COALESCE (a.AccountId , a.AccountId)
           AND b.AccountId = s.AccountId
        INNER JOIN [UtilityMartSub1].[dbo].[ServiceType] AS st
            ON s.ServiceTypeId = st.ServiceTypeId
        INNER JOIN [UtilityMartSub1].[dbo].[Site] AS si
            ON a.SiteId = si.SiteId
  WHERE bst.Reportable = 1
    AND a.AccountId >= @i
    AND a.AccountId < (@i + @batch)
  GROUP BY si.ClientId,
           si.SiteId,
           b.PrimaryVendorId,
           b.ProcessDate,
           st.Description,
           bsc.UOM;

  SET @i = @i + @batch;
END

Once I switched to using this batched process to INSERT data into the table, the load process ran quickly, much faster than a single, massive, insert.

The current trick is to tweak the @batch size to determine which values will run fast enough but not blow up TEMPDB.

This site is protected with Urban Giraffe's plugin 'HTML Purified' and Edward Z. Yang's Powered by HTML Purifier. 226 items have been purified.