Tag SQL Server

Default Values, Triggers, and You

A friend of mine sent me an email the other day asking about default values in SQL Server. I realized that I’ve had to think about this a few times over the years and I’ve been asked about it more than once, too.

Setup

We need a table first, right? We’ll also want a few sample rows in there.

CREATE TABLE Employees (
  emp_id INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
  emp_name varchar(50) NULL);
GO

INSERT INTO Employees (emp_name) VALUES ('a');
INSERT INTO Employees (emp_name) VALUES ('b');
GO

SELECT * FROM Employees;

/*
emp_id      emp_name
----------- --------------------------------------------------
1           a
2           b
*/

How Defaults Work

So far we just have two rows in our two column table. It’s pretty boring. Let’s add a default value:

ALTER TABLE Employees
ADD last_modified DATETIME NULL CONSTRAINT Employees_last_modified
DEFAULT CURRENT_TIMESTAMP;
GO

We might as well add some new rows while we’re having fun with our employees, right?

INSERT INTO Employees (emp_name) VALUES ('c');
INSERT INTO Employees (emp_name) VALUES ('d');

What’s it look like now?

SELECT * FROM Employees;

/*
emp_id      emp_name                                           last_modified
----------- -------------------------------------------------- -----------------------
1           a                                                  NULL
2           b                                                  NULL
3           c                                                  2010-12-06 18:21:37.787
4           d                                                  2010-12-06 18:21:37.787
*/

Hold up. Employees 1 and 2 don’t have a last_modified value. Why not? Well, that’s because we’ve told SQL Server that our last_modified column can allow NULLs. They’re allowable in our table. If we wanted to automatically provide a default value when we added the constraint, we could do so by specifying the datatype as DATETIME NOT NULL. A best practice would be to add the column as a NULLable data type, add a value for all NULL rows, and set the column to NOT NULL.

If we do want to update a NULLable column and set it to the default value, we just issue an update using the DEFAULT keyword for the value. If that makes no sense, perhaps this example will help:

UPDATE Employees
SET last_modified = DEFAULT
WHERE last_modified IS NULL ;

What About Updates?

UPDATE Employees
SET emp_name = 'zzz'
WHERE emp_id = 3;

SELECT *
FROM Employees
WHERE emp_id = 3;

/*
emp_id      emp_name                                           last_modified
----------- -------------------------------------------------- -----------------------
3           zzz                                                2010-12-06 18:21:37.787
*/

As you can see, when we update employee 3, it doesn’t change the value of last_modified. That’s because the default value is only set on insert. We could specify DEFAULT in our UPDATE statement, but then we’d need to specify that every time we update the table. What can we do?

The Answer is Triggers

That’s right: triggers. If we want to track the modification timestamp of an object in the database, we need to use a trigger to keep things updated:

CREATE TRIGGER TR_Employees$AfterUpdate ON dbo.Employees
AFTER UPDATE
AS
BEGIN
  UPDATE  e
  SET     e.last_modified = CURRENT_TIMESTAMP
  FROM    dbo.Employees e
          JOIN inserted i ON e.emp_id = i.emp_id;
END
GO

UPDATE Employees
SET emp_name = 'asdf'
WHERE emp_id = 1;

SELECT * FROM Employees;

/*
emp_id      emp_name                                           last_modified
----------- -------------------------------------------------- -----------------------
1           asdf                                               2010-12-06 18:34:04.340
2           b                                                  NULL
3           zzz                                                2010-12-06 18:21:37.787
4           d                                                  2010-12-06 18:21:37.787
*/

And that, my friends, is how we keep a modification timestamp up to date.

Comparing MongoDB and SQL Server Replication

MongoDB has replication built in. So does SQL Server, Oracle, DB2, PostgreSQL, and MySQL. What’s the difference? What makes each MongoDB a unique and special snowflake?

I recently read a three part series on MongoDB repication (Replication Internals, Getting to Know Your Oplog, Bending the Oplog to Your Will) in an effort to better understand MongoDB’s replication compared to SQL Server’s replication.

Logging Sidebar

Before we get started, it’s important to distinguish between the oplog and MongoDB’s regular log. By default, MongoDB pipes its log to STDOUT… unless you supply the --logpath command line flag. Logging to STDOUT is fine for development, but you’ll want to make sure you log to a file for production use. The MongoDB log file is not like SQL Server’s log. It isn’t used for recovery playback. It’s an activity log. Sort of like the logs for your web server.

What’s The Same?

Both MongoDB and SQL Server store replicated data in a central repository. SQL Server stores transactions to be replicated in the distribution database. MongoDB stores replicated writes in the oplog collection. The most immediate difference between the two mechanisms is that SQL Server uses the transaction as the demarcation point while MongoDB uses the individual command as the demarcation point.

All of our transactions (MongoDB has transactions… they’re just only applied to a single command) are logged. That log is used to ship commands over to a subscriber. Both SQL Server and MongoDB support having multiple subscribers to a single database. In MongoDB, this is referred to as a replica set – every member of the set will receive all of commands from the master. MongoDB adds some additional features: any member of a replica set may be promoted to the master server if the original master server dies. This can be configured to happen automatically.

The Ouroboros

The Ouroboros is a mythical creature than devours its own tail. Like the Ouroboros, the MongoDB oplog devours its own tail. In ideal circumstances, this isn’t a problem. The oplog will happily write away. The replica servers will happily read away and, in general, keep up with the writing to the oplog.

The oplog file is a fixed size so, like the write ahead log in most RDBMSes, it will begin to eat itself again. This is fine… most of the time.

Unfortunately, if the replicas fall far enough behind, the oplog will overwrite the transactions that the replicas are reading. Yes, you read that correctly – your database will overwrite undistributed transactions. DBAs will most likely recoil in horror. Why is this bad? Well, under extreme circumstances you may have no integrity.

Let’s repeat that, just in case you missed it the first time:

There is no guarantee of replica integrity.

Now, before you put on your angry pants and look at SQL Server Books Online to prove me wrong, this is also entirely possible with transactional replication in SQL Server. It’s a little bit different, but the principle still applies. When you set up transactional replication in SQL Server, you also need to set up a retention period. If your replication is down for longer than X hours, SQL Server is going to tell you to cram it up your backside and rebuild your replication from scratch.

Falling Behind

Falling behind is easy to do when a server is under heavy load. But, since MongoDB avoids writing to disk to increase performance, that’s not a problem, right?

Theoretically yes. In reality that’s not always the case.

When servers are under a heavy load, a lot of weird things can happen. Heavy network traffic can result in TCP/IP offloading – the network card can offload work to the CPU. When you’re using commodity hardware with commodity storage, you might be using software RAID instead of hardware RAID to simulate one giant drive for data. Software RAID can be computationally expensive, especially if you encounter a situation where you start swapping to disk. Before you know it, you have a perfect storm of one off factors that have brought your shiny new server to its knees.

In the process, your oplog is happily writing away. The replica is falling further behind because you’re reading from your replica and writing to the master (that’s what we’re supposed to do, after all). Soon enough, your replicas are out of sync and you’ve lost data.

Falling Off a Cliff

Unfortunately, in this scenario, you might have problems recovering because the full resync also uses a circular oplog to determine where to start up replication again. The only way you could resolve this nightmare storm would be to shut down your forward facing application, kill incoming requests, and bring the database back online slowly and carefully.

Stopping I/O from incoming writes will make it easy for the replicas to catch up to the master and perform any shard reallocation that you need to split the load up more effectively.

Climbing Gear, Please

I’ve bitched a lot in this article about MongoDB’s replication. As a former DBA, it’s a scary model. But I’ve bitched a lot in the past about SQL Server’s transactional replication – logs can grow out of control if a subscriber falls behind or dies – but it happens with good reason. The SQL Sever dev team made the assumption that a replica should be consistent with the master. In order to keep a replica consistent, all of the undistributed commands need to be kept somewhere (in a log file) until all of the subscribers/replicas can be brought up to speed. This does result in a massive hit to your disk usage, but it also keeps your replicated databases in sync with the master.

Just like with MongoDB, there are times when a SQL Server subscriber may fall so far behind that you need to rebuild the replication. This is never an easy choice, no matter which platform you’re using, and it’s a decision that should not be taken lightly. MongoDB makes this choice a bit easier because MongoDB might very well eat its own oplog. Once that happens, you have no choice but to rebuild replication.

Replication is hard to administer and hard to get right. Be careful and proceed with caution, no matter what your platform.

At Least There is a Ladder

You can climb out of this hole and, realistically, it’s not that bad of a hole. In specific circumstances you may end up in a situation where you will have to take the front end application offline in order to resync your replicas. It’s not the best option, but at least there is a solution.

Every feature has a trade off. Relational databases trade integrity for performance (in this case) whereas MongoDB trades immediate performance for potential maintenance and recovery problems.

Further Reading

MongoDB

SQL Server

A Bit of Troubleshooting

A client recently asked me for help with their SQL Server environment. It seems that replication was running slowly and was getting further and further behind – replication had been turned off during heavy data modification and was turned on after several days.

Protip: This is why it’s important to have a full checklist for everything that you do on a server.

Check Everyone’s Health

When you have a complicated system you want to take a look at everything, not just the symptoms of the problem. This happens in medicine, economics, and manufacturing. Why shouldn’t we do it in the datacenter?

The very first thing I did was take a look at the health of the publication server. That server was running well within normal parameters – there were no readily apparent disk I/O, memory, or CPU problems. Since the distributor lives on the publication, that was covered as well.

On a lark, I checked all of the other subscribers. They were also functioning normally. I did this to make sure that were weren’t seeing glaring performance problems on one subscriber that were really a symptom of a problem with the replication set up.

Everything was healthy… except one subscription.

The Problem Child

Having ruled out an unknown problem on the other servers, I took a look at the rest of the issues on the problem server. I found a few underlying issues and was quickly able to figure out that the poorly performing replication was only a symptom of the problem.

Oh I/O

When I started digging deeper and looked at the wait stats and I/O activity, I was in for a huge shock- there were queries that had been running for close to a day!

Digging deeper, there were two queries that were causing major performance problems. The first was a daily bulk data load. It read from the replicated tables, so if there was going to be heavy contention on those tables, this might be part of the problem. Luckily, the bulk load had been re-written long ago to use small batches so that the transaction log didn’t grow out of control. Rampant transaction log growth had been a huge problem when the server had tiny log drives – the longer running jobs were re-written using a WHILE loop to read blocks of data and produce smaller, explicit transactions. This design also makes it possible to stop and restart the job whenever you want.

I immediately killed the bulk load job and looked into the second query. This was the nightly index maintenance script. It had been happily chugging away for over 24 hours and was chewing through more disk that I thought was possible (probably because I was never awake at two in the morning to watch the job run). Figuring that bad indexes were a better option than thrashing disks, I killed the index defragment query and moved on to the next problem.

My Memory’s Not What It Used To Be

Turns out that the server was running low on memory. This server has two purposes – it’s both an ad hoc reporting server and runs regular reports. As a result, SQL Server Reporting Services was installed and the SQL Server had been configured with a max memory setting of 4GB out of the 8GB available. I dug deeper into the memory and I discovered that over half of SQL Server’s memory structure was being used to manage locking. The rest was going to plan cache and a few other internal structures, but at no point was memory being used as a cache for data. The server’s page life expectancy was effectively 0 – every read was going to disk.

My immediate recommendation was to double the RAM in the server and increase SQL Server’s max memory setting from 4GB to 12GB. As a longer term recommendation, I cautioned my client that they should invest in a new server since this reporting server was 4 years old and well past its expected lifespan.

Back to the I/O Again

As I was wrapping up, the other production servers started having I/O problems. This was right around the same time that business normally picks up for this client. On a lark I said, “Wouldn’t it be great if this was a hardware problem?”

Five minutes later we had great news: it was a hardware problem. One of the power supplies in the SAN had died. Although the SAN had four power supplies, losing a single one caused the SAN to power down the battery backed cache and perform all reads and writes straight from disk. This more than explained the strange I/O we had been seeing on the reporting server. A new power supply was immediately ordered from EMC and the problem was eventually solved.

Wrapping Up

Have a set of canned scripts ready to help you figure out what kind of performance problems you might have on your systems. I started with Glenn Berry’s diagnostic scripts and customized them over time to give me the information that I want to see. If I weren’t so lazy, I would probably make this into something that I could throw into Management Studio’s canned reports with pretty colors to tell me when there was a problem. I’ve also gotten used to scanning over the output and looking for potential problems. Learn which problems are really just symptoms of a bigger issue. It doesn’t do you any good to troubleshoot slow queries only to find out that the SAN is experiencing horrible performance issues.

Testing for Performance

You know that you should be testing your code. You even know that you should be testing your SQL. But why? We need to make sure that changes to our code are safe, prevent regressions, and that we catch edge cases.

But are you testing your code for performance?

Are you testing for performance? You can bet these people are.

Are you testing for performance? You can bet these people are.

Changes to code can make your code faster or slower, depending on indexing as well as user defined functions and built-in functions. Different computations can result in different in different execution plans. If changes to your code can cause drastic changes to your application performance, why aren’t you monitoring the performance of your code?

Test frameworks, like T-SQL Unit, make it possible to wrap the execution of your stored procedures in other processes. By taking advantage of these hooks it’s possible to time the execution of each procedure and record the results in a table (possibly even correlating each run to the appropriate version from source control). You can see how query performance changes over time.

Testing your code is important – you can prevent changes from causing both logical and performance problems.

Upcoming Presentations!

So, it’s only the one presentation, but it is still a presentation.

On October 26th, I’ll be presenting for the Application Development Virtual Chapter. If you weren’t able to attend the Columbus Code Camp, October 26th will be your lucky day! I’ll be revisiting my presentation, Refactoring SQL. Live Meeting will be the only way you can get a hold of this gem, so make sure you’re ready to rock and roll. The party starts at 12PM Eastern, so make sure you’re there on time.

Also, if you’re going to be in the San Francisco Bay area for the first week in November, look me up. I’ll be speaking at three cloud camps out there. Good times.

T-SQL Tuesday – Indexes

arbitrary logo because a world without rules is a world with fun

When Michael J Swart asked me to take part in T-SQL Tuesday #10 – Indexes, I was incredibly flattered. Nobody’s ever asked me to do anything since a cop asked me to stop doing that one thing (speeding). I had to say yes. Here’s my contribution to T-SQL Tuesday #10.

An index is a horological lever-like something or other.

WTF is an index?

My 1967 Children’s World Book Encyclopedia doesn’t provide a definition for an index. That’s probably because it’s an encyclopedia and not a dictionary Thanks to the local gypsy garage sale going on in my neighborhood, I was able to acquire a dictionary from 1934 which defines an index as “a leverlike regulator for a hairspring.” Apparently this has something to do with clocks. If you try to verify this, you’ll notice that it’s a horological term. Apparently that has something to do with clocks and not adult entertainment.

Obviously neither my Children’s World Book Encyclopedia nor my Book of Learnin’ and Such was going to help me define an index.

In the spirit of true science, I realized that I would have to make something up. Or do my homework. One of the two.

SQL Server and a Drought of Options

SQL Server doesn’t have a wealth of indexing options available. In fact, there are fundamentally only two options – indexed and STFU. Sure, there are multiple indexing techniques, but this is my story.

At the core of SQL Server’s indexing strategy is the self-balancing binary search tree or, for those who don’t like to sound like pompous assfaces, the b-tree. B-trees are full of special magic. Well, not really. A b-tree is based on an algorithmic model that strives to keep the height of the search tree (the number of nodes between the root and the leaf nodes) as small as possible. This means that finding any single piece of information should be equally as cheap/expensive as finding any other piece of information.

That sounds great, right? Well, one of the fundamental indexing problems is that the data in the index has to be kept up to date as the data in the table changes. You can’t have data flying around willy nilly.

This is where we get to start sub-typing our b-trees.

Oh, So There ARE Options

Yes, I lied to you. Sort of. SQL Server has multiple types of indexes, just not multiple… types of index. There’s no way to control the indexing mechanism that SQL Server uses, you’re stuck with the pompous assfacery of self-balancing binary search trees (it’s not really a bad thing).

So, getting back to the topic at hand, in SQL Server we do have options. Two of them: clustered and non-clustered indexes.

Cluster This!

A clustered index is an ordered collection of nuts stored for winter and categorized appropriately… or data. One of the two. When we create a clustered index, all of the nuts/data is sorted and written to disk in the appropriate logical sort order. From this point forward, no physical order is guaranteed. Once the clustered index is created and the data is placed in physical order all bets are off. The data will be stored in logical order, but there is a chance that the data may not be physically in order due to the vagaries of disk access, deleting rows and/or tables and other assorted whatnot. (I think the horologists might have something to do with it.)

What goes in a clustered index? Everything. I told you the data was copied in. Didn’t I? Well, it is. Our clustered index is all of the data in the table and it’s in order based on the clustering key value.

The choice of the b-tree algorithm actually has some implications for how we choose the index columns for the clustered index. (I’m going to call these columns the clustering key. That’s what they are. Deal with it.) Because we’re using a B+ tree, instead of another data structure, to store and sort indexed data we have to take a few things into consideration. B+ trees do really well when every indexed value is unique. This is by design – it’s easier to look up any given row if you know that there is only one row that corresponds to that key. If there’s more than one, things get tricky and I’m not going to talk about that.

B+ trees have other characteristics that make them optimal for use in databases – it’s really easy to find any single element in the tree (even compared to other binary tree mechanisms) and they’re wonderfully optimized for read access when data is being read in sequence across ranges of data.

Searching Is Good

Sometimes you won’t know the key to reference a particular piece of data. Sometimes you’ll want to pull back records for everyone who lives in Tuscaloosa, AL. If you haven’t ordered your data by city and state, this is going to be really difficult to find without scanning the entire table. How do we get that data back?

More indexes!

In addition to clustered indexes, which define how the data should be ordered on disk, we also have non-clustered indexes. We can use these guys to help speed up our search. These are really just supplementary b-trees that point back to the clustered b-tree. You do have a clustered index, right?

That’s it?

No. Of course it can get more complicated than all of this, you can have composite indexes with multiple columns, non-unique indexes, indexes with computed columns, and even full-text indexes. Unfortunately the hookers and their lever actuated hairsprings are not going to let me keep talking.

In case you’re wondering what would happen to your database if you didn’t have any indexes, take a look at this:

i can't find any indexes!

Turtles All The Way Down

This has nothing to do with turtles. Just in case you didn’t figure it out, I wanted to make that clear. However, I am really excited to announce that I’m joining Quest Software as a Database Expert. My official job title is still up in the air, but let’s talk a little bit about what I’m going to be doing.

I’m going to stay involved in the community. That’s a big one right there. I’m still going to be on the Board of Directors for PASS. I’m still going to run my local user group (until we have elections and I’m voted off the island). I’m still going to speak at SQL Saturdays and the like. I’m still going to blog about all the crazy messed up things that I do with data. (Have I mentioned that I really like data?)

I’m going to keep solving problems. One of the things that I enjoyed the most about the last two years at Cass Information Systems is that I wasn’t just a DBA. I worked with a great team of developers to solve a variety of problems. Some days I sat in on meetings with them to just give advice based on prior experience. Some days I would be tuning T-SQL. Some days I would be working to help design the optimal solution to a problem using a combination of C# and T-SQL. Solving problems is something that I love. I never want to stop doing it. In fact, I made things change because of the problems that I would be able to solve; these are problems that have been running through my head for a long time.

I love code. I really mean it – I love writing code. There’s nothing like opening up an editor and whipping up some code to solve a problem. I wanted to download a bunch of PDFs, I figured out how. This new position means that I’ll have the opportunity to combine the things that I love – community, problem solving, code, and data – to do some really cool things.

I should stop bolding the first sentence of every paragraph, shouldn’t I?

Let’s make this a bit freaky for you: I’m not going to be working exclusively with SQL Server. In fact, I’m not even going to be working exclusively with RDBMSes.

Did you guess it yet? That’s right: I’m going to be working with cloud databases and NoSQL.

Since you’re reading this right now, you’ve probably noticed that I’ve been writing a lot about MongoDB, other NoSQL databases, as well as PostgreSQL lately. I’m exploring the world around me and writing about what I’m finding out. This is an amazing time to be looking at different ways to store data. I’m incredibly excited by all of this new technology. People notice that.

You too can have fabulous prizes. A lot of people have said it before. Hell, I’m going to be saying it this Saturday in Nashville: show your passion. People notice it. Even if that passion leads you away from the fold, follow it. When you write and speak with passion, people will take notice. When you share what you’ve learned with passion, people will take notice.

I didn’t respond to an ad on craigslist. I’d like to think that this opportunity came about because of the time I spend with you, the community, the time I spend blogging, and the time I spend presenting. These are all things that I love doing and now I’m going to be paid to do them. I’m not sure it gets any better than this….

Finding SQL Agent Jobs Owned By The Wrong Person

Ever have someone leave the company only to find out that they own critical database processes… because that process failed? It hasn’t happened to me, yet, but a recent Active Directory outage this weekend got me thinking about it.

Revisiting Lady MacBeth and Her Torturous Lies

A while back, I wrote up a genius piece of code that would automatically shrink my log files whenever they grew.

Kendra Little (blog | twitter) completely called me out for my horrible, sneaky, developer ways. Ostensibly, I had found a solution for my rampant log growth problem. Unfortunately, I had cured the symptom and not the underlying issue. After growing tired of her savage abuse and criticism via gtalk, I looked for the source of the problem. No, not me. The other source of the problem.

I set up monitoring on the server in question, waited for the appropriate log death window, and then read my report. Before you think I’m using fancy tools that nobody can afford, I set up profiler and perfmon and then merged the results together.

The reports from the single server showed me… nothing, really. There was a lot of I/O and a backup job overlapped with a re-index by about 2 minutes. The logs also didn’t fill up. To be on the safe side, I adjusted the jobs and then sat around making frowny faces for a few minutes. Then I remembered that all of the servers are connected to the same SAN, so set up monitoring on the remaining production servers. An I/O issue on one server could start causing problems on all of the other servers.

This time around, the logs filled up, I received a ton of emails, and I also found out something important: all of my backups and re-indexing operations were running at the same time. My SAN was saturated on I/O throughput which was causing a the backup and re-indexing jobs to run slowly.

To solve the problem I looked at the average job run times and arranged the jobs so that they had much more downtime between them (to account for other issues that could slow down the jobs). This took a bit more effort than I thought just because of SLAs within the company. I also re-wrote the jobs so that the backups and re-indexes could never run at the same time and would, instead, occur in series. Once I had this change in place I waited and watched.

Sure enough, the incredible ever growing log file problem stopped happening (unless I do something dumb like move 30,000,000 rows of data). Moral of the story: make sure that you’re addressing the cause of the problem and not the symptoms.

Using Indexed Views to Replace Triggers

Let’s get this out of the way: I think triggers are cool. I also think triggers have their place, they just need to be used with care. One reason for this is that triggers are synchronous. That is – trigger actions will block a command from returning to the client until the the trigger’s actions have completed.

What’s the solution? Do we run summary jobs on a regular basis? Maybe.

Sometimes summary jobs aren’t enough. Sometimes we have to update summarized data in real time. Let’s break out AdventureWorks and look at an example.

Let’s say we have a report that shows us the total line item sales per customer broken out by year and month.

That’s pretty simple, right?

SELECT  p.BusinessEntityID ,
        p.FirstName ,
        p.LastName ,
        DATEPART(yyyy, soh.OrderDate) AS [Year] ,
        DATEPART(mm, soh.OrderDate) AS [Month] ,
        SUM(sod.LineTotal) AS LineItemTotal
FROM    Person.Person AS p
        INNER JOIN Sales.Customer AS c ON c.PersonID = p.BusinessEntityID
        INNER JOIN Sales.SalesOrderHeader AS soh ON c.CustomerID = soh.CustomerID
        INNER JOIN Sales.SalesOrderDetail AS sod ON soh.SalesOrderID = sod.SalesOrderID
GROUP BY p.BusinessEntityID ,
        p.FirstName ,
        p.LastName ,
        DATEPART(yyyy, soh.OrderDate) ,
        DATEPART(mm, soh.OrderDate)
ORDER BY p.BusinessEntityID ,
        [Year] ,
        [Month] ;

The problem is that this query may use a lot of CPU or disk resources. How can we avoid that? Earlier I mentioned using summary jobs. The problem is that the summary jobs will only run on a schedule and require that we maintain some kind of maintenance history of each row – even if that maintenance history is just a modification timestamp. Wouldn’t it be easier if our summary report just updated itself automagically?

It can! Enter the indexed view.

An indexed view is a regular view that has a clustered index on it. By adding the clustered index the view is persisted to disk. The upside of persisting the indexed view to disk is that there’s an automatic mechanism in place that will update the indexed view. An indexed view must have a unique clustered index so that SQL Server can identify the materialized row that must be updated in response to changes in the underlying data.

CREATE VIEW dbo.my_indexed_view
WITH SCHEMABINDING
AS
  SELECT  p.BusinessEntityID,
          p.FirstName,
          p.LastName,
          DATEPART(yyyy, soh.OrderDate) AS [Year],
          DATEPART(mm, soh.OrderDate) AS [Month],
          SUM(sod.LineTotal) AS LineItemTotal,
          COUNT_BIG(*) AS NumberOfRecords
  FROM    Person.Person AS p
          INNER JOIN Sales.Customer AS c ON c.PersonID = p.BusinessEntityID
          INNER JOIN Sales.SalesOrderHeader AS soh ON c.CustomerID = soh.CustomerID
          INNER JOIN Sales.SalesOrderDetail AS sod ON soh.SalesOrderID = sod.SalesOrderID
  GROUP BY p.BusinessEntityID,
          p.FirstName,
          p.LastName,
          DATEPART(yyyy, soh.OrderDate),
          DATEPART(mm, soh.OrderDate) ;
GO

CREATE UNIQUE CLUSTERED INDEX CX_my_indexed_view ON dbo.my_indexed_view (BusinessEntityID, [Year], [Month]) ;

We could have accomplished this using a summary table and triggers. Triggers are synchronous and updating the summary table will block our data changes from completing. In addition, if we ever want to bypass the trigger from updating (say during a huge data load) we need to disable and then enable the trigger again and then remember to update the summary table. With an indexed view we can just drop the clustered index and then recreate it when we’re done with our load processes.

Note: I can’t take full credit for this question and explanation. A SQL Saturday #42 attendee sent me an email with a question about using indexed views instead of triggers.

This site is protected with Urban Giraffe's plugin 'HTML Purified' and Edward Z. Yang's Powered by HTML Purifier. 226 items have been purified.