Tag AppDev

Object-Oriented Programming Concepts For DBAs

This all came about when Aaron Nelson (blog | twitter) asked me a few questions about object-oriented concepts. I did the best that I could to answer his questions, but I decided that this needed a real answer that will, I’m hoping, help to bridge the language gap between DBAs and developers.

Note to the reader: Please keep in mind that this is based on the .NET framework and specifically my experience with C#. If you want to highlight differences for your own language of choice in the comments, feel free.

Class

A class is a definition that we’ll be using later when we create objects. It’s a generic blueprint for building something else. It describes all of the properties that an object will have once we create one. In addition, classes also describe all of the behaviors (called methods) that an object will have.

DBAs, you can think of a class as similar to DDL. Just as DDL defines a the rows of a table, a class defines the objects that we’ll create later.

Object

“An object can be said to be an instance of a class.” WTF does that mean?

Well, an object is an instance of a class. That means that any time we create an object, we base if off of some kind of template. The object, whatever it may be, is just an implementation based off of some kind of blueprint. In the .NET world, an object can only be defined by one class.

Once we create an object (using something called a constructor), it’s different from the class. Like a zombie, the object takes on a life of its own. Let’s take a look at an example.

Cars have common properties – they all have four wheels, an engine, and a number of doors that is greater than 0. Individual cars have a lot of differences: different paint and upholstery colors, different trim levels, stereos, engines, transmissions, etc. When you go down to the car dealership (the constructor), you don’t just ask for a generic car. You have to be more specific and tell the dealer that you want a bright red 1978 Lincoln Continental with white wall tires. The dealer will get you the car (the constructor instantiates the object) and you can pimp in style.

Depending on how things are set up, we can change the properties of our object after we’ve created it. Properties can be modified with new values, or we can call methods on the object to make complicated changes to the object’s state.

In a round about way, an object is similar to a row. It’s a single instance of data. This is a bit of a crappy metaphor because a single object can actually contain deep structure that would be difficult to represent in a single row in the database.

Fields

A field is a basic variable – a string or number. It’s the simplest way to store data in an object. You can think of it as roughly akin to a column. You may also hear people refer to fields as member variables, instance variables, or any of a number of terms.

Properties

In the .NET world properties are wrappers around fields. They aren’t quite methods, but they aren’t quite fields. One of the interesting things about properties is that they let us enforce rules or build complex ways to describe an object from a set of simple data points.

public class Person {
  // other nonsense goes here...

  // some fields
  private string _firstName;
  private string _lastName;

  public string FirstName {
    get { return _firstName; }
    set { _firstName = value; }
  }

  public string LastName {
    get { return _lastName; }
    set { _lastName = value; }
  }

  public readonly string FullName {
    get { return _firstName + " " + _lastName; }
  }
}

In the real world, we’d do things far more complex than concatenating first name and last name to make a full name. We’d probably use a comma and put things in the order of last name first. Or maybe we’d calculate order price based on county sales tax and order line items and shipping fees.

An astute reader might notice that properties bear an uncanny similarity to constraints in a database. Let’s look at a different example. A class for a clock might look like this:


public class Clock {
  private int _hours;
  public int Hours {
    get { return _hours; }
    set {
      if (value >= 0 && value <= 23) {
        _hours = value;
      } else {
        throw NotOnPlanetEarthException();
      }
    }
  }
}

And a similar table would look like this:


CREATE TABLE clock (
  [hours] INT CHECK (h BETWEEN 0 AND 23)
);

It’s possible to create deeply nested structures of fields and properties. You can, for example, model a store thusly:


public class Store {
  private List _employees;
  private List _products;

  // I know this isn't how I should really model it. Shut up.
}

And this is all well and good in code. And there is a way to model this in the database that should seem relatively obvious using join tables (employees can work in more than one store and products can be sold in more than one store).


CREATE TABLE stores (
  StoreId INT IDENTITY(1,1)
  -- real implementation left out because I don't want to write it
);

CREATE TABLE Employees (
  EmployeeId INT IDENTITY(1,1)
);

CREATE TABLE Products (
  ProductId INT IDENTITY(1,1)
);

CREATE TABLE StoreEmployees (
  StoreId INT,
  EmployeeId INT
);

CREATE TABLE StoreProducts (
  StoreId INT,
  ProductId INT
);

The problem is that these two models don’t work well together. We call this object-relational impedance mismatch. I wrote about a lot of the problems in an older blog post, O/R-Ms: Panacea or Polio Braces?, so I won’t get into it here. Needless to say, there are some major problems that can arise.

Methods

Methods are the workhorses of object-oriented programming. These are the verbs of programming. A method is a bunch of statements that are executed in order and achieve some result. Methods are roughly analogous to stored procedures or functions (depending on your database).

Inheritance

The programming world would be incredibly painful if we had to keep repeating code every time we want to do the same thing. Inheritance helps us solve this problem, in some cases.

All cars have similar properties – they have engines, doors, wheels, seats, and various other car things. Some cars have different traits – a 1967 Plymouth Fury Station Wagon drives a lot differently than an 2010 Aston Martin DB9.

public class Car {
  // details go here
}

public class StationWagon : Car {
  // more details
}

public class SportsCar : Car {
  // very fast details
}

Inheritance is a way to work with the same root data and behaviors but provide additional specialization. Our SportsCar class might provide a SpinTires or BurnOut method whereas the StationWagon is not capable of performing a burn out.

Another way to think about it is to say that inheritance (and also interfaces) is a way of describing “is a” relationships. A SportsCar is a Car. Anywhere that we can use a Car, we can also use anything else that is a Car. So, we can use a StationWagon in any method that expects a car. But, since a StationWagon isn’t a SportsCar, it won’t work there. See how that works?

If this seems like a huge set of abstractions, you’re absolutely right. A lot of the concepts of object-oriented programming can be implemented using purely procedural code. It’s considerably easier to express some concepts using object-oriented code. There is some performance overhead, but many developers consider that the performance penalty is made up by the ease of development.

Interfaces

The duckroll fully implements IWheeledVehicle

Interfaces are where things get a little bit tricky for some people. An interface is a contract. When we say that a class implements an interface, we’re really saying that the class has all of the methods defined in the interface. An interface is an abstract type that can’t be directly created. Interestingly enough, though, we can create classes that implement an interface and then use the class anywhere that the interface is accepted.

If we create an IWheeledVehicle interface (interfaces in the .NET world are typically named with an I), we can define methods that accept an IWheeledVehicle parameter. We can use any object that provides an implementation of IWheeledVehicle with that method. Even if that object is a duckroll.

In Summation

Object-oriented programming is a strange concept when you first encounter it, but it provides useful abstractions that make it easier for developers to work with data. Understanding how developers might be consuming data is going to make it easier for you to work with them and create amazing applications.

If you’re going to be doing a lot of work with developers for any length of time, it would potentially be a good idea to take a look at a book or two on the language they’re using. You don’t need to become proficient, but you should understand the concepts they’re working with.

Special thanks to Matt Nowack (blog | twitter) and Mike Peschka (blog | twitter) for reviewing this and providing feedback. Also, thanks to Aaron Nelson (blog | twitter) for the original questions that sparked this blog post.

Shrink, Damn’d Log! Shrink, I Say!

Ever have a database log file grow to epic proportions on the weekend? I have. Ever forget to set up job monitoring on that server? I have. Ever get so pissed off that you decided to write a job to automatically shrink the log files whenever they grow? I have.

T-SQL Tuesday – “Did he just say that?” edition

As in, I didn’t participate in the most recent T-SQL Tuesday about my favorite feature in SQL 2008 R2.

Want to know my favorite 2008R2 features? PostgreSQL 9.0 and MongoDB.

PostgreSQL and MongoDB are rapidly advancing features that solve my daily problems. I’m an OLTP guy. Honestly, I don’t care about the latest reporting ding dongs and doo dads. I already know PowerShell. I manage 6 production servers and we’re unlikely to grow, so these MDM and Utility Control Points don’t make me giddy with excitement.

I solve problems using SQL.

You know what makes me happy? Support for window functions or better yet, improved support for window functions. What about exclusion constraints? Or column level triggers so I don’t have to use branching logic in my triggers? Yes, I use triggers. Or any other of these features.

What about MongoDB? I’ve just started playing with it, but it solves a lot of the problems I face at work. Not just in the day job, but in side projects as well. I’ve bitched about O/R-Ms before, but one of the biggest problems that users of O/R-Ms (developers) face is that the ideal way to model data for object-oriented programming bears no resemblance to the ideal way to store relational data. A recent article about scaling Farmville hints at this – the developers of Farmville managed scale by storing everything in a key-value store (memcached) before persisting to a relational data store later. Digg does something similar with Cassandra. It’s not like these guys are idiots, the blog posts from Digg show that they know their stuff.

MongoDB lets me solve these problems. I can control the frequency of syncs to disk (just as I can in PostgreSQL) to improve raw write performance to memory. I only have to worry about storing data the way my application expects to see the data – arrays and hashes – without worrying about building many-to-many join tables.

What about DMVs and data integrity and a write-ahead log and indexes? MongoDB has instrumentation and indexes. Yeah, you sacrifice some durability but many applications don’t need that. Hell, Amazon has even designed their systems to account for the potential of failure.

When I start my next application, I’m going to look long and hard at the platform I’m building on. There’s a good chance it’s not going to be a relational database and a really good chance it’s not going to be using SQL Server. It’s not because I have a problem with SQL Server or even RDBMSes, but because there are other choices that give me the flexibility I need to solve the problems I’m facing.

This is nothing against SQL Server 2008 R2, it’s a great step forward in the direction that SQL Server seems to be going. Sometimes I wonder if SQL Server and I are on the same road.

Push Ups and String Concatenation

SQL sucks at string manipulation. It’s not just SQL Server, it’s the SQL language in general. I’ve mentioned this before on Stack Overflow in answer to the question “What are five things you hate about your favorite language?” It’s almost easier to return the raw rows to application code than it is to work with them inside of a database. There’s one place where SQL Server has made it easier to work with string data – string building.

My MacGyver Moment

David Stein started this current blog meme. He passed the buck on to Brent Ozar, who shared a horrifying tale of his time in the trenches as a developer. Brent then thoughtfully pointed at me and demanded that I carry on the blog meme torch.

Blog memes are great. They give me an opportunity to pretend to be inventive and creative, take someone else’s great idea and spin it in a centrifuge to extract the good stuff and then pass the detritus along to someone else to deal with. As a career developer turned developer berating apparatus, I have seen my fair share of cobbled together dung heaps. In fact, I would be proud to have someone call some of my solutions a cobbled together dung heap.

Bad Process? Just Add Software

Many years ago I was asked to put together a simple proof of concept application showing code movement using VSS automation. The phrase “simple proof of concept” should have sent me running. Of course, I’m assuming that the me of 5 years ago had any sense in his head.

I’ll back up a minute and let you see the whole situation. This organization had a complex development lifecycle. Developers were frequently developing several releases into the future in a combination of major and minor releases. Hot fixes would go out in production and when the next major release went out the icon be cornflower blue but the hot fix would be nowhere to be found. The hot fix would be hot fixed into production and the icon would be back to alice blue. The situation was, in short, a nightmare. What makes the nightmare worse was that it was a process nightmare: development practices in this organization dictated that multiple releases be developed simultaneously by separate teams.

Think about complex, line of business, software. Important software. Software that is core to your business. Imagine that three separate teams of developers are actively developing new features and fixes at the same time in three separate source trees. This is exactly what was going on at this organization. My job was to automate the problem away with software.

Bad Software? Just Add More Software

This organization was heavily invested in using Visual Source Safe. For those of you young enough to have worked with modern version control systems (such as a team of scribes working around the clock), let me tell you about VSS. VSS exists entirely on the client computer. VSS’s source control mojo operates through a network share using a combination of black magic and pure luck. Like all things that work with luck, it fails. It is safer to print your source code, shred it, and burn the paper shreds that to keep your code in VSS.

Despite the pain of VSS, VSS was the platform. How do you automate a platform when there is no server-side component? You install a VSS client on a server and you begin automating it. Thankfully VSS supplied an API – a poorly documented COM+ API.

I created a proof of concept application that showed code motion through various stages of development in VSS. The client was happy. The client signed a statement of work. We took their money and strode purposefully towards what appeared to be a set of rotating blades.

As the project unfolded it became very apparent that no amount of software could solve this problem. The problem was process, not software. Valiantly, foolishly, 26ishly, I soldiered forward. I worked 14 hour days. I build a cobbled together GUI that attempted to model a software development process that was beyond complicated. The GUI was cumbersome to use and brittle to code. A small change in one line of code meant that 17 other bugs showed up in various parts of the program. There were easily 100,000 lines of code in what should have been a simple management application. Edge cases lead to other edge cases. After one marathon 24 hour programming session (after four 18 hour days) the GUI tool finally worked. The users could configure a process. A process that potentially didn’t work.

Throwing Worse Code After Bad

The actual automation was a supposed to be easy. The users could apply VSS labels and the automated software would, in theory, pick up the changes within 15 minutes. The automated software would pick up the changes on a regular basis but it frequently exploded in a fiery error when confronted with real world problems. When two teams are actively developing against the same code base, code merges don’t go very well.

The process was intrinsically human but the client wanted automation. Being young and stupid, I didn’t know enough to say no. I cobbled together a working knowledge of text parsing and attempted out outfox the people who wrote the VSS Merge Tool. Armed with a fistful of academic papers, articles, and APIs I attempted to build a better merge tool using existing libraries.

At the end of the day, the tool built a log of errors using the unified diff format (which no human can read), collected the log and emailed a copy of it to the developers who committed the conflicting files. Or, it would have had anyone used the software.

Didn’t You Read the Manual

I actually wrote a software manual. It was actually a good manual. Want to know how I know? I gave the manual to the project manager and had him create instructional videos. They worked.

The project sponsor was promoted before the project was completed. The primary users transferred to another department. The new users were too busy to read the manual and expected the software to work easily and obviously. Who wouldn’t, right?

The GUI tool looked like the directory structure in VSS, but it didn’t reflect the changes in VSS, it simply let the users configure how the tool should behave. Sound confusing? It was.

At the end of the day, I had written a complicated manual for a complex piece of software that actually worked. It was a cobbled together solution built out of several hundred thousand lines of buggy .NET 1.1 code sitting on top of a buggy COM+ API intended to solve a buggy process. It’s a miracle that this software abomination wasn’t fueled by the shattered dreams of orphans.

Silent Rage Would Be Better Than This

I said that was the end of the day. It wasn’t. I got one support call about the product about 3 months after it had been delivered, signed off, and paid for. I went back to the client’s location and sat down with the current team to learn their problems. None of them knew how to use the software. No original team members were around who had been involved in the original project. All of the original documentation had been lost in some kind of bit rot tar pit.

My beautiful, cobbled together, duct tape and baling wire solution would never see the light of day. It may have been ugly, but it worked. There’s nothing like feeling the triumph of having saved the day and then having that day get flushed right down the toilet.

Calling Other MacGyvers

Since Brent Ozar stole my default victim, Matt Nowack, I’m going to have to find more victims. I’m going to point the finger of shame at TJay Belt, Grant Fritchey, and Alex Moore.

Rounding to the Nearest X Minutes, the Lazy Way

A lot of people use calendar tables. I’ve blogged about it before. They’re incredibly helpful. Now, have you ever needed a table of minutes?

You’re probably asking, “Jeremiah, why the heck would I ever need a table of minutes?” Well, dear reader, I’m going to tell you that. Please stop interrupting.

Let’s say you have a report. This report shows the sum of sales per 5 minute increment. You could do a lot of trickery with math to make this report happen, doing things like this to get the 5 minute interval:

SELECT
    (DATEPART(mi, CAST('2009-01-01 00:01:00' as datetime))+4) / 5 * 5 AS [05]

It works, but it’s ugly as sin. I said to myself, “Self, there has to be a better way.” Turns out that there is a better way: table valued functions!

I created a table valued function to return 60 rows, one for each minute. It also rounds to the nearest 5, 10, 15, and 30 minute intervals. This makes it possible to change the reporting interval very easily by using a join. Check it out:

SELECT DATEPART(hh, s.SaleTime) AS TheHour,
  mt.Five AS NearestFive,
  SUM(s.SalesAmount) AS AmountSold
FROM dbo.Sales AS s
INNER JOIN dbo.MinutesTable() AS mt ON DATEPART(mi, s.SaleTime) = mt.[Minute]
WHERE -- something is true
GROUP BY DATEPART(hh, s.SaleTime), mt.Five

Why do I call this the lazy way to do this? Because now that I’ve written it once I never have to do it again. I can add new columns to dbo.MinutesTable() without having to worry about breaking anything or copying code incorrectly from one query to another.

Here’s the code to create the function:

CREATE FUNCTION dbo.MinutesTable ()
RETURNS @minutes TABLE (
  [Minute] TINYINT,
  Five TINYINT,
  Ten TINYINT,
  Fifteen TINYINT,
  Thirty TINYINT
)
AS BEGIN

INSERT INTO @minutes VALUES (0,60,60,60,60)
                            (1,5,10,15,30),
                            (2,5,10,15,30),
                            /* you get the gist of it */
                            (59,60,60,60,60);

RETURN ;

END

Update: Thanks to Brad Schulz this has gone for a single post about my laziness to an example of a refactoring you can make. If you ever see a table being generated like this, you can take a look at it and determine how you can change it into an inline select. The query optimizer is going to do something completely different for each plan. With the inline select, SQL Server is able to determine that there are 60 rows in our table and can build a much more efficient execution plan.

CREATE FUNCTION [dbo].[MinutesTable] ()
RETURNS TABLE
AS 

RETURN
  SELECT  0 AS [Minute], 60 AS Five, 50 AS Ten, 60 AS Fifteen, 60 AS Thirty
  UNION ALL
  SELECT  1, 5, 10, 15, 30
  UNION ALL
  /* ... this still makes sense, right? ... */
  UNION ALL
  SELECT  59, 60, 60, 60, 60 ;
GO

A Simple Refactoring – Avoiding Table Scans

Refactoring SQL code can be pretty easy. Just like refactoring any other programming language, sometimes you have to look around to find the culprit.

We have a slow running query that frequently times out when run by the users. A quick look at the query told me what was wrong. I found this clause in the middle of an otherwise simple query:

        LEFT JOIN (
                   SELECT DISTINCT
                          key,
                          column_a
                   FROM   ImportantData
                   WHERE  FieldName = 'VALUES'
                   UNION ALL
                   SELECT DISTINCT
                          key,
                          column_a
                   FROM   ImportantData_History
                   WHERE  FieldName = 'VALUES'
                  ) AS t ON other.key = t.key
        LEFT JOIN (
                   SELECT DISTINCT
                          key,
                          column_b
                   FROM   ImportantData
                   WHERE  FieldName = 'VALUESXY'
                   UNION ALL
                   SELECT DISTINCT
                          key,
                          column_b
                   FROM   ImportantData_History
                   WHERE  FieldName = 'VALUESXY'
                  ) AS t2 ON other.key = t2.key

The problem with this query is that we’re reading each table twice for the same data. I checked in the execution plan and, sure enough, SQL Server was performing two scans on the underlying data. After some careful thought I realized that I could accomplish the same thing with a single query:

        LEFT JOIN (
                   SELECT key,
                          VALUES AS column_a,
                          VALUESXY AS column_b
                   FROM   (
                           SELECT DISTINCT
                                  key,
                                  FieldName,
                                  CASE WHEN FieldName = 'VALUES'
                                       THEN column_a
                                       WHEN FieldName = 'VALUESXY'
                                       THEN column_b
                                  END AS content
                           FROM   ImportantData
                           WHERE  FieldName = 'VALUES'
                                  OR FieldName = 'VALUESXY'
                           UNION ALL
                           SELECT DISTINCT
                                  key,
                                  FieldName,
                                  CASE WHEN FieldName = 'VALUES'
                                       THEN column_a
                                       WHEN FieldName = 'VALUESXY'
                                       THEN column_b
                                  END AS content
                           FROM   ImportantData_History
                           WHERE  FieldName = 'VALUES'
                                  OR FieldName = 'VALUESXY'
                          ) AS source PIVOT( MAX(CONTENT) FOR FieldName IN ([VALUES],
                                                              [VALUESXY]) ) AS pvt
                  ) AS t ON other.key= t.key

The upside to this unreadable pile of junk is that it scans each table once and only once. This reduces the load on disk, the amount of data that needs to be read into memory, and it most likely reduces the amount of data stored in memory. And, while this only reduced the estimated query cost by 40 (580 down to 540), we all know that the query cost is only one factor in what actually affects the overall run time of a query.

Many readers are probably mortified that there’s a SELECT DISTINCT in this query. I am too. That being said, I plucked the low-hanging fruit and managed to get a huge performance boost as a result – query execution time went from over 1 minute to 14 seconds. With an improvement like that, removing that distinct isn’t on the radar any more.

When is a Lookup not a Lookup?

Execution plans are great things. They give us an insight into how SQL Server is putting together queries and why they run slowly.

One immediate thing I look for in an execution plan is a Key Lookup. In a Key Lookup operation, SQL Server has to reference the clustered index on the table because a value it’s looking for is not present in the index that was used to find the row.

Normally, when you look at the Key Lookup, you will see a list of output columns. This is a great way to help you modify your indexes and make sure that you can avoid these extra disk hits by sacrificing a tiny bit of storage space.

A Normal Output List from a Query Plan

Today, while troubleshooting a query that is never finishing, I ran across this gem.

Empty. Bereft of Meaning. Nothing.

There’s no output list in the Key Lookup. I, wrongly, thought that Key Lookups would always include an output list. What’s going on here?

Well, here’s the scoop. I have the following, sample, table:

CREATE TABLE dbo.ServiceList (
  ServiceListId BIGINT NOT NULL PRIMARY KEY,
  BillId BIGINT NOT NULL,
  ServiceMonth DATETIME
);

In this query, ServiceList is referenced through a CROSS APPLY’d inline function. The inline function references the BillId and ServiceListId. There’s an index on ServiceList that contains both of these columns:

CREATE INDEX IX_ServiceList_BillId ON dbo.ServiceList
(
  BillId
)
INCLUDE (ServiceListId, ServiceMonth);

This index is correctly being used by the query optimizer but in the past it’s only been used to perform lookups and push data to screen (or somewhere). So, what we’re seeing here makes sense. The CROSS APPLY needs to reference the BillId so the compiler checks the situation out and decides to use IX_ServiceList_BillId in order get the BillId. Since the index only includes ServiceListId, it has to perform a Key Lookup.

Protip: included data is only included in the index; it is not indexed.

To avoid this kind of behavior, I need to change this index to actually index on the combination of BillId and ServiceListId like so:

CREATE INDEX IX_ServiceList_BillId ON dbo.ServiceList
(
  BillId,
  ServiceListId
)
INCLUDE (ServiceMonth);

O/R-Ms: Panacea or Polio Braces?

In case you haven’t heard, there is a huge difference between the way that software developers and database professionals operate – there are accepted use paradigms and development methodologies for both worlds. Unfortunately for everyone, they don’t match up. Object-oriented programming languages operate on single entities: objects. Databases work with sets. Objects sometimes have relationships that are difficult to effectively map in a relational database. Database entities frequently do not even correspond to application entities. How are these two wildly different paradigms supposed to interact?

Object-Relational Mappers, O/R-Ms, are often hailed as the solution to the differences between object-oriented code and the relational storage layer. By simply using some magical tool, library, or bong we can code, map, or smoke our way to software that accounts for the differences between databases and object-oriented software. The thing is: this doesn’t work.

Magical Snake Oil

A lot of developers see the database as nothing more than a hole for objects. You tie a string around them in the application layer, put some pretty ribbons on the string so you can find your object later, and you then chuck it in the hole. When you need it later, you find the pretty ribbon and pull the object back out, right?

Not quite, it appears. First you have to create some kind of mapping so you know which tables are associated with which objects. With some O/R-Ms you have to create mappings in XML. These mappings require maintenance whenever your application-layer or database changes. They are also error prone when you are setting them up. Don’t believe me? Search the web for developers bitching about working with Hibernate and NHibernate mappings. Solving non-trivial mapping problems isn’t easy with these tools.

Isn’t one of the reasons for using O/R-Ms to cut down on the amount of difficult, “specialized” code that you have to write? Instead of writing SQL – a language purpose built to the task of querying relational data from a database – we resort to creating magical XML mappings, write more code using fluent style code with lambda expressions and other advanced language features, or rely on magical automapping to wire up classes directly with tables. Think about it: in order to avoid writing special purpose code, we’re writing special purpose code in languages that aren’t even well suited to the task at hand. If these tools are solving problems, why do so many developers bitch about them and write code that generates their entire O/R-M layer, which is really just code to generate mappings to and from SQL.

Shackled to the Solution

If everything is as horrible as I say, why are people using O/R-Ms? A lot of O/R-M software works. Technically, an AMC Gremlin also works. It has wheels, doors, and an engine. It is suitable for driving you around town. It is not ideal. Likewise, O/R-Ms are suitable to the task of getting at data, but they are not ideal. I hear the arguments all the time, and I’m going to discuss some of the more prominent ones. I feel it’s important to put my biases out there, so keep the following in mind:

  1. I am a database developer and a production DBA.
  2. I still work with OO languages, just not as part of my day job.
  3. I don’t hate O/R-Ms; they have a place. They just aren’t a magical cure-all.

But my database is just an object store

Use a flat file. You don’t need anything more complicated than that. Clearly, since you think of your database as an object store you’re not overly concerned about data integrity, reportability, concurrency, consistency, or any of the other reasons to use a relational database. Hell, use an object-oriented database for all I care. Just stop using RDBMSes and complaining that they don’t meet your needs. Or take a refresher course in everything that an RDBMS provides. I’ll wait.

Putting my vitriol aside, a database is more than an object store. By lowering the database to the level of nothing more than an object store, you’re admitting that you haven’t taken the time to fully understand your business problem or the underlying data. Business problems are, fundamentally, problems related to ever changing data and ways to store and retrieve that data programmatically for human consumption. Objects and classes are ideal for working with and describing systems with different behaviors in a set of related ideas. The OO paradigm works with relatively constant data. I have a Phone base class (I’m not making the obvious IPhone interface joke) with a call(string number) method. The PayPhone class will implement a different call method than the CellPhone class. This works wonderfully for modeling the behavior of a telephone. In the database, we’re likely to have a table with columns PhoneId, PhoneType, and Number. There’s no actual differentiation between how we’re storing any of these phones. Having an OO paradigm actually complicates the issue.

But I don’t have time to write CRUD stored procedures

You know what? Neither do I. I generate them. SQL Server Management Studio will generate them for me. SSMS Tools Pack adds a context menu option that reads “Generate CRUD.” I could, in a very short amount of time, write T-SQL to generate dynamic CRUD for me that will even only update the values supplied.

People, I hate to break it to you, but if you took the time to understand SQL you could probably write your complex mappings using views and stored procedures faster than you could write your O/R-M XML/fluent/automapping magic code. It’s just as easy to use stored procedures and views as it is to use tables. Trust me, I know this.

You’re just being lazy or scared of SQL.

But my O/R-M uses prepared statements and those are just like stored procedures

I understand that you believe that prepared statements are just like stored procedures because some expert told you so. I’m very happy that you have good friends who have use their free time to selectively quote documentation and educate you about things they don’t understand.

Thing is: prepared statements are not just like stored procedures.

I will admit that with a prepared statement you are buying yourself the ability to have cached queries. But what if your line of business app is one of a hundred apps on a server that is near capacity. Odds are that your prepared statement execution plans aren’t going to stay in memory. My DBA voodoo talk means that your prepared statement (the one that’s “just as good as a stored procedure”) is actually worthless. With stored procedures us DBA voodoo priests can do all kinds of things that you didn’t even know about – we can tell the database which indexes to use. We can force the database to use execution plans that we saved off ages ago. Hell, we can even re-write your table structure into something that suits the relational model a lot better and hide it that re-write through the stored procedures.

But I digress

No, seriously, I was digressing. The point is, developers start making bitchy noises whenever it’s suggested that they do something in SQL. These are the same developers who suggest some new tool/library because it’s faster/better/stronger/more Daft Punk than ever.

There are a lot of problems that O/R-Ms seek to solve. In an ideal world, it would work. In reality, it doesn’t. The instant you start relying on an automated solution to fix everything, you’re losing. I strongly advocate automation. Hell, I’ve automated a ton of my job away already so that I have time to do the things I enjoy: helping our development staff understand what they’re working with and finding the best way to solve the problem. Sometimes that solution has nothing to do with the database. That’s fine by me. But we put that solution where it belongs. We aren’t shoving everything into an OO box because that’s what we know.

The Cure Is Making Us Sick

O/R-Ms make assumptions about data access. Many O/R-Ms assume that you allow access directly to the underlying tables. A lot of them also assume that your tables reside in the default database schema. The thing is, there are a lot of assumptions going on. If you work with the O/R-M library, you’re missing a lot of the built-in features that come with your database out of the box.

Security

Let’s look at an example of a corporate intranet. We have a rule that the HR department data is not visible to the sales department and vice versa. If we’re working purely with an O/R-M, we’re going to have to identify the classes and tables that map to each department and write programming logic to make sure that HR users can’t see sales data and that the sales people can’t see HR data. But, if we go ahead and connect directly to the database with Access or Excel, anyone can go ahead and see these tables.

Working with database features like schemas and security, we can lock down the tables, views, stored procedures, and functions on a schema-by-schema basis. It’s possible to give the sales department access to specific HR information while denying them carte blanche to all HR data.

Data Access

Something else to take into account is universal data access. If we’re writing all of our data access logic in an O/R-M, we’re going to need to create a robust service layer that everyone can communicate with. Depending on the size of the company, we’ll have to create multiple service layers to make it possible to access our data. Why? All of our logic, all of our rules, all knowledge about the data is embedded in the O/R-M.

Let’s say that we are working in a large corporation. We have some legacy Java applications, some front line apps that were written in PHP, and new development is going on in .NET. Our team has started development and they’re using WCF and an O/R-M to get to the data in the database. We’ve got logic embedded in our data access layer. At the inception of the project we were under the assumption that no other applications would need to interface with our data. Unfortunately, we’ve done such a good job that the other development teams would like to get to our data as well. Java and PHP can’t talk to our WCF service layer. So, we have to write some SOAP services to talk to the Java and PHP code. Any time that we change or add new functionality to our WCF layer we have to add to or change those SOAP services as well. If we had originally gone with data access logic embedded in the database through stored procedures and views, it would have been simple to give the other teams access to our database objects but not the underlying tables. In short, we would have a low-level data access layer in the database that sits just above our data and below our code.

The database provides a universal way to get at the data in the database, agnostic of any platform or programming language.

Data Life, Code Life

Do you remember what programming language you were using five or ten years ago? There’s a pretty good chance that it’s not the same language that you’re using today. Next, think about how old your data is. There’s an even better chance that you’re working with data that is a lot older than five or ten years. Sure, the underlying platform may have changed from one RDBMS vendor to another but the structure of the data has stayed the same and our data access logic, if written in SQL, has probably changed very little from RDMBS to RDBMS.

As time goes on, the data structure will change at a very slow pace. I can’t predict what language we’ll be using in the application layer in five or ten years. However, I can assure you that developers don’t want to maintain a legacy data access layer because too much logic is embedded in the layer to move away from it. Instead, what will happen is that there is going to be an expensive and lengthy conversion project to move from the existing programming language to the new programming language. These conversion projects are frequently hidden in project time frames, but they exist and they add time to your projects – time that could be used to provide features to the users or work on far more interesting projects.

There is no Magic Cure

Data lives longer than code. If you don’t believe me, ask around. The data I’m working with goes back seven years; we’ve been using .NET for 15 months. At the recent PASS Summit I spoke with people who have data that is 30 years old and I’ve heard stories of people working with data older than that. If this ancient data is an RDBMS, there’s a great chance that the application accessing it are written in a variety of languages, some old and some new. They aren’t constant. Developers have come and gone. The data is permanent. There is a universal language for accessing it.

Databases provide features and functionality that we frequently re-implement in application code. It’s incredibly important to consider where we’re placing logic lest we incur technical debt that will take orders of magnitude longer to fix than to develop correctly. Take the time and figure out when you want to pay for that feature. Is moving your release date up by several weeks more important than saving six months of time down the road?

Always Learning

As knowledge workers, we should always be acquiring more knowledge. Knowledge comes through deepening our current understanding of a subject as welling as adding breadth to our experience and expertise. That’s right, we’re supposed to keep learning new programming languages and increasing our knowledge of the ones we already know. Heck, The Pragmatic Programmer suggests learning a new programming language every year to stay on top of your game.

While you are busy learning the hot new languages, or revisiting existing languages, you’re missing something: SQL is a language. By refusing to learn SQL apart from simple SELECT statements, you are refusing to leverage the most powerful piece of software at your disposal. I understand that set-based programming is different from the type of programming that you’re used to, but it is a perfect fit for the problem domain: working with data.

A Working Example

MySpace is arguably the biggest pile of insanity to ever be unleashed upon the internet, barring twitter. And yet, somehow it keeps running. Why? Because they know when to leverage the power of the database. They’ve learned that there is some functionality that can be handled in the data tier and they use a native solution to solve the problem.

When MySpace were facing scalability problems as the site grew larger, they made use of SQL Server’s Service broker to handle communication between multiple scale-out servers. Using SQL Server Service Broker helped them ensure that transactions were atomic across multiple servers. They built a custom tool – Service Dispatcher – that build on the technology already provided by SQL Server Service Broker

They took the time to understand their tools.

It’s Not All Bad

Lest you think that I despise O/R-Ms and I’m some kind of horrible data zealot, let me assure you of a few things:

  1. I am a data zealot.
  2. I like some O/R-Ms

A good O/R-M makes it very apparent what it does and where its limitations are. Just like a good language. T-SQL is a good language. I know where the limitations are. Anyone who uses it for a while knows what the limitations are. That’s okay; we know what it’s good at.

Likewise, when I’m writing Ruby on Rails code, I know where the limitations of ActiveRecord are. It makes them very apparent to me and it gives me ways to drop down into native SQL and it behaves just as well. When I did .NET development for a living, we used LLBLGen Pro. It, like ActiveRecord, does some very intelligent things. I always knew the limits of LLBLGen.

A poor O/R-M solution tries to be everything to everyone. It supplies pseudo-languages to query the database that are, really, bizarre subsets of SQL with slightly non-SQL like syntax that make it just confusing enough.

But What About…

LINQ

LINQ is a really cool idea. I haven’t worked with it enough to make a judgment call on it. Aesthetically speaking, I think it uglies up code with a giant chunk o’ nasty that looks like it doesn’t belong. But, that being said, it also offers a way to query data into an internal memory representation of what appears to be a set and then join that to existing collections of objects. Pretty cool, eh?

Active Record

If we’re talking about the Ruby O/R-M framework, I think it’s pretty good. The limitations are clearly plotted out and yet it’s flexible enough (thanks to Ruby metaprogramming magic) to handle on the fly data projections and custom querying.

ActiveRecord works well when you’re working within the context it knows: table per entity data models. Once you move from that paradigm, ActiveRecord stops being quite as helpful. On a previous project, in a different life as a consultant, we were building what amounted to a master data management system with Ruby on Rails. As the data model grew in complexity to account for the vagaries of legacy applications, we had to abandon ActiveRecord on multiple occasions and work with straight SQL to retrieve the objects the way we needed. The limitations of the OO approach became readily apparent we moved outside of the OO paradigm and started working with relational data.

Are we talking the Active Record design pattern? In that case I can’t really comment. It’s a design pattern. If you implement it right, it works wonders. If you implement it poorly… Well, if you implement it poorly I’m likely to tell you that your code smells bad and then explain why.

NHibernate

Nothing against NHibernate, I’m sure some people find it useful. I don’t.

There is a great deal of complexity in NHibernate. So much so, in fact, that I will do everything I can do avoid writing a speck of code for it.

Sure, you can tell me that I don’t have to write the XML mapping files and I can use Fluent NHibernate. I’ve done that. I’m still writing code to map from classes to tables. I can accomplish the same thing with boilerplate code and T4 to generate it.

But what about AutoMapper? AutoMapper solves simple problems. When you start creating complex mappings it starts getting trickier and you have to resort to writing the mappings yourself.

But what about Visual NHibernate? It’s a hot new tool that lets developers graphically design their table to object mappings in a GUI and then create the mapping files at the push of a button. Let me get this straight – you’re telling me that I should learn a tool to make my O/R-M tool easy to use because writing SQL is too hard/slow/requires sobriety? Let’s think about that for a minute:

  1. SQL is hard. I’ll grant you that it’s not easy. Neither is programming. Look at IoC containers and dependency injection.
  2. Objects aren’t sets. I’ll grant you this one, too. A collection of objects bears no resemblance to a set.
  3. We have a mismatch going on.
  4. An O/R-M solves this problem (NHibernate, for the sake of example).
  5. NHibernate mapping is painful.
  6. Let’s use a tool because mapping is difficult and error prone.
  7. The tool isn’t always right so we have to go back and fix our mappings.

In order to ease the pain of mapping SQL statements to objects I have to learn two new technologies to obfuscate away the data persistence layer instead of learning how to correctly access data in the first place.

I trust you can sense my incredulity about the merits of this scenario and I’m going to move on.

Other O/R-M Goes Here

I don’t know, unless I’ve listed it in this article I’ve probably never used it. But I see statements like this one: “I use (your O/R-M goes here) and it solves more problems than it creates.”

When you’re using a tool and you’re saying “Meh, it hasn’t killed my cat, yet,” I think it’s time that you rethink your strategy.

Other Language Goes Here

I’m going to use Ruby as an example, because I know it and I love it. Ruby is a class based language. Just check out the code you write for a basic Ruby on Rails project:

class Post < ActiveRecord::Base
end

ZOMG! It’s a class! But, thanks to RubyMagic, Ruby is able to change the Post class at run time to look more like this:

class Post < ActiveRecord::Base
  attr_accessor :id, :title, :body, :created_at, :author
end

Shazam! It’s like I wrote it myself, but I didn’t and I didn’t have to map anything. Time for Fruity Pebbles.

When you get down to brass tacks, it’s only because of Ruby’s magical metaprogramming that a lot of the things I love about ActiveRecord are possible. Thankfully, the DLR should make a lot of this pseudo-magic readily available in the .NET Framework with the upcoming release of .NET 4.0. I don’t know about any other programming languages and how they intend (or if they intend) to make any kind of magical dynamic whizbangery available. The point is, dynamic whizbangery is neat. Dynamic whizbangery, when used properly, goes from being neat to being a productive tool that saves everyone a ton of time and effort.

Object-Oriented Databases

I’ve never used something like db4o, so I can’t comment on them directly. But what I can tell you is this: There is no way to easily query data from an OODB. It requires complicated logic that is embedded in your data access libraries and, probably, is not available outside of your OO programming language. This effectively prevents the business users from hooking up Access or Excel to your OODB to run ad hoc reports. Instead you’ll have to write a complicated mapping process that pushes data on a regular basis from the OODB to an RDBMS for reporting purposes. You may end up doing this if you allow live ad hoc reporting against your production RDBMS. If you’re storing your data in a relational database, you can take advantage of SQL (which you’re probably already using) to move the data from the live production system to the reporting system.

OODBs do offer a seamless mapping between your code and your storage tier. There’s nothing to change because you’re storing native objects on disk. There is no O/R-M to hassle with. As I mentioned above, you’re going to need to create some kind of mechanism to export the data. db4o does contain functionality to do this. Or it claimed to the last time I looked into it. But, there are inherent difficulties in mapping OO code to relational storage mechanisms. Hell, that’s all we’ve been talking about for the last two thousand words.

The Philosopher’s Stone

There is no single solution. No alchemical, magickal, panacea or philosopher’s stone.

I really wish that I had a great solution to this problem, but the fact is that you need to take the long view about your project. Are you willing to use an O/R-M and trade off your initial development speed for problems down the road? You’ll notice that I’m not naming any specific problems. Why? Because the problems you’re going to run into are, without a doubt, very different from the problems I’ve run into. They depend on the application you’re developing, the library you’re working with, and your experience with everything involved (the other developers, the codebase, the O/R-M, and your data model).

On the other hand, you could work with SQL and a thin abstraction layer and trade off initial development speed for a consistent long-term development pace. What does this require? Your developers will need to understand your OO language of choice, the data storage model, and SQL. That seems like a lot to ask but that’s how we do things at work. The developers who work with me on a daily basis understand the underlying data model. They understand .NET development. They know how to write SQL. Not just simple SQL, but they’re capable of writing fairly complex SQL. They didn’t show up on day one with these skills – we all had to learn. I know OO development well enough, I know SQL very well, and I learned the data model.

Commitment to doing your job well requires that you understand all facets of your work – the business, the data, and the applications that access it.

Note: Any time I mention a specific technology or library, I’m not trying to knock it, I’m just pointing out an issue I’ve encountered.

References

We all owe a lot of thanks to Ted Neward for his great essay The Vietnam of Computer Science and his follow up Thoughts on Vietnam commentary

Martin Fowler’s Patterns of Enterprise Application Architecture

Phil Haack’s Why Store Your Data in a Relational Databse

Jeff Atwood’s Object-Relational Mapping is the Vietnam of Computer Science

Object-Relational Mapping Wikipedia FTW!

How Do You Use SQL Server

Grant Fritchey took it upon himself to tag me in his most recent blog post about how we use SQL Server. The best part about this is I was just hired Senior DBA, so I can write with the unbridled enthusiasm of someone who is showing off a new car.

Here’s a little bit of background – my employer provides freight, utility, and telecom expense management services. In short, we pay your bills, we hand key your bills, we figure out how to EDI your bills – we do everything we can to process as many bills as possible every second of every day of the year. We were also a bank – expenses can be justified but you better have a damn good reason for wanting to spend 27k per CPU for Enterprise Edition vs 7k per CPU for Standard Edition. My production servers all run Standard Edition.

In a way, though, I’m very lucky. I have a homogenous environment. From development to production, everything is running on SQL Server 2008 sitting on top of Windows Server 2008. Every server gets patched on the same day.

On to the reason why you came here: how we actually use SQL Server.

Most people have an application tier that sits on top of some kind of data access layer that eventually talks to an RDBMS through some kind of query generation engine. The application tier makes decisions and does most of the work and talks to the database to get a result set and then pukes that into a grid on a screen somewhere. We don’t really have that.

Long before I arrived a decision was made to treat SQL Server as an application platform. SQL Server is the core of our business. Before someone gets all up in arms, let me explain.

When I said that we process a large quantity of bills I want you to think massive piles of paper bills being hand keyed by an army of people. Bills are being typed in, scanned in, sent in via EDI, pigeon, and Star Trek teleporter at all hours of the day. We are doing everything we can to process as much data as we can. We do batch processing. We do single line item processing. We do all kinds of crazy transformations on the data.

Guess where that all happens. That’s right: in the database. Sure, we have an application layer that sits on top of the database, but it accesses the database through stored procedures and views. Yes, there is logic in the stored procedures. I am very strongly of the opinion that universal data logic belongs in the database. We make heavy use of stored procedures, constraints, views, indexed views, summary jobs… all the traditional aspects of RDMBS development. We’re leveraging SQLCLR to keep some heavy lifting in SQL Server but move it into the CLR where it belongs and also to give us the ability to create high performance ad hoc reporting capabilities.

Things are very much in their infancy right now, I’m the first full time DBA at this company. Are things smooth right now? Not always. We’ve had runaway processes bring a production machine to its knees for hours. Our SAN isn’t configured optimally and I/O waits account for 1/3 of the time waits on all of our production machines.

Data is still coming into the database from a legacy mainframe application written in COBOL. The monitoring is still largely a collection of hand written scripts that were thrown together by myself and the previous DBA – my manager. On the bright side, both my boss and his boss were DBAs at one point. They understand my pain points and they’re willing to give me the time to get things up and running the way they should be.

In a few years, we’re replacing it with a brand new .NET application. We’ll still be using stored procedures for data access and business logic, but we’ll be expanding to use even more SQLCLR. We’ll also be picking up SQL 2008 R2 or maybe even SQL11 by that point and leveraging new technology like StreamInsight and Master Data Management. Hopefully I’ll be implementing things that were new in 2005/2008 but haven’t had the opportunity to use like Service Broker, Change Data Capture, and Performance Data Collector.

To me, this is heaven. SQL Server is in the place it should be – at the core of the business.

Victims, victims… I need victims… I’m going to tag Tim Benninghoff and Mladen Prajdić.

This site is protected with Urban Giraffe's plugin 'HTML Purified' and Edward Z. Yang's Powered by HTML Purifier. 219 items have been purified.