In case you haven’t heard, there is a huge difference between the way that software developers and database professionals operate – there are accepted use paradigms and development methodologies for both worlds. Unfortunately for everyone, they don’t match up. Object-oriented programming languages operate on single entities: objects. Databases work with sets. Objects sometimes have relationships that are difficult to effectively map in a relational database. Database entities frequently do not even correspond to application entities. How are these two wildly different paradigms supposed to interact?
Object-Relational Mappers, O/R-Ms, are often hailed as the solution to the differences between object-oriented code and the relational storage layer. By simply using some magical tool, library, or bong we can code, map, or smoke our way to software that accounts for the differences between databases and object-oriented software. The thing is: this doesn’t work.
Magical Snake Oil
A lot of developers see the database as nothing more than a hole for objects. You tie a string around them in the application layer, put some pretty ribbons on the string so you can find your object later, and you then chuck it in the hole. When you need it later, you find the pretty ribbon and pull the object back out, right?
Not quite, it appears. First you have to create some kind of mapping so you know which tables are associated with which objects. With some O/R-Ms you have to create mappings in XML. These mappings require maintenance whenever your application-layer or database changes. They are also error prone when you are setting them up. Don’t believe me? Search the web for developers bitching about working with Hibernate and NHibernate mappings. Solving non-trivial mapping problems isn’t easy with these tools.
Isn’t one of the reasons for using O/R-Ms to cut down on the amount of difficult, “specialized” code that you have to write? Instead of writing SQL – a language purpose built to the task of querying relational data from a database – we resort to creating magical XML mappings, write more code using fluent style code with lambda expressions and other advanced language features, or rely on magical automapping to wire up classes directly with tables. Think about it: in order to avoid writing special purpose code, we’re writing special purpose code in languages that aren’t even well suited to the task at hand. If these tools are solving problems, why do so many developers bitch about them and write code that generates their entire O/R-M layer, which is really just code to generate mappings to and from SQL.
Shackled to the Solution
If everything is as horrible as I say, why are people using O/R-Ms? A lot of O/R-M software works. Technically, an AMC Gremlin also works. It has wheels, doors, and an engine. It is suitable for driving you around town. It is not ideal. Likewise, O/R-Ms are suitable to the task of getting at data, but they are not ideal. I hear the arguments all the time, and I’m going to discuss some of the more prominent ones. I feel it’s important to put my biases out there, so keep the following in mind:
- I am a database developer and a production DBA.
- I still work with OO languages, just not as part of my day job.
- I don’t hate O/R-Ms; they have a place. They just aren’t a magical cure-all.
But my database is just an object store
Use a flat file. You don’t need anything more complicated than that. Clearly, since you think of your database as an object store you’re not overly concerned about data integrity, reportability, concurrency, consistency, or any of the other reasons to use a relational database. Hell, use an object-oriented database for all I care. Just stop using RDBMSes and complaining that they don’t meet your needs. Or take a refresher course in everything that an RDBMS provides. I’ll wait.
Putting my vitriol aside, a database is more than an object store. By lowering the database to the level of nothing more than an object store, you’re admitting that you haven’t taken the time to fully understand your business problem or the underlying data. Business problems are, fundamentally, problems related to ever changing data and ways to store and retrieve that data programmatically for human consumption. Objects and classes are ideal for working with and describing systems with different behaviors in a set of related ideas. The OO paradigm works with relatively constant data. I have a Phone base class (I’m not making the obvious IPhone interface joke) with a
call(string number) method. The PayPhone class will implement a different call method than the CellPhone class. This works wonderfully for modeling the behavior of a telephone. In the database, we’re likely to have a table with columns PhoneId, PhoneType, and Number. There’s no actual differentiation between how we’re storing any of these phones. Having an OO paradigm actually complicates the issue.
But I don’t have time to write CRUD stored procedures
You know what? Neither do I. I generate them. SQL Server Management Studio will generate them for me. SSMS Tools Pack adds a context menu option that reads “Generate CRUD.” I could, in a very short amount of time, write T-SQL to generate dynamic CRUD for me that will even only update the values supplied.
People, I hate to break it to you, but if you took the time to understand SQL you could probably write your complex mappings using views and stored procedures faster than you could write your O/R-M XML/fluent/automapping magic code. It’s just as easy to use stored procedures and views as it is to use tables. Trust me, I know this.
You’re just being lazy or scared of SQL.
But my O/R-M uses prepared statements and those are just like stored procedures
I understand that you believe that prepared statements are just like stored procedures because some expert told you so. I’m very happy that you have good friends who have use their free time to selectively quote documentation and educate you about things they don’t understand.
Thing is: prepared statements are not just like stored procedures.
I will admit that with a prepared statement you are buying yourself the ability to have cached queries. But what if your line of business app is one of a hundred apps on a server that is near capacity. Odds are that your prepared statement execution plans aren’t going to stay in memory. My DBA voodoo talk means that your prepared statement (the one that’s “just as good as a stored procedure”) is actually worthless. With stored procedures us DBA voodoo priests can do all kinds of things that you didn’t even know about – we can tell the database which indexes to use. We can force the database to use execution plans that we saved off ages ago. Hell, we can even re-write your table structure into something that suits the relational model a lot better and hide it that re-write through the stored procedures.
But I digress
No, seriously, I was digressing. The point is, developers start making bitchy noises whenever it’s suggested that they do something in SQL. These are the same developers who suggest some new tool/library because it’s faster/better/stronger/more Daft Punk than ever.
There are a lot of problems that O/R-Ms seek to solve. In an ideal world, it would work. In reality, it doesn’t. The instant you start relying on an automated solution to fix everything, you’re losing. I strongly advocate automation. Hell, I’ve automated a ton of my job away already so that I have time to do the things I enjoy: helping our development staff understand what they’re working with and finding the best way to solve the problem. Sometimes that solution has nothing to do with the database. That’s fine by me. But we put that solution where it belongs. We aren’t shoving everything into an OO box because that’s what we know.
The Cure Is Making Us Sick
O/R-Ms make assumptions about data access. Many O/R-Ms assume that you allow access directly to the underlying tables. A lot of them also assume that your tables reside in the default database schema. The thing is, there are a lot of assumptions going on. If you work with the O/R-M library, you’re missing a lot of the built-in features that come with your database out of the box.
Let’s look at an example of a corporate intranet. We have a rule that the HR department data is not visible to the sales department and vice versa. If we’re working purely with an O/R-M, we’re going to have to identify the classes and tables that map to each department and write programming logic to make sure that HR users can’t see sales data and that the sales people can’t see HR data. But, if we go ahead and connect directly to the database with Access or Excel, anyone can go ahead and see these tables.
Working with database features like schemas and security, we can lock down the tables, views, stored procedures, and functions on a schema-by-schema basis. It’s possible to give the sales department access to specific HR information while denying them carte blanche to all HR data.
Something else to take into account is universal data access. If we’re writing all of our data access logic in an O/R-M, we’re going to need to create a robust service layer that everyone can communicate with. Depending on the size of the company, we’ll have to create multiple service layers to make it possible to access our data. Why? All of our logic, all of our rules, all knowledge about the data is embedded in the O/R-M.
Let’s say that we are working in a large corporation. We have some legacy Java applications, some front line apps that were written in PHP, and new development is going on in .NET. Our team has started development and they’re using WCF and an O/R-M to get to the data in the database. We’ve got logic embedded in our data access layer. At the inception of the project we were under the assumption that no other applications would need to interface with our data. Unfortunately, we’ve done such a good job that the other development teams would like to get to our data as well. Java and PHP can’t talk to our WCF service layer. So, we have to write some SOAP services to talk to the Java and PHP code. Any time that we change or add new functionality to our WCF layer we have to add to or change those SOAP services as well. If we had originally gone with data access logic embedded in the database through stored procedures and views, it would have been simple to give the other teams access to our database objects but not the underlying tables. In short, we would have a low-level data access layer in the database that sits just above our data and below our code.
The database provides a universal way to get at the data in the database, agnostic of any platform or programming language.
Data Life, Code Life
Do you remember what programming language you were using five or ten years ago? There’s a pretty good chance that it’s not the same language that you’re using today. Next, think about how old your data is. There’s an even better chance that you’re working with data that is a lot older than five or ten years. Sure, the underlying platform may have changed from one RDBMS vendor to another but the structure of the data has stayed the same and our data access logic, if written in SQL, has probably changed very little from RDMBS to RDBMS.
As time goes on, the data structure will change at a very slow pace. I can’t predict what language we’ll be using in the application layer in five or ten years. However, I can assure you that developers don’t want to maintain a legacy data access layer because too much logic is embedded in the layer to move away from it. Instead, what will happen is that there is going to be an expensive and lengthy conversion project to move from the existing programming language to the new programming language. These conversion projects are frequently hidden in project time frames, but they exist and they add time to your projects – time that could be used to provide features to the users or work on far more interesting projects.
There is no Magic Cure
Data lives longer than code. If you don’t believe me, ask around. The data I’m working with goes back seven years; we’ve been using .NET for 15 months. At the recent PASS Summit I spoke with people who have data that is 30 years old and I’ve heard stories of people working with data older than that. If this ancient data is an RDBMS, there’s a great chance that the application accessing it are written in a variety of languages, some old and some new. They aren’t constant. Developers have come and gone. The data is permanent. There is a universal language for accessing it.
Databases provide features and functionality that we frequently re-implement in application code. It’s incredibly important to consider where we’re placing logic lest we incur technical debt that will take orders of magnitude longer to fix than to develop correctly. Take the time and figure out when you want to pay for that feature. Is moving your release date up by several weeks more important than saving six months of time down the road?
As knowledge workers, we should always be acquiring more knowledge. Knowledge comes through deepening our current understanding of a subject as welling as adding breadth to our experience and expertise. That’s right, we’re supposed to keep learning new programming languages and increasing our knowledge of the ones we already know. Heck, The Pragmatic Programmer suggests learning a new programming language every year to stay on top of your game.
While you are busy learning the hot new languages, or revisiting existing languages, you’re missing something: SQL is a language. By refusing to learn SQL apart from simple SELECT statements, you are refusing to leverage the most powerful piece of software at your disposal. I understand that set-based programming is different from the type of programming that you’re used to, but it is a perfect fit for the problem domain: working with data.
A Working Example
MySpace is arguably the biggest pile of insanity to ever be unleashed upon the internet, barring twitter. And yet, somehow it keeps running. Why? Because they know when to leverage the power of the database. They’ve learned that there is some functionality that can be handled in the data tier and they use a native solution to solve the problem.
When MySpace were facing scalability problems as the site grew larger, they made use of SQL Server’s Service broker to handle communication between multiple scale-out servers. Using SQL Server Service Broker helped them ensure that transactions were atomic across multiple servers. They built a custom tool – Service Dispatcher – that build on the technology already provided by SQL Server Service Broker
They took the time to understand their tools.
It’s Not All Bad
Lest you think that I despise O/R-Ms and I’m some kind of horrible data zealot, let me assure you of a few things:
- I am a data zealot.
- I like some O/R-Ms
A good O/R-M makes it very apparent what it does and where its limitations are. Just like a good language. T-SQL is a good language. I know where the limitations are. Anyone who uses it for a while knows what the limitations are. That’s okay; we know what it’s good at.
Likewise, when I’m writing Ruby on Rails code, I know where the limitations of ActiveRecord are. It makes them very apparent to me and it gives me ways to drop down into native SQL and it behaves just as well. When I did .NET development for a living, we used LLBLGen Pro. It, like ActiveRecord, does some very intelligent things. I always knew the limits of LLBLGen.
A poor O/R-M solution tries to be everything to everyone. It supplies pseudo-languages to query the database that are, really, bizarre subsets of SQL with slightly non-SQL like syntax that make it just confusing enough.
But What About…
LINQ is a really cool idea. I haven’t worked with it enough to make a judgment call on it. Aesthetically speaking, I think it uglies up code with a giant chunk o’ nasty that looks like it doesn’t belong. But, that being said, it also offers a way to query data into an internal memory representation of what appears to be a set and then join that to existing collections of objects. Pretty cool, eh?
If we’re talking about the Ruby O/R-M framework, I think it’s pretty good. The limitations are clearly plotted out and yet it’s flexible enough (thanks to Ruby metaprogramming magic) to handle on the fly data projections and custom querying.
ActiveRecord works well when you’re working within the context it knows: table per entity data models. Once you move from that paradigm, ActiveRecord stops being quite as helpful. On a previous project, in a different life as a consultant, we were building what amounted to a master data management system with Ruby on Rails. As the data model grew in complexity to account for the vagaries of legacy applications, we had to abandon ActiveRecord on multiple occasions and work with straight SQL to retrieve the objects the way we needed. The limitations of the OO approach became readily apparent we moved outside of the OO paradigm and started working with relational data.
Are we talking the Active Record design pattern? In that case I can’t really comment. It’s a design pattern. If you implement it right, it works wonders. If you implement it poorly… Well, if you implement it poorly I’m likely to tell you that your code smells bad and then explain why.
Nothing against NHibernate, I’m sure some people find it useful. I don’t.
There is a great deal of complexity in NHibernate. So much so, in fact, that I will do everything I can do avoid writing a speck of code for it.
Sure, you can tell me that I don’t have to write the XML mapping files and I can use Fluent NHibernate. I’ve done that. I’m still writing code to map from classes to tables. I can accomplish the same thing with boilerplate code and T4 to generate it.
But what about AutoMapper? AutoMapper solves simple problems. When you start creating complex mappings it starts getting trickier and you have to resort to writing the mappings yourself.
But what about Visual NHibernate? It’s a hot new tool that lets developers graphically design their table to object mappings in a GUI and then create the mapping files at the push of a button. Let me get this straight – you’re telling me that I should learn a tool to make my O/R-M tool easy to use because writing SQL is too hard/slow/requires sobriety? Let’s think about that for a minute:
- SQL is hard. I’ll grant you that it’s not easy. Neither is programming. Look at IoC containers and dependency injection.
- Objects aren’t sets. I’ll grant you this one, too. A collection of objects bears no resemblance to a set.
- We have a mismatch going on.
- An O/R-M solves this problem (NHibernate, for the sake of example).
- NHibernate mapping is painful.
- Let’s use a tool because mapping is difficult and error prone.
- The tool isn’t always right so we have to go back and fix our mappings.
In order to ease the pain of mapping SQL statements to objects I have to learn two new technologies to obfuscate away the data persistence layer instead of learning how to correctly access data in the first place.
I trust you can sense my incredulity about the merits of this scenario and I’m going to move on.
Other O/R-M Goes Here
I don’t know, unless I’ve listed it in this article I’ve probably never used it. But I see statements like this one: “I use (your O/R-M goes here) and it solves more problems than it creates.”
When you’re using a tool and you’re saying “Meh, it hasn’t killed my cat, yet,” I think it’s time that you rethink your strategy.
Other Language Goes Here
I’m going to use Ruby as an example, because I know it and I love it. Ruby is a class based language. Just check out the code you write for a basic Ruby on Rails project:
class Post < ActiveRecord::Base end
ZOMG! It’s a class! But, thanks to RubyMagic, Ruby is able to change the Post class at run time to look more like this:
class Post < ActiveRecord::Base attr_accessor :id, :title, :body, :created_at, :author end
Shazam! It’s like I wrote it myself, but I didn’t and I didn’t have to map anything. Time for Fruity Pebbles.
When you get down to brass tacks, it’s only because of Ruby’s magical metaprogramming that a lot of the things I love about ActiveRecord are possible. Thankfully, the DLR should make a lot of this pseudo-magic readily available in the .NET Framework with the upcoming release of .NET 4.0. I don’t know about any other programming languages and how they intend (or if they intend) to make any kind of magical dynamic whizbangery available. The point is, dynamic whizbangery is neat. Dynamic whizbangery, when used properly, goes from being neat to being a productive tool that saves everyone a ton of time and effort.
I’ve never used something like db4o, so I can’t comment on them directly. But what I can tell you is this: There is no way to easily query data from an OODB. It requires complicated logic that is embedded in your data access libraries and, probably, is not available outside of your OO programming language. This effectively prevents the business users from hooking up Access or Excel to your OODB to run ad hoc reports. Instead you’ll have to write a complicated mapping process that pushes data on a regular basis from the OODB to an RDBMS for reporting purposes. You may end up doing this if you allow live ad hoc reporting against your production RDBMS. If you’re storing your data in a relational database, you can take advantage of SQL (which you’re probably already using) to move the data from the live production system to the reporting system.
OODBs do offer a seamless mapping between your code and your storage tier. There’s nothing to change because you’re storing native objects on disk. There is no O/R-M to hassle with. As I mentioned above, you’re going to need to create some kind of mechanism to export the data. db4o does contain functionality to do this. Or it claimed to the last time I looked into it. But, there are inherent difficulties in mapping OO code to relational storage mechanisms. Hell, that’s all we’ve been talking about for the last two thousand words.
The Philosopher’s Stone
I really wish that I had a great solution to this problem, but the fact is that you need to take the long view about your project. Are you willing to use an O/R-M and trade off your initial development speed for problems down the road? You’ll notice that I’m not naming any specific problems. Why? Because the problems you’re going to run into are, without a doubt, very different from the problems I’ve run into. They depend on the application you’re developing, the library you’re working with, and your experience with everything involved (the other developers, the codebase, the O/R-M, and your data model).
On the other hand, you could work with SQL and a thin abstraction layer and trade off initial development speed for a consistent long-term development pace. What does this require? Your developers will need to understand your OO language of choice, the data storage model, and SQL. That seems like a lot to ask but that’s how we do things at work. The developers who work with me on a daily basis understand the underlying data model. They understand .NET development. They know how to write SQL. Not just simple SQL, but they’re capable of writing fairly complex SQL. They didn’t show up on day one with these skills – we all had to learn. I know OO development well enough, I know SQL very well, and I learned the data model.
Commitment to doing your job well requires that you understand all facets of your work – the business, the data, and the applications that access it.
Note: Any time I mention a specific technology or library, I’m not trying to knock it, I’m just pointing out an issue I’ve encountered.
Martin Fowler’s Patterns of Enterprise Application Architecture
Phil Haack’s Why Store Your Data in a Relational Databse
Object-Relational Mapping Wikipedia FTW!