August 2010
Mon Tue Wed Thu Fri Sat Sun
« Jul   Sep »
 1
2345678
9101112131415
16171819202122
23242526272829
3031  

Month August 2010

They See Things Differently

Every once in a while, I’ll be talking with a friend about presentations and they mention that they really like the visual flow or the artwork or something about the presentation. After I get over being flattered (trust me, it’s easy to do), we’ll start talking about how I came up with the material in the presentation. I’d love to say that the photographs are mine, but they aren’t. There are a few techniques I use to help me make presentations that work well.

Tell Me a Story

When I’m giving a presentation, I try to be acutely aware about how the material flows. I don’t want to take the audience on a wild ride across a bunch of topics. It’s difficult enough to sit still for 60+ minutes and listen to a single topic. But sitting still for over 60 minutes and listening to a variety of topics? That’s nearly impossible.

Instead of trying to cover a bunch of topics, I cover one topic. I break the topic down and I find a logical beginning, middle, and end. Sometimes this part is easy, sometimes it isn’t. The point of the exercise, though, it to find a good way to teach a topic. Kevin Kline (blog | twitter) has a great presentation about the SQL Server internals where he visualizes the presentation from the perspective of a query traveling through SQL Server.

Telling a story makes the presentation more than a series of facts. Telling a story takes a series of facts and gives them a personal connection. Rather than list a bunch of facts, tell your audience how you got somewhere. In a presentation I give on dynamic SQL, I share with the audience how I learned to write good dynamic SQL by showing them examples of bad dynamic SQL. It’s a technique that works well because we’re sharing our embarrassment at the bad code we’ve written and then we learn how to get better. Buck Woody’s (blog | twitter) presentations are so popular because he peppers them with anecdotes. (It probably helps that Buck knows what he’s doing, but let me stick to only one point, okay?) The anecdotes do the same thing – they break make the material relatable.

Let’s Play Word Association

I do a lot of word association when I’m working on presentations. This isn’t some kind of goofy improv theater troupe exercise; it’s how I find great images for my presentations. I’ve been known to spend a lot of time agonizing over a single image to get the message just right.

Symbols are the instruments which convert raw intelligence into culture. Without them, explained Lewis Mumfor, ‘man’s life would be one of immediate appetites, immediate sensations, limited to a past shorter than his own lifetime, at the mercy of a future he could never anticipate, never prepare for. In such a world, out of hearing would be out of reach and out of sight would be out of mind.’

Alan Fletcher – The Art of Looking Sideways

Finding the right symbol to trigger a memory is difficult. There are many different ways to convey an idea, but only one will bring the idea to life. How do you find the image that brings an idea to life? Searching.

In my presentation on SQL Server internals, I use this image to help describe row and index operations. The linear form of the building kind of looks like a table with rows and columns. It doesn’t look exactly like a table might look (you could call that Excel), but it is an image that we’re all familiar with.

Finding Inspiration

The obvious question is “How do you come up with this stuff?”

The unfortunate answer is “I don’t know.”

I spend a decent amount of time looking at art – be it photos, drawings, paintings, whatever. I really enjoy visual communication in all of its forms, so I try to partake every chance I get. I think that has influence the way I find images to use in presentations.

When I’m looking for images, I go through a process of searching through flickr. I don’t necessarily search for funny pictures or pictures with any specific word association. Sometimes I just watch the flickr panda and hope that something interesting and creative commons licensed pops up. This can be a good way to find interesting pictures that set a mood or a tone. Or maybe the image will just continue on whatever theme I’m using through the slide deck. It’s not always important that I pick the right image for any single slide, but that the image fits the presentation as a whole.

Sometimes I’ll even watch the panda when I’m not creating a presentation. I’ll have it up in the background and I’ll save off interesting pictures that I’ve found. The key when I’m doing that is to make sure I have some way to capture the photo’s metadata so I can give credit later. Sometimes it’s easiest to save a text file with the same name as the photo so you know exactly what metadata belongs to each photo. It doesn’t matter how you do it, just make sure you can give credit where credit is due.

Another way I find inspiration is to read a lot of magazines. I don’t necessarily subscribe to them, but I look through them. Advertising is something that we normally bypass when we’re searching for an article we want to read. Advertisers, on the other hand, are trying to get your attention and convey a message with a single image and as few words as possible. Take a look at how advertisers are trying to get your attention. There are a lot of techniques that you can pick up from successful ads without even knowing what you’re doing – composition, layout, the amount of text to use.

There’s inspiration everywhere. Find it and use it.

Turtles All The Way Down

This has nothing to do with turtles. Just in case you didn’t figure it out, I wanted to make that clear. However, I am really excited to announce that I’m joining Quest Software as a Database Expert. My official job title is still up in the air, but let’s talk a little bit about what I’m going to be doing.

I’m going to stay involved in the community. That’s a big one right there. I’m still going to be on the Board of Directors for PASS. I’m still going to run my local user group (until we have elections and I’m voted off the island). I’m still going to speak at SQL Saturdays and the like. I’m still going to blog about all the crazy messed up things that I do with data. (Have I mentioned that I really like data?)

I’m going to keep solving problems. One of the things that I enjoyed the most about the last two years at Cass Information Systems is that I wasn’t just a DBA. I worked with a great team of developers to solve a variety of problems. Some days I sat in on meetings with them to just give advice based on prior experience. Some days I would be tuning T-SQL. Some days I would be working to help design the optimal solution to a problem using a combination of C# and T-SQL. Solving problems is something that I love. I never want to stop doing it. In fact, I made things change because of the problems that I would be able to solve; these are problems that have been running through my head for a long time.

I love code. I really mean it – I love writing code. There’s nothing like opening up an editor and whipping up some code to solve a problem. I wanted to download a bunch of PDFs, I figured out how. This new position means that I’ll have the opportunity to combine the things that I love – community, problem solving, code, and data – to do some really cool things.

I should stop bolding the first sentence of every paragraph, shouldn’t I?

Let’s make this a bit freaky for you: I’m not going to be working exclusively with SQL Server. In fact, I’m not even going to be working exclusively with RDBMSes.

Did you guess it yet? That’s right: I’m going to be working with cloud databases and NoSQL.

Since you’re reading this right now, you’ve probably noticed that I’ve been writing a lot about MongoDB, other NoSQL databases, as well as PostgreSQL lately. I’m exploring the world around me and writing about what I’m finding out. This is an amazing time to be looking at different ways to store data. I’m incredibly excited by all of this new technology. People notice that.

You too can have fabulous prizes. A lot of people have said it before. Hell, I’m going to be saying it this Saturday in Nashville: show your passion. People notice it. Even if that passion leads you away from the fold, follow it. When you write and speak with passion, people will take notice. When you share what you’ve learned with passion, people will take notice.

I didn’t respond to an ad on craigslist. I’d like to think that this opportunity came about because of the time I spend with you, the community, the time I spend blogging, and the time I spend presenting. These are all things that I love doing and now I’m going to be paid to do them. I’m not sure it gets any better than this….

Speaking at SQL Saturday Nashville

- or -

It’s Going to be a Busy Week

This week is going to be an incredibly busy one. On Saturday I’m very honored to be presenting twice at SQL Saturday Nashville.

Fundamentals of SQL Server Internals: Want to know what makes SQL Server tick? Ever wonder what SQL Server is doing when you run a query? Ever wonder which parts of SQL Server are responsible for specific functionality? Want to know what a HOBT is? I can’t promise answers to every question, but I can set you on the path to knowledge about the inner workings of SQL Server.

Taking Control of Your Career: Raises, promotions, and job offers don’t happen by accident; you need a plan. Through careful planning you can create and reach impressive goals. But what’s the point of reaching your goals if nobody notices? “If you build it, they will come” doesn’t apply when you’re building your career.

“But Jeremiah,” you say, “you are clearly a slacker. Getting ready for two presentations can’t be that difficult for someone with as much ego as you!” And you would be correct; getting ready for two presentations isn’t difficult (mainly because I’m amazing). This week I am also travelling to beautiful Nashville for the PASS Board of Directors meeting.

The BoD meetings are always a good time, but they can be a little tiring (as do most meetings). This meeting promises to be a good one. As best as I recall, we’re going to be talking about PASS at a much higher, strategic, level. There will be less “down in the details” discussion and more “head in the clouds” type of talk. It’s good to put aside worrying about nitty gritty details and focus on the future of an organization. Too often we don’t do that.

Finding SQL Agent Jobs Owned By The Wrong Person

Ever have someone leave the company only to find out that they own critical database processes… because that process failed? It hasn’t happened to me, yet, but a recent Active Directory outage this weekend got me thinking about it.

PostgreSQL Tutorial – Inserting Data

In the last two articles, we talked about creating your first PostgreSQL database, the syntax of createdb and psql, and creating tables in the database. Next up, we’re going to look at filling those tables with data.

createdb -U jeremiah pagila
psql -U jeremiah -f pagila-schema.sql pagila
psql -U jeremiah -f pagila-insert-data.sql pagila

Loading Data From A File

We’re going to be looking at the pagila-insert-data.sql file. There is a separate file, pagila-data.sql, but we will not be using it. Both files accomplish the same purpose, but pagila-insert-data.sql uses standard SQL syntax whereas pagila-data.sql uses the COPY command do the bulk of the work. Since our goal is to familiarize ourselves with basic SQL, we’re going to use pagila-insert-data.sql.

Preparation is Key

The first thing you will notice is that there are close to 100 lines of options and comments (comments are start with -- and end at the end of the current line) before we get to the actual meat of the file. This is because we need to set up common settings before we start working. Let’s take a look at these options line by line.

SET client_encoding = 'UTF8';

Strings will be treated as Unicode strings. This makes sure that characters from non-Latin alphabets can easily be inserted into the database.

SET standard_conforming_strings = off;

The standard_conforming_strings setting ensures some backwards compatibility with PostgreSQL behavior – we can use ‘' as an escape character in strings. When this is set to off, PostgreSQL will use the ‘' as an escape character. The goal is to, some day, default this setting to on.

SET check_function_bodies = false;

We are, in effect, recovering from a database dump. By setting check_function_bodies to false we tell PostgreSQL that we want to put off a lot of checks until later because things might not be completely set up at the moment and we’ll accept the risks of doing this, thanks. I would never recommend using this setting in a production environment unless you’re recovering from a data dump. If that’s the case, knock yourself out.

SET client_min_messages = warning;

This controls the amount of messages that we are going to receive from PostgreSQL. Leaving this alone is a good thing but it’s important to know that you can mute all messages from the server if you so desire.

SET escape_string_warning = off;

The escape_string_warning setting actually goes along with the standard_conforming_strings setting and has to do with how PostgreSQL will behave when it encounters a ‘' character rampaging about in our code.

SET search_path = public, pg_catalog;

There’s a history in the PostgreSQL world of omitting the schema qualification from table names in queries. So, what we’re doing here, is telling PostgreSQL that we would like to look for table first in the public database schema and then, second, in the pg_catalog schema. Don’t worry if you don’t know what schemas are all about, I’ll explain them later. Right now you can think of schemas as if they were folders for holding tables, functions, and the like. The search_path variable works just like the PATH variable on your operating system – it’s where PostgreSQL is going to look for functionality.

The Anatomy of an Insert

I wish that I could tell you that there is something complex going on with the pagila inserts. But, unfortunately, they’re about as boring as an insert could possibly be:

INSERT INTO  -- PostgreSQL needs to listen up, we have
             -- data coming in.
actor        -- We're adding data to the actor table,
             -- specifically into the following columns
(actor_id, first_name, last_name, last_update)
VALUES       -- and here is a list of the values that
             -- we are going to insert.
(1, 'PENELOPE', 'GUINESS', '2006-02-15 09:34:33');

That’s all there is to a basic insert statement – we tell PostgreSQL that we want to add a new row to the database in a specific table and then supply a list of columns and values that we would like to add. Similar code happens about several hundred more times in that file.

This is made slightly more interesting by ALTER TABLE actor DISABLE TRIGGER ALL;. This code ensures that any and all triggers on the table actor will not fire while we’re inserting our rows. Why would we want to bypass all of the triggers? Well, we don’t want any extra validation to run, nor do we want a trigger to run that will populate the primary key column of our tables with the value from a sequence – we already have those values and our insert scripts would be much more complicated if they were filled with lookup queries too.

Adding a Brand New Record

Let’s go ahead and add a new movie. Pagila is a movie rental store and we’re not going to be able to keep up with the competition without adding a new movie. Keeping with Pagila’s standard of using nothing but gibberish movie tiles, we’re going to be adding the movie “Poseidon Sundance.”

Of course, creating the movie is not as easy as simply adding the title and walking away. People need to be able to make sure that the movie is acceptable for their tastes, has their favorite actor, or maybe just find out if the film is kid friendly.

INSERT INTO film
-- Here we specify the table where we're going to add the data,
--  and then list the columns where we're going to actually
--  add the data. You don't need to list out all of the
--  columns.
-- P.S. Lines that start with two lines like this are comments.
--  The PostgreSQL command parser will ignore them.
(
  title,
  description,
  release_year,
  language_id,
  original_language_id,
  rental_duration,
  rental_rate,
  length,
  replacement_cost,
  rating
)
VALUES
(
  'POSEIDON SUNDANCE',
  'A thrilling story of a dog and a moose who have to journey across Finland to be reunited with their owners',
  '2006',
  1, -- language_id, this is English
  6, -- original_language_id, this is German
  3, -- rental_duration
  4.99, -- rental_rate
  114, -- length in minutes
  24.99,
  'G'::mpaa_rating
) ;

SELECT * FROM film WHERE title = 'POSEIDON SUNDANCE' ;

-- Ee need to put the film into categories so people can find it.
INSERT INTO film_category
(
  film_id,
  category_id
)
SELECT  film_id ,
        9 -- It's a  foreign film
FROM    film
WHERE   title = 'POSEIDON SUNDANCE' ;
/* This is also a comment. It's a multi-line comment, just like in
   C++, C#, or Java.

   We're combining an INSERT with a SELECT because we need to find out
   the film_id of the movie we just added. There are occasions where
   we could take a different route, but we'll cover those later.
 */

INSERT INTO film_category
(
  film_id,
  category_id
)
SELECT  film_id ,
        8 -- films about animals are almost always family movies
FROM    film
WHERE   title = 'POSEIDON SUNDANCE' ;

-- Add the main actors
INSERT INTO film_actor
(
  actor_id,
  film_id
)
SELECT  71 ,
        film_id
FROM    film
WHERE   title = 'POSEIDON SUNDANCE' ;

INSERT INTO film_actor
(
  actor_id,
  film_id
)
SELECT  123 ,
        film_id
FROM    film
WHERE   title = 'POSEIDON SUNDANCE' ;

INSERT INTO film_actor
(
  actor_id,
  film_id
)
SELECT  97 ,
        film_id
FROM    film
WHERE   title = 'POSEIDON SUNDANCE' ;

INSERT INTO film_actor
(
  actor_id,
  film_id
)
SELECT  49 ,
        film_id
FROM    film
WHERE   title = 'POSEIDON SUNDANCE' ;

INSERT INTO film_actor
(
  actor_id,
  film_id
)
SELECT  3 ,
        film_id
FROM    film
WHERE   title = 'POSEIDON SUNDANCE' ;

Once we’ve created our movie, let’s try to find it in the database. We can do this by writing a simple query. Don’t worry if you don’t understand everything that’s going on, everything is going to be explained in good time.

SELECT  *
FROM    film f
        JOIN film_actor fa ON f.film_id = fa.film_id
WHERE   f.title = 'POSEIDON SUNDANCE' ;

If you’ve done everything right, this should return five rows – one for each actor in the movie.

SELECT  *
FROM    film f
        JOIN film_category fc ON f.film_id = fc.film_id
WHERE   f.title = 'POSEIDON SUNDANCE' ;

And this should only return two rows – one for every category the movie is in.

A Flaw in the Plan

When I was originally writing this tutorial, I didn’t pay any attention to some of the views in the database that use the film table (views are magical virtual tables that we’ll talk about later). I was getting ready to test things and I discovered a big flaw in and I thought I would highlight it here.

SELECT  *
FROM    film_list fl
WHERE   fl.title = 'POSEIDON SUNDANCE' ;

This will return two rows for you. Why?

Well, it turns out whoever wrote the film_list view assumed that each movie will only belong to one category even though it can feasibly belong to as many as we want. There are a few places that we could fix this error. Before reading on see if you can think of how we could fix this.

Fixing the Flaw

There are multiple places we could fix this. The first place would be in the definition of the view. We could change the view to display all of the categories in a comma separated list, just like we’re doing for all of the actors in the film.

The next place that we can fix this is in the data itself. Apart from deleting the extra row, we can do one of two things. We can create a uniqueness requirement (called a unique constraint) on the film_category table so that only one film can be in the table at a time. This is a limiting feature of the application. My preference would be to create a unique constraint on both the film_id and category_id columns. That ensures that we will never have a duplicated row for a film and category combination, but we can still add multiple categories. Combining this with changes to the view fixes this flaw in our database design.

We’ll cover this issue later.

Summary

So, there you have it – inserts are very simple statements. We can insert entirely new data that comes from somewhere outside of the database, we can insert data that’s a mash up of other information in the database, or we can do something in between.

What’s important to take away from this is that it’s very easy to insert information into PostgreSQL. We tell PostgreSQL that we’re going to insert data into a table, list the columns that we’re going to fill with data, and then define the data that we’re going to insert.

MongoDB – Now With Sharding

Version 1.6 of MongoDB was released last week while I was at devLink. What makes this an important point release? Two things: sharding and replica sets.

Sharding is a way of partitioning data. Data will be sharded by a key – this could be based on zip code, customer number, SKU, or any other aspect of the data. Sharding data in MongoDB occurs on a collection by collection basis. If you want related data to be on the same server you’ll need to pick a sharding key that is present in every collection. However, this design decision make sense because some collections may grow faster than others. The point of sharding is, in part, to spread load across multiple servers.

What makes this important? Previously, sharding has been difficult to set up and administer and required custom application code to be written and maintained. More important, though, sharding gives MongoDB the ability to scale across multiple servers with minimal effort.

What would happen if a single node in a set of sharded servers got very busy? Well, MongoDB would detect that one of the nodes is growing faster than the others and it would start balancing the load across the other servers. This might seem like it would violate my earlier statement about how we set MongoDB up to using a sharding key that we define. Here’s the catch: MongoDB only uses that sharding key when we set things up and when there are no problems. If things start getting busy, it will make changes to the sharding key. Those changes get reported throughout the entire cluster of servers and everyone knows where their data is, although nobody outside of the cluster really needs to care.

Replica Sets

Replica sets are a new and improved way to perform data replication in MongoDB. Basically, we set up replication in a cluster of servers. If any single server fails, another server in the replica set will pick up the load. Once we’re able to get the dead server back up and running, the replica set will automatically start up a recovery process and our users will never know that there was an outage.

There can be only one master server at any given time, so this protects us from master server failures. Through the magic of network heartbeats, we can be aware of all of the servers in the replica set. Interestingly, the master server is determined by a priority setting that is assigned to each server. This way, we could use older hardware to serve as a backup (or read-only server) to the master and use faster machines in the replica set to take over from the master in the event of any kind of hardware failure.

How It Works

MongoDB Sharding Diagram

MongoDB Sharding Diagram

Basically, here’s what happens (if you want more details, please see the Sharding Introduction):

  1. The mongos server is a router that makes our complicated MongoDB set up look like a single server to the application.
  2. The mongod config servers maintain the shards. They know where data is stored and will attempt to balance the shards if any single node gets out of whack.
  3. Replica sets provide localized redundancy for each shard key.

Gotchas

There are a few things to be aware when you’re considering sharding with MongoDB:

  1. If a configuration server goes down, you can no longer reallocate data if any shards become write hot spots. This meta-data must be writeable for data to be repartitioned. You can still read and write data, but load will not be distributed.
  2. Choose sharding keys wisely. An overly broad sharding key will do you no good: all data can end up on one node and you will be unable to split the data onto multiple nodes.
  3. Some queries will use multiple shards – make sure you understand data distribution, querying patterns, and potential sharding keys.

Photo Credits

glass litter by psyberartist – Creative Commons Licensed
I thought I saw a puddy cat… by Keven Law – Creative Commons Licensed

Kendra Little at CBusPASS

Hey, good news! Kendra Little (blog | twitter) will be presenting for us, remotely, on August 12th at 6:30 PM. The meeting is at the usual place, Battelle for Kids – 1160 Dublin Rd Suite 500, Columbus, OH 43215.

What is Kendra going to be talking about?

Stay Agile, Stay Sane

Agile software development emphasizes continuous depolyment and its methods do not directly include long term planning. DBAs must ensure data integrity and have a long term view for application scale, so Agile methods present challenges. Come learn about successful real-world practices iteratively developed in a high transaction internet service environment over the last five years. We have create a flourishing Agile shop while meeting high requirements for uptime, customer response, and data consistency. In this session we’ll cover key habits for success, practices to avoid, how and when to get started, and why Agile development can be a great thing for DBAs. Topics will also include how “Testing in Production” can be a huge benefit.

About Kendra

Kendra Little is a Senior DBA in the online advertising industry who has spent ten years nerding out on SQL Server. Kendra works closely with an Agile development team to deploy frequent incremental changes to scale and improve a busy production environment. She likes tuning production servers, developing tools to automate tasks, building SQL Reports for trending, and secretly enjoys writing troubleshooting guides and documentation. Kendra has a Masters degree in Philosophy and a salt shaker full of certifications, but all the best stuff she’s ever learned has come from her smarty-pants colleagues and the SQL Server community. Read her blog at http://littlekendra.com.

Object-Oriented Programming Concepts For DBAs

This all came about when Aaron Nelson (blog | twitter) asked me a few questions about object-oriented concepts. I did the best that I could to answer his questions, but I decided that this needed a real answer that will, I’m hoping, help to bridge the language gap between DBAs and developers.

Note to the reader: Please keep in mind that this is based on the .NET framework and specifically my experience with C#. If you want to highlight differences for your own language of choice in the comments, feel free.

Class

A class is a definition that we’ll be using later when we create objects. It’s a generic blueprint for building something else. It describes all of the properties that an object will have once we create one. In addition, classes also describe all of the behaviors (called methods) that an object will have.

DBAs, you can think of a class as similar to DDL. Just as DDL defines a the rows of a table, a class defines the objects that we’ll create later.

Object

“An object can be said to be an instance of a class.” WTF does that mean?

Well, an object is an instance of a class. That means that any time we create an object, we base if off of some kind of template. The object, whatever it may be, is just an implementation based off of some kind of blueprint. In the .NET world, an object can only be defined by one class.

Once we create an object (using something called a constructor), it’s different from the class. Like a zombie, the object takes on a life of its own. Let’s take a look at an example.

Cars have common properties – they all have four wheels, an engine, and a number of doors that is greater than 0. Individual cars have a lot of differences: different paint and upholstery colors, different trim levels, stereos, engines, transmissions, etc. When you go down to the car dealership (the constructor), you don’t just ask for a generic car. You have to be more specific and tell the dealer that you want a bright red 1978 Lincoln Continental with white wall tires. The dealer will get you the car (the constructor instantiates the object) and you can pimp in style.

Depending on how things are set up, we can change the properties of our object after we’ve created it. Properties can be modified with new values, or we can call methods on the object to make complicated changes to the object’s state.

In a round about way, an object is similar to a row. It’s a single instance of data. This is a bit of a crappy metaphor because a single object can actually contain deep structure that would be difficult to represent in a single row in the database.

Fields

A field is a basic variable – a string or number. It’s the simplest way to store data in an object. You can think of it as roughly akin to a column. You may also hear people refer to fields as member variables, instance variables, or any of a number of terms.

Properties

In the .NET world properties are wrappers around fields. They aren’t quite methods, but they aren’t quite fields. One of the interesting things about properties is that they let us enforce rules or build complex ways to describe an object from a set of simple data points.

public class Person {
  // other nonsense goes here...

  // some fields
  private string _firstName;
  private string _lastName;

  public string FirstName {
    get { return _firstName; }
    set { _firstName = value; }
  }

  public string LastName {
    get { return _lastName; }
    set { _lastName = value; }
  }

  public readonly string FullName {
    get { return _firstName + " " + _lastName; }
  }
}

In the real world, we’d do things far more complex than concatenating first name and last name to make a full name. We’d probably use a comma and put things in the order of last name first. Or maybe we’d calculate order price based on county sales tax and order line items and shipping fees.

An astute reader might notice that properties bear an uncanny similarity to constraints in a database. Let’s look at a different example. A class for a clock might look like this:


public class Clock {
  private int _hours;
  public int Hours {
    get { return _hours; }
    set {
      if (value >= 0 && value <= 23) {
        _hours = value;
      } else {
        throw NotOnPlanetEarthException();
      }
    }
  }
}

And a similar table would look like this:


CREATE TABLE clock (
  [hours] INT CHECK (h BETWEEN 0 AND 23)
);

It’s possible to create deeply nested structures of fields and properties. You can, for example, model a store thusly:


public class Store {
  private List _employees;
  private List _products;

  // I know this isn't how I should really model it. Shut up.
}

And this is all well and good in code. And there is a way to model this in the database that should seem relatively obvious using join tables (employees can work in more than one store and products can be sold in more than one store).


CREATE TABLE stores (
  StoreId INT IDENTITY(1,1)
  -- real implementation left out because I don't want to write it
);

CREATE TABLE Employees (
  EmployeeId INT IDENTITY(1,1)
);

CREATE TABLE Products (
  ProductId INT IDENTITY(1,1)
);

CREATE TABLE StoreEmployees (
  StoreId INT,
  EmployeeId INT
);

CREATE TABLE StoreProducts (
  StoreId INT,
  ProductId INT
);

The problem is that these two models don’t work well together. We call this object-relational impedance mismatch. I wrote about a lot of the problems in an older blog post, O/R-Ms: Panacea or Polio Braces?, so I won’t get into it here. Needless to say, there are some major problems that can arise.

Methods

Methods are the workhorses of object-oriented programming. These are the verbs of programming. A method is a bunch of statements that are executed in order and achieve some result. Methods are roughly analogous to stored procedures or functions (depending on your database).

Inheritance

The programming world would be incredibly painful if we had to keep repeating code every time we want to do the same thing. Inheritance helps us solve this problem, in some cases.

All cars have similar properties – they have engines, doors, wheels, seats, and various other car things. Some cars have different traits – a 1967 Plymouth Fury Station Wagon drives a lot differently than an 2010 Aston Martin DB9.

public class Car {
  // details go here
}

public class StationWagon : Car {
  // more details
}

public class SportsCar : Car {
  // very fast details
}

Inheritance is a way to work with the same root data and behaviors but provide additional specialization. Our SportsCar class might provide a SpinTires or BurnOut method whereas the StationWagon is not capable of performing a burn out.

Another way to think about it is to say that inheritance (and also interfaces) is a way of describing “is a” relationships. A SportsCar is a Car. Anywhere that we can use a Car, we can also use anything else that is a Car. So, we can use a StationWagon in any method that expects a car. But, since a StationWagon isn’t a SportsCar, it won’t work there. See how that works?

If this seems like a huge set of abstractions, you’re absolutely right. A lot of the concepts of object-oriented programming can be implemented using purely procedural code. It’s considerably easier to express some concepts using object-oriented code. There is some performance overhead, but many developers consider that the performance penalty is made up by the ease of development.

Interfaces

The duckroll fully implements IWheeledVehicle

Interfaces are where things get a little bit tricky for some people. An interface is a contract. When we say that a class implements an interface, we’re really saying that the class has all of the methods defined in the interface. An interface is an abstract type that can’t be directly created. Interestingly enough, though, we can create classes that implement an interface and then use the class anywhere that the interface is accepted.

If we create an IWheeledVehicle interface (interfaces in the .NET world are typically named with an I), we can define methods that accept an IWheeledVehicle parameter. We can use any object that provides an implementation of IWheeledVehicle with that method. Even if that object is a duckroll.

In Summation

Object-oriented programming is a strange concept when you first encounter it, but it provides useful abstractions that make it easier for developers to work with data. Understanding how developers might be consuming data is going to make it easier for you to work with them and create amazing applications.

If you’re going to be doing a lot of work with developers for any length of time, it would potentially be a good idea to take a look at a book or two on the language they’re using. You don’t need to become proficient, but you should understand the concepts they’re working with.

Special thanks to Matt Nowack (blog | twitter) and Mike Peschka (blog | twitter) for reviewing this and providing feedback. Also, thanks to Aaron Nelson (blog | twitter) for the original questions that sparked this blog post.

PostgreSQL Tutorial – Referring to Other Tables

Referring to Other Tables

In the previous article you learned how to create a single table. One table does not make for a very interesting database. Since we looked at how the actor table was created, let’s take a look at the most logical follow up: the film table.

The film table

Without going into all of the details of the film table, we do know that the table looks something like this:

CREATE TABLE public.film  (
    film_id                 int4 NOT NULL DEFAULT nextval('film_film_id_seq'::regclass),
    title                   varchar(255) NOT NULL,
    description             text NULL,
    release_year            year NULL,
    language_id             int2 NOT NULL,
    original_language_id    int2 NULL,
    rental_duration         int2 NOT NULL DEFAULT 3,
    rental_rate             numeric(4,2) NOT NULL DEFAULT 4.99,
    length                  int2 NULL,
    replacement_cost        numeric(5,2) NOT NULL DEFAULT 19.99,
    rating                  mpaa_rating NULL DEFAULT 'G'::mpaa_rating,
    last_update             timestamp NOT NULL DEFAULT now(),
    special_features        text NULL,
    fulltext                tsvector NOT NULL,
    PRIMARY KEY(film_id)
)

There are a few features that will be covered later – like enumerations and full text searching – but this is a pretty standard table. The one thing that’s missing, though, is a way to reference the actors in each film. We could do this by adding a number of columns (actor1_id, actor2_id, etc.) but that would become cumbersome and is not an effective way to model data.

The alternative is to create what is commonly called a cross reference table.

Cross Reference Tables

A cross reference table is a simple way to create a cross reference between two objects that have a many-to-many relationship. This is typically done using a two column table – one column for the primary key of each table:

CREATE TABLE public.film_actor  (
    actor_id    int4 NOT NULL,
    film_id     int4 NOT NULL,
    last_update timestamp NOT NULL DEFAULT now(),
    PRIMARY KEY(actor_id,film_id)
)

Here’s the problem: as it stands, we could insert any values into this table. The whole point of the film_actor table is to map valid actors with valid films. Without any restrictions in place, we can’t make sure that any actor or film id in our table isn’t changed to an invalid value.

Enter Referential Integrity

Referential integrity is a way of saying that every value of a column in one table must exist as a value in a column in another table. In other words: every value in the referenced column or columns must exist in the referenced table. PostgreSQL makes it easy for us to enforce referential integrity through the use of foreign keys.

In order to make sure that we only have valid actors in the film_actor table, we’ll create a foreign key:

ALTER TABLE public.film_actor
ADD CONSTRAINT film_actor_actor_id_fkey
FOREIGN KEY (actor_id)
REFERENCES actor(actor_id)
ON UPDATE CASCADE
ON DELETE RESTRICT

That’s a very shorthand way of saying

The table "public.film_actor"
has a foreign key constraint named "film_actor_actor_id"
on the column "actor_id"
which references the column "actor_id" in the table "actor"
  so that every value of "actor_id" in "public.film_actor"
  must also be present in the "actor_id" column
  of the table "public.actor"
if the value of "actor_id" is changed in the "public.actor" table
  we should update all values in "public.fim_actor" that are the same
and if the value of "actor_id" is deleted from the "public.actor" table
  we should prevent that delete from happening if there are rows
  in the "public.film_actor" table with the same value

We’ll create a similar foreign key to enforce referential integrity to the film table:

ALTER TABLE public.film_actor
ADD CONSTRAINT film_actor_film_id_fkey
FOREIGN KEY (film_id)
REFERENCES film(film_id)
ON UPDATE CASCADE
ON DELETE RESTRICT

This foreign key is almost identical to the first one. The only difference between the two are the names of the table being referenced. If you’ve taken a look at the other tables in the database, or if you’ve looked back at the previous article about creating the schema, you may have noticed something else: all of our foreign keys reference the primary key of another table.

Foreign keys must refer to a unique value, this prevents us from creating a foreign key that points to any column. However, since a primary key requires every value in that column be unique, it uniquely identifies the row, we can safely create a foreign key that points to the primary key column.

What if we want to create a foreign key that points to a different column? We can do that too. PostgreSQL has a feature called a unique constraint. This is an internal database feature that makes it possible to ensure that all values in a column, or set of columns, are unique. A unique constraint is actually implemented as a unique index in PostgreSQL, just as it is in many databases.

CREATE UNIQUE INDEX idx_unq_manager_staff_id
ON public.store(manager_staff_id)

This index wouldn’t be a good candidate for a foreign key, but it does illustrate how you can create a unique index that you can reference with a foreign key.

References

Table Design Patterns: Cross-Reference Validation

Would the Fastest Cloud Please Step Forward?

Not so fast, there, buddy.

CloudSleuth has released a new tool that allows users and customers to see how the different cloud providers stack up. As of 6:45 PM EST on August 1, Microsoft’s Azure services come out ahead by about 0.08 seconds.

0.08 seconds per what?

Turns out that it was the time to request and receive the same files from each service provider. Is this a meaningful metric for gauging the speed of cloud service providers? Arguably, yes. There was considerable variance across the different cloud providers, but that is also to be expected since there are different internet usage patterns in different parts of the world.

More importantly, using speed as a metric points out something much more important: As we develop more and more applications in the cloud, we will need new benchmarks to determine the effectiveness of an application. We can no longer just look at raw query performance and raw application response time. We need to consider things like global availability and global response times. In the past, we could get away with storing data on a remote storage server and caching commonly used files in memory on our web servers or in a separate reverse proxy server. Moving to the cloud makes it more difficult to accomplish some of this advanced performance tuning. Arguably, our cloud service providers should be handling these things for us, but that may not be the case.

With the proliferation of cheaper and cheaper cloud based hosting (did you know you can host basic apps for free with heroku?) the problems that used to plague only large customers can now bother smaller customers. As soon as you begin distributing data and applications across data centers, you can run into all kinds of problems. If you are sharding data based on geographic location, you may have problems when a data center goes down. Worse than that, because you’re in the cloud you may not notice it.

This isn’t meant to be fear mongering, I’ve come to the conclusion that the cloud is a great thing and it opens up a lot of exciting options that weren’t available before. But moving applications into the cloud caries some risks and requires new ways of looking at application performance. Latency is no longer as simple as measuring the time to render a page or the time to return a database query. We need to be aware of the differences between different cloud providers’ offerings and why performance on one provider may not directly equate to performance with a different provider.

This site is protected with Urban Giraffe's plugin 'HTML Purified' and Edward Z. Yang's Powered by HTML Purifier. 230 items have been purified.