MongoDB – Basic Querying

I put together a little video tutorial showing you how to accomplish some basic querying with MongoDB.

Get the Flash Player to see the wordTube Media Player.

Download the sample code. A QuickTime/H.264/iPhone formatted video is also available – download it now.

PostgreSQL Tutorial – Creating the Pagila Schema

In the previous article, we talked about creating your first PostgreSQL database and I went over the syntax of createdb and psql in a bit of detail.

Next up, we’re going to look at creating the structure of the pagila database.

createdb -U jeremiah pagila
psql -U jeremiah -f pagila-schema.sql pagila
psql -U jeremiah -f pagila-insert-data.sql pagila

Working With SQL

Even though SQL is the name of the language that we use to work with data, there are two subsets of SQL that you will hear people refer to: DML and DDL.

DDL

DDL is the common abbreviation for Data Definition Language. This is the name for the part of SQL that is used for creating and modifying database structures – tables, indexes, views, etc. The pagila-schema.sql file is chock full of DDL and is a great introduction to how to create objects in PostgreSQL.

DML

DML stands for Data Manipulation Language. This is the other subset of SQL. This is how we read, update, insert, and delete data. We’ll be talking about DML in the next article.

Tables

Tables are the heart and soul of a relational database – they’re where data is stored. Let’s take a look at creating the actors table:

CREATE TABLE public.actor  (
    actor_id    int4 NOT NULL DEFAULT nextval('actor_actor_id_seq'::regclass),
    first_name  varchar(45) NOT NULL,
    last_name   varchar(45) NOT NULL,
    last_update timestamp NOT NULL DEFAULT now(),
    PRIMARY KEY(actor_id)
)
GO
CREATE INDEX idx_actor_last_name
    ON public.actor(last_name)
GO
CREATE TRIGGER last_updated
     BEFORE UPDATE ON actor FOR EACH ROW
     EXECUTE PROCEDURE last_updated()
GO

CREATE FUNCTION last_updated() RETURNS trigger
    AS $$
BEGIN
    NEW.last_update = CURRENT_TIMESTAMP;
    RETURN NEW;
END $$
    LANGUAGE plpgsql;

What’s going on here? Well, we’re creating a table named actor, for starters (I’ll come back to the sequence in a minute). The actor table contains four columns: actor_id, first_name, last_name, and last_update. The first_name and last_name columns are the simplest, so I’m going to start with them.

Both the first_name and last_name columns store character data – strings of text. These columns store variable length character data that can hold up to 45 characters. Both of these columns have been created as NOT NULL. This means that the columns require a value, even if that value is an empty string. Ideally you will have additional logic in place to prevent empty strings from being stored in the database as meaningful values.

The last_update column holds a date and time. The first time a row is inserted into the actor table, the last_update column will default to the current date and time. There is a trigger on this table, too, that will update last_update with the time of the update. While last_update does not contain any time zone specific information, it is possible for PostgreSQL to store time zone information by using the timestamp with time zone data type.

Finally we come to the actor_id column. This column is used as the primary key – it uniquely identifies any row in the actor table and requires the the values in the row not be null. I’ve omitted the code to create the actor_actor_id_seq sequence, but I’ll summarize. A sequence is similar an IDENTITY column in SQL Server or an AUTOINCREMENT column in MySQL, it just requires manual intervention. Whenever we want to increment the value of a sequence, we need to use the nextval() function to get the next value from the sequence. In the actor table, our primary key is a 4-byte number that will automatically be incremented by 1 every time a new row is added.

Once we have the four main columns created, we create an index idx_actor_last_name. Indexes make it possible for PostgreSQL to rapidly find data in a large table. It’s possible to create an index on more than one column. In this case, though, we’ll be creating our index on just the one column. This makes it easier to find actors by their last name.

The last thing we do is create a trigger that runs after an update. Triggers let us run pieces of code after an event (such as an insert, update, or delete) happens on a table. The last_updated trigger runs a function named last_updated before any row is updated in the database. The last_updated function modifies the NEW database row (a virtual row used in inserts and updates on row-level triggers). This new row will be inserted into the table with a new value in the last_update column.

You’re probably wondering what’s with the ::regclass and the $$ when we created the function. Be patient, everything will be explained in time. Right now it’s more important to focus on the higher level concepts and slowly move into the details as time goes on.

Summary

Creating a basic table is a simple operation, once you know what you want to store in it. You name the table and then supply a list of columns with a data type for each one. This helps PostgreSQL figure out the optimum way to store data on disk.

Finding Cross-Database Dependencies

Ever want to know how many queries are referring to other databases on a server? How about a different server?

Worry no more! I have a query that will help you answer these pesky questions:

SELECT  DB_NAME() AS current_db_name,
        OBJECT_NAME(referencing_id) AS o_name ,
        UPPER(COALESCE(sed.referenced_server_name, '')) AS referenced_server_name,
        sed.referenced_database_name,
        sed.referenced_schema_name,
        sed.referenced_entity_name
FROM    sys.sql_expression_dependencies AS sed
WHERE   referenced_database_name <> DB_NAME()
        AND referenced_database_name <> 'msdb'
ORDER BY UPPER(referenced_server_name) ;

Want to run it across every database on your server? We got that, too.

DECLARE @command AS NVARCHAR(MAX);

SET @command = 'USE ?;

SELECT  DB_NAME() AS current_db_name,
        OBJECT_NAME(referencing_id) AS o_name ,
        UPPER(COALESCE(sed.referenced_server_name, '''')) AS referenced_server_name,
        sed.referenced_database_name,
        sed.referenced_schema_name,
        sed.referenced_entity_name
FROM    sys.sql_expression_dependencies AS sed
WHERE   referenced_database_name <> DB_NAME()
        AND referenced_database_name <> ''msdb''
ORDER BY UPPER(referenced_server_name) ;'

EXEC sys.sp_MSforeachdb @command1 = @command

Update: This only applies to SQL Server 2008. Nothing else. Thanks to an astute reader I have been corrected. Now we’re all smarter. Thanks, John. :)

PostgreSQL Tutorial – Creating Your First Database

Once you’ve installed PostgreSQL (OS X Instructions) (other platforms), you’ll probably want to get started with a new database. Unfortunately PostgreSQL does not include a sample database by default.

You can download the pagila sample database from the pgFoundry (direct download link). Pagila is a simple movie rental database, sort of like what you would get if you ran a video rental store. It is also a port of a sample MySQL database with changes made to take advantage of PostgreSQL’s features – you can read about the differences in the README file. Installing pagila is easy:

createdb -U jeremiah pagila
psql -U jeremiah -f pagila-schema.sql pagila
# There will be a great deal of text here about creating tables and other database
#  objects. You can read through the create file if you'd like, but we're going to be
#  taking a look around so you can hold off for a minute
psql -U jeremiah -f pagila-insert-data.sql pagila
# You should see a lot of output from psql again talking about inserting rows
#  and altering seqvals. Once again, we'll be looking at this data over

Before you blindly copy and paste the above and run it, let’s go over everything that we’re looking at.

createdb

createdb is a command line tool that is installed with your PostgreSQL installation. This is one way to create a new database using PostgreSQL. If you supply the -e command line option (createdb -U jeremiah -e pagila), you can see the commands that are actually being run against PostgreSQL. In this case it’s as simple as CREATE DATABASE pagila;. We need to tell PostgreSQL which user we’re using to connect because only certain users have the appropriate rights to create new databases. This isn’t something that just anyone can do, after all.

psql

psql is a command line tool that is used to connect to your PostgreSQL databases. psql has a number of options, but I’m only going to cover the options that we’re directly using to set things up for the pagila database.

-U

The -U option is used to specify the user name to use to connect to the database instead of the default user. Unless you’ve changed something during installation, the default user is the postgres admin account.

-f [filename]

psql can operate in two modes: an interactive mode and a file input mode. Passing the -f option to psql will use the commands in the file filename. You could use the input redirection operator (<), but using -f has the advantage of processing the file as a single batch and subsequently providing helpful line numbers in the event of an error.

The database

The last parameter we’re passing is the name of the database. If you want to connect to a database on a different host, you would use the -h or --hostname command line flag to change the database host.

createdb syntax
CREATE DATABASE
psql

Some Minutiae About PostgreSQL Databases

PostgreSQL, like many other databases, has a vast set of features. Two of these features, collations and tablespaces, aren’t immediately important for understanding what we’re doing here, but it’s a good thing to know about them for when you get into more advanced scenarios.

Collations and Locales

When you create a new database you can also specify a collation for the database. Collations are used to determine how text is stored and sorted within PostgreSQL. You can set the collation when PostgreSQL is first configured, during database creation, or when connecting to the database. There are a few gotchas that aren’t important here, but you can read about them in the PostgreSQL documentation on Local Support.

Tablespaces

You can also specify the default tablespace when you create a new database. A tablespace is the place where database objects are stored on disk. If you’re familiar with SQL Server, you will know this as a filegroup. This is a more advanced feature and can be used to achieve incredible performance benefits by spreading physical database objects across multiple physical disks. Like collation, we’re not going to worry about tablespaces for now, but it is important to know that the feature is available.

Interviewing (A Gently Stolen Idea)

I totally ripped this idea off. I think that’s okay, I’m giving credit.

The premise is responding to a set of interview questions written thusly:

I like _____________________ .

I love _____________________ .

I want _____________________ .

I was _____________________ .

I am _____________________ .

I will _____________________ .

I think _____________________ .

I know _____________________ .

Answers:

I like making up the plan as I go.

I love summer thunderstorms. Especially late at night.

I want to be remembered for the small things and celebrated for the big ones.

I was born on a pirate ship.

I am never going to be as grown up as they want me to be.

I will not compromise, but I will listen and change.

I think a lot of things are overrated.

I know many things. I don’t know which ones are important.