PostgreSQL Tutorial – Creating the Pagila Schema

In the previous article, we talked about creating your first PostgreSQL database and I went over the syntax of createdb and psql in a bit of detail.

Next up, we’re going to look at creating the structure of the pagila database.

createdb -U jeremiah pagila
psql -U jeremiah -f pagila-schema.sql pagila 
psql -U jeremiah -f pagila-insert-data.sql pagila

Working With SQL

Even though SQL is the name of the language that we use to work with data, there are two subsets of SQL that you will hear people refer to: DML and DDL.

DDL

DDL is the common abbreviation for Data Definition Language. This is the name for the part of SQL that is used for creating and modifying database structures – tables, indexes, views, etc. The pagila-schema.sql file is chock full of DDL and is a great introduction to how to create objects in PostgreSQL.

DML

DML stands for Data Manipulation Language. This is the other subset of SQL. This is how we read, update, insert, and delete data. We’ll be talking about DML in the next article.

Tables

Tables are the heart and soul of a relational database – they’re where data is stored. Let’s take a look at creating the actors table:

CREATE TABLE public.actor  ( 
    actor_id    int4 NOT NULL DEFAULT nextval('actor_actor_id_seq'::regclass),
    first_name  varchar(45) NOT NULL,
    last_name   varchar(45) NOT NULL,
    last_update timestamp NOT NULL DEFAULT now(),
    PRIMARY KEY(actor_id)
)
GO
CREATE INDEX idx_actor_last_name
    ON public.actor(last_name)
GO
CREATE TRIGGER last_updated
     BEFORE UPDATE ON actor FOR EACH ROW
     EXECUTE PROCEDURE last_updated()
GO

CREATE FUNCTION last_updated() RETURNS trigger
    AS $$
BEGIN
    NEW.last_update = CURRENT_TIMESTAMP;
    RETURN NEW;
END $$
    LANGUAGE plpgsql;

What’s going on here? Well, we’re creating a table named actor, for starters (I’ll come back to the sequence in a minute). The actor table contains four columns:actor_id, first_name, last_name, and last_update. Thefirst_name and last_name columns are the simplest, so I’m going to start with them.

Both the first_name and last_name columns store character data – strings of text. These columns store variable length character data that can hold up to 45 characters. Both of these columns have been created as NOT NULL. This means that the columns require a value, even if that value is an empty string. Ideally you will have additional logic in place to prevent empty strings from being stored in the database as meaningful values.

The last_update column holds a date and time. The first time a row is inserted into the actor table, the last_update column will default to the current date and time. There is a trigger on this table, too, that will updatelast_update with the time of the update. While last_update does not contain any time zone specific information, it is possible for PostgreSQL to store time zone information by using the timestamp with time zone data type.

Finally we come to the actor_id column. This column is used as the primary key – it uniquely identifies any row in the actor table and requires the the values in the row not be null. I’ve omitted the code to create theactor_actor_id_seq sequence, but I’ll summarize. A sequence is similar an IDENTITY column in SQL Server or an AUTOINCREMENT column in MySQL, it just requires manual intervention. Whenever we want to increment the value of a sequence, we need to use the nextval() function to get the next value from the sequence. In the actor table, our primary key is a 4-byte number that will automatically be incremented by 1 every time a new row is added.

Once we have the four main columns created, we create an indexidx_actor_last_name. Indexes make it possible for PostgreSQL to rapidly find data in a large table. It’s possible to create an index on more than one column. In this case, though, we’ll be creating our index on just the one column. This makes it easier to find actors by their last name.

The last thing we do is create a trigger that runs after an update. Triggers let us run pieces of code after an event (such as an insert, update, or delete) happens on a table. The last_updated trigger runs a function namedlast_updated before any row is updated in the database. Thelast_updated function modifies the NEW database row (a virtual row used in inserts and updates on row-level triggers). This new row will be inserted into the table with a new value in the last_update column.

You’re probably wondering what’s with the ::regclass and the $$ when we created the function. Be patient, everything will be explained in time. Right now it’s more important to focus on the higher level concepts and slowly move into the details as time goes on.

Summary

Creating a basic table is a simple operation, once you know what you want to store in it. You name the table and then supply a list of columns with a data type for each one. This helps PostgreSQL figure out the optimum way to store data on disk.

Menu