PostgreSQL Tutorial – Creating the Pagila Schema
In the previous article, we talked about creating your first PostgreSQL database and I went over the syntax of createdb and psql in a bit of detail.
Next up, we’re going to look at creating the structure of the pagila database.
createdb -U jeremiah pagila psql -U jeremiah -f pagila-schema.sql pagila psql -U jeremiah -f pagila-insert-data.sql pagila
Working With SQL
Even though SQL is the name of the language that we use to work with data, there are two subsets of SQL that you will hear people refer to: DML and DDL.
DDL is the common abbreviation for Data Definition Language. This is the name for the part of SQL that is used for creating and modifying database structures – tables, indexes, views, etc. The
pagila-schema.sql file is chock full of DDL and is a great introduction to how to create objects in PostgreSQL.
DML stands for Data Manipulation Language. This is the other subset of SQL. This is how we read, update, insert, and delete data. We’ll be talking about DML in the next article.
Tables are the heart and soul of a relational database – they’re where data is stored. Let’s take a look at creating the actors table:
CREATE TABLE public.actor ( actor_id int4 NOT NULL DEFAULT nextval('actor_actor_id_seq'::regclass), first_name varchar(45) NOT NULL, last_name varchar(45) NOT NULL, last_update timestamp NOT NULL DEFAULT now(), PRIMARY KEY(actor_id) ) GO CREATE INDEX idx_actor_last_name ON public.actor(last_name) GO CREATE TRIGGER last_updated BEFORE UPDATE ON actor FOR EACH ROW EXECUTE PROCEDURE last_updated() GO CREATE FUNCTION last_updated() RETURNS trigger AS $$ BEGIN NEW.last_update = CURRENT_TIMESTAMP; RETURN NEW; END $$ LANGUAGE plpgsql;
What’s going on here? Well, we’re creating a table named actor, for starters (I’ll come back to the sequence in a minute). The actor table contains four columns:
last_name columns are the simplest, so I’m going to start with them.
last_name columns store character data – strings of text. These columns store variable length character data that can hold up to 45 characters. Both of these columns have been created as
NOT NULL. This means that the columns require a value, even if that value is an empty string. Ideally you will have additional logic in place to prevent empty strings from being stored in the database as meaningful values.
last_update column holds a date and time. The first time a row is inserted into the actor table, the
last_update column will default to the current date and time. There is a trigger on this table, too, that will update
last_update with the time of the update. While
last_update does not contain any time zone specific information, it is possible for PostgreSQL to store time zone information by using the
timestamp with time zone data type.
Finally we come to the
actor_id column. This column is used as the primary key – it uniquely identifies any row in the actor table and requires the the values in the row not be null. I’ve omitted the code to create the
actor_actor_id_seq sequence, but I’ll summarize. A sequence is similar an
IDENTITY column in SQL Server or an
AUTOINCREMENT column in MySQL, it just requires manual intervention. Whenever we want to increment the value of a sequence, we need to use the
nextval() function to get the next value from the sequence. In the
actor table, our primary key is a 4-byte number that will automatically be incremented by 1 every time a new row is added.
Once we have the four main columns created, we create an index
idx_actor_last_name. Indexes make it possible for PostgreSQL to rapidly find data in a large table. It’s possible to create an index on more than one column. In this case, though, we’ll be creating our index on just the one column. This makes it easier to find actors by their last name.
The last thing we do is create a trigger that runs after an update. Triggers let us run pieces of code after an event (such as an insert, update, or delete) happens on a table. The
last_updated trigger runs a function named
last_updated before any row is updated in the database. The
last_updated function modifies the
NEW database row (a virtual row used in inserts and updates on row-level triggers). This new row will be inserted into the table with a new value in the
You’re probably wondering what’s with the
::regclass and the
$$ when we created the function. Be patient, everything will be explained in time. Right now it’s more important to focus on the higher level concepts and slowly move into the details as time goes on.
Creating a basic table is a simple operation, once you know what you want to store in it. You name the table and then supply a list of columns with a data type for each one. This helps PostgreSQL figure out the optimum way to store data on disk.
This is Jeremiah
I live in Portland, OR. I have two dogs.
I recently received a Master's of Science in Computer Science from Portland State University.
I'm was Microsoft MVP from 2009 - 2018 with a pile of certifications. Somewhere along the way, I wrote a database client for Riak and then handed it off to the community.