In the last two articles, we talked about creating your first PostgreSQL database, the syntax of createdb and psql, and creating tables in the database. Next up, we’re going to look at filling those tables with data.
createdb -U jeremiah pagila psql -U jeremiah -f pagila-schema.sql pagila psql -U jeremiah -f pagila-insert-data.sql pagila
Loading Data From A File
We’re going to be looking at the
pagila-insert-data.sql file. There is a separate file,
pagila-data.sql, but we will not be using it. Both files accomplish the same purpose, but
pagila-insert-data.sql uses standard SQL syntax whereas
pagila-data.sql uses the
COPY command do the bulk of the work. Since our goal is to familiarize ourselves with basic SQL, we’re going to use
Preparation is Key
The first thing you will notice is that there are close to 100 lines of options and comments (comments are start with
-- and end at the end of the current line) before we get to the actual meat of the file. This is because we need to set up common settings before we start working. Let’s take a look at these options line by line.
SET client_encoding = 'UTF8';
Strings will be treated as Unicode strings. This makes sure that characters from non-Latin alphabets can easily be inserted into the database.
SET standard_conforming_strings = off;
standard_conforming_strings setting ensures some backwards compatibility with PostgreSQL behavior – we can use ‘’ as an escape character in strings. When this is set to
off, PostgreSQL will use the ‘’ as an escape character. The goal is to, some day, default this setting to
SET check_function_bodies = false;
We are, in effect, recovering from a database dump. By setting
false we tell PostgreSQL that we want to put off a lot of checks until later because things might not be completely set up at the moment and we’ll accept the risks of doing this, thanks. I would never recommend using this setting in a production environment unless you’re recovering from a data dump. If that’s the case, knock yourself out.
SET client_min_messages = warning;
This controls the amount of messages that we are going to receive from PostgreSQL. Leaving this alone is a good thing but it’s important to know that you can mute all messages from the server if you so desire.
SET escape_string_warning = off;
escape_string_warning setting actually goes along with the
standard_conforming_strings setting and has to do with how PostgreSQL will behave when it encounters a ‘’ character rampaging about in our code.
SET search_path = public, pg_catalog;
There’s a history in the PostgreSQL world of omitting the schema qualification from table names in queries. So, what we’re doing here, is telling PostgreSQL that we would like to look for table first in the
public database schema and then, second, in the
pg_catalog schema. Don’t worry if you don’t know what schemas are all about, I’ll explain them later. Right now you can think of schemas as if they were folders for holding tables, functions, and the like. The
search_path variable works just like the
PATH variable on your operating system – it’s where PostgreSQL is going to look for functionality.
The Anatomy of an Insert
I wish that I could tell you that there is something complex going on with the pagila inserts. But, unfortunately, they’re about as boring as an insert could possibly be:
INSERT INTO -- PostgreSQL needs to listen up, we have -- data coming in. actor -- We're adding data to the actor table, -- specifically into the following columns (actor_id, first_name, last_name, last_update) VALUES -- and here is a list of the values that -- we are going to insert. (1, 'PENELOPE', 'GUINESS', '2006-02-15 09:34:33');
That’s all there is to a basic insert statement – we tell PostgreSQL that we want to add a new row to the database in a specific table and then supply a list of columns and values that we would like to add. Similar code happens about several hundred more times in that file.
This is made slightly more interesting by
ALTER TABLE actor DISABLE TRIGGER ALL;. This code ensures that any and all triggers on the table
actor will not fire while we’re inserting our rows. Why would we want to bypass all of the triggers? Well, we don’t want any extra validation to run, nor do we want a trigger to run that will populate the primary key column of our tables with the value from a sequence – we already have those values and our insert scripts would be much more complicated if they were filled with lookup queries too.
Adding a Brand New Record
Let’s go ahead and add a new movie. Pagila is a movie rental store and we’re not going to be able to keep up with the competition without adding a new movie. Keeping with Pagila’s standard of using nothing but gibberish movie tiles, we’re going to be adding the movie “Poseidon Sundance.”
Of course, creating the movie is not as easy as simply adding the title and walking away. People need to be able to make sure that the movie is acceptable for their tastes, has their favorite actor, or maybe just find out if the film is kid friendly.
INSERT INTO film -- Here we specify the table where we're going to add the data, -- and then list the columns where we're going to actually -- add the data. You don't need to list out all of the -- columns. -- P.S. Lines that start with two lines like this are comments. -- The PostgreSQL command parser will ignore them. ( title, description, release_year, language_id, original_language_id, rental_duration, rental_rate, length, replacement_cost, rating ) VALUES ( 'POSEIDON SUNDANCE', 'A thrilling story of a dog and a moose who have to journey across Finland to be reunited with their owners', '2006', 1, -- language_id, this is English 6, -- original_language_id, this is German 3, -- rental_duration 4.99, -- rental_rate 114, -- length in minutes 24.99, 'G'::mpaa_rating ) ; SELECT * FROM film WHERE title = 'POSEIDON SUNDANCE' ; -- Ee need to put the film into categories so people can find it. INSERT INTO film_category ( film_id, category_id ) SELECT film_id , 9 -- It's a foreign film FROM film WHERE title = 'POSEIDON SUNDANCE' ; /* This is also a comment. It's a multi-line comment, just like in C++, C#, or Java. We're combining an INSERT with a SELECT because we need to find out the film_id of the movie we just added. There are occasions where we could take a different route, but we'll cover those later. */ INSERT INTO film_category ( film_id, category_id ) SELECT film_id , 8 -- films about animals are almost always family movies FROM film WHERE title = 'POSEIDON SUNDANCE' ; -- Add the main actors INSERT INTO film_actor ( actor_id, film_id ) SELECT 71 , film_id FROM film WHERE title = 'POSEIDON SUNDANCE' ; INSERT INTO film_actor ( actor_id, film_id ) SELECT 123 , film_id FROM film WHERE title = 'POSEIDON SUNDANCE' ; INSERT INTO film_actor ( actor_id, film_id ) SELECT 97 , film_id FROM film WHERE title = 'POSEIDON SUNDANCE' ; INSERT INTO film_actor ( actor_id, film_id ) SELECT 49 , film_id FROM film WHERE title = 'POSEIDON SUNDANCE' ; INSERT INTO film_actor ( actor_id, film_id ) SELECT 3 , film_id FROM film WHERE title = 'POSEIDON SUNDANCE' ;
Once we’ve created our movie, let’s try to find it in the database. We can do this by writing a simple query. Don’t worry if you don’t understand everything that’s going on, everything is going to be explained in good time.
SELECT * FROM film f JOIN film_actor fa ON f.film_id = fa.film_id WHERE f.title = 'POSEIDON SUNDANCE' ;
If you’ve done everything right, this should return five rows – one for each actor in the movie.
SELECT * FROM film f JOIN film_category fc ON f.film_id = fc.film_id WHERE f.title = 'POSEIDON SUNDANCE' ;
And this should only return two rows – one for every category the movie is in.
A Flaw in the Plan
When I was originally writing this tutorial, I didn’t pay any attention to some of the views in the database that use the film table (views are magical virtual tables that we’ll talk about later). I was getting ready to test things and I discovered a big flaw in and I thought I would highlight it here.
SELECT * FROM film_list fl WHERE fl.title = 'POSEIDON SUNDANCE' ;
This will return two rows for you. Why?
Well, it turns out whoever wrote the
film_list view assumed that each movie will only belong to one category even though it can feasibly belong to as many as we want. There are a few places that we could fix this error. Before reading on see if you can think of how we could fix this.
Fixing the Flaw
There are multiple places we could fix this. The first place would be in the definition of the view. We could change the view to display all of the categories in a comma separated list, just like we’re doing for all of the actors in the film.
The next place that we can fix this is in the data itself. Apart from deleting the extra row, we can do one of two things. We can create a uniqueness requirement (called a unique constraint) on the
film_category table so that only one film can be in the table at a time. This is a limiting feature of the application. My preference would be to create a unique constraint on both the
category_id columns. That ensures that we will never have a duplicated row for a film and category combination, but we can still add multiple categories. Combining this with changes to the view fixes this flaw in our database design.
We’ll cover this issue later.
So, there you have it – inserts are very simple statements. We can insert entirely new data that comes from somewhere outside of the database, we can insert data that’s a mash up of other information in the database, or we can do something in between.
What’s important to take away from this is that it’s very easy to insert information into PostgreSQL. We tell PostgreSQL that we’re going to insert data into a table, list the columns that we’re going to fill with data, and then define the data that we’re going to insert.