For a recent class on databases, we had to put together a database as a final graduate project. Rather than let my work go to waste, I figured that it would be fun to share it with the world.
Getting the Sample Database
First off, this will only work on PostgreSQL. As best as I can tell, I didn’t use any fancy new features of Postgres, so this should work on anything that you’ve installed within recent memory. In my case, I’m using PostgreSQL 10.1 installed via homebrew on a Mac.
You can get the database by cloning the horror-movies-database reposi- tory and then executing psql -f sql/data.sql in the root of the project. You could just as easily open data.sql in DataGrip or pgAdmin and execute it from there.
What Will You Find?
The sample database contains horror movies, crawled from IMDB, from prompt cloud’s “Spooky Dataset for Halloween”. You’ll find a variety of movies in the horror genre from all over the planet. The schema is broken up into the following tables:
movies: Core data about movies.
movie_genres: A many-to-many relationship. Movies can have more than one genre, after all. “Horror” is still included in here, take care when doing genre-based queries.
movie_locations: A many-to-many relationship for filming locations. Unfortunately, the data only seems to have one shooting location for movies, but it’s more fun to plan ahead, right?
movie_cast: The cast of the movie!
countries: The country of origin.
ratings: The movie rating (R, PG-13, etc).
You’ll also find that the data isn’t very clean. A number of movies are missing information, but there’s no real rhyme or reason to what’s missing. This does make querying more fun, though.
Just to show a bit of interaction, here are two sample queries to get you started. The first query shows the highest budget movie by year and country of origin:
WITH expensive_movies AS ( SELECT release_year, country_id, MAX(budget) AS max_budget FROM movies GROUP BY release_year, country_id ) SELECT c.country_name, m.release_year, m.title, m.budget FROM movies m JOIN expensive_movies em ON m.release_year = em.release_year AND m.country_id = em.country_id AND m.budget = em.max_budget JOIN countries c ON m.country_id = c.id ORDER BY m.country_id, m.release_year DESC;
This next query shows the average IMDB rating by country of origin and release year:
SELECT c.country_name, m.release_year, AVG(review) AS average_review_score FROM movies m JOIN countries c ON m.country_id = c.id WHERE release_year IS NOT NULL AND m.review IS NOT NULL GROUP BY c.country_name, m.release_year ORDER BY country_name, release_year;
Remember to have fun! There’s a lot of interesting data out there that you can play around with. If you find something interesting, share it with the world.
If you find a bug, please submit a GitHub issue.