We store data in relational databases so we can organize and quickly find information we need. That's where JOINs come in. In relational databases we organize data in sets, that are called tables in SQL.
Joins are the mechanism to find information in one set (table) that matches information in a different set.
For these examples we'll look at a hypothetical database for a human resources application. This application keeps track of departments, employees and their managers and salaries.
INNER JOIN is the most common, and matches a column in one table to another.
In our HR database we would like to find (query)
employees and their
departments. Let's take a look at the structure for those tables:
For each department we record its id, name, and a
location_id that is a
reference to the
locations table, so we can know which office this
department is a part of.
As for the
employees table we have:
Each employee record/row has an
employee_id, things we need to know about
the employee such as first and last name, a
manager_id that refers to another
employee in the same table, and a
department_id which tells our system
which department each employee is a part of.
So let's say we want to query the employees and the departments they are a part of:
SELECT e.employee_id, e.first_name, e.last_name, d.department_name FROM employees e INNER JOIN departments d ON (e.department_id = d.department_id);
Note how we request
department_name from the table
We match these two sets of information in the
FROM part of the query.
INNER JOIN and
ON keywords. In this case we're asking for
the intersection of the table
employees and the table
department_id matches on both tables. That gives us this output:
We told the database server what information we wanted, and the database took care of finding and responding with that data, without the need for us to write a specific program to parse and retrieve the information.
In the real world things don't always match perfectly.
OUTER JOINs deal
with those mismatched sets of data, by returning
NULL values for the
table that does not have match (as a side note,
NULL means "undefined" in SQL).
There are different kinds of
LEFT OUTER JOIN all rows from the left side table will be returned,
NULL will be returned for the non-matching right side table rows.
In our HR database, our
employees can have
Let's say in addition to the departments, we also need the first and last names of each employee's dependents:
SELECT e.employee_id, e.first_name, e.last_name, d.department_name, dep.first_name || ' ' || dep.last_name AS dependent FROM employees e INNER JOIN departments d ON (e.department_id = d.department_id) LEFT OUTER JOIN dependents dep ON (e.employee_id = dep.employee_id);
LEFT OUTER JOIN is joining
employees on the left and
on the right, which means we will get the records from employees (left) but
may get some
NULL values for
dependents on the right.
employee_id | first_name | last_name | department_name | dependent -------------+-------------+------------+-----------------+------------------- 1 | John | Smith | Management | Spoiled Smith 1 | John | Smith | Management | DoNothin Smith 2 | Pointy | HairedBoss | IT | Rafael HairedBoss 2 | Pointy | HairedBoss | IT | Beta HairedBoss 3 | Dilbertzoni | DeScrewed | IT | Wanda DeScrewed 5 | Howard | TheDuck | Sales | 8 | Lea | Palpatine | IT | 6 | Luke | Fett | Sales | 4 | Vanda | Socialist | Human Resources | 7 | John | SkipWork | Human Resources |
But why? Well, because some employees may not have any dependents, and we need a list of all employees, even if they have no dependents. We see from what our PostgreSQL database returned that John Smith has two dependents: Spoiled and DoNothing Smith, and the employee Howard TheDuck has none.
Had we used an
INNER JOIN then only employees with dependents would have
RIGHT OUTER JOIN all rows from the right side table will be returned,
NULL will be returned for the non-matching left side table rows.
We already saw how our HR database keeps track of
departments and each department
location_id referring to the
locations table, which records where
each office is located. Let’s look at the structure for the
Now let's query our database for our departments and their locations:
SELECT d.department_name, l.city, l.state_province, l.country_id FROM departments d RIGHT OUTER JOIN locations l ON (d.location_id = l.location_id);
Note that we joined the
departments table on the left to the
on the right, on the
location_id column, which gives us this output:
department_name | city | state_province | country_id -----------------+-------------+----------------+------------ Human Resources | Munich | Bavaria | DE Management | London | London | UK Sales | Munich | Bavaria | DE IT | Lehi | Utah | US | Effingham | Indiana | US
This tells us, among the list of offices and locations, that the Effingham, Indiana office has no department assigned to it, and therefore is closed.
A self join is not a different type of join in a relational database, but it's important to mention them because they often happen in real systems.
A self join is one that joins a table to itself. It is often useful when the table
contains hierarchical data. In our HR database we've already seen the
table. Notice that each employee has a
manager_id, him/herself being
We can therefore ask our database to give us a list of employees and their managers:
SELECT e.first_name, e.last_name, e.job_title, format('%s %s', m.first_name, m.last_name) AS manager FROM employees e LEFT OUTER JOIN employees m ON (e.manager_id = m.employee_id);
In this case we are joining the
employees table (aliased as
e) to itself, aliased as
(for managers). We joined them with a
LEFT OUTER JOIN because at least one employee doesn't have a manager: the president of the company, John Smith.
first_name | last_name | job_title | manager -------------+------------+-------------------------------+------------------- John | Smith | President | Pointy | HairedBoss | CTO | John Smith Dilbertzoni | DeScrewed | Software Engineer | Pointy HairedBoss Vanda | Socialist | VP Equity Diversity Inclusion | John Smith Howard | TheDuck | VP Sales | John Smith Luke | Fett | Sales Guy | Howard TheDuck John | SkipWork | HR Henchman | Vanda Socialist Lea | Palpatine | UI Designer | Pointy HairedBoss
We covered the most common types of relational database joins. You're now
able to explore querying data from a database. There are many more ways to query data in a relational database. You can use other operators in your joins, besides the
= operator, for example.
The relationships between different sets of data, along with the key for each table in your database, are crucial for a well-designed database and performant queries.