Tutorial Instructions

Joins in Postgres

We store data in relational databases so we can organize and quickly find information we need. That's where JOINs come in. In relational databases we organize data in sets, that are called tables in SQL.

Joins are the mechanism to find information in one set (table) that matches information in a different set.

For these examples we'll look at a hypothetical database for a human resources application. This application keeps track of departments, employees and their managers and salaries.

INNER JOIN

An INNER JOIN is the most common, and matches a column in one table to another.

In our HR database we would like to find (query) employees and their departments. Let's take a look at the structure for those tables:

Departments

columnrefers to
department_id
department_name
location_idlocations (location_id)

For each department we record its id, name, and a location_id that is a reference to the locations table, so we can know which office this department is a part of.

As for the employees table we have:

Employees

columnrefers to
employee_id
first_name
last_name
start_date
job_title
salary
manager_idemployees (employee_id)
department_iddepartments (department_id)

Each employee record/row has an employee_id, things we need to know about the employee such as first and last name, a manager_id that refers to another employee in the same table, and a department_id which tells our system which department each employee is a part of.

So let's say we want to query the employees and the departments they are a part of:

SELECT e.employee_id, e.first_name, e.last_name, d.department_name FROM employees e INNER JOIN departments d ON (e.department_id = d.department_id);

Note how we request employee_id, first_name and last_name from the table employees and department_name from the table departments.

We match these two sets of information in the FROM part of the query. Note the INNER JOIN and ON keywords. In this case we're asking for the intersection of the table employees and the table departments where the department_id matches on both tables. That gives us this output:

We told the database server what information we wanted, and the database took care of finding and responding with that data, without the need for us to write a specific program to parse and retrieve the information.

OUTER JOIN

In the real world things don't always match perfectly. OUTER JOINs deal with those mismatched sets of data, by returning NULL values for the table that does not have match (as a side note, NULL means "undefined" in SQL).

There are different kinds of OUTER JOINs.

LEFT OUTER JOIN

In a LEFT OUTER JOIN all rows from the left side table will be returned, and NULL will be returned for the non-matching right side table rows.

In our HR database, our employees can have dependents:

Dependents

Columnrefers to
dependent_id
first_name
last_name
employee_idemployees (employee_id)

Let's say in addition to the departments, we also need the first and last names of each employee's dependents:

SELECT e.employee_id, e.first_name, e.last_name, d.department_name, dep.first_name || ' ' || dep.last_name AS dependent FROM employees e INNER JOIN departments d ON (e.department_id = d.department_id) LEFT OUTER JOIN dependents dep ON (e.employee_id = dep.employee_id);

The LEFT OUTER JOIN is joining employees on the left and dependents on the right, which means we will get the records from employees (left) but may get some NULL values for dependents on the right.


 employee_id | first_name  | last_name  | department_name |     dependent
-------------+-------------+------------+-----------------+-------------------
           1 | John        | Smith      | Management      | Spoiled Smith
           1 | John        | Smith      | Management      | DoNothin Smith
           2 | Pointy      | HairedBoss | IT              | Rafael HairedBoss
           2 | Pointy      | HairedBoss | IT              | Beta HairedBoss
           3 | Dilbertzoni | DeScrewed  | IT              | Wanda DeScrewed
           5 | Howard      | TheDuck    | Sales           |
           8 | Lea         | Palpatine  | IT              |
           6 | Luke        | Fett       | Sales           |
           4 | Vanda       | Socialist  | Human Resources |
           7 | John        | SkipWork   | Human Resources |

But why? Well, because some employees may not have any dependents, and we need a list of all employees, even if they have no dependents. We see from what our PostgreSQL database returned that John Smith has two dependents: Spoiled and DoNothing Smith, and the employee Howard TheDuck has none.

Had we used an INNER JOIN then only employees with dependents would have been returned.

RIGHT OUTER JOIN

In a RIGHT OUTER JOIN all rows from the right side table will be returned, and NULL will be returned for the non-matching left side table rows.

We already saw how our HR database keeps track of departments and each department has a location_id referring to the locations table, which records where each office is located. Let’s look at the structure for the locations table:

Column
location_id
street_address
postal_code
city
state_province
country_id

Now let's query our database for our departments and their locations:

SELECT d.department_name, l.city, l.state_province, l.country_id FROM departments d RIGHT OUTER JOIN locations l ON (d.location_id = l.location_id);

Note that we joined the departments table on the left to the locations table on the right, on the location_id column, which gives us this output:

 department_name |    city     | state_province | country_id
-----------------+-------------+----------------+------------
 Human Resources | Munich      | Bavaria        | DE
 Management      | London      | London         | UK
 Sales           | Munich      | Bavaria        | DE
 IT              | Lehi        | Utah           | US
                 | Effingham   | Indiana        | US

This tells us, among the list of offices and locations, that the Effingham, Indiana office has no department assigned to it, and therefore is closed.

Self Join

A self join is not a different type of join in a relational database, but it's important to mention them because they often happen in real systems.

A self join is one that joins a table to itself. It is often useful when the table contains hierarchical data. In our HR database we've already seen the employees table. Notice that each employee has a manager_id, him/herself being employee.

We can therefore ask our database to give us a list of employees and their managers:

SELECT e.first_name, e.last_name, e.job_title, format('%s %s', m.first_name, m.last_name) AS manager FROM employees e LEFT OUTER JOIN employees m ON (e.manager_id = m.employee_id);

In this case we are joining the employees table (aliased as e) to itself, aliased as m (for managers). We joined them with a LEFT OUTER JOIN because at least one employee doesn't have a manager: the president of the company, John Smith.

 first_name  | last_name  |           job_title           |      manager
-------------+------------+-------------------------------+-------------------
 John        | Smith      | President                     |
 Pointy      | HairedBoss | CTO                           | John Smith
 Dilbertzoni | DeScrewed  | Software Engineer             | Pointy HairedBoss
 Vanda       | Socialist  | VP Equity Diversity Inclusion | John Smith
 Howard      | TheDuck    | VP Sales                      | John Smith
 Luke        | Fett       | Sales Guy                     | Howard TheDuck
 John        | SkipWork   | HR Henchman                   | Vanda Socialist
 Lea         | Palpatine  | UI Designer                   | Pointy HairedBoss

Conclusion

We covered the most common types of relational database joins. You're now able to explore querying data from a database. There are many more ways to query data in a relational database. You can use other operators in your joins, besides the = operator, for example.

The relationships between different sets of data, along with the key for each table in your database, are crucial for a well-designed database and performant queries.

Loading terminal...