Database Joins
March 31, 2018
Database Joins
If you are new to the world of relational databases, I will encourage that you spend time truly understanding how the various joins work. I believe that this is the key foundation skill that will enable you to explore and make sense of the data stored in the database.
There are four types of joins:
- inner join
- left (outer) join
- right (outer) join
- cross join (cartesian product)
While you will often see database joins being depicted as venn diagrams, do note that this is not how they work under the hood. I will readily admit though, that venn diagrams do serve as a good approximation to the underlying process, and it is likely far easier for a beginner to understand. When you feel relatively confident about your understanding of joins, I strongly suggest that you read this blog post by JOOQ.
The Employees Table
SELECT
employeeNumber,
firstName,
lastName,
jobTitle,
reportsTo
FROM
employees
WHERE
employeeNumber IN
(
1002,
1056,
1076,
1088,
1102
)
;
employeeNumber | firstName | lastName | jobTitle | reportsTo |
---|---|---|---|---|
1002 | Diane | Murphy | President | NULL |
1056 | Mary | Patterson | VP Sales | 1002 |
1076 | Jeff | Firrelli | VP Marketing | 1002 |
1088 | William | Patterson | Sales Manager (APAC) | 1056 |
1102 | Gerard | Bondur | Sale Manager (EMEA) | 1056 |
The employees
table contains information about each employee’s unique id, their first and last name, job title and also if they report to another employee. Here, we can see that Diane Murphy does not report to any employee.
For brevity, we will be working with a subset of this data for this article.
Inner Join
Inner joins are performed by identifying matches between the values in a table’s column with the values of another table’s column. The query returns results only when there is a match.
We will demonstrate inner joins by querying the employees table to display the first and last name of an employee, their job title and the first and last name of the employee that they report to.
SELECT
e.firstName,
e.lastName,
e.jobTitle,
rep.firstName AS 'reportingFirstName',
rep.lastName AS 'reportingLastName'
FROM
employees e
INNER JOIN
employees rep
ON e.reportsTo = rep.employeeNumber
AND e.employeeNumber IN
(
1002,
1056,
1076,
1088,
1102
)
;
firstName | lastName | jobTitle | reportingFirstName | reportingLastName |
---|---|---|---|---|
Mary | Patterson | VP Sales | Diane | Murphy |
Jeff | Firrelli | VP Marketing | Diane | Murphy |
William | Patterson | Sales Manager (APAC) | Mary | Patterson |
Gerard | Bondur | Sale Manager (EMEA) | Mary | Patterson |
You will notice that Diane does not show up in the results. That is because she reports to NULL (nobody, in human speak), and therefore the inner join will neither find a match nor will she show up in the final result set.
Left Join
A left join returns all rows in the first table listed in the query, regardless of whether it finds a match in table that it is being joined with.
SELECT
e.firstName,
e.lastName,
e.jobTitle,
rep.firstName AS 'reportingFirstName',
rep.lastName AS 'reportingLastName'
FROM
employees e
LEFT JOIN
employees rep
ON e.reportsTo = rep.employeeNumber
WHERE
e.employeeNumber IN
(
1002,
1056,
1076,
1088,
1102
)
;
firstName | lastName | jobTitle | reportingFirstName | reportingLastName |
---|---|---|---|---|
Diane | Murphy | President | NULL | NULL |
Mary | Patterson | VP Sales | Diane | Murphy |
Jeff | Firrelli | VP Marketing | Diane | Murphy |
William | Patterson | Sales Manager (APAC) | Mary | Patterson |
Gerard | Bondur | Sale Manager (EMEA) | Mary | Patterson |
Contrast this result set to the inner join one, we will notice that Diane does show up in this result set. That is because a left join retains all the rows in the first table of the query.
Right Join
To demonstrate right joins, we will be removing the WHERE
clause and we will display a truncated result set.
SELECT
e.firstName,
e.lastName,
e.jobTitle,
rep.firstName AS 'reportingFirstName',
rep.lastName AS 'reportingLastName'
FROM
employees e
RIGHT JOIN
employees rep
ON e.reportsTo = rep.employeeNumber;
firstName | lastName | jobTitle | reportingFirstName | reportingLastName |
---|---|---|---|---|
Mary | Patterson | VP Sales | Diane | Murphy |
Jeff | Firrelli | VP Marketing | Diane | Murphy |
William | Patterson | Sales Manager (APAC) | Mary | Patterson |
Gerard | Bondur | Sale Manager (EMEA) | Mary | Patterson |
Anthony | Bow | Sales Manager (NA) | Mary | Patterson |
NULL | NULL | NULL | Jeff | Firrelli |
I will like to bring your attention to Jeff Firrelli. You will notice that for his entry, there are no values in the firstName, lastName and jobTitle columns. This implies that nobody within the firm reports to him.
SELECT
COUNT(*)
FROM
employees
WHERE
reportsTo IN
(
SELECT
employeeNumber
FROM
employees
WHERE
firstName = 'Jeff'
AND lastName = 'Firrelli'
)
;
COUNT(*) |
---|
0 |
Cross Join (Cartesian Product)
A cross join (also known as a cartesian product) matches each row of a table to all rows of another table. If a table has 3 rows, and is joined to another table with 3 rows, we expect a result set of 9 rows.
SELECT
COUNT(*)
FROM
employees;
COUNT(*) |
---|
23 |
There are 23 records in the employees table.
SELECT
COUNT(*)
FROM
employees e
CROSS JOIN
employees em;
COUNT(*) |
---|
529 |
Performing a cross join of the employee table against itself creates a result set with 529 records (23 * 23).
SELECT
e.employeeNumber,
e.firstName,
e.lastName,
e.jobTitle,
em.employeeNumber,
em.firstName,
em.lastName,
em.jobTitle
FROM
employees e
CROSS JOIN
employees em;
employeeNumber | firstName | lastName | jobTitle | employeeNumber | firstName | lastName | jobTitle |
---|---|---|---|---|---|---|---|
1002 | Diane | Murphy | President | 1002 | Diane | Murphy | President |
1056 | Mary | Patterson | VP Sales | 1002 | Diane | Murphy | President |
1076 | Jeff | Firrelli | VP Marketing | 1002 | Diane | Murphy | President |
1088 | William | Patterson | Sales Manager (APAC) | 1002 | Diane | Murphy | President |
1102 | Gerard | Bondur | Sale Manager (EMEA) | 1002 | Diane | Murphy | President |
1143 | Anthony | Bow | Sales Manager (NA) | 1002 | Diane | Murphy | President |
1165 | Leslie | Jennings | Sales Rep | 1002 | Diane | Murphy | President |
1166 | Leslie | Thompson | Sales Rep | 1002 | Diane | Murphy | President |
1188 | Julie | Firrelli | Sales Rep | 1002 | Diane | Murphy | President |
1216 | Steve | Patterson | Sales Rep | 1002 | Diane | Murphy | President |
Here, you can see that Diane’s record is repeated for each row in the first table.
Self Join
A self join is used to describe instances where a table is joined with itself. A common use case is demonstrated in the examples above, where a column in the table holds a reference to the id of an object within the same table.
The key point to note for self joins is that we need to provide aliases for the tables. This removes any form of ambiguity for the SQL interpreter.