Database Joins

March 31, 2018

Database Joins

If you are new to the world of relational databases, I will encourage that you spend time truly understanding how the various joins work. I believe that this is the key foundation skill that will enable you to explore and make sense of the data stored in the database.

There are four types of joins:

While you will often see database joins being depicted as venn diagrams, do note that this is not how they work under the hood. I will readily admit though, that venn diagrams do serve as a good approximation to the underlying process, and it is likely far easier for a beginner to understand. When you feel relatively confident about your understanding of joins, I strongly suggest that you read this blog post by JOOQ.


The Employees Table


SELECT
   employeeNumber,
   firstName,
   lastName,
   jobTitle,
   reportsTo 
FROM
   employees 
WHERE
   employeeNumber IN 
   (
      1002,
      1056,
      1076,
      1088,
      1102
   )
;
employeeNumber firstName lastName jobTitle reportsTo
1002 Diane Murphy President NULL
1056 Mary Patterson VP Sales 1002
1076 Jeff Firrelli VP Marketing 1002
1088 William Patterson Sales Manager (APAC) 1056
1102 Gerard Bondur Sale Manager (EMEA) 1056

The employees table contains information about each employee’s unique id, their first and last name, job title and also if they report to another employee. Here, we can see that Diane Murphy does not report to any employee.

For brevity, we will be working with a subset of this data for this article.


Inner Join

Inner joins are performed by identifying matches between the values in a table’s column with the values of another table’s column. The query returns results only when there is a match.

We will demonstrate inner joins by querying the employees table to display the first and last name of an employee, their job title and the first and last name of the employee that they report to.


SELECT
   e.firstName,
   e.lastName,
   e.jobTitle,
   rep.firstName AS 'reportingFirstName',
   rep.lastName AS 'reportingLastName' 
FROM
   employees e 
   INNER JOIN
      employees rep 
      ON e.reportsTo = rep.employeeNumber 
      AND e.employeeNumber IN 
      (
         1002,
         1056,
         1076,
         1088,
         1102
      )
;
firstName lastName jobTitle reportingFirstName reportingLastName
Mary Patterson VP Sales Diane Murphy
Jeff Firrelli VP Marketing Diane Murphy
William Patterson Sales Manager (APAC) Mary Patterson
Gerard Bondur Sale Manager (EMEA) Mary Patterson

You will notice that Diane does not show up in the results. That is because she reports to NULL (nobody, in human speak), and therefore the inner join will neither find a match nor will she show up in the final result set.


Left Join

A left join returns all rows in the first table listed in the query, regardless of whether it finds a match in table that it is being joined with.


SELECT
   e.firstName,
   e.lastName,
   e.jobTitle,
   rep.firstName AS 'reportingFirstName',
   rep.lastName AS 'reportingLastName' 
FROM
   employees e 
   LEFT JOIN
      employees rep 
      ON e.reportsTo = rep.employeeNumber 
WHERE
   e.employeeNumber IN 
   (
      1002,
      1056,
      1076,
      1088,
      1102
   )
;
firstName lastName jobTitle reportingFirstName reportingLastName
Diane Murphy President NULL NULL
Mary Patterson VP Sales Diane Murphy
Jeff Firrelli VP Marketing Diane Murphy
William Patterson Sales Manager (APAC) Mary Patterson
Gerard Bondur Sale Manager (EMEA) Mary Patterson

Contrast this result set to the inner join one, we will notice that Diane does show up in this result set. That is because a left join retains all the rows in the first table of the query.


Right Join

To demonstrate right joins, we will be removing the WHERE clause and we will display a truncated result set.


SELECT
   e.firstName,
   e.lastName,
   e.jobTitle,
   rep.firstName AS 'reportingFirstName',
   rep.lastName AS 'reportingLastName' 
FROM
   employees e 
   RIGHT JOIN
      employees rep 
      ON e.reportsTo = rep.employeeNumber;
firstName lastName jobTitle reportingFirstName reportingLastName
Mary Patterson VP Sales Diane Murphy
Jeff Firrelli VP Marketing Diane Murphy
William Patterson Sales Manager (APAC) Mary Patterson
Gerard Bondur Sale Manager (EMEA) Mary Patterson
Anthony Bow Sales Manager (NA) Mary Patterson
NULL NULL NULL Jeff Firrelli

I will like to bring your attention to Jeff Firrelli. You will notice that for his entry, there are no values in the firstName, lastName and jobTitle columns. This implies that nobody within the firm reports to him.


SELECT
   COUNT(*) 
FROM
   employees 
WHERE
   reportsTo IN 
   (
      SELECT
         employeeNumber 
      FROM
         employees 
      WHERE
         firstName = 'Jeff' 
         AND lastName = 'Firrelli'
   )
;
COUNT(*)
0


Cross Join (Cartesian Product)

A cross join (also known as a cartesian product) matches each row of a table to all rows of another table. If a table has 3 rows, and is joined to another table with 3 rows, we expect a result set of 9 rows.


SELECT 
    COUNT(*)
FROM
    employees;
COUNT(*)
23

There are 23 records in the employees table.


SELECT 
    COUNT(*) 
FROM
    employees e
        CROSS JOIN
    employees em;
COUNT(*)
529

Performing a cross join of the employee table against itself creates a result set with 529 records (23 * 23).


SELECT
   e.employeeNumber,
   e.firstName,
   e.lastName,
   e.jobTitle,
   em.employeeNumber,
   em.firstName,
   em.lastName,
   em.jobTitle 
FROM
   employees e 
   CROSS JOIN
      employees em;
employeeNumber firstName lastName jobTitle employeeNumber firstName lastName jobTitle
1002 Diane Murphy President 1002 Diane Murphy President
1056 Mary Patterson VP Sales 1002 Diane Murphy President
1076 Jeff Firrelli VP Marketing 1002 Diane Murphy President
1088 William Patterson Sales Manager (APAC) 1002 Diane Murphy President
1102 Gerard Bondur Sale Manager (EMEA) 1002 Diane Murphy President
1143 Anthony Bow Sales Manager (NA) 1002 Diane Murphy President
1165 Leslie Jennings Sales Rep 1002 Diane Murphy President
1166 Leslie Thompson Sales Rep 1002 Diane Murphy President
1188 Julie Firrelli Sales Rep 1002 Diane Murphy President
1216 Steve Patterson Sales Rep 1002 Diane Murphy President

Here, you can see that Diane’s record is repeated for each row in the first table.


Self Join

A self join is used to describe instances where a table is joined with itself. A common use case is demonstrated in the examples above, where a column in the table holds a reference to the id of an object within the same table.

The key point to note for self joins is that we need to provide aliases for the tables. This removes any form of ambiguity for the SQL interpreter.

comments powered by Disqus