Joining DataFrames Using Merge
February 7, 2018
Joining DataFrames Using Merge
If you are familiar with SQL have just started using Pandas, I am sure that you are hoping for a function that mimics the JOIN functions. Rest assured - you will not be disappointed.
Import Libraries
import pandas as pd
Load Data
We will demonstrate the different types of joins using simulated data about people and their ice cream preferences.
people_id = [1, 2, 3]
people = ['Amy', 'Barry', 'Cory']
people_df = pd.DataFrame(data=people, index=people_id)
people_df.columns = ['Name']
people_df
| Name | |
|---|---|
| 1 | Amy |
| 2 | Barry |
| 3 | Cory |
flavor_id = [1, 2, 3]
flavors = ['Chocolate', 'Strawberry', 'Vanilla']
flavors_df = pd.DataFrame(data=flavors, index=flavor_id)
flavors_df.columns = ['Flavor']
flavors_df
| Flavor | |
|---|---|
| 1 | Chocolate |
| 2 | Strawberry |
| 3 | Vanilla |
To link the data from the people and flavors dataframes, we will create a DataFrame that matches people_id and flavor_id.
people_flavor = {'people_id': [1, 1, 1, 2, 2, 3],
'flavor_id': [1, 2, 3, 1, 2, 1]}
people_flavor_df = pd.DataFrame(people_flavor)
people_flavor_df
| flavor_id | people_id | |
|---|---|---|
| 0 | 1 | 1 |
| 1 | 2 | 1 |
| 2 | 3 | 1 |
| 3 | 1 | 2 |
| 4 | 2 | 2 |
| 5 | 1 | 3 |
Our end goal is to have a DataFrame that contains the person’s name as well as their preferred flavor of ice cream. The first step is to combine the joining DataFrame with people_df to retrieve the names.
We are able to specifiy a join between a specific column of a DataFrame and the index of another DataFrame.
people_flavor_df = people_flavor_df.merge(people_df, how='inner', left_on='people_id', right_index=True)
people_flavor_df
| flavor_id | people_id | Name | |
|---|---|---|---|
| 0 | 1 | 1 | Amy |
| 1 | 2 | 1 | Amy |
| 2 | 3 | 1 | Amy |
| 3 | 1 | 2 | Barry |
| 4 | 2 | 2 | Barry |
| 5 | 1 | 3 | Cory |
We complete the process by matching the indexes in column flavor_id with that in the flavors_df.
people_flavor_df = people_flavor_df.merge(flavors_df, how='inner',
left_on='flavor_id', right_index=True)
people_flavor_df
| flavor_id | people_id | Name | Flavor | |
|---|---|---|---|---|
| 0 | 1 | 1 | Amy | Chocolate |
| 3 | 1 | 2 | Barry | Chocolate |
| 5 | 1 | 3 | Cory | Chocolate |
| 1 | 2 | 1 | Amy | Strawberry |
| 4 | 2 | 2 | Barry | Strawberry |
| 2 | 3 | 1 | Amy | Vanilla |
While our examples have been limited to inner joins, merge does allow us to specify the type of join using the how parameter.
leftuses only keys in the left DataFrame, similar to an SQL left outer joinrightuses only keys in the right DataFrame, similar to an SQL right outer joinouteruses the union of keys from both DataFrames, similar to an SQL full outer join