Joining DataFrames Using Merge
February 7, 2018
Joining DataFrames Using Merge
If you are familiar with SQL have just started using Pandas, I am sure that you are hoping for a function that mimics the JOIN
functions. Rest assured - you will not be disappointed.
Import Libraries
import pandas as pd
Load Data
We will demonstrate the different types of joins using simulated data about people and their ice cream preferences.
people_id = [1, 2, 3]
people = ['Amy', 'Barry', 'Cory']
people_df = pd.DataFrame(data=people, index=people_id)
people_df.columns = ['Name']
people_df
Name | |
---|---|
1 | Amy |
2 | Barry |
3 | Cory |
flavor_id = [1, 2, 3]
flavors = ['Chocolate', 'Strawberry', 'Vanilla']
flavors_df = pd.DataFrame(data=flavors, index=flavor_id)
flavors_df.columns = ['Flavor']
flavors_df
Flavor | |
---|---|
1 | Chocolate |
2 | Strawberry |
3 | Vanilla |
To link the data from the people and flavors dataframes, we will create a DataFrame that matches people_id
and flavor_id
.
people_flavor = {'people_id': [1, 1, 1, 2, 2, 3],
'flavor_id': [1, 2, 3, 1, 2, 1]}
people_flavor_df = pd.DataFrame(people_flavor)
people_flavor_df
flavor_id | people_id | |
---|---|---|
0 | 1 | 1 |
1 | 2 | 1 |
2 | 3 | 1 |
3 | 1 | 2 |
4 | 2 | 2 |
5 | 1 | 3 |
Our end goal is to have a DataFrame that contains the person’s name as well as their preferred flavor of ice cream. The first step is to combine the joining DataFrame with people_df
to retrieve the names.
We are able to specifiy a join between a specific column of a DataFrame and the index of another DataFrame.
people_flavor_df = people_flavor_df.merge(people_df, how='inner', left_on='people_id', right_index=True)
people_flavor_df
flavor_id | people_id | Name | |
---|---|---|---|
0 | 1 | 1 | Amy |
1 | 2 | 1 | Amy |
2 | 3 | 1 | Amy |
3 | 1 | 2 | Barry |
4 | 2 | 2 | Barry |
5 | 1 | 3 | Cory |
We complete the process by matching the indexes in column flavor_id
with that in the flavors_df
.
people_flavor_df = people_flavor_df.merge(flavors_df, how='inner',
left_on='flavor_id', right_index=True)
people_flavor_df
flavor_id | people_id | Name | Flavor | |
---|---|---|---|---|
0 | 1 | 1 | Amy | Chocolate |
3 | 1 | 2 | Barry | Chocolate |
5 | 1 | 3 | Cory | Chocolate |
1 | 2 | 1 | Amy | Strawberry |
4 | 2 | 2 | Barry | Strawberry |
2 | 3 | 1 | Amy | Vanilla |
While our examples have been limited to inner joins, merge
does allow us to specify the type of join using the how
parameter.
left
uses only keys in the left DataFrame, similar to an SQL left outer joinright
uses only keys in the right DataFrame, similar to an SQL right outer joinouter
uses the union of keys from both DataFrames, similar to an SQL full outer join