Joining DataFrames Using Merge

February 7, 2018

Joining DataFrames Using Merge

If you are familiar with SQL have just started using Pandas, I am sure that you are hoping for a function that mimics the JOIN functions. Rest assured - you will not be disappointed.


Import Libraries

import pandas as pd


Load Data

We will demonstrate the different types of joins using simulated data about people and their ice cream preferences.

people_id = [1, 2, 3]
people = ['Amy', 'Barry', 'Cory']

people_df = pd.DataFrame(data=people, index=people_id)
people_df.columns = ['Name']

people_df
Name
1 Amy
2 Barry
3 Cory
flavor_id = [1, 2, 3]
flavors = ['Chocolate', 'Strawberry', 'Vanilla']

flavors_df = pd.DataFrame(data=flavors, index=flavor_id)
flavors_df.columns = ['Flavor']

flavors_df
Flavor
1 Chocolate
2 Strawberry
3 Vanilla


To link the data from the people and flavors dataframes, we will create a DataFrame that matches people_id and flavor_id.

people_flavor = {'people_id': [1, 1, 1, 2, 2, 3], 
                 'flavor_id': [1, 2, 3, 1, 2, 1]}

people_flavor_df = pd.DataFrame(people_flavor)
people_flavor_df
flavor_id people_id
0 1 1
1 2 1
2 3 1
3 1 2
4 2 2
5 1 3


Our end goal is to have a DataFrame that contains the person’s name as well as their preferred flavor of ice cream. The first step is to combine the joining DataFrame with people_df to retrieve the names.

We are able to specifiy a join between a specific column of a DataFrame and the index of another DataFrame.

people_flavor_df = people_flavor_df.merge(people_df, how='inner', left_on='people_id', right_index=True)
people_flavor_df
flavor_id people_id Name
0 1 1 Amy
1 2 1 Amy
2 3 1 Amy
3 1 2 Barry
4 2 2 Barry
5 1 3 Cory


We complete the process by matching the indexes in column flavor_id with that in the flavors_df.

people_flavor_df = people_flavor_df.merge(flavors_df, how='inner', 
                                          left_on='flavor_id', right_index=True)
people_flavor_df
flavor_id people_id Name Flavor
0 1 1 Amy Chocolate
3 1 2 Barry Chocolate
5 1 3 Cory Chocolate
1 2 1 Amy Strawberry
4 2 2 Barry Strawberry
2 3 1 Amy Vanilla


While our examples have been limited to inner joins, merge does allow us to specify the type of join using the how parameter.

comments powered by Disqus