Appending Rows to DataFrame

February 1, 2018

Appending Rows to DataFrame

Suppose you have multiple files that contain the same columns (for instance, a machine that records and generates a file on a daily basis), and you want to combine them to perform trend analysis. What is the easiest way to do this?

Enter Pandas’ very aptly named append function.


Import Libraries

import pandas as pd


Load Data

We simuate data generated by a teacher taking daily attendance of her students, and load them to a DataFrame.

mon = {'Day': ['Mon', 'Mon', 'Mon'], 
       'Name': ['Amy', 'Barry', 'Cory'], 
       'Status': ['Absent', 'Present', 'Present']}
tue = {'Day': ['Tue', 'Tue', 'Tue'], 
       'Name': ['Amy', 'Barry', 'Cory'], 
       'Status': ['Present', 'Present', 'Present']}

mon_df = pd.DataFrame(mon)
tue_df = pd.DataFrame(tue)


To combine data for both days, we simply call the append function using the first DataFrame, and supply the second DataFrame as our input to the function.

combined = mon_df.append(tue_df)
combined.head(6)
Day Name Status
0 Mon Amy Absent
1 Mon Barry Present
2 Mon Cory Present
0 Tue Amy Present
1 Tue Barry Present
2 Tue Cory Present


What happens when the columns of the DataFrames do not match? We will simulate this by creating a third DataFrame with an additional column (Punctual), while at the same time removing the Status column.

wed = {'Day': ['Wed', 'Wed', 'Wed'], 
       'Name': ['Amy', 'Barry', 'Cory'], 
       'Punctual': ['Yes', 'Yes', 'No']}

wed_df = pd.DataFrame(wed)


For columns that are not common between the appended DataFrames, Pandas will autofill rows of the non-common columns with np.nan.

combined = combined.append(wed_df)
combined
Day Name Punctual Status
0 Mon Amy NaN Absent
1 Mon Barry NaN Present
2 Mon Cory NaN Present
0 Tue Amy NaN Present
1 Tue Barry NaN Present
2 Tue Cory NaN Present
0 Wed Amy Yes NaN
1 Wed Barry Yes NaN
2 Wed Cory No NaN
comments powered by Disqus