Applying Custom Functions to GroupBy Object
February 23, 2018
Applying Custom Functions to GroupBy Object
If you are interested to learn about in-built, commonly used groupby operations, visit - link to commonly used groupby functions -
If what you are intending to do cannot be accomplished through the standard groupby operations, Pandas also provides us with the option to apply a custom function to each group within a DataFrameGroupBy
object.
Import Libraries
import pandas as pd
import seaborn as sns # to retrieve the tips dataset
Load Data
The tips dataset contains information about how much people tipped, as well as their gender and whether they are smokers.
tips = sns.load_dataset('tips')
tips.head()
total_bill | tip | sex | smoker | day | time | size | |
---|---|---|---|---|---|---|---|
0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
Recap - Applying In-built Functions to GroupBy Objects
Suppose we want to calculate the average tip amount, broken by gender and whether if the patron was a smoker. This should yield 4 values.
- Male, smoker
- Male, non-smoker
- Female, smoker
- Female, non-smoker
gb_sex_smoker_tips = tips.groupby(['sex', 'smoker'])['tip'].mean().reset_index()
gb_sex_smoker_tips = gb_sex_smoker_tips.rename(columns={'tip': 'avg_tip'})
gb_sex_smoker_tips
sex | smoker | avg_tip | |
---|---|---|---|
0 | Male | Yes | 3.051167 |
1 | Male | No | 3.113402 |
2 | Female | Yes | 2.931515 |
3 | Female | No | 2.773519 |
Applying Custom Functions
We will recreate the results above to demonstrate how we can apply custom functions to a DataFrameGroupBy
object. This also allows us to gain a better understanding of what happens under the hood of the in-built functions.
- We begin by creating a
DataFrameGroupBy
object from our DataFrame by passing our desired columns (sex
andsmoker
) to thegroupby
function. - The
DataFrameGroupBy
object callsapply
, and we pass our function,group_average_tip
as an input to the function. - The
group_average_tip
function does not receive the entireDataFrameGroupBy
object at once, but rather each group within theDataFrameGroupBy
object sequentially. - The
group_average_tip
function calculates the average tip for the group and appends the result to the group. - Finally, the groups are merged.
def group_average_tip(g):
total_tips = g['tip'].sum()
count = len(g)
return (total_tips / count)
group_average = tips.groupby(['sex', 'smoker']).apply(group_average_tip)
group_average = group_average.reset_index().rename(columns={0: 'avg_tip'})
group_average
sex | smoker | avg_tip | |
---|---|---|---|
0 | Male | Yes | 3.051167 |
1 | Male | No | 3.113402 |
2 | Female | Yes | 2.931515 |
3 | Female | No | 2.773519 |