Bivariate Analysis

March 5, 2018

Bivariate Analysis

In general, there are three types of quantitative analysis.

This article will touch on bivariate analysis.

Exhaustively, there are three possible combinations for the variables. Each combination will be addressed in its specific section below.

  1. Both variables are categorical.
  2. One variable is numeric, the other categorical.
  3. Both variables are numeric.


Import Libraries

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

%matplotlib inline


Set Seaborn Visualisation Options

sns.set_style('whitegrid')
sns.set_palette('pastel')


Load Dataset

We will be using the tips dataset, which contains information collected by a waiter about his tips as well as the characteristics of the diner.

tips = sns.load_dataset('tips')
tips.head()
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4


Both Variables Are Categorical

This is best handled using a frequency table, post performing a groupby on both variables.

gb_sex_smoker = tips.groupby(['sex', 'smoker'])
gb_sex_smoker = gb_sex_smoker.size()
gb_sex_smoker = gb_sex_smoker.reset_index().rename(columns={0: 'count'})
gb_sex_smoker
sex smoker count
0 Male Yes 60
1 Male No 97
2 Female Yes 33
3 Female No 54


One Variable is Numeric, the Other Categorical

For this combination, we want to understand the central tendency and dispersion of the numerical variable, split by the categorical variable.

fig, ax = plt.subplots(figsize=(2.5, 6.5))

sex_tip_box = sns.boxplot(x=tips['sex'], y=tips['tip'])
sex_tip_box.set(xlabel='Gender', ylabel='Tip')
sex_tip_box.set_ylabel(sex_tip_box.get_ylabel(), rotation=0, labelpad=15)
sns.despine()

png


Both Variables Are Numeric

If both values are numeric, we will be interested in how an increase or decrease in one variable affects the other. A scatterplot is the best way to demonstrate this graphically.

bill_tip_scatter = sns.regplot(x=tips['total_bill'], y=tips['tip'])
bill_tip_scatter.set(xlabel='Total Bill', ylabel='Tip')
bill_tip_scatter.set_ylabel(bill_tip_scatter.get_ylabel(), rotation=0, labelpad=15)
sns.despine()

png

comments powered by Disqus