Univariate Analysis

March 3, 2018

Univariate Analysis

In general, there are three types of quantitative analysis.

We will focus on univariate analysis in this article.


Import Libraries

import matplotlib.pyplot as plt
import numpy as np
import scipy as sp
import seaborn as sns

%matplotlib inline


Set Seaborn Visualisation Options

sns.set_style('whitegrid')
sns.set_palette('pastel')


Load Dataset

We will be using the tips dataset, which contains information collected by a waiter about his tips as well as the characteristics of the diner.

tips = sns.load_dataset('tips')
tips.head()
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4


Nominal Variables

For nominal variables, we can summarise the data using either a frequency table or a bar chart.

sex_freq_table = tips['sex'].value_counts().reset_index()
sex_freq_table
index sex
0 Male 157
1 Female 87
# to control the aspect ratio of the plot
fig, ax = plt.subplots(figsize=(2.5, 5))

# unfortunate variable naming
sex_bar = sns.barplot(x=sex_freq_table['index'], y=sex_freq_table['sex'])
ax.set_xlabel('Gender')
ax.set_ylabel('Count', rotation=0, labelpad=25)

sns.despine()

png


Interval (Numerical) Variables

For numerical variables, we are interested in the following.


Measures of Central Tendency

NumPy is able to calculate mean and median, while we will utilise scipy to derive the mode.

print(np.mean(tips['total_bill']))
19.785942622950824
print(np.median(tips['total_bill']))
17.795
print(sp.stats.mode(tips['total_bill']))
ModeResult(mode=array([13.42]), count=array([3]))


Measures of Dispersion

NumPy also comes with utility functions that calculate standard deviation and variance. We can adapt the functions to be used for both populations and samples by setting the appropriate degrees of freedom.

print(np.std(tips['total_bill'], ddof=0))
8.88415057777113
print(np.var(tips['total_bill'], ddof=0))
78.92813148851113
comments powered by Disqus