Generate Additional Features from Datetime

February 5, 2018

Generate Additional Features from Datetime

Parsed your date/time fields as a Python datetime object? Great! Let’s go one step further - depending on the problem, it can be extremely valuable to generate additional features of varying granularity from the datetime object.

Load Libraries

import pandas as pd

Load Data

The data from this exercise is adapted from Bay Area Bike Share’s (now known as Ford GoBike) open data project to analyse and visualise ride information.

data = [{'Trip ID': 913465, 'Start Date': '9/1/2015  00:10:00',  
         'Start Station': 'San Francisco Caltrain 2 (330 Townsend)'},
        {'Trip ID': 913468, 'Start Date': '9/1/2015  01:29:00', 
         'Start Station': 'Clay at Battery'}, 
        {'Trip ID': 913471, 'Start Date': '9/1/2015  02:15:00', 
         'Start Station': 'Embarcadero at Bryant'}]

df = pd.DataFrame(data)
df['Start Date'] = pd.to_datetime(df['Start Date'], format='%m/%d/%Y %H:%M:%S')

Vectorised Datetime Function

pandas.Series.dt allows us to apply a function to the entire column.

Extract Hour

df['Start Hour'] = df['Start Date'].dt.hour
df[['Start Date', 'Start Hour']].head()

	Start Date	Start Hour
0	2015-09-01 00:10:00	0
1	2015-09-01 01:29:00	1
2	2015-09-01 02:15:00	2

Extract Minute

df['Start Minute'] = df['Start Date'].dt.minute
df[['Start Date', 'Start Minute']].head()

	Start Date	Start Minute
0	2015-09-01 00:10:00	10
1	2015-09-01 01:29:00	29
2	2015-09-01 02:15:00	15

Extract Year

df['Start Year'] = df['Start Date'].dt.year
df[['Start Date', 'Start Year']].head()

	Start Date	Start Year
0	2015-09-01 00:10:00	2015
1	2015-09-01 01:29:00	2015
2	2015-09-01 02:15:00	2015

Extract Month

df['Start Month'] = df['Start Date'].dt.month
df[['Start Date', 'Start Month']].head()

	Start Date	Start Month
0	2015-09-01 00:10:00	9
1	2015-09-01 01:29:00	9
2	2015-09-01 02:15:00	9

Convert Numerical Month to Alphabetical Representation

We can use the calendar module’s month_name() function to perform the conversion. It works by mapping 1 to ‘January’, 2 to ‘February’ and so on (note that it is not zero-indexed so as to make it more user-friendly).

import calendar

df['Start Month Name'] = df['Start Month'].apply(lambda x: calendar.month_name[x])
df[['Start Date', 'Start Month Name']].head()

	Start Date	Start Month Name
0	2015-09-01 00:10:00	September
1	2015-09-01 01:29:00	September
2	2015-09-01 02:15:00	September

If you’re looking for the abbreviated version, month_abbr() has you covered.

df['Start Month Abbr'] = df['Start Month'].apply(lambda x: calendar.month_abbr[x])
df[['Start Date', 'Start Month Abbr']].head()

	Start Date	Start Month Abbr
0	2015-09-01 00:10:00	Sep
1	2015-09-01 01:29:00	Sep
2	2015-09-01 02:15:00	Sep

Extract Day of Week

The dayofweek function returns a zero-indexed representation of the day of the week, where 0 is Monday and 6 is Sunday.

df['Start Day'] = df['Start Date'].dt.dayofweek
df[['Start Date', 'Start Day']].head()

	Start Date	Start Day
0	2015-09-01 00:10:00	1
1	2015-09-01 01:29:00	1
2	2015-09-01 02:15:00	1

Convert Numerical Day to Alphabetical Representation

To get an alphabetical representation of the day of week, the calendar module provides two handy functions - day_name and day_abbr.

df['Start Day Name'] = df['Start Day'].apply(lambda x: calendar.day_name[x])
df['Start Day Abbr'] = df['Start Day'].apply(lambda x: calendar.day_abbr[x])
df[['Start Day Name', 'Start Day Abbr']].head()

	Start Day Name	Start Day Abbr
0	Tuesday	Tue
1	Tuesday	Tue
2	Tuesday	Tue

Derive Week Ordinal of the Year

There are instances where we are interested to create a variable that encodes for the week ordinal of the year. One common use case is to group users onboarded in the same week and track their engagement over time.

df['Week Ordinal'] = df['Start Date'].dt.week
df[['Start Date', 'Week Ordinal']].head()

	Start Date	Week Ordinal
0	2015-09-01 00:10:00	36
1	2015-09-01 01:29:00	36
2	2015-09-01 02:15:00	36

Generate Additional Features from Datetime

February 5, 2018