Generate Additional Features from Datetime

February 5, 2018

Generate Additional Features from Datetime

Parsed your date/time fields as a Python datetime object? Great! Let’s go one step further - depending on the problem, it can be extremely valuable to generate additional features of varying granularity from the datetime object.


Load Libraries

import pandas as pd


Load Data

The data from this exercise is adapted from Bay Area Bike Share’s (now known as Ford GoBike) open data project to analyse and visualise ride information.

data = [{'Trip ID': 913465, 'Start Date': '9/1/2015  00:10:00',  
         'Start Station': 'San Francisco Caltrain 2 (330 Townsend)'},
        {'Trip ID': 913468, 'Start Date': '9/1/2015  01:29:00', 
         'Start Station': 'Clay at Battery'}, 
        {'Trip ID': 913471, 'Start Date': '9/1/2015  02:15:00', 
         'Start Station': 'Embarcadero at Bryant'}]

df = pd.DataFrame(data)
df['Start Date'] = pd.to_datetime(df['Start Date'], format='%m/%d/%Y %H:%M:%S')


Vectorised Datetime Function

pandas.Series.dt allows us to apply a function to the entire column.


Extract Hour

df['Start Hour'] = df['Start Date'].dt.hour
df[['Start Date', 'Start Hour']].head()
Start Date Start Hour
0 2015-09-01 00:10:00 0
1 2015-09-01 01:29:00 1
2 2015-09-01 02:15:00 2


Extract Minute

df['Start Minute'] = df['Start Date'].dt.minute
df[['Start Date', 'Start Minute']].head()
Start Date Start Minute
0 2015-09-01 00:10:00 10
1 2015-09-01 01:29:00 29
2 2015-09-01 02:15:00 15


Extract Year

df['Start Year'] = df['Start Date'].dt.year
df[['Start Date', 'Start Year']].head()
Start Date Start Year
0 2015-09-01 00:10:00 2015
1 2015-09-01 01:29:00 2015
2 2015-09-01 02:15:00 2015


Extract Month

df['Start Month'] = df['Start Date'].dt.month
df[['Start Date', 'Start Month']].head()
Start Date Start Month
0 2015-09-01 00:10:00 9
1 2015-09-01 01:29:00 9
2 2015-09-01 02:15:00 9


Convert Numerical Month to Alphabetical Representation

We can use the calendar module’s month_name() function to perform the conversion. It works by mapping 1 to ‘January’, 2 to ‘February’ and so on (note that it is not zero-indexed so as to make it more user-friendly).

import calendar

df['Start Month Name'] = df['Start Month'].apply(lambda x: calendar.month_name[x])
df[['Start Date', 'Start Month Name']].head()
Start Date Start Month Name
0 2015-09-01 00:10:00 September
1 2015-09-01 01:29:00 September
2 2015-09-01 02:15:00 September


If you’re looking for the abbreviated version, month_abbr() has you covered.

df['Start Month Abbr'] = df['Start Month'].apply(lambda x: calendar.month_abbr[x])
df[['Start Date', 'Start Month Abbr']].head()
Start Date Start Month Abbr
0 2015-09-01 00:10:00 Sep
1 2015-09-01 01:29:00 Sep
2 2015-09-01 02:15:00 Sep


Extract Day of Week

The dayofweek function returns a zero-indexed representation of the day of the week, where 0 is Monday and 6 is Sunday.

df['Start Day'] = df['Start Date'].dt.dayofweek
df[['Start Date', 'Start Day']].head()
Start Date Start Day
0 2015-09-01 00:10:00 1
1 2015-09-01 01:29:00 1
2 2015-09-01 02:15:00 1


Convert Numerical Day to Alphabetical Representation

To get an alphabetical representation of the day of week, the calendar module provides two handy functions - day_name and day_abbr.

df['Start Day Name'] = df['Start Day'].apply(lambda x: calendar.day_name[x])
df['Start Day Abbr'] = df['Start Day'].apply(lambda x: calendar.day_abbr[x])
df[['Start Day Name', 'Start Day Abbr']].head()
Start Day Name Start Day Abbr
0 Tuesday Tue
1 Tuesday Tue
2 Tuesday Tue


Derive Week Ordinal of the Year

There are instances where we are interested to create a variable that encodes for the week ordinal of the year. One common use case is to group users onboarded in the same week and track their engagement over time.

df['Week Ordinal'] = df['Start Date'].dt.week
df[['Start Date', 'Week Ordinal']].head()
Start Date Week Ordinal
0 2015-09-01 00:10:00 36
1 2015-09-01 01:29:00 36
2 2015-09-01 02:15:00 36
comments powered by Disqus