Generate Additional Features from Datetime
Parsed your date/time fields as a Python datetime
object? Great! Let’s go one step further - depending on the problem, it can be extremely valuable to generate additional features of varying granularity from the datetime
object.
Load Libraries
import pandas as pd
Load Data
The data from this exercise is adapted from Bay Area Bike Share’s (now known as Ford GoBike) open data project to analyse and visualise ride information.
data = [{'Trip ID': 913465, 'Start Date': '9/1/2015 00:10:00',
'Start Station': 'San Francisco Caltrain 2 (330 Townsend)'},
{'Trip ID': 913468, 'Start Date': '9/1/2015 01:29:00',
'Start Station': 'Clay at Battery'},
{'Trip ID': 913471, 'Start Date': '9/1/2015 02:15:00',
'Start Station': 'Embarcadero at Bryant'}]
df = pd.DataFrame(data)
df['Start Date'] = pd.to_datetime(df['Start Date'], format='%m/%d/%Y %H:%M:%S')
Vectorised Datetime Function
pandas.Series.dt
allows us to apply a function to the entire column.
df['Start Hour'] = df['Start Date'].dt.hour
df[['Start Date', 'Start Hour']].head()
|
Start Date |
Start Hour |
0 |
2015-09-01 00:10:00 |
0 |
1 |
2015-09-01 01:29:00 |
1 |
2 |
2015-09-01 02:15:00 |
2 |
df['Start Minute'] = df['Start Date'].dt.minute
df[['Start Date', 'Start Minute']].head()
|
Start Date |
Start Minute |
0 |
2015-09-01 00:10:00 |
10 |
1 |
2015-09-01 01:29:00 |
29 |
2 |
2015-09-01 02:15:00 |
15 |
df['Start Year'] = df['Start Date'].dt.year
df[['Start Date', 'Start Year']].head()
|
Start Date |
Start Year |
0 |
2015-09-01 00:10:00 |
2015 |
1 |
2015-09-01 01:29:00 |
2015 |
2 |
2015-09-01 02:15:00 |
2015 |
df['Start Month'] = df['Start Date'].dt.month
df[['Start Date', 'Start Month']].head()
|
Start Date |
Start Month |
0 |
2015-09-01 00:10:00 |
9 |
1 |
2015-09-01 01:29:00 |
9 |
2 |
2015-09-01 02:15:00 |
9 |
Convert Numerical Month to Alphabetical Representation
We can use the calendar
module’s month_name()
function to perform the conversion. It works by mapping 1 to ‘January’, 2 to ‘February’ and so on (note that it is not zero-indexed so as to make it more user-friendly).
import calendar
df['Start Month Name'] = df['Start Month'].apply(lambda x: calendar.month_name[x])
df[['Start Date', 'Start Month Name']].head()
|
Start Date |
Start Month Name |
0 |
2015-09-01 00:10:00 |
September |
1 |
2015-09-01 01:29:00 |
September |
2 |
2015-09-01 02:15:00 |
September |
If you’re looking for the abbreviated version, month_abbr()
has you covered.
df['Start Month Abbr'] = df['Start Month'].apply(lambda x: calendar.month_abbr[x])
df[['Start Date', 'Start Month Abbr']].head()
|
Start Date |
Start Month Abbr |
0 |
2015-09-01 00:10:00 |
Sep |
1 |
2015-09-01 01:29:00 |
Sep |
2 |
2015-09-01 02:15:00 |
Sep |
The dayofweek
function returns a zero-indexed representation of the day of the week, where 0 is Monday and 6 is Sunday.
df['Start Day'] = df['Start Date'].dt.dayofweek
df[['Start Date', 'Start Day']].head()
|
Start Date |
Start Day |
0 |
2015-09-01 00:10:00 |
1 |
1 |
2015-09-01 01:29:00 |
1 |
2 |
2015-09-01 02:15:00 |
1 |
Convert Numerical Day to Alphabetical Representation
To get an alphabetical representation of the day of week, the calendar
module provides two handy functions - day_name
and day_abbr
.
df['Start Day Name'] = df['Start Day'].apply(lambda x: calendar.day_name[x])
df['Start Day Abbr'] = df['Start Day'].apply(lambda x: calendar.day_abbr[x])
df[['Start Day Name', 'Start Day Abbr']].head()
|
Start Day Name |
Start Day Abbr |
0 |
Tuesday |
Tue |
1 |
Tuesday |
Tue |
2 |
Tuesday |
Tue |
Derive Week Ordinal of the Year
There are instances where we are interested to create a variable that encodes for the week ordinal of the year. One common use case is to group users onboarded in the same week and track their engagement over time.
df['Week Ordinal'] = df['Start Date'].dt.week
df[['Start Date', 'Week Ordinal']].head()
|
Start Date |
Week Ordinal |
0 |
2015-09-01 00:10:00 |
36 |
1 |
2015-09-01 01:29:00 |
36 |
2 |
2015-09-01 02:15:00 |
36 |