Creating a Range of Dates
February 19, 2018
Creating a Range of Dates
Suppose you are given a dataset of observations over a period of time, and you are only interested in a subset of them (for instance, all business days). Pandas provides a useful function, date_range
, that can generate a list of desired dates which we can then use as a filter on our original dataset.
Import Libraries
import pandas as pd
Load Data
visits = {'date': ['3/10/2018', '3/11/2018', '3/12/2018', '3/13/2018', '3/14/2018', '3/15/2018'],
'visits': [30, 43, 50, 54, 45, 48]}
visits_df = pd.DataFrame(visits)
# transform 'date' column to Python datetime object
visits_df['date'] = pd.to_datetime(visits_df['date'], format='%m/%d/%Y')
visits_df
date | visits | |
---|---|---|
0 | 2018-03-10 | 30 |
1 | 2018-03-11 | 43 |
2 | 2018-03-12 | 50 |
3 | 2018-03-13 | 54 |
4 | 2018-03-14 | 45 |
5 | 2018-03-15 | 48 |
The date_range
function allows us to create a range of dates, as well as specify offsets for the range.
business_dates = pd.date_range(start='3/10/2018', end='3/15/2018', freq='B')
# convert to empty DataFrame where business_dates is the index
business_dates_df = pd.DataFrame(index=business_dates)
An inner join between the dataset of visits and our range of business dates will produce the subset that we desire.
business_day_visits = visits_df.merge(business_dates_df, how='inner',
left_on='date', right_index=True)
business_day_visits
date | visits | |
---|---|---|
2 | 2018-03-12 | 50 |
3 | 2018-03-13 | 54 |
4 | 2018-03-14 | 45 |
5 | 2018-03-15 | 48 |
Offset Aliases
The date_range
function supports a wide range of offsets that I am certain will cover your use-case, from quarter-end to year-end to hourly to secondly.
The full list of offset aliases can be found on Pandas’ documentation of Time Series/Date functionality.