Creating a Range of Dates

February 19, 2018

Creating a Range of Dates

Suppose you are given a dataset of observations over a period of time, and you are only interested in a subset of them (for instance, all business days). Pandas provides a useful function, date_range, that can generate a list of desired dates which we can then use as a filter on our original dataset.


Import Libraries

import pandas as pd


Load Data

visits = {'date': ['3/10/2018', '3/11/2018', '3/12/2018', '3/13/2018', '3/14/2018', '3/15/2018'], 
          'visits': [30, 43, 50, 54, 45, 48]}
visits_df = pd.DataFrame(visits)

# transform 'date' column to Python datetime object
visits_df['date'] = pd.to_datetime(visits_df['date'], format='%m/%d/%Y')
visits_df
date visits
0 2018-03-10 30
1 2018-03-11 43
2 2018-03-12 50
3 2018-03-13 54
4 2018-03-14 45
5 2018-03-15 48


The date_range function allows us to create a range of dates, as well as specify offsets for the range.

business_dates = pd.date_range(start='3/10/2018', end='3/15/2018', freq='B')

# convert to empty DataFrame where business_dates is the index
business_dates_df = pd.DataFrame(index=business_dates)


An inner join between the dataset of visits and our range of business dates will produce the subset that we desire.

business_day_visits = visits_df.merge(business_dates_df, how='inner', 
                                      left_on='date', right_index=True)
business_day_visits
date visits
2 2018-03-12 50
3 2018-03-13 54
4 2018-03-14 45
5 2018-03-15 48


Offset Aliases

The date_range function supports a wide range of offsets that I am certain will cover your use-case, from quarter-end to year-end to hourly to secondly.

The full list of offset aliases can be found on Pandas’ documentation of Time Series/Date functionality.

comments powered by Disqus