Creating a Range of Dates
February 19, 2018
Creating a Range of Dates
Suppose you are given a dataset of observations over a period of time, and you are only interested in a subset of them (for instance, all business days). Pandas provides a useful function, date_range, that can generate a list of desired dates which we can then use as a filter on our original dataset.
Import Libraries
import pandas as pd
Load Data
visits = {'date': ['3/10/2018', '3/11/2018', '3/12/2018', '3/13/2018', '3/14/2018', '3/15/2018'],
'visits': [30, 43, 50, 54, 45, 48]}
visits_df = pd.DataFrame(visits)
# transform 'date' column to Python datetime object
visits_df['date'] = pd.to_datetime(visits_df['date'], format='%m/%d/%Y')
visits_df
| date | visits | |
|---|---|---|
| 0 | 2018-03-10 | 30 |
| 1 | 2018-03-11 | 43 |
| 2 | 2018-03-12 | 50 |
| 3 | 2018-03-13 | 54 |
| 4 | 2018-03-14 | 45 |
| 5 | 2018-03-15 | 48 |
The date_range function allows us to create a range of dates, as well as specify offsets for the range.
business_dates = pd.date_range(start='3/10/2018', end='3/15/2018', freq='B')
# convert to empty DataFrame where business_dates is the index
business_dates_df = pd.DataFrame(index=business_dates)
An inner join between the dataset of visits and our range of business dates will produce the subset that we desire.
business_day_visits = visits_df.merge(business_dates_df, how='inner',
left_on='date', right_index=True)
business_day_visits
| date | visits | |
|---|---|---|
| 2 | 2018-03-12 | 50 |
| 3 | 2018-03-13 | 54 |
| 4 | 2018-03-14 | 45 |
| 5 | 2018-03-15 | 48 |
Offset Aliases
The date_range function supports a wide range of offsets that I am certain will cover your use-case, from quarter-end to year-end to hourly to secondly.
The full list of offset aliases can be found on Pandas’ documentation of Time Series/Date functionality.