Removing Rows or Columns

January 26, 2018

Removing Rows or Columns

While in general more data is better than less, it is common to have datasets where certain fields or rows are not of interest for the scope of analysis.

Pandas provides us with a handy drop function. As the name suggests, it allows us to remove rows or columns from a DataFrame.

Load Libraries

import pandas as pd

Load Data

screening_data = {'name': ['Amy', 'Barry', 'Cory'], 
                  'age': [20, 25, 30], 
                  'weight_kg': [45, 50, 55]}

screening_df = pd.DataFrame(screening_data)
screening_df

	age	name	weight_kg
0	20	Amy	45
1	25	Barry	50
2	30	Cory	55

The default behavior of drop removes a row from the DataFrame, specified by the index of the row.

screening_df.drop(0)

	age	name	weight_kg
1	25	Barry	50
2	30	Cory	55

It is also possible to drop multiple rows by passing a list of row indexes.

screening_df.drop([0, 1])

	age	name	weight_kg
2	30	Cory	55

To drop a column, pass the column label and the parameter axis=1 to the function.

screening_df.drop('weight_kg', axis=1)

	age	name
0	20	Amy
1	25	Barry
2	30	Cory

Similar to how we can drop multiple rows, passing a list of column headers will allow us to drop multiple columns.

screening_df.drop(['age', 'weight_kg'], axis=1)

	name
0	Amy
1	Barry
2	Cory

Removing Rows or Columns

January 26, 2018