Removing Rows or Columns

January 26, 2018

Removing Rows or Columns

While in general more data is better than less, it is common to have datasets where certain fields or rows are not of interest for the scope of analysis.

Pandas provides us with a handy drop function. As the name suggests, it allows us to remove rows or columns from a DataFrame.


Load Libraries

import pandas as pd


Load Data

screening_data = {'name': ['Amy', 'Barry', 'Cory'], 
                  'age': [20, 25, 30], 
                  'weight_kg': [45, 50, 55]}

screening_df = pd.DataFrame(screening_data)
screening_df
age name weight_kg
0 20 Amy 45
1 25 Barry 50
2 30 Cory 55


The default behavior of drop removes a row from the DataFrame, specified by the index of the row.

screening_df.drop(0)
age name weight_kg
1 25 Barry 50
2 30 Cory 55


It is also possible to drop multiple rows by passing a list of row indexes.

screening_df.drop([0, 1])
age name weight_kg
2 30 Cory 55


To drop a column, pass the column label and the parameter axis=1 to the function.

screening_df.drop('weight_kg', axis=1)
age name
0 20 Amy
1 25 Barry
2 30 Cory


Similar to how we can drop multiple rows, passing a list of column headers will allow us to drop multiple columns.

screening_df.drop(['age', 'weight_kg'], axis=1)
name
0 Amy
1 Barry
2 Cory
comments powered by Disqus