Removing Rows or Columns
January 26, 2018
Removing Rows or Columns
While in general more data is better than less, it is common to have datasets where certain fields or rows are not of interest for the scope of analysis.
Pandas provides us with a handy drop
function. As the name suggests, it allows us to remove rows or columns from a DataFrame.
Load Libraries
import pandas as pd
Load Data
screening_data = {'name': ['Amy', 'Barry', 'Cory'],
'age': [20, 25, 30],
'weight_kg': [45, 50, 55]}
screening_df = pd.DataFrame(screening_data)
screening_df
age | name | weight_kg | |
---|---|---|---|
0 | 20 | Amy | 45 |
1 | 25 | Barry | 50 |
2 | 30 | Cory | 55 |
The default behavior of drop
removes a row from the DataFrame, specified by the index of the row.
screening_df.drop(0)
age | name | weight_kg | |
---|---|---|---|
1 | 25 | Barry | 50 |
2 | 30 | Cory | 55 |
It is also possible to drop multiple rows by passing a list of row indexes.
screening_df.drop([0, 1])
age | name | weight_kg | |
---|---|---|---|
2 | 30 | Cory | 55 |
To drop a column, pass the column label and the parameter axis=1
to the function.
screening_df.drop('weight_kg', axis=1)
age | name | |
---|---|---|
0 | 20 | Amy |
1 | 25 | Barry |
2 | 30 | Cory |
Similar to how we can drop multiple rows, passing a list of column headers will allow us to drop multiple columns.
screening_df.drop(['age', 'weight_kg'], axis=1)
name | |
---|---|
0 | Amy |
1 | Barry |
2 | Cory |