In this post we will see examples of how to drop rows of a dataframe based on values of one or more columns in Pandas. Pandas drop function makes it really easy to drop rows of a dataframe using index number or index names. We can use Pandas drop function to drop rows and columns easily.
Sometimes you might want to drop rows, not by their index names, but based on values of another column. We can drop rows using column values in multiple ways. Here we will see three examples of dropping rows by condition(s) on column values.
Let us load Pandas and gapminder data for these examples.
import pandas as pd data_url = 'http://bit.ly/2cLzoxH' # read data from url as pandas dataframe gapminder = pd.read_csv(data_url) print(gapminder.head(3)) country year pop continent lifeExp gdpPercap 0 Afghanistan 1952 8425333.0 Asia 28.801 779.445314 1 Afghanistan 1957 9240934.0 Asia 30.332 820.853030 2 Afghanistan 1962 10267083.0 Asia 31.997 853.100710
Let us say we want to drop rows of this gapminder dataframe based on the values in continent column. Remember selecting and dropping operations are kind of complementary. We can achieve one by negating the other.
One way to drop rows is using subsetting operations. We first find the rows that satisfies our condition and then select only those rows. For example, if we want to drop rows if the column value of continent is not equal to “Africa”, we will first find rows whose continent is equal to “Africa”.
We can do that by checking for equality
gapminder.continent == 'Africa'
And then we can actually select the rows by subsetting.
gapminder[gapminder.continent == 'Africa'] country year pop continent lifeExp gdpPercap 24 Algeria 1952 9279525.0 Africa 43.077 2449.008185 25 Algeria 1957 10270856.0 Africa 45.685 3013.976023 26 Algeria 1962 11000948.0 Africa 48.303 2550.816880
Now we have dropped rows based on a condition using subsetting. We can check that the resulting dataframe is much smaller.
gapminder[gapminder.continent == 'Africa'].shape (624, 6)
We can also use Pandas query function to select rows and therefore drop rows based on column value.
gapminder.query('continent =="Africa"') country year pop continent lifeExp gdpPercap 24 Algeria 1952 9279525.0 Africa 43.077 2449.008185 25 Algeria 1957 10270856.0 Africa 45.685 3013.976023 26 Algeria 1962 11000948.0 Africa 48.303 2550.816880
This seems much simpler than before. We have dropped rows whose column value is not Africa with a simple statement.
We can also see that the resulting dataframe is smaller as we expect
gapminder.query('continent =="Africa"').shape (624, 6)
Let us also verify if the resulting dataframes from two ways to drop rows are identical. We can use Pandas’ equals() function to test for equality.
df1 = gapminder[gapminder.continent == 'Africa'] df2 = gapminder.query('continent =="Africa"') df1.equals(df2) True
Third way to drop rows using a condition on column values is to use drop() function. This is a round about way and one first need to get the index numbers or index names. And then we can use drop function.
df = gapminder[gapminder.continent == 'Africa'] print(df.index) df.drop(df.index)