How to Drop Rows Based on a Column Value in Pandas Dataframe?

In this post we will see examples of how to drop rows of a dataframe based on values of one or more columns in Pandas. Pandas drop function makes it really easy to drop rows of a dataframe using index number or index names. We can use Pandas drop function to drop rows and columns easily.

Sometimes you might want to drop rows, not by their index names, but based on values of another column. We can drop rows using column values in multiple ways. Here we will see three examples of dropping rows by condition(s) on column values.

Let us load Pandas and gapminder data for these examples.

import pandas as pd
data_url = 'http://bit.ly/2cLzoxH'
# read data from url as pandas dataframe
gapminder = pd.read_csv(data_url)
print(gapminder.head(3))

       country  year         pop continent  lifeExp   gdpPercap
0  Afghanistan  1952   8425333.0      Asia   28.801  779.445314
1  Afghanistan  1957   9240934.0      Asia   30.332  820.853030
2  Afghanistan  1962  10267083.0      Asia   31.997  853.100710

Let us say we want to drop rows of this gapminder dataframe based on the values in continent column. Remember selecting and dropping operations are kind of complementary. We can achieve one by negating the other.

One way to drop rows is using subsetting operations. We first find the rows that satisfies our condition and then select only those rows. For example, if we want to drop rows if the column value of continent is not equal to “Africa”, we will first find rows whose continent is equal to “Africa”.

We can do that by checking for equality

gapminder.continent == 'Africa'

And then we can actually select the rows by subsetting.

gapminder[gapminder.continent == 'Africa']

    country  year         pop continent  lifeExp    gdpPercap
24  Algeria  1952   9279525.0    Africa   43.077  2449.008185
25  Algeria  1957  10270856.0    Africa   45.685  3013.976023
26  Algeria  1962  11000948.0    Africa   48.303  2550.816880

Now we have dropped rows based on a condition using subsetting. We can check that the resulting dataframe is much smaller.

gapminder[gapminder.continent == 'Africa'].shape
(624, 6)

We can also use Pandas query function to select rows and therefore drop rows based on column value.

gapminder.query('continent =="Africa"')
    country  year         pop continent  lifeExp    gdpPercap
24  Algeria  1952   9279525.0    Africa   43.077  2449.008185
25  Algeria  1957  10270856.0    Africa   45.685  3013.976023
26  Algeria  1962  11000948.0    Africa   48.303  2550.816880

This seems much simpler than before. We have dropped rows whose column value is not Africa with a simple statement.

We can also see that the resulting dataframe is smaller as we expect

gapminder.query('continent =="Africa"').shape
(624, 6)

Let us also verify if the resulting dataframes from two ways to drop rows are identical. We can use Pandas’ equals() function to test for equality.

df1 = gapminder[gapminder.continent == 'Africa']
df2 = gapminder.query('continent =="Africa"')
df1.equals(df2)
True

Third way to drop rows using a condition on column values is to use drop() function. This is a round about way and one first need to get the index numbers or index names. And then we can use drop function.

df = gapminder[gapminder.continent == 'Africa']
print(df.index)
df.drop(df.index)

Share this:

Related posts: