How To Create a Column Using Condition on Another Column in Pandas?

Often while cleaning data, one might want to create a new variable or column based on the values of another column using conditions.

In this post we will see two different ways to create a column based on values of another column using conditional statements.

First we will use NumPy’s little unknown function where to create a column in Pandas using If condition on another column’s values. Next we will use Pandas’ apply function to do the same.

Let us first load Pandas and NumPy.

import pandas as pd
import numpy as np

Let us use gapminder dataset from Carpentries for this examples.

data_url = 'http://bit.ly/2cLzoxH'
gapminder = pd.read_csv(data_url)
print(gapminder.head(n=3))
       country  year         pop continent  lifeExp   gdpPercap
0  Afghanistan  1952   8425333.0      Asia   28.801  779.445314
1  Afghanistan  1957   9240934.0      Asia   30.332  820.853030
2  Afghanistan  1962  10267083.0      Asia   31.997  853.100710

How to Create a Column Using A Condition in Pandas using NumPy?

Let us use the lifeExp column to create another column such that the new column will have True if the lifeExp >= 50 False otherwise.

We will use NumPy’s where function on the lifeExp column to create the new Boolean column.

# Create a new column called based on the value of another column
# np.where assigns True if gapminder.lifeExp>=50 
gapminder['lifeExp_ind'] = np.where(gapminder.lifeExp >= 50, True, False)
gapminder.head(n=3)

We can see that we have new column “lifeExp_ind” with True or False.


country	year	pop	continent	lifeExp	gdpPercap	lifeExp_ind
0	Afghanistan	1952	8425333.0	Asia	28.801	779.445314	False
1	Afghanistan	1957	9240934.0	Asia	30.332	820.853030	False
2	Afghanistan	1962	10267083.0	Asia	31.997	853.100710	False

How to Create a Column Using A Condition in Pandas using apply and Lambda functions

Actually we don’t have to rely on NumPy to create new column using condition on another column. Instead we can use Panda’s apply function with lambda function.

gapminder['gdpPercap_ind'] = gapminder.gdpPercap.apply(lambda x: 1 if x >= 1000 else 0)
gapminder.head()
country	year	pop	continent	lifeExp	gdpPercap	lifeExp_ind	gdpPercap_ind
0	Afghanistan	1952	8425333.0	Asia	28.801	779.445314	False	0
1	Afghanistan	1957	9240934.0	Asia	30.332	820.853030	False	0
2	Afghanistan	1962	10267083.0	Asia	31.997	853.100710	False	0

Similarly, we can create complex conditionals. In this example, we check of the variable is in a list and use if condition if present.

gapminder['continent_group'] = gapminder.continent.apply(lambda x: 1 if x in ['Europe','America', 'Oceania'] else 0)
gapminder.head(n=3)


country	year	pop	continent	lifeExp	gdpPercap	lifeExp_ind	gdpPercap_ind	continent_group
0	Afghanistan	1952	8425333.0	Asia	28.801	779.445314	False	0	0
1	Afghanistan	1957	9240934.0	Asia	30.332	820.853030	False	0	0
2	Afghanistan	1962	10267083.0	Asia	31.997	853.100710	False	0	0