Often while cleaning data, one might want to create a new variable or column based on the values of another column using conditions.
In this post we will see two different ways to create a column based on values of another column using conditional statements.
First we will use NumPy’s little unknown function where to create a column in Pandas using If condition on another column’s values. Next we will use Pandas’ apply function to do the same.
Let us first load Pandas and NumPy.
import pandas as pd import numpy as np
Let us use gapminder dataset from Carpentries for this examples.
data_url = 'http://bit.ly/2cLzoxH' gapminder = pd.read_csv(data_url) print(gapminder.head(n=3))
country year pop continent lifeExp gdpPercap 0 Afghanistan 1952 8425333.0 Asia 28.801 779.445314 1 Afghanistan 1957 9240934.0 Asia 30.332 820.853030 2 Afghanistan 1962 10267083.0 Asia 31.997 853.100710
How to Create a Column Using A Condition in Pandas using NumPy?
Let us use the lifeExp column to create another column such that the new column will have True if the lifeExp >= 50 False otherwise.
We will use NumPy’s where function on the lifeExp column to create the new Boolean column.
# Create a new column called based on the value of another column # np.where assigns True if gapminder.lifeExp>=50 gapminder['lifeExp_ind'] = np.where(gapminder.lifeExp >= 50, True, False) gapminder.head(n=3)
We can see that we have new column “lifeExp_ind” with True or False.
country year pop continent lifeExp gdpPercap lifeExp_ind 0 Afghanistan 1952 8425333.0 Asia 28.801 779.445314 False 1 Afghanistan 1957 9240934.0 Asia 30.332 820.853030 False 2 Afghanistan 1962 10267083.0 Asia 31.997 853.100710 False
How to Create a Column Using A Condition in Pandas using apply and Lambda functions
Actually we don’t have to rely on NumPy to create new column using condition on another column. Instead we can use Panda’s apply function with lambda function.
gapminder['gdpPercap_ind'] = gapminder.gdpPercap.apply(lambda x: 1 if x >= 1000 else 0) gapminder.head()
country year pop continent lifeExp gdpPercap lifeExp_ind gdpPercap_ind 0 Afghanistan 1952 8425333.0 Asia 28.801 779.445314 False 0 1 Afghanistan 1957 9240934.0 Asia 30.332 820.853030 False 0 2 Afghanistan 1962 10267083.0 Asia 31.997 853.100710 False 0
Similarly, we can create complex conditionals. In this example, we check of the variable is in a list and use if condition if present.
gapminder['continent_group'] = gapminder.continent.apply(lambda x: 1 if x in ['Europe','America', 'Oceania'] else 0) gapminder.head(n=3) country year pop continent lifeExp gdpPercap lifeExp_ind gdpPercap_ind continent_group 0 Afghanistan 1952 8425333.0 Asia 28.801 779.445314 False 0 0 1 Afghanistan 1957 9240934.0 Asia 30.332 820.853030 False 0 0 2 Afghanistan 1962 10267083.0 Asia 31.997 853.100710 False 0 0