In this quick tutorial, we will learn how to create a new column using if else condition on an existing column in a Pandas dataframe.
To add new column using a condional on existing column we will use Numpy’s where function. So, let us load both numby and Pandas to get started.
import pandas as pd import numpy as np
We. will use one of the built-in datasets from Seaborn package. Let us load Seaborn and the health expenditure dataset.
import seaborn as sns healthexp = sns.load_dataset("healthexp")
Our health expenditure dataset looks like this.
healthexp.head() Year Country Spending_USD Life_Expectancy 0 1970 Germany 252.311 70.6 1 1970 France 192.143 72.2 2 1970 Great Britain 123.993 71.9 3 1970 Japan 150.437 72.0 4 1970 USA 326.961 70.9
We will create a new column using the existing Country column with if condition. Our new column has Yes value if the country value for that row is USA, No if it is not. We use Numpy’s where() function to check the country values and create a new column.
healthexp['Is_USA'] = np.where(healthexp["Country"] == 'USA', "Yes", "No") healthexp.head() Year Country Spending_USD Life_Expectancy USA Is_USA 0 1970 Germany 252.311 70.6 False No 1 1970 France 192.143 72.2 False No 2 1970 Great Britain 123.993 71.9 False No 3 1970 Japan 150.437 72.0 False No 4 1970 USA 326.961 70.9 True Yes
In the second example, we create a boolean column based on the value of Country column. If the country column value is USA, we have True else False.
healthexp['Is_USA'] = np.where(healthexp["Country"] == 'USA', True, False) healthexp.head() Year Country Spending_USD Life_Expectancy USA Is_USA 0 1970 Germany 252.311 70.6 False False 1 1970 France 192.143 72.2 False False 2 1970 Great Britain 123.993 71.9 False False 3 1970 Japan 150.437 72.0 False False 4 1970 USA 326.961 70.9 True True