The newest Pandas release Pandas 2.2.0 has one of the most useful functions case_when() available on a Pandas Series object. Often you might want to create a new variable from an existing variable using multiple conditions. For a simple binary condition we can use Pandas’ where() function. With the new case_when() function we can apply complex conditions to create a new variable. In this post, we will multiple of examples of how to use Pandas case_when() function.
Let us load Pandas and Numpy for creating some toy data.
import pandas as pd import numpy as np
Until now there was no equivalent to widely useful SQL case_when() function in the pandas library. With Pandas version 2.2.0 we have case_when() function in Pandas. Let us check the version of the installed Pandas.
pd.<strong>version</strong> 2.2.0
If the Pandas version is less than 2.2.0, you can install Pandas version 2.2.0 using pip by specifying the version we want to install.
pip install pandas==2.2.0
To get started on how to use Pandas case_when() fucntion, let us create a simple Pandas Series with 5 elements.
scores = pd.Series(np.random.randint(10,100,5)) scores 0 29 1 45 2 90 3 69 4 68 dtype: int64
Pandas case_when() syntax
Pandas case_when() function takes one argument “caselist”. caselist expects a list of tuples of conditions and expected replacement of the form (condition0, replacement0), (condition1, replacement1). Here condition is a boolean variable.
Pandas case_when() simple example
Let us start with a simple example using one condition. Here we check for the condition if the score value is greater than or equal to 35 and provide the replacement. We can see that the new Series we get as a result has the replacement whenever the condition is satisfied. And it has left the original value when the condition is not met.
scores.case_when(caselist=[(scores >= 35, "pass")]) 0 29 1 pass 2 pass 3 pass 4 pass dtype: object
Pandas case_when() with two conditions
In the example below we specific two conditions and their replacements as argument to Pandas case_when() function.
scores.case_when(caselist=[(scores >= 35, "pass"), (scores < 35, "fail") ]) 0 fail 1 pass 2 pass 3 pass 4 pass dtype: object
Create a New column based on existing column using Pandas case_when() in a dataframe
Pandas case_when() is extremely useful when you want to create a new column in a dataframe based on the values of existing column using multiple conditions.
First, let us create a data frame with one column.
df = pd.DataFrame({"scores": scores}) df scores 0 29 1 45 2 90 3 69 4 68
We can create a new column that assigns binary grades based on the values of scores column.
df['grade'] = df.scores.case_when(caselist=[(scores >= 35, "pass"), (scores < 35, "fail")]) df <pre><code> scores grade </code></pre> 0 29 fail 1 45 pass 2 90 pass 3 69 pass 4 68 pass
In the example below we use multiple complex conditions to create a new column with multiple levels of grades defined based on the values of the scores column.
df['grade'] = df.scores.case_when(caselist=[(((scores >= 35) & (scores < 50)), "C"), (((scores >= 50) & (scores < 80)), "B"), (scores >= 80, "A"), (scores < 35, "D")]) df <pre><code>scores grade </code></pre> 0 29 D 1 45 C 2 90 A 3 69 B 4 68 B