Pandas mask() and where() functions are two related functions that are useful in Pandas to find if elements of Pandas dataframe satisfy a condition. They both preserve the shape of the dataframe. In this post, we will first see simple examples of using Pandas where() and mask() functions and then we will learn the key difference between Pandas mask() and where() function.
Let us first load Pandas and Numpy.
import pandas as pd import numpy as np
Pandas where() function example
Pandas where() function takes in a condition as input and replace values where the condition is False. Pandas where() function syntax is as follows.
DataFrame.where(cond, other=nan, inplace=False)
For illustrating how Pandas where() and mask() functions work let us create a toy dataframe with two columns.
df = pd.DataFrame(np.arange(10).reshape(-1, 2), columns=['A', 'B']) df A B 0 0 1 1 2 3 2 4 5 3 6 7 4 8 9
If we simply provide a condition to test, Pandas where() replaces values with NAs whenever the condition fails. In our example, the values of the first three rows less than or equal to 5 and the rest are greater than 5. If we use condition df >5, Pandas where() function replaces the values in the first three rows to NAs as they fail the condition.
df.where(df > 5) A B 0 NaN NaN 1 NaN NaN 2 NaN NaN 3 6.0 7.0 4 8.0 9.0
By using “other” argument, we can replace the values with the value provided by “other”
df.where(df > 5, other=999) A B 0 999 999 1 999 999 2 999 999 3 6 7 4 8 9
Pandas mask() function example
Pandas mask() function is the inverse boolean operation of where() function. Pandas mask() function takes a condition as input and replace values in the data, like Pandas where() function. However, mask() replaces values where the condition is True, in contrast to wherever it is False by where().
The toy example figure show the difference between Pandas where() function and mask() fucntion.
In the simple use case, Pandas mask() arguments very similar to where() and looks like this
DataFrame.mask(cond, other=nan, inplace=False)
Pandas mask() function can take a condition, a value to replace by using “other” argument. By default, other value is a missing value where the condition is True. In our example, the values in the last two rows are greater than 5, so they are replaced with NAs.
df.mask(df > 5) A B 0 0.0 1.0 1 2.0 3.0 2 4.0 5.0 3 NaN NaN 4 NaN NaN
With the other argument, we can specify a value to be replaced by. In the example below we replace the values with 99.
df.mask(df > 5, other=999) A B 0 0 1 1 2 3 2 4 5 3 999 999 4 999 999