In this post, we will learn how to use Pandas query() function. Using query() function is a great way to to filter rows of Pandas dataframe based on values of another column in the dataframe. Basically
Let us load Pandas and Numpy first.
# load numpy import numpy as np # load pandas import pandas as pd pd.__version__ 1.0.0
We use Numpy to generate data using its random module and create a Pandas dataframe with index names and column names from scratch.
df =pd.DataFrame(np.random.randint(20, size=(5,3)), index=list('ijklm'), columns=list('ABC'))
Let us check the top rows of the dataframe using head() function.
df.head() A B C i 6 19 14 j 10 7 6 k 18 10 10 l 3 7 2 m 1 11 5
The way query() works is we need to provide boolean condition to filter the rows of the dataframe as a literal sting with quotes or back-ticks. We provide the string as argument to query() function.
Under the hood, the query function evaluates the string and passes that to Pandas loc() function for filtering.
Let us say we want to fiulter the dataframe such that we keep all rows whose A column values are less than 7, the we would write that as expression “A<7” and provide that to query() function.
# filter rows with query() df.query("A<7")
And we would get a row filtered dataframe with A column values less than 7.
A B C i 6 19 14 l 3 7 2 m 1 11 5
We can write complex expression and pass that to query() function as argument and filter the rows of a data frame.
For example, if we want to check for two conditions i.e “A<7” and “B>5”, we can combine these conditions with & symbol inside the string and provide as argument to query() function.
df.query("A<7 & B>5")
Now we have filtered the dataframe using two conditions and the resulting dataframe contains a single row that satisfies our conditions.
A B C i 6 19 14
Sometimes you column names might have empty space in them. We can use query() function with column names with empty space using backticks.
new_df.query(A <7 & `B B`>5")
To summarize, Pandas offer multiple ways to filter rows of dataframe. Pandas’ query() function is one of the newer and easy ways to filter rows of a dataframe.
This post is part of the series on Pandas 101, a tutorial covering tips and tricks on using Pandas for data munging and analysis.