• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / Python / Pandas DataFrame / Filter Rows with Query / Pandas query(): How to Filter Rows of Pandas Dataframe?

Pandas query(): How to Filter Rows of Pandas Dataframe?

July 31, 2019 by cmdlinetips

Pandas offer many ways to select rows from a dataframe. One of the commonly used approach to filter rows of a dataframe is to use the indexing in multiple ways. For example, one can use label based indexing with loc function.

Introducing pandas query() function, Jake VanderPlas nicely explains,

While these abstractions are efficient and effective for many common use cases, they often rely on the creation of temporary intermediate objects, which can cause undue overhead in computational time and memory use.

Not just that, often this involve slightly messier code with a lot of repetition. A simpler alternative in Pandas to select or filter rows dataframe with specified condition is to use query function Pandas.

In this post, we will see multiple examples of using query function in Pandas to select or filter rows of Pandas data frame based values of columns.

Let us first load Pandas.

# import pandas
import pandas as pd

Let us load gapminder dataset to work through examples of using query() to filter rows.

data_url = 'http://bit.ly/2cLzoxH'
# read data from url as pandas dataframe
gapminder = pd.read_csv(data_url)
print(gapminder.head(3))

       country  year         pop continent  lifeExp   gdpPercap
0  Afghanistan  1952   8425333.0      Asia   28.801  779.445314
1  Afghanistan  1957   9240934.0      Asia   30.332  820.853030
2  Afghanistan  1962  10267083.0      Asia   31.997  853.100710

Filtering Rows of Pandas Dataframe – the usual way

Let us say we want to subset the gapminder dataframe such that we want all rows whose country value is United States. We can use Pandas indexing to subset the gapminder dataframe for United States as follows. Here we first create a boolean series and use it to filter the dataframe.

gapminder[gapminder.country=="United States"].head()

And we would get

	country	year	pop	continent	lifeExp	gdpPercap
1608	United States	1952	157553000.0	Americas	68.44	13990.48208
1609	United States	1957	171984000.0	Americas	69.49	14847.12712
1610	United States	1962	186538000.0	Americas	70.21	16173.14586
1611	United States	1967	198712000.0	Americas	70.76	19530.36557
1612	United States	1972	209896000.0	Americas	71.34	21806.03594

Filtering Rows of Pandas Dataframe by variable using query() function

In the above example, we can see that we have to create an intermediate boolean variable and also have to repeat “gapminder” two times.

Filtering Rows with Pandas query(): Example 1

A cleaner approach to filter Pandas dataframe is to use Pandas query() function and select rows. The way to query() function to filter rows is to specify the condition within quotes inside query().

# filter rows with Pandas query
gapminder.query('country=="United States"').head()

And we would get the same answer as above.

Filtering Rows with Pandas query(): Example 2

In the above query() example we used string to select rows of a dataframe. We can also use it to select based on numerical values. For example, to select rows for year 1952, we can write

gapminder.query('year==1952').head()

And we would get a new dataframe for the year 1952.

	country	year	pop	continent	lifeExp	gdpPercap
0	Afghanistan	1952	8425333.0	Asia	28.801	779.445314
12	Albania	1952	1282697.0	Europe	55.230	1601.056136
24	Algeria	1952	9279525.0	Africa	43.077	2449.008185
36	Angola	1952	4232095.0	Africa	30.015	3520.610273
48	Argentina	1952	17876956.0	Americas	62.485	5911.315053

Filtering Rows with Pandas query() multiple conditions: Example 3

Similarly, we use boolean operators to combine multiple conditions. For example, if want to select rows corresponding to US for the year greater than 1996,

gapminder.query('country=="United States" & year > 1996')

And we would get

	country	year	pop	continent	lifeExp	gdpPercap
1617	United States	1997	272911760.0	Americas	76.810	35767.43303
1618	United States	2002	287675526.0	Americas	77.310	39097.09955
1619	United States	2007	301139947.0	Americas	78.242	42951.65309

Filtering Rows with Pandas query() Multiple Conditions: Example 4

We can also use query() to check for matches with a list of values corresponding to a column. Here we use in operator to check for equality.

gapminder.query('country in ["United States", "United Kingdom"] & year > 2000')

And we would get

	country	year	pop	continent	lifeExp	gdpPercap
1606	United Kingdom	2002	59912431.0	Europe	78.471	29478.99919
1607	United Kingdom	2007	60776238.0	Europe	79.425	33203.26128
1618	United States	2002	287675526.0	Americas	77.310	39097.09955
1619	United States	2007	301139947.0	Americas	78.242	42951.65309

Filtering Rows with Pandas query(): Example 5

Starting with Pandas 1.0.0. query() function has expanded the functionalities of using backtick quoting for more than only spaces. In the simplest use case backticks quoted variable is useful for column names with spaces in it. For example, if we have data frame with column ‘C C’ with space

df = pd.DataFrame({'A': range(1, 6),
                   'B': range(10, 0, -2),
                   'C C': range(10, 5, -1)})

We can use query function with backticks quoting as shown in Pandas documentation.

df.query('B == `C C`')

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

Default ThumbnailHow to Filter Rows Based on Column Values with query function in Pandas? Pandas Filter/Select Rows Based on Column ValuesHow To Filter Pandas Dataframe By Values of Column? Default ThumbnailHow to Drop Rows Based on a Column Value in Pandas Dataframe? Default ThumbnailHow to Select Top N Rows with the Largest Values in a Column(s) in Pandas?

Filed Under: Filter Rows with Query, Pandas 101, Pandas Query, Python Tips Tagged With: Pandas Query, Select Rows with Query

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version