• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / Pandas 101 / Pandas filter(): Select Columns and Rows by Labels in a Dataframe

Pandas filter(): Select Columns and Rows by Labels in a Dataframe

March 17, 2021 by cmdlinetips

In this post, we will learn how to use Pandas filter() function to subset a dataframe based on its column names and row indexes. Pandas has a number of ways to subset a dataframe, but Pandas filter() function differ from others in a key way.

Pandas filter() function does not filter a dataframe on its content. It primarily use labels of dataframe to subset a dataframe.

Here we will see examples of how to is Pandas filter() function to select one or more columns using the column names and select one or more rows using row indices.

Let us first get started with loading the modules necessary.

 
import seaborn as sns
import pandas as pd

We will use a subset of Penguins dataset available from Seaborn’s built-in dataset. Here we load the dataset and use head() function to get a few rows.

 
df = (sns.load_dataset("penguins").
      head())

And our toy dataset look like this.

 
df
	species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex
0	Adelie	Torgersen	39.1	18.7	181.0	3750.0	Male
1	Adelie	Torgersen	39.5	17.4	186.0	3800.0	Female
2	Adelie	Torgersen	40.3	18.0	195.0	3250.0	Female
3	Adelie	Torgersen	NaN	NaN	NaN	NaN	NaN
4	Adelie	Torgersen	36.7	19.3	193.0	3450.0	Female

Our toy dataframe’ row index is integer. For the sake of easiness, let us change the row index to be string instead of numbers in order. Let us assign new row index names using “index” method in Pandas.

 
# assign index names to dataframe
df.index=["one","two","three","four","five"]

After changing the row indices our dataframe looks like this

 
df

species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex
one	Adelie	Torgersen	39.1	18.7	181.0	3750.0	Male
two	Adelie	Torgersen	39.5	17.4	186.0	3800.0	Female
three	Adelie	Torgersen	40.3	18.0	195.0	3250.0	Female
four	Adelie	Torgersen	NaN	NaN	NaN	NaN	NaN
five	Adelie	Torgersen	36.7	19.3	193.0	3450.0	Female

How to Select a Single Column using Pandas filter() function?

We can select a column from a dataframe by using the column name we want to select as a list to filter() function. In this example, we select species column from the dataframe.

 
df.filter(["species"])


       species
one	Adelie
two	Adelie
three	Adelie
four	Adelie
five	Adelie

By default, filter() function selects a column when we provide the column label as a list. We can also specify axis=1 to specify we are filtering columns.

 
df.filter(["species"], axis=1)


       species
one	Adelie
two	Adelie
three	Adelie
four	Adelie
five	Adelie

Another way to specify we are selecting columns is to use axis=”columns” and this argument makes more sense as it clearly specifies what we are selecting.

 
df.filter(["species"], axis="columns")


       species
one	Adelie
two	Adelie
three	Adelie
four	Adelie
five	Adelie

How to Select a Multiple Columns using Pandas filter() function?

To select multiple columns by their column names, we should provide the list of column names as list to Pandas filter() function.

 
df.filter(["species", "bill_length_mm"])
 

species	bill_length_mm
one	Adelie	39.1
two	Adelie	39.5
three	Adelie	40.3
four	Adelie	NaN
five	Adelie	36.7

How to Use Regular Expression to Select a Multiple Columns using Pandas filter() function?

We can also use regular expression to match the patterns of interest on column names and select multiple columns using Pandas filter() function.

In this example below, we select columns that ends with “mm” in the dataframe using “regex=’mm$'” as argument. We see that we have three columns that ends with “mm”.

 
df.filter(regex='mm$', axis="columns")

	bill_length_mm	bill_depth_mm	flipper_length_mm
one	39.1	18.7	181.0
two	39.5	17.4	186.0
three	40.3	18.0	195.0
four	NaN	NaN	NaN
five	36.7	19.3	193.0

How to Use Regular Expression to Select a Multiple Rows using Pandas filter() function?

We can also use regular expression to select rows from a dataframe based on row names. In this example below, we select rows that ends with “e” using regx=’e$’ argument. Here, we also specify axis=”index” to specify we are filtering rows not columns.

 
df.filter(regex='e$', axis="index")

Now we have filtered rows whose row names end with letter “e”; one, three and five.

 
	species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex
one	Adelie	Torgersen	39.1	18.7	181.0	3750.0	Male
three	Adelie	Torgersen	40.3	18.0	195.0	3250.0	Female
five	Adelie	Torgersen	36.7	19.3	193.0	3450.0	Female

Sometimes you might want to select some columns and rows. We can use Pandas chain operation with multiple filter() functions to select rows and columns or columns and rows of interest.

 
(df.
 filter(regex='^f', axis="index").
 filter(["species","bill_length_mm"]))

In this example, we first select some rows using regular expression and the select two columns using filter() function.

 
species	bill_length_mm
four	Adelie	NaN
five	Adelie	36.7

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

Default ThumbnailHow To Delete Rows in Pandas Dataframe dplyr select(): How to Select Columns?dplyr select(): Select one or more variables from a dataframe 4 ways to select columns with dplyr select()4 ways to select columns from a dataframe with dplyr’s select() Default Thumbnaildplyr filter(): Filter/Select Rows based on conditions

Filed Under: Pandas 101, Pandas filter, Python Tagged With: Pandas 101, Python

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version