Pandas filter(): Select Columns and Rows by Labels in a Dataframe

In this post, we will learn how to use Pandas filter() function to subset a dataframe based on its column names and row indexes. Pandas has a number of ways to subset a dataframe, but Pandas filter() function differ from others in a key way.

Pandas filter() function does not filter a dataframe on its content. It primarily use labels of dataframe to subset a dataframe.

Here we will see examples of how to is Pandas filter() function to select one or more columns using the column names and select one or more rows using row indices.

Let us first get started with loading the modules necessary.

 
import seaborn as sns
import pandas as pd

We will use a subset of Penguins dataset available from Seaborn’s built-in dataset. Here we load the dataset and use head() function to get a few rows.

 
df = (sns.load_dataset("penguins").
      head())

And our toy dataset look like this.

 
df
	species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex
0	Adelie	Torgersen	39.1	18.7	181.0	3750.0	Male
1	Adelie	Torgersen	39.5	17.4	186.0	3800.0	Female
2	Adelie	Torgersen	40.3	18.0	195.0	3250.0	Female
3	Adelie	Torgersen	NaN	NaN	NaN	NaN	NaN
4	Adelie	Torgersen	36.7	19.3	193.0	3450.0	Female

Our toy dataframe’ row index is integer. For the sake of easiness, let us change the row index to be string instead of numbers in order. Let us assign new row index names using “index” method in Pandas.

 
# assign index names to dataframe
df.index=["one","two","three","four","five"]

After changing the row indices our dataframe looks like this

 
df

species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex
one	Adelie	Torgersen	39.1	18.7	181.0	3750.0	Male
two	Adelie	Torgersen	39.5	17.4	186.0	3800.0	Female
three	Adelie	Torgersen	40.3	18.0	195.0	3250.0	Female
four	Adelie	Torgersen	NaN	NaN	NaN	NaN	NaN
five	Adelie	Torgersen	36.7	19.3	193.0	3450.0	Female

How to Select a Single Column using Pandas filter() function?

We can select a column from a dataframe by using the column name we want to select as a list to filter() function. In this example, we select species column from the dataframe.

 
df.filter(["species"])


       species
one	Adelie
two	Adelie
three	Adelie
four	Adelie
five	Adelie

By default, filter() function selects a column when we provide the column label as a list. We can also specify axis=1 to specify we are filtering columns.

 
df.filter(["species"], axis=1)


       species
one	Adelie
two	Adelie
three	Adelie
four	Adelie
five	Adelie

Another way to specify we are selecting columns is to use axis=”columns” and this argument makes more sense as it clearly specifies what we are selecting.

 
df.filter(["species"], axis="columns")


       species
one	Adelie
two	Adelie
three	Adelie
four	Adelie
five	Adelie

How to Select a Multiple Columns using Pandas filter() function?

To select multiple columns by their column names, we should provide the list of column names as list to Pandas filter() function.

 
df.filter(["species", "bill_length_mm"])
 

species	bill_length_mm
one	Adelie	39.1
two	Adelie	39.5
three	Adelie	40.3
four	Adelie	NaN
five	Adelie	36.7

How to Use Regular Expression to Select a Multiple Columns using Pandas filter() function?

We can also use regular expression to match the patterns of interest on column names and select multiple columns using Pandas filter() function.

In this example below, we select columns that ends with “mm” in the dataframe using “regex=’mm$'” as argument. We see that we have three columns that ends with “mm”.

 
df.filter(regex='mm$', axis="columns")

	bill_length_mm	bill_depth_mm	flipper_length_mm
one	39.1	18.7	181.0
two	39.5	17.4	186.0
three	40.3	18.0	195.0
four	NaN	NaN	NaN
five	36.7	19.3	193.0

How to Use Regular Expression to Select a Multiple Rows using Pandas filter() function?

We can also use regular expression to select rows from a dataframe based on row names. In this example below, we select rows that ends with “e” using regx=’e$’ argument. Here, we also specify axis=”index” to specify we are filtering rows not columns.

 
df.filter(regex='e$', axis="index")

Now we have filtered rows whose row names end with letter “e”; one, three and five.

 
	species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex
one	Adelie	Torgersen	39.1	18.7	181.0	3750.0	Male
three	Adelie	Torgersen	40.3	18.0	195.0	3250.0	Female
five	Adelie	Torgersen	36.7	19.3	193.0	3450.0	Female

Sometimes you might want to select some columns and rows. We can use Pandas chain operation with multiple filter() functions to select rows and columns or columns and rows of interest.

 
(df.
 filter(regex='^f', axis="index").
 filter(["species","bill_length_mm"]))

In this example, we first select some rows using regular expression and the select two columns using filter() function.

 
species	bill_length_mm
four	Adelie	NaN
five	Adelie	36.7