In this post, we will learn how to select columns of a Pandas dataframe or a rows of a dataframe based on substring match in Pandas. We will use Pandas filter() function with argument “like” to select columns/rows, whose names partially match with a string of interest.
Let us load the necessary modules. We are importing seaborn in addition to Pandas to use its built in datasets to illustrate the column/row selection by substring match.
import seaborn as sns import pandas as pd
We use palmer penguin dataset and load it as a dataframe. For this toy example, we also subset the dataframe using pandas sample() function.
# load penguins data from Seaborn's built in datasets penguins = sns.load_dataset("penguins") # random sample of 6 rows using Pandas sample() function df = penguins.sample(6)
Our toy dataframe looks like this with 7 columns and row index.
df species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex 283 Gentoo Biscoe 54.3 15.7 231.0 5650.0 Male 198 Chinstrap Dream 50.1 17.9 190.0 3400.0 Female 25 Adelie Biscoe 35.3 18.9 187.0 3800.0 Female 329 Gentoo Biscoe 48.1 15.1 209.0 5500.0 Male 338 Gentoo Biscoe 47.2 13.7 214.0 4925.0 Female 208 Chinstrap Dream 45.2 16.6 191.0 3250.0 Female
To select columns, whose column name match with a substring, like “len” in the example below, we use Pandas filter function with argument “like“. We specify the substring that we want to match as the value for “like” as shown below. And this filters columns with matching substring. In the example, below we have two columns with matching substring “len”.
df.filter(like="len", axis=1) bill_length_mm flipper_length_mm 283 54.3 231.0 198 50.1 190.0 25 35.3 187.0 329 48.1 209.0 338 47.2 214.0 208 45.2 191.0
Here is another example, where there is only one column, whose column name has a matching substring.
df.filter(like="lan", axis=1) island 283 Biscoe 198 Dream 25 Biscoe 329 Biscoe 338 Biscoe 208 Dream
We can also use filter() function with like argument to select matching substrings in row indices. In this example below we use axis=0 to specify we are filtering rows, not columns, based on the substring match to the row names.
df.filter(like="3", axis=0) species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex 283 Gentoo Biscoe 54.3 15.7 231.0 5650.0 Male 329 Gentoo Biscoe 48.1 15.1 209.0 5500.0 Male 338 Gentoo Biscoe 47.2 13.7 214.0 4925.0 Female