• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / R / tidyverse 101 / dplyr matches(): select columns using regular expression

dplyr matches(): select columns using regular expression

June 20, 2022 by cmdlinetips

This quick post has an example using a neat dplyr function matches() to select columns using regular expressions.

dplyr has a number of helper functions, contains(), starts_with() and others, for selecting columns based on certain condition. For example if you interested selecting columns based on how its starts with we can use start_with() function. However, these helpe function cannot handle regular expression. And I just came across matches() function from dplyr, the only function that can handle regular expression to select matching columns.

Here is a quick example showing how it works. First, load the packages needed.

library(tidyverse)
library(palmerpenguins)
packageVersion("dplyr")
## [1] '1.0.9'

We will use penguins dataset from palmer penguins R package. And to illustrate matches using regular expression, let us make a small change to one of the column names.

 penguins = penguins %>% 
  rename(cill_length_mm=bill_length_mm)

Penguins data looks like this. Notice the spelling mistake “bill” to “cill” for bill_length_mm column that we created.

penguins %>% head()

## # A tibble: 5 × 8
##   species island cill_length_mm bill_depth_mm flipper_length_… body_mass_g sex  
##   <fct>   <fct>           <dbl>         <dbl>            <int>       <int> <fct>
## 1 Adelie  Torge…           39.1          18.7              181        3750 male 
## 2 Adelie  Torge…           39.5          17.4              186        3800 fema…
## 3 Adelie  Torge…           40.3          18                195        3250 fema…
## 4 Adelie  Torge…           NA            NA                 NA          NA <NA> 
## 5 Adelie  Torge…           36.7          19.3              193        3450 fema…
## # … with 1 more variable: year <int>

dplyr matches(): How to Select columns whose names matches a regular expression?

We can use dplyr’s matches() function to select columns whose names matches with a string or a pattern using regular expression. The difference between matches() and contains() is that matches can take regular expression and match the pattern, while contains cannot do it.

We can use matches to select columns that contains matching string. The matching string does not have to be in the beginning or at the end like starts_with() or ends_with() functions. In the example below, we select a column based on string that is in the middle of the name.

penguins %>%
  select(matches("bill")) 

In this example, we have not used any regular expression. Therefore we could have used contains() function to get the same results.

## # A tibble: 344 × 1
##    bill_depth_mm
##            <dbl>
##  1          18.7
##  2          17.4
##  3          18  
##  4          NA  
##  5          19.3
##  6          20.6
##  7          17.8
##  8          19.6
##  9          18.1
## 10          20.2
## # … with 334 more rows

Using simple regular expression like “*” we can use matches() function to select two columns that matches our pattern.

penguins %>% 
  select(matches("*ill"))

## # A tibble: 344 × 2
##    cill_length_mm bill_depth_mm
##             <dbl>         <dbl>
##  1           39.1          18.7
##  2           39.5          17.4
##  3           40.3          18  
##  4           NA            NA  
##  5           36.7          19.3
##  6           39.3          20.6
##  7           38.9          17.8
##  8           39.2          19.6
##  9           34.1          18.1
## 10           42            20.2
## # … with 334 more rows

We can also add a bit more specificities if needed. For example, we select columns matching “cill” or “bill” using regular expression “[cb]ill” to select the two columns.

penguins %>% 
  select(matches("[cb]ill"))

## # A tibble: 344 × 2
##    cill_length_mm bill_depth_mm
##             <dbl>         <dbl>
##  1           39.1          18.7
##  2           39.5          17.4
##  3           40.3          18  
##  4           NA            NA  
##  5           36.7          19.3
##  6           39.3          20.6
##  7           38.9          17.8
##  8           39.2          19.6
##  9           34.1          18.1
## 10           42            20.2
## # … with 334 more rows

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

4 ways to select columns with dplyr select()4 ways to select columns from a dataframe with dplyr’s select() dplyr select(): How to Select Columns?dplyr select(): Select one or more variables from a dataframe Default ThumbnailHow To Move A Column to the Front with dplyr Default Thumbnaildplyr filter(): Filter/Select Rows based on conditions

Filed Under: R Tips, tidyverse 101 Tagged With: dplyr matches, dplyr matches() using regular expression

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version