• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / R / tidyverse 101 / 4 ways to select columns from a dataframe with dplyr’s select()

4 ways to select columns from a dataframe with dplyr’s select()

October 12, 2020 by cmdlinetips

4 ways to select columns with dplyr select()
4 ways to select columns with dplyr select()
dplyr’s select() function is one of the core functionalities of dplyr that enables select one or more columns from a dataframe.

With dplyr’s version 1.0.0, select() function has gained new functionalities that makes it easy to select columns in multiple ways. One of the most common ways to select columns is to use their names. However, with dplyr version 1.0.0, we can select columns by their location.

In this post, we will see examples four ways to select columns from a dataframe. We will start with selecting columns by names, and then see examples of selecting columns by positions, selecting columns by their types, and selecting columns by using functions that looks for patterns in names.

Let us load tidyverse and make sure dplyr’s version is 1.0.0+.

library(tidyverse)
packageVersion("dplyr")
[1] ‘1.0.0’

We will use the penguins dataset to select columns in 4 different ways using select() function.

path2data <- "https://raw.githubusercontent.com/cmdlinetips/data/master/palmer_penguins.csv"
penguins<- readr::read_csv(path2data)

We can see that we have columns of different types.

## Parsed with column specification:
## cols(
##   species = col_character(),
##   island = col_character(),
##   bill_length_mm = col_double(),
##   bill_depth_mm = col_double(),
##   flipper_length_mm = col_double(),
##   body_mass_g = col_double(),
##   sex = col_character()
## )

dplyr select(): How To Select Columns By Names?

Let us start by selecting columns of a dataframe by name, which is the most common way to select columns.

# select column by names
penguins %>%
  dplyr::select(species, island,flipper_length_mm) 
## # A tibble: 6 x 3
##   species island    flipper_length_mm
##   <chr>   <chr>                 <dbl>
## 1 Adelie  Torgersen               181
## 2 Adelie  Torgersen               186
## 3 Adelie  Torgersen               195
## 4 Adelie  Torgersen                NA
## 5 Adelie  Torgersen               193
## 6 Adelie  Torgersen               190

dplyr select(): How To Select Columns By Their Positions?

Let us select the same columns as in the previous example, but this time use their position in the dataframe. For example, the column species is the first column in the dataframe and island is the second column in the dataframe.

We can simply specify the column position or location as argument to select() function.

# dplyr select column by position
penguins %>%
  select(1,2,5)

And we get the same results as above.

## # A tibble: 6 x 3
##   species island    flipper_length_mm
##   <chr>   <chr>                 <dbl>
## 1 Adelie  Torgersen               181
## 2 Adelie  Torgersen               186
## 3 Adelie  Torgersen               195
## 4 Adelie  Torgersen                NA
## 5 Adelie  Torgersen               193
## 6 Adelie  Torgersen               190

One of the nice things (or bad?) about selecting columns by position is that if you specify a column position that does not exist, dplyr’s select() function ignores and gives the result from the remaining vaild column position.

For example, here we specify column position zero, that does not exisit. However, select() function does not crash but gives results from the remaining valid column positions.

# dplyr select column by position ignores a missing column
penguins %>%
  select(0,2,5)

The resulting tibble has skipped 0’th position column that we requested.

## # A tibble: 6 x 2
##   island    flipper_length_mm
##   <chr>                 <dbl>
## 1 Torgersen               181
## 2 Torgersen               186
## 3 Torgersen               195
## 4 Torgersen                NA
## 5 Torgersen               193
## 6 Torgersen               190

dplyr select(): How To Select Columns By Their Types?

Often it is useful to select columns by their types. For example, you might want to select all columns that are numeric for further analysis.

To get all columns that are numeric, we can use where(is.numeric) as argument to select() function.

# dplyr select all columns that are numeric 
penguins %>%
  select(where(is.numeric))

And we get all columns that are numeric,

## # A tibble: 6 x 4
##   bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##            <dbl>         <dbl>             <dbl>       <dbl>
## 1           39.1          18.7               181        3750
## 2           39.5          17.4               186        3800
## 3           40.3          18                 195        3250
## 4           NA            NA                  NA          NA
## 5           36.7          19.3               193        3450
## 6           39.3          20.6               190        3650

Similarly, we can select all columns that are factor using “where(is.factor)” and all columns that are characters using “where(is.character)”.

Note that the use of where() function for selecting columns here is new in dplyr 1.0.0.

dplyr select(): How To Select Columns By Function of Names?

dplyr starts_with(): How To Select Columns whose names starts with a string ?

# dplyr select column whose names starts with
penguins %>%
  select(starts_with("bill"))
## # A tibble: 6 x 2
##   bill_length_mm bill_depth_mm
##            <dbl>         <dbl>
## 1           39.1          18.7
## 2           39.5          17.4
## 3           40.3          18  
## 4           NA            NA  
## 5           36.7          19.3
## 6           39.3          20.6

dplyr ends_with(): How To Select Columns whose names end with a string ?

The fourth way to select columns from a dataframe is to look for a string or a pattern in column names. For example, often we might want to select columns that starts with or ends with a string.

dplyr has special functions for that. For example, to select columns that starts with using starts_with() function and similarly we can select columns that ends with certain string using ends_with() function.

Here is an example, where we select columns that ends with the string “mm”.

Sometimes one might want to

# dplyr select column whose names ends with
penguins %>%
  select(ends_with("mm"))

Now we have all columns whose names ends with “mm”.

## # A tibble: 6 x 3
##   bill_length_mm bill_depth_mm flipper_length_mm
##            <dbl>         <dbl>             <dbl>
## 1           39.1          18.7               181
## 2           39.5          17.4               186
## 3           40.3          18                 195
## 4           NA            NA                  NA
## 5           36.7          19.3               193
## 6           39.3          20.6               190

And not just this. As dplyr’s document page suggests, we can also use any combination of the above approaches with boolean operators to select columns.

  • df %>% select(!where(is.factor)): selects all non-factor variables.
  • df %>% select(where(is.numeric) & starts_with(“x”)): selects all numeric variables that starts with “x”.
  • df %>% select(starts_with(“a”) | ends_with(“z”)): selects all variables that starts with “a” or ends with “z”.
  • Share this:

    • Click to share on Facebook (Opens in new window) Facebook
    • Click to share on X (Opens in new window) X

    Related posts:

    dplyr select(): How to Select Columns?dplyr select(): Select one or more variables from a dataframe Default ThumbnailHow to Compute Summary Statistics Across Multiple Columns in R Default Thumbnail3 Ways to Select One or More Columns with Pandas Default Thumbnaildplyr filter(): Filter/Select Rows based on conditions

    Filed Under: dplyr select(), tidyverse 101 Tagged With: dplyr select, select by position, select is.numeric, select starts_with

    Primary Sidebar

    Subscribe to Python and R Tips and Learn Data Science

    Learn Pandas in Python and Tidyverse in R

    Tags

    Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

    RSS RSS

    • How to convert row names to a column in Pandas
    • How to resize an image with PyTorch
    • Fashion-MNIST data from PyTorch
    • Pandas case_when() with multiple examples
    • An Introduction to Statistical Learning: with Applications in Python Is Here
    • 10 Tips to customize ggplot2 title text
    • 8 Plot types with Matplotlib in Python
    • PCA on S&P 500 Stock Return Data
    • Linear Regression with Matrix Decomposition Methods
    • Numpy’s random choice() function

    Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

    Go to mobile version