• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / R / How To Compute Column Means in R with tidyverse

How To Compute Column Means in R with tidyverse

June 3, 2021 by cmdlinetips

In this bite sized post, we will see how to compute column means in R using tidyverse. We will compute column means for a couple of scenarios. First we will see how to compute column means of a dataframe with no missing values. And then we will compute column means with missing values.

We will use two R functions to compute column means. First, we we will see how to use across() function in dplyr 1.0.0+ to compute column means and then use base R’s colMeans() function to do the same.

How to Compute Column Means in R?
Compute Column Means in R with across() and colMeans()

To get started, let us load tidyverse and data set needed to compute mean values of each numerical columns in a data frame.

library(tidyverse)
library(palmerpenguins)

Let us create two dataframes, one without any missing data.

data_without_na <- penguins %>%
  select(-year)%>%
  drop_na()

And the next dataframe without any missing values.

data_with_na <- penguins %>%
    select(-year)

Computing Column Means on data without missing data using across() function dplyr

Our dataframe contains both numerical and character variables. To compute means of all numerical columns, we use select() function to select the numerical columns. And then apply across() function on all columns to compute mean values. Note that we use across() function inside summarize() variable here.

data_without_na %>%
  select(where(is.numeric)) %>%
  summarise(across(everything(), mean))

Since our data does not contain any missing value we get a tibble with a single row containing column means.

## # A tibble: 1 x 4
##   bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##            <dbl>         <dbl>             <dbl>       <dbl>
## 1           44.0          17.2              201.       4207.

We can skip the selection of numerical variables using select() function. Here, we select all numerical columns inside across() function and compute mean values.

data_without_na %>%
  summarise(across(where(is.numeric), mean))

As expected we get the same results.

## # A tibble: 1 x 4
##   bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##            <dbl>         <dbl>             <dbl>       <dbl>
## 1           44.0          17.2              201.       4207.

How to Compute Column Means on data with missing data using across() function dplyr

When our data frame contains missing values, we have to instruct to ignore or remove the missing values them to compute mean values.

Let us try to to compute column means without specifying to remove the missing values.

data_with_na %>%
  select(where(is.numeric)) %>%
  summarise(across(everything(), 
                   mean))

Then we get the following results, where all the column’s mean values are NA.

## # A tibble: 1 x 4
##   bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##            <dbl>         <dbl>             <dbl>       <dbl>
## 1             NA            NA                NA          NA

To remove missing values in the data, we use “na.rm=TRUE” argument to across() function.

data_with_na %>%
  select(where(is.numeric)) %>%
  summarise(across(everything(), 
                   mean,
                   na.rm = TRUE))

And we get columns means as expected.

## # A tibble: 1 x 4
##   bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##            <dbl>         <dbl>             <dbl>       <dbl>
## 1           43.9          17.2              201.       4202.

As before, we can skip separate select() statement and compute numerical column’s mean values using across() function.

data_with_na %>%
  summarise(across(where(is.numeric),
                   mean,
                   na.rm = TRUE))

How to Compute Column Means with colMeans() function

Another easy approach to compute column means is to use base R’s colMeans() function. Here we select numerical columns first and use colMeans() with na.rm argument to compute mean values by removing any missing data.

data_with_na %>%
  select(where(is.numeric)) %>% 
  colMeans(na.rm = TRUE)
##    bill_length_mm     bill_depth_mm flipper_length_mm       body_mass_g 
##          43.92193          17.15117         200.91520        4201.75439

In summary, we saw examples of using two functions in R, across() and colMeans() to compute column means on numerical columns with and without missing data.

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

Default ThumbnailHow to Compute Summary Statistics Across Multiple Columns in R How To Remove Rows With Missing Values with dplyr's drop_na()?How To Remove Rows with Missing values using dplyr 4 ways to select columns with dplyr select()4 ways to select columns from a dataframe with dplyr’s select() Default ThumbnailHow To Move A Column to the Front with dplyr

Filed Under: column mean dplyr, dplyr across, R, R Tips, tidyverse 101 Tagged With: R, R Tips

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version