This tutorial shows how to perform row-wise operations in R using tidyverse. We will use three key functions, rowwise(), c_across() and rowMeans() to perform to perform row-wise operations on a dataframe.
rowwise() and c_across() functions are from dplyr. rowwise() function is available in dplyr 1.0.0+ to perform row-wise operations, like computing row means or other summary statistics on each row.
c_aross() function in dplyr helps combine values from multiple columns in a dataframe. And it is designed to work with rowwise() and makes it easy to perform row-wise operations. It is similar to combine function c() in base R, but with c_across() we can easily select multiple columns from a dataframe using “tidy select” framework.
And rowMeans() function is a function available in base R and we can use it compute row means possibly much faster.
To get started, let us load tidyverse and data set needed to compute mean values of each numerical columns in a data frame.
Let us start with loading tidyverse packages
library(tidyverse)
We will use a subset of gapminder data in wide form and load it directly from cmdlinetips.com‘s github page.
data_url <- "https://raw.githubusercontent.com/cmdlinetips/data/master/gapminder/gapminder_lifeExp_wide.tsv" gapminder_wide <- read_tsv(data_url) %>% head()
Our toy dataset look like this.
gapminder_wide %>% ## # A tibble: 6 x 14 ## country continent lifeExp_1952 lifeExp_1957 lifeExp_1962 lifeExp_1967 ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Algeria Africa 43.1 45.7 48.3 51.4 ## 2 Angola Africa 30.0 32.0 34 36.0 ## 3 Benin Africa 38.2 40.4 42.6 44.9 ## 4 Botswa… Africa 47.6 49.6 51.5 53.3 ## 5 Burkin… Africa 32.0 34.9 37.8 40.7 ## 6 Burundi Africa 39.0 40.5 42.0 43.5 ## # … with 8 more variables: lifeExp_1972 <dbl>, lifeExp_1977 <dbl>, ## # lifeExp_1982 <dbl>, lifeExp_1987 <dbl>, lifeExp_1992 <dbl>, ## # lifeExp_1997 <dbl>, lifeExp_2002 <dbl>, lifeExp_2007 <dbl>
First we will see how to compute row-means on a dataframe with numerical columns using rowwise() function and c_across() function in dplyr.
Next, we will learn how to compute multiple summary statistics for each row. More specifically, we will compute row-wise mean and variance.
Finally, we will also see example of using base R function rowMeans() in tidyverse framework to compute mean values of each row.
Row-wise operation on a dataframe with all numerical columns: compute row means with c_across()
Let us compute row means from a dataframe with numerical columns. In this example, we remove any non-numeric columns using select() function and then apply rowwise() function to dod row-wise operations. To compute row means we use c_across() function to combine row elements from all the columns. everything() argument to c_across() functions get all the columns.
gapminder_wide %>% select(-country,-continent) %>% rowwise() %>% summarise(row_mean=mean(c_across(everything())))
We used summarize() function to compute row means and this results in a tibble with mean values of rows.
## # A tibble: 6 x 1 ## row_mean ## <dbl> ## 1 59.0 ## 2 37.9 ## 3 48.8 ## 4 54.6 ## 5 44.7 ## 6 44.8
With rowwise() and c_across() functions we can compute multiple operations on each row. In this example, we compute means and variance for each row using summarise() function on a dataframe with numerical columns.
gapminder_wide %>% select(-country,-continent) %>% rowwise() %>% summarise(row_mean=mean(c_across(everything())), row_variance=var(c_across(everything())))
Now we have two rwo-wise summaries as columns in our results.
## # A tibble: 6 x 2 ## row_mean row_variance ## <dbl> <dbl> ## 1 59.0 98.0 ## 2 37.9 14.7 ## 3 48.8 34.4 ## 4 54.6 32.2 ## 5 44.7 43.0 ## 6 44.8 9.24
So far, our results contained only the row-wise summaries. This is because we use summarise() function to compute row-wise operations. By using mutate() function, we can keep the original data in addition to the row-wise summaries. In this example, we use mutate() function with rowwise() function and compute two row-wise summaries using c_across().
We also make the row-wise summary results as the first two columns using select() function.
gapminder_wide %>% select(-country,-continent) %>% rowwise() %>% mutate(row_mean=mean(c_across(everything())), row_variance=var(c_across(everything()))) %>% select(row_mean,row_variance,everything())
## # A tibble: 6 x 14 ## # Rowwise: ## row_mean row_variance lifeExp_1952 lifeExp_1957 lifeExp_1962 lifeExp_1967 ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 59.0 98.0 43.1 45.7 48.3 51.4 ## 2 37.9 14.7 30.0 32.0 34 36.0 ## 3 48.8 34.4 38.2 40.4 42.6 44.9 ## 4 54.6 32.2 47.6 49.6 51.5 53.3 ## 5 44.7 43.0 32.0 34.9 37.8 40.7 ## 6 44.8 9.24 39.0 40.5 42.0 43.5 ## # … with 8 more variables: lifeExp_1972 <dbl>, lifeExp_1977 <dbl>, ## # lifeExp_1982 <dbl>, lifeExp_1987 <dbl>, lifeExp_1992 <dbl>, ## # lifeExp_1997 <dbl>, lifeExp_2002 <dbl>, lifeExp_2007 <dbl>
Row-wise operation on a dataframe with character and numerical columns: compute row means with c_across()
We can apply rowwise() function and compute row-wise summaries like row means using a data frame with mixed datatypes, i.e. dataframe that contains both numerical and character/factor variables.
Inside c_across() function we use is.numeric to select numerical columns to compute row-wise summaries. In this example we use summarise() function to compute mean and variance for numerical values in each row.
gapminder_wide %>% rowwise() %>% summarise(row_mean=mean(c_across(where(is.numeric))), row_variance=var(c_across(where(is.numeric))))
## # A tibble: 6 x 2 ## row_mean row_variance ## <dbl> <dbl> ## 1 59.0 98.0 ## 2 37.9 14.7 ## 3 48.8 34.4 ## 4 54.6 32.2 ## 5 44.7 43.0 ## 6 44.8 9.24
c_across() function also enables to select columns of interest to compute row summaries in a few ways. Here, we select columns using starting and ending column names to compute row means.
gapminder_wide %>% rowwise() %>% summarise(lifeExp_mean=mean(c_across(lifeExp_1952:lifeExp_2007))) %>% head()
## # A tibble: 6 x 1 ## lifeExp_mean ## <dbl> ## 1 59.0 ## 2 37.9 ## 3 48.8 ## 4 54.6 ## 5 44.7 ## 6 44.8
Compute row means using base R’s rowMeans()
We can also compute row means by using base R’s rowMeans(). Using pipe directly on the datafram with numerical columns gives us row means as a vectoe
gapminder_wide %>% select(-country,-continent) %>% rowMeans()
We can use dplyr’s summarize function and use rowMeans() to get the mean values of the rows as a tibble.
gapminder_wide %>% summarise(row_mean=rowMeans(across(where(is.numeric))))
## # A tibble: 6 x 1 ## row_mean ## <dbl> ## 1 59.0 ## 2 37.9 ## 3 48.8 ## 4 54.6 ## 5 44.7 ## 6 44.8
In terms of speed base R’s rowMeans is possibly way faster than rowwise() function.