How To Reshape Tidy Data to Wide Data with pivot_wider() from tidyr

pivot_wider(): tidy data to wide data

tidyr: pivot_wider()
Reshaping the data from one for form to another is one of the most common data munging activities. tidyr, R package part of tidyverse, provides core functions to manipulate datasets in wide or long form.

In this post, we will see examples of one of tidyr’s core function pivot_wider() to convert data in long tidy form to data in wide form.

pivot_wider() “widens” data, increasing the number of columns and decreasing the number of rows.

And it is exact inverse of tidyr’s pivot_longer() function to convert data in wide form to long/tidy form.

pivot_wider(): tidy data to wide data

Let us first load tidyverse sure of R packages. Let us also make sure the version of tidyr package. We need tidyr version 1.0.0 and above.

library(tidyverse)

For reshaping tidy dataframe, we will use mobile subscription growth over the years across the world from tidytuesday project.

mobile <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-11-10/mobile.csv')

Our data is in tidy form or long form with each variable in a separate column.

## Parsed with column specification:
## cols(
##   entity = col_character(),
##   code = col_character(),
##   year = col_double(),
##   total_pop = col_double(),
##   gdp_per_cap = col_double(),
##   mobile_subs = col_double(),
##   continent = col_character()
## )

How To Reshape long data to wide using single variable with pivot_wider()?

At first we will focus on reshaping the dataframe by one variable, mobile_subs. We will aim to reshape the tidy data to wider form such that year variable is on columns and the mobile_subs are values in the wide dataframe.

We can reshape using tidyr’s pivot_wider() function. We first select only the variables of interest to reshape and then use pivot_wider() with arguments “names_from” and “values_from”.

We specify which variable or column should be used as column names in the reshaped wider data with names_from. And which variable or column should be values in the wider form using values_from argument.

mobile %>% 
  select(code,continent,year,mobile_subs) %>%
  pivot_wider(names_from=year, values_from=mobile_subs) 

Now we get the same data frame in wider form with years as column names and mobile_subs as values.

## # A tibble: 248 x 30
##    code  continent   `1990`  `1991`  `1992`   `1993`   `1994`  `1995`   `1996`
##    <chr> <chr>        <dbl>   <dbl>   <dbl>    <dbl>    <dbl>   <dbl>    <dbl>
##  1 AFG   Asia       0        0       0       0        0        0       0      
##  2 ALB   Europe     0        0       0       0        0        0       0.0744 
##  3 DZA   Africa     0.00181  0.0180  0.0176  0.0172   0.00475  0.0162  0.0398 
##  4 ASM   Oceania    0        0       1.41    1.77     2.32     2.36    2.41   
##  5 AND   Europe     0        0       1.31    1.28     1.25     4.42    8.53   
##  6 AGO   Africa     0        0       0       0.00821  0.0132   0.0140  0.0225 
##  7 AIA   Americas  NA       NA      NA      NA       NA       NA      NA      
##  8 ATG   Americas   0       NA      NA      NA       NA       NA       1.72   
##  9 ARG   Americas   0.0367   0.0753  0.138   0.328    0.698    1.16    1.88   
## 10 ARM   Asia       0        0       0       0        0        0       0.00947
## # … with 238 more rows, and 21 more variables: `1997` <dbl>, `1998` <dbl>,

Note that when you are pivoting by single variable, it is a good idea to keep the dataframe simple as needed. For example, when there are multiple variables in the tidy data, pivoting with a single variable will result in a really wide dataframe. And that may not be useful.

Here is an example of that using the mobile dataframe, where pivot with single variable but the data frame contains multiple variables.

mobile %>% 
  pivot_wider(names_from=year, values_from=mobile_subs) 

We get a wider dataframe with over 6000 columns and most of the values in them are “NA”.

## # A tibble: 6,184 x 33
##    entity code  total_pop gdp_per_cap continent `1990` `1991` `1992` `1993`
##    <chr>  <chr>     <dbl>       <dbl> <chr>      <dbl>  <dbl>  <dbl>  <dbl>
##  1 Afgha… AFG    13032161          NA Asia           0     NA     NA     NA
##  2 Afgha… AFG    14069854          NA Asia          NA      0     NA     NA
##  3 Afgha… AFG    15472076          NA Asia          NA     NA      0     NA
##  4 Afgha… AFG    17053213          NA Asia          NA     NA     NA      0
##  5 Afgha… AFG    18553819          NA Asia          NA     NA     NA     NA
##  6 Afgha… AFG    19789880          NA Asia          NA     NA     NA     NA
##  7 Afgha… AFG    20684982          NA Asia          NA     NA     NA     NA
##  8 Afgha… AFG    21299350          NA Asia          NA     NA     NA     NA
##  9 Afgha… AFG    21752257          NA Asia          NA     NA     NA     NA
## 10 Afgha… AFG    22227543          NA Asia          NA     NA     NA     NA
## # … with 6,174 more rows, and 24 more variables: `1994` <dbl>, `1995` <dbl>,

How To pivot long to wide using multiple variables with pivot_wider()?

Often we are interested in reshaping a tidy dataframe into wider dataframe using multiple variables instead of one. Here is a cartoon illustration of the reshaping problem.

pivot_wider() with multiple columns

Pivot_wider() function from tidyr can be used for this reshaping challenge as well. This time we provide the variables as a vector instead of single variable to the argument values_from.

For example, here we reshape tidy dataframe by two variables or columns.

mobile %>% 
  select(code,continent,year,
         mobile_subs,gdp_per_cap) %>%
  pivot_wider(names_from=year, 
              values_from=c(mobile_subs, 
                            gdp_per_cap)) 

Note that when we pivot a tidy data by two variables, pivot_wider() adds the variable names to the column names by default.

## # A tibble: 248 x 58
##    code  continent mobile_subs_1990 mobile_subs_1991 mobile_subs_1992
##    <chr> <chr>                <dbl>            <dbl>            <dbl>
##  1 AFG   Asia               0                 0                0     
##  2 ALB   Europe             0                 0                0     
##  3 DZA   Africa             0.00181           0.0180           0.0176
##  4 ASM   Oceania            0                 0                1.41  
##  5 AND   Europe             0                 0                1.31  
##  6 AGO   Africa             0                 0                0     
##  7 AIA   Americas          NA                NA               NA     
##  8 ATG   Americas           0                NA               NA     
##  9 ARG   Americas           0.0367            0.0753           0.138 
## 10 ARM   Asia               0                 0                0     
## # … with 238 more rows, and 53 more variables: mobile_subs_1993 <dbl>,
## #   mobile_subs_1994 <dbl>, mobile_subs_1995 <dbl>, mobile_subs_1996 <dbl>,
## #   mobile_subs_1997 <dbl>, mobile_subs_1998 <dbl>, mobile_subs_1999 <dbl>,
## #   mobile_subs_2000 <dbl>, mobile_subs_2001 <dbl>, mobile_subs_2002 <dbl>,
## #   mobile_subs_2003 <dbl>, mobile_subs_2004 <dbl>, mobile_subs_2005 <dbl>,
## #   mobile_subs_2006 <dbl>, mobile_subs_2007 <dbl>, mobile_subs_2008 <dbl>,
## #   mobile_subs_2009 <dbl>, mobile_subs_2010 <dbl>, mobile_subs_2011 <dbl>,
## #   mobile_subs_2012 <dbl>, mobile_subs_2013 <dbl>, mobile_subs_2014 <dbl>,
## #   mobile_subs_2015 <dbl>, mobile_subs_2016 <dbl>, mobile_subs_2017 <dbl>,
## #   gdp_per_cap_1990 <dbl>, gdp_per_cap_1991 <dbl>, gdp_per_cap_1992 <dbl>,
## #   gdp_per_cap_1993 <dbl>, gdp_per_cap_1994 <dbl>, gdp_per_cap_1995 <dbl>,
## #   gdp_per_cap_1996 <dbl>, gdp_per_cap_1997 <dbl>, gdp_per_cap_1998 <dbl>,

In the above example, we saw how to pivot by two columns using pivot_wider(). We can easily extend to more than two variables the same way as shown below.

mobile %>% 
  select(code,continent,year,
         mobile_subs, gdp_per_cap, total_pop) %>%
  pivot_wider(names_from=year, 
              values_from=c(mobile_subs,
                            gdp_per_cap,
                            total_pop)) 
## # A tibble: 248 x 86
##    code  continent mobile_subs_1990 mobile_subs_1991 mobile_subs_1992
##    <chr> <chr>                <dbl>            <dbl>            <dbl>
##  1 AFG   Asia               0                 0                0     
##  2 ALB   Europe             0                 0                0     
##  3 DZA   Africa             0.00181           0.0180           0.0176
##  4 ASM   Oceania            0                 0                1.41  
##  5 AND   Europe             0                 0                1.31  
##  6 AGO   Africa             0                 0                0     
##  7 AIA   Americas          NA                NA               NA     
##  8 ATG   Americas           0                NA               NA     
##  9 ARG   Americas           0.0367            0.0753           0.138 
## 10 ARM   Asia               0                 0                0     
## # … with 238 more rows, and 81 more variables: mobile_subs_1993 <dbl>,
## #   mobile_subs_1994 <dbl>, mobile_subs_1995 <dbl>, mobile_subs_1996 <dbl>,
## #   mobile_subs_1997 <dbl>, mobile_subs_1998 <dbl>, mobile_subs_1999 <dbl>,
## #
## #   gdp_per_cap_1990 <dbl>, gdp_per_cap_1991 <dbl>, gdp_per_cap_1992 <dbl>,
## #
## #   total_pop_1992 <dbl>, total_pop_1993 <dbl>, total_pop_1994 <dbl>,

Customize column name delimiter with names_sep in pivot_wider()

When pivoting with multiple variables, we also have control on how the names of the columns in the output wider dataframe. For example, using the argument name_sep, we can specify the delimiter. In the example below, we use “.” instead of under score “_”, the default delimiter.


mobile %>% 
  select(code,continent,year,
         mobile_subs, gdp_per_cap, total_pop) %>%
  pivot_wider(names_from=year, 
              values_from=c(mobile_subs,
                            gdp_per_cap),
               names_sep = ".")