Reshaping the data from one for form to another is one of the most common data munging activities. tidyr, R package part of tidyverse, provides core functions to manipulate datasets in wide or long form.
In this post, we will see examples of one of tidyr’s core function pivot_wider() to convert data in long tidy form to data in wide form.
pivot_wider() “widens” data, increasing the number of columns and decreasing the number of rows.
And it is exact inverse of tidyr’s pivot_longer() function to convert data in wide form to long/tidy form.
Let us first load tidyverse sure of R packages. Let us also make sure the version of tidyr package. We need tidyr version 1.0.0 and above.
library(tidyverse)
For reshaping tidy dataframe, we will use mobile subscription growth over the years across the world from tidytuesday project.
mobile <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-11-10/mobile.csv')
Our data is in tidy form or long form with each variable in a separate column.
## Parsed with column specification: ## cols( ## entity = col_character(), ## code = col_character(), ## year = col_double(), ## total_pop = col_double(), ## gdp_per_cap = col_double(), ## mobile_subs = col_double(), ## continent = col_character() ## )
How To Reshape long data to wide using single variable with pivot_wider()?
At first we will focus on reshaping the dataframe by one variable, mobile_subs. We will aim to reshape the tidy data to wider form such that year variable is on columns and the mobile_subs are values in the wide dataframe.
We can reshape using tidyr’s pivot_wider() function. We first select only the variables of interest to reshape and then use pivot_wider() with arguments “names_from” and “values_from”.
We specify which variable or column should be used as column names in the reshaped wider data with names_from. And which variable or column should be values in the wider form using values_from argument.
mobile %>% select(code,continent,year,mobile_subs) %>% pivot_wider(names_from=year, values_from=mobile_subs)
Now we get the same data frame in wider form with years as column names and mobile_subs as values.
## # A tibble: 248 x 30 ## code continent `1990` `1991` `1992` `1993` `1994` `1995` `1996` ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 AFG Asia 0 0 0 0 0 0 0 ## 2 ALB Europe 0 0 0 0 0 0 0.0744 ## 3 DZA Africa 0.00181 0.0180 0.0176 0.0172 0.00475 0.0162 0.0398 ## 4 ASM Oceania 0 0 1.41 1.77 2.32 2.36 2.41 ## 5 AND Europe 0 0 1.31 1.28 1.25 4.42 8.53 ## 6 AGO Africa 0 0 0 0.00821 0.0132 0.0140 0.0225 ## 7 AIA Americas NA NA NA NA NA NA NA ## 8 ATG Americas 0 NA NA NA NA NA 1.72 ## 9 ARG Americas 0.0367 0.0753 0.138 0.328 0.698 1.16 1.88 ## 10 ARM Asia 0 0 0 0 0 0 0.00947 ## # … with 238 more rows, and 21 more variables: `1997` <dbl>, `1998` <dbl>,
Note that when you are pivoting by single variable, it is a good idea to keep the dataframe simple as needed. For example, when there are multiple variables in the tidy data, pivoting with a single variable will result in a really wide dataframe. And that may not be useful.
Here is an example of that using the mobile dataframe, where pivot with single variable but the data frame contains multiple variables.
mobile %>% pivot_wider(names_from=year, values_from=mobile_subs)
We get a wider dataframe with over 6000 columns and most of the values in them are “NA”.
## # A tibble: 6,184 x 33 ## entity code total_pop gdp_per_cap continent `1990` `1991` `1992` `1993` ## <chr> <chr> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Afgha… AFG 13032161 NA Asia 0 NA NA NA ## 2 Afgha… AFG 14069854 NA Asia NA 0 NA NA ## 3 Afgha… AFG 15472076 NA Asia NA NA 0 NA ## 4 Afgha… AFG 17053213 NA Asia NA NA NA 0 ## 5 Afgha… AFG 18553819 NA Asia NA NA NA NA ## 6 Afgha… AFG 19789880 NA Asia NA NA NA NA ## 7 Afgha… AFG 20684982 NA Asia NA NA NA NA ## 8 Afgha… AFG 21299350 NA Asia NA NA NA NA ## 9 Afgha… AFG 21752257 NA Asia NA NA NA NA ## 10 Afgha… AFG 22227543 NA Asia NA NA NA NA ## # … with 6,174 more rows, and 24 more variables: `1994` <dbl>, `1995` <dbl>,
How To pivot long to wide using multiple variables with pivot_wider()?
Often we are interested in reshaping a tidy dataframe into wider dataframe using multiple variables instead of one. Here is a cartoon illustration of the reshaping problem.
Pivot_wider() function from tidyr can be used for this reshaping challenge as well. This time we provide the variables as a vector instead of single variable to the argument values_from.
For example, here we reshape tidy dataframe by two variables or columns.
mobile %>% select(code,continent,year, mobile_subs,gdp_per_cap) %>% pivot_wider(names_from=year, values_from=c(mobile_subs, gdp_per_cap))
Note that when we pivot a tidy data by two variables, pivot_wider() adds the variable names to the column names by default.
## # A tibble: 248 x 58 ## code continent mobile_subs_1990 mobile_subs_1991 mobile_subs_1992 ## <chr> <chr> <dbl> <dbl> <dbl> ## 1 AFG Asia 0 0 0 ## 2 ALB Europe 0 0 0 ## 3 DZA Africa 0.00181 0.0180 0.0176 ## 4 ASM Oceania 0 0 1.41 ## 5 AND Europe 0 0 1.31 ## 6 AGO Africa 0 0 0 ## 7 AIA Americas NA NA NA ## 8 ATG Americas 0 NA NA ## 9 ARG Americas 0.0367 0.0753 0.138 ## 10 ARM Asia 0 0 0 ## # … with 238 more rows, and 53 more variables: mobile_subs_1993 <dbl>, ## # mobile_subs_1994 <dbl>, mobile_subs_1995 <dbl>, mobile_subs_1996 <dbl>, ## # mobile_subs_1997 <dbl>, mobile_subs_1998 <dbl>, mobile_subs_1999 <dbl>, ## # mobile_subs_2000 <dbl>, mobile_subs_2001 <dbl>, mobile_subs_2002 <dbl>, ## # mobile_subs_2003 <dbl>, mobile_subs_2004 <dbl>, mobile_subs_2005 <dbl>, ## # mobile_subs_2006 <dbl>, mobile_subs_2007 <dbl>, mobile_subs_2008 <dbl>, ## # mobile_subs_2009 <dbl>, mobile_subs_2010 <dbl>, mobile_subs_2011 <dbl>, ## # mobile_subs_2012 <dbl>, mobile_subs_2013 <dbl>, mobile_subs_2014 <dbl>, ## # mobile_subs_2015 <dbl>, mobile_subs_2016 <dbl>, mobile_subs_2017 <dbl>, ## # gdp_per_cap_1990 <dbl>, gdp_per_cap_1991 <dbl>, gdp_per_cap_1992 <dbl>, ## # gdp_per_cap_1993 <dbl>, gdp_per_cap_1994 <dbl>, gdp_per_cap_1995 <dbl>, ## # gdp_per_cap_1996 <dbl>, gdp_per_cap_1997 <dbl>, gdp_per_cap_1998 <dbl>,
In the above example, we saw how to pivot by two columns using pivot_wider(). We can easily extend to more than two variables the same way as shown below.
mobile %>% select(code,continent,year, mobile_subs, gdp_per_cap, total_pop) %>% pivot_wider(names_from=year, values_from=c(mobile_subs, gdp_per_cap, total_pop))
## # A tibble: 248 x 86 ## code continent mobile_subs_1990 mobile_subs_1991 mobile_subs_1992 ## <chr> <chr> <dbl> <dbl> <dbl> ## 1 AFG Asia 0 0 0 ## 2 ALB Europe 0 0 0 ## 3 DZA Africa 0.00181 0.0180 0.0176 ## 4 ASM Oceania 0 0 1.41 ## 5 AND Europe 0 0 1.31 ## 6 AGO Africa 0 0 0 ## 7 AIA Americas NA NA NA ## 8 ATG Americas 0 NA NA ## 9 ARG Americas 0.0367 0.0753 0.138 ## 10 ARM Asia 0 0 0 ## # … with 238 more rows, and 81 more variables: mobile_subs_1993 <dbl>, ## # mobile_subs_1994 <dbl>, mobile_subs_1995 <dbl>, mobile_subs_1996 <dbl>, ## # mobile_subs_1997 <dbl>, mobile_subs_1998 <dbl>, mobile_subs_1999 <dbl>, ## # ## # gdp_per_cap_1990 <dbl>, gdp_per_cap_1991 <dbl>, gdp_per_cap_1992 <dbl>, ## # ## # total_pop_1992 <dbl>, total_pop_1993 <dbl>, total_pop_1994 <dbl>,
Customize column name delimiter with names_sep in pivot_wider()
When pivoting with multiple variables, we also have control on how the names of the columns in the output wider dataframe. For example, using the argument name_sep, we can specify the delimiter. In the example below, we use “.” instead of under score “_”, the default delimiter.
mobile %>% select(code,continent,year, mobile_subs, gdp_per_cap, total_pop) %>% pivot_wider(names_from=year, values_from=c(mobile_subs, gdp_per_cap), names_sep = ".")
1 comment
Comments are closed.