• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / R / How To Reshape Tidy Data to Wide Data with pivot_wider() from tidyr

How To Reshape Tidy Data to Wide Data with pivot_wider() from tidyr

November 18, 2020 by cmdlinetips

tidyr: pivot_wider()
tidyr: pivot_wider()
Reshaping the data from one for form to another is one of the most common data munging activities. tidyr, R package part of tidyverse, provides core functions to manipulate datasets in wide or long form.

In this post, we will see examples of one of tidyr’s core function pivot_wider() to convert data in long tidy form to data in wide form.

pivot_wider() “widens” data, increasing the number of columns and decreasing the number of rows.

And it is exact inverse of tidyr’s pivot_longer() function to convert data in wide form to long/tidy form.

pivot_wider(): tidy data to wide data

Let us first load tidyverse sure of R packages. Let us also make sure the version of tidyr package. We need tidyr version 1.0.0 and above.

library(tidyverse)

For reshaping tidy dataframe, we will use mobile subscription growth over the years across the world from tidytuesday project.

mobile <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-11-10/mobile.csv')

Our data is in tidy form or long form with each variable in a separate column.

## Parsed with column specification:
## cols(
##   entity = col_character(),
##   code = col_character(),
##   year = col_double(),
##   total_pop = col_double(),
##   gdp_per_cap = col_double(),
##   mobile_subs = col_double(),
##   continent = col_character()
## )

How To Reshape long data to wide using single variable with pivot_wider()?

At first we will focus on reshaping the dataframe by one variable, mobile_subs. We will aim to reshape the tidy data to wider form such that year variable is on columns and the mobile_subs are values in the wide dataframe.

We can reshape using tidyr’s pivot_wider() function. We first select only the variables of interest to reshape and then use pivot_wider() with arguments “names_from” and “values_from”.

We specify which variable or column should be used as column names in the reshaped wider data with names_from. And which variable or column should be values in the wider form using values_from argument.

mobile %>% 
  select(code,continent,year,mobile_subs) %>%
  pivot_wider(names_from=year, values_from=mobile_subs) 

Now we get the same data frame in wider form with years as column names and mobile_subs as values.

## # A tibble: 248 x 30
##    code  continent   `1990`  `1991`  `1992`   `1993`   `1994`  `1995`   `1996`
##    <chr> <chr>        <dbl>   <dbl>   <dbl>    <dbl>    <dbl>   <dbl>    <dbl>
##  1 AFG   Asia       0        0       0       0        0        0       0      
##  2 ALB   Europe     0        0       0       0        0        0       0.0744 
##  3 DZA   Africa     0.00181  0.0180  0.0176  0.0172   0.00475  0.0162  0.0398 
##  4 ASM   Oceania    0        0       1.41    1.77     2.32     2.36    2.41   
##  5 AND   Europe     0        0       1.31    1.28     1.25     4.42    8.53   
##  6 AGO   Africa     0        0       0       0.00821  0.0132   0.0140  0.0225 
##  7 AIA   Americas  NA       NA      NA      NA       NA       NA      NA      
##  8 ATG   Americas   0       NA      NA      NA       NA       NA       1.72   
##  9 ARG   Americas   0.0367   0.0753  0.138   0.328    0.698    1.16    1.88   
## 10 ARM   Asia       0        0       0       0        0        0       0.00947
## # … with 238 more rows, and 21 more variables: `1997` <dbl>, `1998` <dbl>,

Note that when you are pivoting by single variable, it is a good idea to keep the dataframe simple as needed. For example, when there are multiple variables in the tidy data, pivoting with a single variable will result in a really wide dataframe. And that may not be useful.

Here is an example of that using the mobile dataframe, where pivot with single variable but the data frame contains multiple variables.

mobile %>% 
  pivot_wider(names_from=year, values_from=mobile_subs) 

We get a wider dataframe with over 6000 columns and most of the values in them are “NA”.

## # A tibble: 6,184 x 33
##    entity code  total_pop gdp_per_cap continent `1990` `1991` `1992` `1993`
##    <chr>  <chr>     <dbl>       <dbl> <chr>      <dbl>  <dbl>  <dbl>  <dbl>
##  1 Afgha… AFG    13032161          NA Asia           0     NA     NA     NA
##  2 Afgha… AFG    14069854          NA Asia          NA      0     NA     NA
##  3 Afgha… AFG    15472076          NA Asia          NA     NA      0     NA
##  4 Afgha… AFG    17053213          NA Asia          NA     NA     NA      0
##  5 Afgha… AFG    18553819          NA Asia          NA     NA     NA     NA
##  6 Afgha… AFG    19789880          NA Asia          NA     NA     NA     NA
##  7 Afgha… AFG    20684982          NA Asia          NA     NA     NA     NA
##  8 Afgha… AFG    21299350          NA Asia          NA     NA     NA     NA
##  9 Afgha… AFG    21752257          NA Asia          NA     NA     NA     NA
## 10 Afgha… AFG    22227543          NA Asia          NA     NA     NA     NA
## # … with 6,174 more rows, and 24 more variables: `1994` <dbl>, `1995` <dbl>,

How To pivot long to wide using multiple variables with pivot_wider()?

Often we are interested in reshaping a tidy dataframe into wider dataframe using multiple variables instead of one. Here is a cartoon illustration of the reshaping problem.

pivot_wider() with multiple columns

Pivot_wider() function from tidyr can be used for this reshaping challenge as well. This time we provide the variables as a vector instead of single variable to the argument values_from.

For example, here we reshape tidy dataframe by two variables or columns.

mobile %>% 
  select(code,continent,year,
         mobile_subs,gdp_per_cap) %>%
  pivot_wider(names_from=year, 
              values_from=c(mobile_subs, 
                            gdp_per_cap)) 

Note that when we pivot a tidy data by two variables, pivot_wider() adds the variable names to the column names by default.

## # A tibble: 248 x 58
##    code  continent mobile_subs_1990 mobile_subs_1991 mobile_subs_1992
##    <chr> <chr>                <dbl>            <dbl>            <dbl>
##  1 AFG   Asia               0                 0                0     
##  2 ALB   Europe             0                 0                0     
##  3 DZA   Africa             0.00181           0.0180           0.0176
##  4 ASM   Oceania            0                 0                1.41  
##  5 AND   Europe             0                 0                1.31  
##  6 AGO   Africa             0                 0                0     
##  7 AIA   Americas          NA                NA               NA     
##  8 ATG   Americas           0                NA               NA     
##  9 ARG   Americas           0.0367            0.0753           0.138 
## 10 ARM   Asia               0                 0                0     
## # … with 238 more rows, and 53 more variables: mobile_subs_1993 <dbl>,
## #   mobile_subs_1994 <dbl>, mobile_subs_1995 <dbl>, mobile_subs_1996 <dbl>,
## #   mobile_subs_1997 <dbl>, mobile_subs_1998 <dbl>, mobile_subs_1999 <dbl>,
## #   mobile_subs_2000 <dbl>, mobile_subs_2001 <dbl>, mobile_subs_2002 <dbl>,
## #   mobile_subs_2003 <dbl>, mobile_subs_2004 <dbl>, mobile_subs_2005 <dbl>,
## #   mobile_subs_2006 <dbl>, mobile_subs_2007 <dbl>, mobile_subs_2008 <dbl>,
## #   mobile_subs_2009 <dbl>, mobile_subs_2010 <dbl>, mobile_subs_2011 <dbl>,
## #   mobile_subs_2012 <dbl>, mobile_subs_2013 <dbl>, mobile_subs_2014 <dbl>,
## #   mobile_subs_2015 <dbl>, mobile_subs_2016 <dbl>, mobile_subs_2017 <dbl>,
## #   gdp_per_cap_1990 <dbl>, gdp_per_cap_1991 <dbl>, gdp_per_cap_1992 <dbl>,
## #   gdp_per_cap_1993 <dbl>, gdp_per_cap_1994 <dbl>, gdp_per_cap_1995 <dbl>,
## #   gdp_per_cap_1996 <dbl>, gdp_per_cap_1997 <dbl>, gdp_per_cap_1998 <dbl>,

In the above example, we saw how to pivot by two columns using pivot_wider(). We can easily extend to more than two variables the same way as shown below.

mobile %>% 
  select(code,continent,year,
         mobile_subs, gdp_per_cap, total_pop) %>%
  pivot_wider(names_from=year, 
              values_from=c(mobile_subs,
                            gdp_per_cap,
                            total_pop)) 
## # A tibble: 248 x 86
##    code  continent mobile_subs_1990 mobile_subs_1991 mobile_subs_1992
##    <chr> <chr>                <dbl>            <dbl>            <dbl>
##  1 AFG   Asia               0                 0                0     
##  2 ALB   Europe             0                 0                0     
##  3 DZA   Africa             0.00181           0.0180           0.0176
##  4 ASM   Oceania            0                 0                1.41  
##  5 AND   Europe             0                 0                1.31  
##  6 AGO   Africa             0                 0                0     
##  7 AIA   Americas          NA                NA               NA     
##  8 ATG   Americas           0                NA               NA     
##  9 ARG   Americas           0.0367            0.0753           0.138 
## 10 ARM   Asia               0                 0                0     
## # … with 238 more rows, and 81 more variables: mobile_subs_1993 <dbl>,
## #   mobile_subs_1994 <dbl>, mobile_subs_1995 <dbl>, mobile_subs_1996 <dbl>,
## #   mobile_subs_1997 <dbl>, mobile_subs_1998 <dbl>, mobile_subs_1999 <dbl>,
## #
## #   gdp_per_cap_1990 <dbl>, gdp_per_cap_1991 <dbl>, gdp_per_cap_1992 <dbl>,
## #
## #   total_pop_1992 <dbl>, total_pop_1993 <dbl>, total_pop_1994 <dbl>,

Customize column name delimiter with names_sep in pivot_wider()

When pivoting with multiple variables, we also have control on how the names of the columns in the output wider dataframe. For example, using the argument name_sep, we can specify the delimiter. In the example below, we use “.” instead of under score “_”, the default delimiter.


mobile %>% 
  select(code,continent,year,
         mobile_subs, gdp_per_cap, total_pop) %>%
  pivot_wider(names_from=year, 
              values_from=c(mobile_subs,
                            gdp_per_cap),
               names_sep = ".")

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

pivot_longer(): tidyrtidyr’s pivot_longer(): Reshape Wide Data to Long/Tidy Data Default ThumbnailPandas Melt: Reshape Wide Data to Long/Tidy Data Default Thumbnailtidyr 1.0.0 is here. pivot_longer & pivot_wider replace spread & gather Pandas Melt Example with IdentifiersPandas Melt: Reshape Wide to Tidy with identifiers

Filed Under: R, tidyr pivot_wider(), tidyverse 101 Tagged With: pivot_wider in R

Reader Interactions

Trackbacks

  1. How To Reshape Wide Data to Tidy Data with pivot_longer() in tidyr? - Python and R Tips says:
    November 22, 2020 at 8:33 pm

    […] and pivot_wider() are two core functions of tidyr version 1.0.0 and above for reshaping data. In this post, we will […]

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version