• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / R / tidyr’s pivot_longer(): Reshape Wide Data to Long/Tidy Data

tidyr’s pivot_longer(): Reshape Wide Data to Long/Tidy Data

November 22, 2020 by cmdlinetips

One of the most common activities while doing data analysis is to reshape data from one form to another. For human eyes and data collection, often it is easier to work with data in wider form. However, for analyzing data it is more convenient to have the data in tidy/long form in most circumstances. tidyr, R package part of tidyverse, offer a number of functions to manipulate datasets in wide or long form.

pivot_longer() and pivot_wider() are two core functions of tidyr version 1.0.0 and above for reshaping data. In this post, we will see examples of using pivot_longer() to convert data in wide form to data in long or tidy form.

As tidyr defines, Tidy data is data where:

  • Every column is variable.
  • Every row is an observation.
  • Every cell is a single value.
  • Want to learn more about tidy data? Check out the original paper from Hadley Wickham.

    Let us get started by loading tidyverse, suite of R packages from RStudio.

    library(tidyverse)
    

    We will use historical mobile phone data from TidyTuesday project. For this post, we need data in wide form. We used pivot_wider() from tidyr as shown in this post to get mobile data in wide form and saved the results in cmdlinetips’s github page.

    Reshape a simple wide data to tidy/long data

    To start with, we will use a data set in simple wide form and load it.

    wide_df  <- readr::read_tsv("https://raw.githubusercontent.com/cmdlinetips/data/master/mobile_subscription_wide_simple_tidytuesday.tsv")
    

    We can see that the data is in wide form and the column names are years and the values in the dataframe is mobile subscribers.

    head(wide_df)
    ## # A tibble: 6 x 30
    ##   code  continent  `1990` `1991` `1992`  `1993`  `1994` `1995` `1996`  `1997`
    ##   <chr> <chr>       <dbl>  <dbl>  <dbl>   <dbl>   <dbl>  <dbl>  <dbl>   <dbl>
    ## 1 AFG   Asia      0       0      0      0       0       0      0       0     
    ## 2 ALB   Europe    0       0      0      0       0       0      0.0744  0.107 
    ## 3 DZA   Africa    0.00181 0.0180 0.0176 0.0172  0.00475 0.0162 0.0398  0.0582
    ## 4 ASM   Oceania   0       0      1.41   1.77    2.32    2.36   2.41    2.55  
    ## 5 AND   Europe    0       0      1.31   1.28    1.25    4.42   8.53   13.4   
    ## 6 AGO   Africa    0       0      0      0.00821 0.0132  0.0140 0.0225  0.0467
    ## # … with 20 more variables: `1998` <dbl>, `1999` <dbl>, `2000` <dbl>,
    ## #   `2001` <dbl>, `2002` <dbl>, `2003` <dbl>, `2004` <dbl>, `2005` <dbl>,
    ## #   `2006` <dbl>, `2007` <dbl>, `2008` <dbl>, `2009` <dbl>, `2010` <dbl>,
    ## #   `2011` <dbl>, `2012` <dbl>, `2013` <dbl>, `2014` <dbl>, `2015` <dbl>,
    ## #   `2016` <dbl>, `2017` <dbl>
    

    Let us start with a simple example, where we keep one of the columns as the same in tidy form. To do that we will remove one column and use the remaining to reshape to tidy form

    pivot_longer:  Reshape wide data tidy/long data
    pivot_longer: Reshape wide data tidy/long data

    We use pivot_longer() and specify which column we want to keep in the tidy form first. And then specify a variable for the column names using “names_to” argument and a variable for the values in the wide form using “values_to” argument.

    In our example, we would like to keep the continent column in the wide dataframe and specify names_to=”year” and values_to = “mobile_subs”.

    wide_df %>%
      select(-code) %>%
      pivot_longer(-continent,
                   names_to="year",
                   values_to = "mobile_subs")
    

    And we get a dataframe in tidy or long form with three columns.

    ## # A tibble: 6,944 x 3
    ##    continent year  mobile_subs
    ##    <chr>     <chr>       <dbl>
    ##  1 Asia      1990            0
    ##  2 Asia      1991            0
    ##  3 Asia      1992            0
    ##  4 Asia      1993            0
    ##  5 Asia      1994            0
    ##  6 Asia      1995            0
    ##  7 Asia      1996            0
    ##  8 Asia      1997            0
    ##  9 Asia      1998            0
    ## 10 Asia      1999            0
    ## # … with 6,934 more rows
    

    Sometimes, you might want to keep more than one variables as its in the reshaped data. The way to do it is to specify the names of variables to keep as vector with negative sign or negate operator.

    wide_df %>%
      pivot_longer(-c(continent,code),
                   names_to="year",
                   values_to = "mobile_subs")
    

    This will keep the two columns we specified as it is and reshape the rest of the data. Now we get tidy dataframe with four columns.

    ## # A tibble: 6,944 x 4
    ##    code  continent year  mobile_subs
    ##    <chr> <chr>     <chr>       <dbl>
    ##  1 AFG   Asia      1990            0
    ##  2 AFG   Asia      1991            0
    ##  3 AFG   Asia      1992            0
    ##  4 AFG   Asia      1993            0
    ##  5 AFG   Asia      1994            0
    ##  6 AFG   Asia      1995            0
    ##  7 AFG   Asia      1996            0
    ##  8 AFG   Asia      1997            0
    ##  9 AFG   Asia      1998            0
    ## 10 AFG   Asia      1999            0
    ## # … with 6,934 more rows
    

    pivot_longer() with multiple variables

    Most often our wide dataframe is more complex and might have multiple variables that need to be reshaped. To illustrate how to use pivot_longer to reshape wide dataframe with multiple variable, we will be using historical mobile data with multiple variables.

    Here is a cartoon of wide dataframe with three variables and the resulting tidy data we would like to have.

    pivot_longer with multiple variables

    pivot_longer with multiple variables

    We have wide dataframe with mobile subscribers, gdp and total population in wide form at cmdlinetips.com’s github page. Let us load that data directly from github.
    wide_multi_df  <- readr::read_tsv('https://raw.githubusercontent.com/cmdlinetips/data/master/mobile_subscription_wide_multiple_variables_tidytuesday.tsv')
    

    And our data in wide form looks like this. Note that column names has variable name and year information and the delimiter separating them is dot “.”.

    wide_multi_df
    ## # A tibble: 248 x 86
    ##    code  continent mobile_subs.1990 mobile_subs.1991 mobile_subs.1992
    ##    <chr> <chr>                <dbl>            <dbl>            <dbl>
    ##  1 AFG   Asia               0                 0                0     
    ##  2 ALB   Europe             0                 0                0     
    ##  3 DZA   Africa             0.00181           0.0180           0.0176
    ##  4 ASM   Oceania            0                 0                1.41  
    ##  5 AND   Europe             0                 0                1.31  
    ##  6 AGO   Africa             0                 0                0     
    ##  7 AIA   Americas          NA                NA               NA     
    ##  8 ATG   Americas           0                NA               NA     
    ##  9 ARG   Americas           0.0367            0.0753           0.138 
    ## 10 ARM   Asia               0                 0                0     
    ## # … with 238 more rows, and 81 more variables: mobile_subs.1993 <dbl>,
    ## #   mobile_subs.1994 <dbl>, mobile_subs.1995 <dbl>, mobile_subs.1996 <dbl>,
    

    To reshape wide data with multiple variables using pivot_longer, we need first select columns of interest corresponding to multiple variables. In the example below, we suse dplyr’s starts_with() function select columns starting with strings corresponding to three variables. Then we use names_to and values_to arguments of pivot_longer() to create new variables in tidy/long form.

    tidy_df <- wide_multi_df %>%
      pivot_longer( #cols = mobile_subs.1990:total_pop.2017,
                    cols = c(starts_with("mobile"),
                             starts_with("gdp"),
                             starts_with("total")),
                    names_to = "var_name",
                    values_to = "val"
    )
    

    And we get data in tidy form with four columns of interest.

    tidy_df
    
    ## # A tibble: 20,832 x 4
    ##    code  continent var_name           val
    ##    <chr> <chr>     <chr>            <dbl>
    ##  1 AFG   Asia      mobile_subs.1990     0
    ##  2 AFG   Asia      mobile_subs.1991     0
    ##  3 AFG   Asia      mobile_subs.1992     0
    ##  4 AFG   Asia      mobile_subs.1993     0
    ##  5 AFG   Asia      mobile_subs.1994     0
    ##  6 AFG   Asia      mobile_subs.1995     0
    ##  7 AFG   Asia      mobile_subs.1996     0
    ##  8 AFG   Asia      mobile_subs.1997     0
    ##  9 AFG   Asia      mobile_subs.1998     0
    ## 10 AFG   Asia      mobile_subs.1999     0
    ## # … with 20,822 more rows
    

    Often we like the data in some kind of intermediate wide form. Here is a cartoon illustration of the reshape we are interested in.

    tidy data to intermediate wide
    tidy data to intermediate wide

    For example in this case, we would like to have three variables to be on separate columns. We can use pivot_wider() to create the dataframe with three variables on separate column. At first, we have to separate the variable name and year variables. One of the ways to do that is to use separate() function tidyr.

    tidy_df %>%
      separate(var_name,c("var_n","year"), sep="\\.") %>%
      pivot_wider(names_from = var_n,
                  values_from=val)
    

    And here is the user friendly dataframe reshaped from tidy dataframe.

    ## # A tibble: 6,944 x 6
    ##    code  continent year  mobile_subs gdp_per_cap total_pop
    ##    <chr> <chr>     <chr>       <dbl>       <dbl>     <dbl>
    ##  1 AFG   Asia      1990            0          NA  13032161
    ##  2 AFG   Asia      1991            0          NA  14069854
    ##  3 AFG   Asia      1992            0          NA  15472076
    ##  4 AFG   Asia      1993            0          NA  17053213
    ##  5 AFG   Asia      1994            0          NA  18553819
    ##  6 AFG   Asia      1995            0          NA  19789880
    ##  7 AFG   Asia      1996            0          NA  20684982
    ##  8 AFG   Asia      1997            0          NA  21299350
    ##  9 AFG   Asia      1998            0          NA  21752257
    ## 10 AFG   Asia      1999            0          NA  22227543
    ## # … with 6,934 more rows
    

    Share this:

    • Click to share on Facebook (Opens in new window) Facebook
    • Click to share on X (Opens in new window) X

    Related posts:

    How To Reshape Tidy Data to Wide Data with pivot_wider() from tidyr Default ThumbnailPandas Melt: Reshape Wide Data to Long/Tidy Data Pandas Melt Example with IdentifiersPandas Melt: Reshape Wide to Tidy with identifiers Default Thumbnailtidyr 1.0.0 is here. pivot_longer & pivot_wider replace spread & gather

    Filed Under: R, tidyverse 101 Tagged With: pivot_longer wide to long, R, tidyverse 101

    Primary Sidebar

    Subscribe to Python and R Tips and Learn Data Science

    Learn Pandas in Python and Tidyverse in R

    Tags

    Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

    RSS RSS

    • How to convert row names to a column in Pandas
    • How to resize an image with PyTorch
    • Fashion-MNIST data from PyTorch
    • Pandas case_when() with multiple examples
    • An Introduction to Statistical Learning: with Applications in Python Is Here
    • 10 Tips to customize ggplot2 title text
    • 8 Plot types with Matplotlib in Python
    • PCA on S&P 500 Stock Return Data
    • Linear Regression with Matrix Decomposition Methods
    • Numpy’s random choice() function

    Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

    Go to mobile version