• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / R / tidyverse / pivot_longer() tidyr / tidyr 1.0.0 is here. pivot_longer & pivot_wider replace spread & gather

tidyr 1.0.0 is here. pivot_longer & pivot_wider replace spread & gather

September 16, 2019 by cmdlinetips

tidyr 1.0.0 is here.
tidyr 1.0.0 is here.
tidyr version 1.0.0 is here with a lot of new changes. tidyr has been around for about five years and it has finally tidyr has reached version 1.0.0. There are four big changes in the new version of tidyr. One of the biggest changes is the new functions pivot_longer() and pivot_wider() for reshaping tabular dataserts. These functions supercede the existing spread() and gather(). This change can be a bit disruptive to your workflow, but it is kind of expected.

tidyr 1.0.0 now on CRAN ?: https://t.co/sLnI2SlUsp — new pivot_longer() and pivot_wider() functions, better rectangling tools (unnest_longer(), unnest_wider(), hoist()), improved expand_grid(), new (un)nesting interface, and much much more! #rstats

— Hadley Wickham (@hadleywickham) September 13, 2019

A lot of data wrangling is about reshaping data from one form to another. In R, tidyerse’s tidyr package is at the core of reshaping tabular datasets. If you have used tidyr’s key functions spread and gather and felt that you don’t remember how you did the same operations last time, you are not alone. Even Hadley Wickham had to lookup documents to use these function. Earlier this year, Hadley Wickham announced that there will be simpler and “easier to understand” functions for reshaping data in the next version of tidyr.

You may have heard a rumour that gather/spread are going away. This is simply not true (they’ll stay around forever) but I am working on better replacements which you can learn about at https://t.co/sU2GzWeBaf. Now is a great time for feedback! #rstats

— Hadley Wickham (@hadleywickham) March 19, 2019

The new functions, pivot_longer() and pivot_wider(), for reshaping data is here now and they are substantially more powerful. These functions borrow ideas from existing packages like, data.table and cdata.

Other big changes include new set of functions for rectangling, to convert nested lists into tidy dataframes.

  • unnest_auto()
  • unnest_longer(),
  • unnest_wider(),
  • hoist()

In the new version of tidyr 1.0.0, we also have four new functions to make nesting easier.

  • pack()/unpack()
  • chop()/unchop()

In addition to these updates, new tidyr version also has new expand_grid() function, a variant of base::expand.grid() to create all possible combination of variables..

Getting started with pivot_longer and pivot_wider

Just, can’t wait to learn the new functions and their use. To start with, here is the first exploration of tidyr 1.0.0. In this part, we will see a step by step example of simpler uses of pivot_longer and pivot_wider functions using gapminder data set. We will start with un-tidy wider data and use pivot_longer to tidy the data and then use pivot_wider to make the longer tidy data to wider data frame.

pivot_longer is the replacement for gather() and pivot_wider() is the replacement for spread(). Both are designed to be simpler and can handle more cases than gather and spread. RStudio highly recommends you use the new functions although gather() and spread() are not going away but will not be actively devloped.

Let us first install the new version tidyr 1.0.0 and verify we have the new tidyr version.

> install.packages("tidyr")
# check the installed package version
> packageVersion("tidyr")
[1] ‘1.0.0’

Let us load the new version of tidyr package and other packages needed.

library(tidyr)
library(readr)
library(dplyr)

Let us use gapminder dataset in wide form from Carpentries website.

data_url <- "https://goo.gl/ioc2Td"
gapminder <-read_csv(data_url)
head(gapminder)

We can see that the gapminder data frame is not tidy and in wide form. For example the column names are actually variables containing information about year and the type of variable.

## # A tibble: 6 x 38
##   continent country gdpPercap_1952 gdpPercap_1957 gdpPercap_1962
##   <chr>     <chr>            <dbl>          <dbl>          <dbl>
## 1 Africa    Algeria          2449.          3014.          2551.
## 2 Africa    Angola           3521.          3828.          4269.
## 3 Africa    Benin            1063.           960.           949.
## 4 Africa    Botswa…           851.           918.           984.
## 5 Africa    Burkin…           543.           617.           723.
## 6 Africa    Burundi           339.           380.           355.
## # … with 33 more variables: gdpPercap_1967 <dbl>, gdpPercap_1972 <dbl>,
## #   gdpPercap_1977 <dbl>, gdpPercap_1982 <dbl>, gdpPercap_1987 <dbl>,
## #   gdpPercap_1992 <dbl>, gdpPercap_1997 <dbl>, gdpPercap_2002 <dbl>,
## #   gdpPercap_2007 <dbl>, lifeExp_1952 <dbl>, lifeExp_1957 <dbl>,
## #   lifeExp_1962 <dbl>, lifeExp_1967 <dbl>, lifeExp_1972 <dbl>,

For our illustration here, let us simplify the gapminder dataframe in wide form to contain columns starting with “life” to get lifeExp variable and the year.

gapminder_life <- gapminder %>% 
  select(continent,country,starts_with("life"))
head(gapminder_life)

Now we can see that column names specify lifeExp for each year in our data set.

## # A tibble: 6 x 14
##   continent country lifeExp_1952 lifeExp_1957 lifeExp_1962 lifeExp_1967
##   <chr>     <chr>          <dbl>        <dbl>        <dbl>        <dbl>
## 1 Africa    Algeria         43.1         45.7         48.3         51.4
## 2 Africa    Angola          30.0         32.0         34           36.0
## 3 Africa    Benin           38.2         40.4         42.6         44.9
## 4 Africa    Botswa…         47.6         49.6         51.5         53.3
## 5 Africa    Burkin…         32.0         34.9         37.8         40.7
## 6 Africa    Burundi         39.0         40.5         42.0         43.5
## # … with 8 more variables: lifeExp_1972 <dbl>, lifeExp_1977 <dbl>,

To make the wide data frame to tidy form, where each column is a variable and each row is an observation, we can use pivot_longer() function from the new version of tidyr.

pivot_longer() makes datasets longer by increasing the number of rows and decreasing the number of columns.

In the simplest use case here, we first specify which columns needs to be reshaped. Since the first first two columns, continent and country are variables already in tidy form, we specify that we need to reshape all columns except these two. Then we specify the variable name for column names using “names_to” argument and then variable name for the values in columns using the argument “values_to”. Note that these argument takes the new variable names with quotes, as they are present in the data frame yet.

gapminder_life %>% 
  pivot_longer(-c(continent,country), names_to = "year", values_to = "lifeExp")

The result from pivot_longer() function is a tibble with four columns, where the firs two columns are the old ones and the remaining two columns are the new ones that we created. We can see that the variable year contains the column names in the wide data frame and the lifeExp contains the actual values.

## # A tibble: 1,704 x 4
##    continent country year         lifeExp
##    <chr>     <chr>   <chr>          <dbl>
##  1 Africa    Algeria lifeExp_1952    43.1
##  2 Africa    Algeria lifeExp_1957    45.7
##  3 Africa    Algeria lifeExp_1962    48.3
##  4 Africa    Algeria lifeExp_1967    51.4
##  5 Africa    Algeria lifeExp_1972    54.5

Let us see an example of how to use pivot_wider() to convert a data frame in tidy form to a data frame in non-tidy/wider form. Let us use the tidy data frame from the above example.

gapminder_tidy <- gapminder_life %>% 
  pivot_longer(-c(continent,country), 
               names_to = "year", values_to = "lifeExp")

Now we have the tidy tall data frame. Let us use pivot_wider() to reshape the tidy data to wide data frame. As a first argument to pivot_wider() function, we need to specify which column in the tidy data frame should be column names in the wide form. In our example, year should be the column names of the wide/non-tidy data and we provide that to the argument “names_from”.

And then we specify which column/variable should be values in non-tidy data frame as argument to “values_from”.

gapminder_tidy %>% 
  pivot_wider(names_from = year, values_from = lifeExp)

The result from pivot_wider() function is our original gap minder data frame in wide form.

## # A tibble: 142 x 14
##    continent country lifeExp_1952 lifeExp_1957 lifeExp_1962 lifeExp_1967
##    <chr>     <chr>          <dbl>        <dbl>        <dbl>        <dbl>
##  1 Africa    Algeria         43.1         45.7         48.3         51.4
##  2 Africa    Angola          30.0         32.0         34           36.0
##  3 Africa    Benin           38.2         40.4         42.6         44.9
##  4 Africa    Botswa…         47.6         49.6         51.5         53.3
##  5 Africa    Burkin…         32.0         34.9         37.8         40.7

In these simple examples using pivot_longer() and pivot_wider(), it is clear that the argument names definitely make more sense.

Look forward to examples of other new functions in the new version of tidy 1.0.0 soon.

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

pivot_longer(): wide form to long formR For Data Science Book Gets tidyr 1.0.0 Friendly Pandas melt to reshape dataframeHow To Reshape Pandas Dataframe with melt and wide_to_long()? dplyr 1.0.0 is heredplyr 1.0.0 is here: Quick fun with Summarise() and rowwise() Default ThumbnailIntroduction to nest() in tidyr

Filed Under: pivot_longer() tidyr, pivot_wider() tidyr, tidyr 1.0.0 Tagged With: pivot_longer() tidyr, tidyr 1.0.0

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version