How to create a list of plot objects and save them as files

List of plots using map() in purrr
How to make a list of ggplots using map() in purrr

In this post, we will learn a really nice trick on creating multiple ggplots from a dataframe and saving the plots into files using ggsave, using tidyverse purrr’s magic. We will use Purrr’s map function to create multiple plots from a dataframe and use another Purrr function pwalk to save the plots as files. Learned this awesome trick from the upcoming Second Edition of R for Data Science Book online.

This is an extremely useful when you are making similar plots, but for different values of a group or variable from a dataframe. Normally, when I try to create multiple similar plots from a dataframe, I use for loop and filter dataframe for groups of interest, make plot on the subset of the data and then save the plot as file.

With the Purrr’s goodness, we can skip the for loop and create multiple plots from a dataframe and save the all plots as files.

Let us get started by loading tidyverse and palmer penguin dataset.

library(tidyverse)
library(palmerpenguins)

Our goal is to make a scatter plot between two numerical variables for each value of categorical/character variable. To make it easier, we will simplify the penguins data by selecting just a few columns. Now we have just five columns of data with no missing values.

penguins <- penguins %>%
  drop_na() %>%
  select(species:bill_depth_mm, sex)

penguins %>% head()

## # A tibble: 6 × 5
##   species island    bill_length_mm bill_depth_mm sex   
##   <fct>   <fct>              <dbl>         <dbl> <fct> 
## 1 Adelie  Torgersen           39.1          18.7 male  
## 2 Adelie  Torgersen           39.5          17.4 female
## 3 Adelie  Torgersen           40.3          18   female
## 4 Adelie  Torgersen           36.7          19.3 female
## 5 Adelie  Torgersen           39.3          20.6 male  
## 6 Adelie  Torgersen           38.9          17.8 female

In this Penguins dataset, we aim to make scatter plot between bill length and bill depth each penguin species. First step is to split our dataframe into smaller dataframes such that each smaller dataframe contains the data corresponding to each value of the species variable.

Using base R function split(), we can split the original dataframe into multiple smaller dataframes. Since we have three different penguin species in our data, when we apply split() function on species variable we get three smaller dataframe; one for each penguin species. Note the “.$” symbol while splitting the data when using split() function.

penguins %>%
  split(.$species)

## $Adelie
## # A tibble: 146 × 5
##    species island    bill_length_mm bill_depth_mm sex   
##    <fct>   <fct>              <dbl>         <dbl> <fct> 
##  1 Adelie  Torgersen           39.1          18.7 male  
##  2 Adelie  Torgersen           39.5          17.4 female
##  3 Adelie  Torgersen           40.3          18   female
##  4 Adelie  Torgersen           36.7          19.3 female
##  5 Adelie  Torgersen           39.3          20.6 male  
##  6 Adelie  Torgersen           38.9          17.8 female
##  7 Adelie  Torgersen           39.2          19.6 male  
##  8 Adelie  Torgersen           41.1          17.6 female
##  9 Adelie  Torgersen           38.6          21.2 male  
## 10 Adelie  Torgersen           34.6          21.1 male  
## # … with 136 more rows
## 
## $Chinstrap
## # A tibble: 68 × 5
##    species   island bill_length_mm bill_depth_mm sex   
##    <fct>     <fct>           <dbl>         <dbl> <fct> 
##  1 Chinstrap Dream            46.5          17.9 female
##  2 Chinstrap Dream            50            19.5 male  
##  3 Chinstrap Dream            51.3          19.2 male  
##  4 Chinstrap Dream            45.4          18.7 female
##  5 Chinstrap Dream            52.7          19.8 male  
##  6 Chinstrap Dream            45.2          17.8 female
##  7 Chinstrap Dream            46.1          18.2 female
##  8 Chinstrap Dream            51.3          18.2 male  
##  9 Chinstrap Dream            46            18.9 female
## 10 Chinstrap Dream            51.3          19.9 male  
## # … with 58 more rows
## 
## $Gentoo
## # A tibble: 119 × 5
##    species island bill_length_mm bill_depth_mm sex   
##    <fct>   <fct>           <dbl>         <dbl> <fct> 
##  1 Gentoo  Biscoe           46.1          13.2 female
##  2 Gentoo  Biscoe           50            16.3 male  
##  3 Gentoo  Biscoe           48.7          14.1 female
##  4 Gentoo  Biscoe           50            15.2 male  
##  5 Gentoo  Biscoe           47.6          14.5 male  
##  6 Gentoo  Biscoe           46.5          13.5 female
##  7 Gentoo  Biscoe           45.4          14.6 female
##  8 Gentoo  Biscoe           46.7          15.3 male  
##  9 Gentoo  Biscoe           43.3          13.4 female
## 10 Gentoo  Biscoe           46.8          15.4 male  
## # … with 109 more rows

To split the data, we can also use tidyverse function group_split() instead of split() from base R. tidyverse’s group_split() function does exactly the same as split(), it

returns a list of tibbles. Each tibble contains the rows of .tbl for the associated group and all the columns, including the grouping variables.

However, note that group_split() function is is still in experimental lifecycle.

penguins %>%
  group_split(species) 

## <list_of<
##   tbl_df<
##     species       : factor<b22a0>
##     island        : factor<ccf33>
##     bill_length_mm: double
##     bill_depth_mm : double
##     sex           : factor<8f119>
##   >
## >[3]>
## [[1]]
## # A tibble: 146 × 5
##    species island    bill_length_mm bill_depth_mm sex   
##    <fct>   <fct>              <dbl>         <dbl> <fct> 
##  1 Adelie  Torgersen           39.1          18.7 male  
##  2 Adelie  Torgersen           39.5          17.4 female
##  3 Adelie  Torgersen           40.3          18   female
##  4 Adelie  Torgersen           36.7          19.3 female
##  5 Adelie  Torgersen           39.3          20.6 male  
##  6 Adelie  Torgersen           38.9          17.8 female
##  7 Adelie  Torgersen           39.2          19.6 male  
##  8 Adelie  Torgersen           41.1          17.6 female
##  9 Adelie  Torgersen           38.6          21.2 male  
## 10 Adelie  Torgersen           34.6          21.1 male  
## # … with 136 more rows
## 
## [[2]]
## # A tibble: 68 × 5
##    species   island bill_length_mm bill_depth_mm sex   
##    <fct>     <fct>           <dbl>         <dbl> <fct> 
##  1 Chinstrap Dream            46.5          17.9 female
##  2 Chinstrap Dream            50            19.5 male  
##  3 Chinstrap Dream            51.3          19.2 male      
## # … 
## 

List of plots using map() function in purrr

Using the results from split() function, we can create a list of plots, ggplot objects, using map() function in purrr R package. In this example, map() makes a scatter plot for each species. Note that the first argument to ggplot() function is .x and it represents the smaller dataframe corresponding to each species.

plots <- penguins %>%
  split(.$species) %>%
  map(~ggplot(.x, aes(bill_length_mm, bill_depth_mm, color=sex)) + 
        geom_point()+
        theme_bw(16))

The resulting plots variable we created has the list of plots like these.

plots

## $Adelie

## 
## $Chinstrap

## 
## $Gentoo
How to make a list of ggplots using map() in purrr

How to save a list of plots as files using pwalk function in purrr

Now that we have created a list plot objects, we can use another purrr function, pwalk() to walk through each plot object and save it as a file. First, we create file names using names() function on the list of plots.

file_names <- stringr::str_c(names(plots), ".png")

file_names
## [1] "Adelie.png"    "Chinstrap.png" "Gentoo.png"

And then use the created file names and plots as input to pwalk() function to save as plots. In this example, we save the plots as png file using ggsave function as one of the arguments to pwalk().

pwalk(list(file_names, plots), ggsave, path = ".")

We can also specify other arguments of ggsave function. Here we specify height and width for the plots to be saved.

pwalk(list(file_names, plots),
      ggsave, 
      width=8, 
      height=6,
      path = ".")

Here is a slightly improved version of the same plots, this time using current small dataframe to get the species name and using it for the title of the plot

plots <- penguins %>%
  split(.$species) %>%
  map(~ggplot(.x, aes(bill_length_mm, bill_depth_mm, color=sex)) + 
        geom_point()+
        theme_bw(16)+
        labs(title=.x %>% pull(species) %>% unique()))
pwalk(list(file_names, plots),
      ggsave, 
      width=8, 
      height=6,
      path = ".")

And we get the files saved in thee specified location with the names we assigned. Here is how one of the plots we generated looks like.

Saving list of plot objects as files