In this post, we will learn a really nice trick on creating multiple ggplots from a dataframe and saving the plots into files using ggsave, using tidyverse purrr’s magic. We will use Purrr’s map function to create multiple plots from a dataframe and use another Purrr function pwalk to save the plots as files. Learned this awesome trick from the upcoming Second Edition of R for Data Science Book online.
This is an extremely useful when you are making similar plots, but for different values of a group or variable from a dataframe. Normally, when I try to create multiple similar plots from a dataframe, I use for loop and filter dataframe for groups of interest, make plot on the subset of the data and then save the plot as file.
With the Purrr’s goodness, we can skip the for loop and create multiple plots from a dataframe and save the all plots as files.
Let us get started by loading tidyverse and palmer penguin dataset.
library(tidyverse) library(palmerpenguins)
Our goal is to make a scatter plot between two numerical variables for each value of categorical/character variable. To make it easier, we will simplify the penguins data by selecting just a few columns. Now we have just five columns of data with no missing values.
penguins <- penguins %>% drop_na() %>% select(species:bill_depth_mm, sex) penguins %>% head() ## # A tibble: 6 × 5 ## species island bill_length_mm bill_depth_mm sex ## <fct> <fct> <dbl> <dbl> <fct> ## 1 Adelie Torgersen 39.1 18.7 male ## 2 Adelie Torgersen 39.5 17.4 female ## 3 Adelie Torgersen 40.3 18 female ## 4 Adelie Torgersen 36.7 19.3 female ## 5 Adelie Torgersen 39.3 20.6 male ## 6 Adelie Torgersen 38.9 17.8 female
In this Penguins dataset, we aim to make scatter plot between bill length and bill depth each penguin species. First step is to split our dataframe into smaller dataframes such that each smaller dataframe contains the data corresponding to each value of the species variable.
Using base R function split(), we can split the original dataframe into multiple smaller dataframes. Since we have three different penguin species in our data, when we apply split() function on species variable we get three smaller dataframe; one for each penguin species. Note the “.$” symbol while splitting the data when using split() function.
penguins %>% split(.$species) ## $Adelie ## # A tibble: 146 × 5 ## species island bill_length_mm bill_depth_mm sex ## <fct> <fct> <dbl> <dbl> <fct> ## 1 Adelie Torgersen 39.1 18.7 male ## 2 Adelie Torgersen 39.5 17.4 female ## 3 Adelie Torgersen 40.3 18 female ## 4 Adelie Torgersen 36.7 19.3 female ## 5 Adelie Torgersen 39.3 20.6 male ## 6 Adelie Torgersen 38.9 17.8 female ## 7 Adelie Torgersen 39.2 19.6 male ## 8 Adelie Torgersen 41.1 17.6 female ## 9 Adelie Torgersen 38.6 21.2 male ## 10 Adelie Torgersen 34.6 21.1 male ## # … with 136 more rows ## ## $Chinstrap ## # A tibble: 68 × 5 ## species island bill_length_mm bill_depth_mm sex ## <fct> <fct> <dbl> <dbl> <fct> ## 1 Chinstrap Dream 46.5 17.9 female ## 2 Chinstrap Dream 50 19.5 male ## 3 Chinstrap Dream 51.3 19.2 male ## 4 Chinstrap Dream 45.4 18.7 female ## 5 Chinstrap Dream 52.7 19.8 male ## 6 Chinstrap Dream 45.2 17.8 female ## 7 Chinstrap Dream 46.1 18.2 female ## 8 Chinstrap Dream 51.3 18.2 male ## 9 Chinstrap Dream 46 18.9 female ## 10 Chinstrap Dream 51.3 19.9 male ## # … with 58 more rows ## ## $Gentoo ## # A tibble: 119 × 5 ## species island bill_length_mm bill_depth_mm sex ## <fct> <fct> <dbl> <dbl> <fct> ## 1 Gentoo Biscoe 46.1 13.2 female ## 2 Gentoo Biscoe 50 16.3 male ## 3 Gentoo Biscoe 48.7 14.1 female ## 4 Gentoo Biscoe 50 15.2 male ## 5 Gentoo Biscoe 47.6 14.5 male ## 6 Gentoo Biscoe 46.5 13.5 female ## 7 Gentoo Biscoe 45.4 14.6 female ## 8 Gentoo Biscoe 46.7 15.3 male ## 9 Gentoo Biscoe 43.3 13.4 female ## 10 Gentoo Biscoe 46.8 15.4 male ## # … with 109 more rows
To split the data, we can also use tidyverse function group_split() instead of split() from base R. tidyverse’s group_split() function does exactly the same as split(), it
returns a list of tibbles. Each tibble contains the rows of .tbl for the associated group and all the columns, including the grouping variables.
However, note that group_split() function is is still in experimental lifecycle.
penguins %>% group_split(species) ## <list_of< ## tbl_df< ## species : factor<b22a0> ## island : factor<ccf33> ## bill_length_mm: double ## bill_depth_mm : double ## sex : factor<8f119> ## > ## >[3]> ## [[1]] ## # A tibble: 146 × 5 ## species island bill_length_mm bill_depth_mm sex ## <fct> <fct> <dbl> <dbl> <fct> ## 1 Adelie Torgersen 39.1 18.7 male ## 2 Adelie Torgersen 39.5 17.4 female ## 3 Adelie Torgersen 40.3 18 female ## 4 Adelie Torgersen 36.7 19.3 female ## 5 Adelie Torgersen 39.3 20.6 male ## 6 Adelie Torgersen 38.9 17.8 female ## 7 Adelie Torgersen 39.2 19.6 male ## 8 Adelie Torgersen 41.1 17.6 female ## 9 Adelie Torgersen 38.6 21.2 male ## 10 Adelie Torgersen 34.6 21.1 male ## # … with 136 more rows ## ## [[2]] ## # A tibble: 68 × 5 ## species island bill_length_mm bill_depth_mm sex ## <fct> <fct> <dbl> <dbl> <fct> ## 1 Chinstrap Dream 46.5 17.9 female ## 2 Chinstrap Dream 50 19.5 male ## 3 Chinstrap Dream 51.3 19.2 male ## # … ##
List of plots using map() function in purrr
Using the results from split() function, we can create a list of plots, ggplot objects, using map() function in purrr R package. In this example, map() makes a scatter plot for each species. Note that the first argument to ggplot() function is .x and it represents the smaller dataframe corresponding to each species.
plots <- penguins %>% split(.$species) %>% map(~ggplot(.x, aes(bill_length_mm, bill_depth_mm, color=sex)) + geom_point()+ theme_bw(16))
The resulting plots variable we created has the list of plots like these.
plots ## $Adelie ## ## $Chinstrap ## ## $Gentoo
How to save a list of plots as files using pwalk function in purrr
Now that we have created a list plot objects, we can use another purrr function, pwalk() to walk through each plot object and save it as a file. First, we create file names using names() function on the list of plots.
file_names <- stringr::str_c(names(plots), ".png") file_names ## [1] "Adelie.png" "Chinstrap.png" "Gentoo.png"
And then use the created file names and plots as input to pwalk() function to save as plots. In this example, we save the plots as png file using ggsave function as one of the arguments to pwalk().
pwalk(list(file_names, plots), ggsave, path = ".")
We can also specify other arguments of ggsave function. Here we specify height and width for the plots to be saved.
pwalk(list(file_names, plots), ggsave, width=8, height=6, path = ".")
Here is a slightly improved version of the same plots, this time using current small dataframe to get the species name and using it for the title of the plot
plots <- penguins %>% split(.$species) %>% map(~ggplot(.x, aes(bill_length_mm, bill_depth_mm, color=sex)) + geom_point()+ theme_bw(16)+ labs(title=.x %>% pull(species) %>% unique())) pwalk(list(file_names, plots), ggsave, width=8, height=6, path = ".")
And we get the files saved in thee specified location with the names we assigned. Here is how one of the plots we generated looks like.