How to Write Functions to Make Plots with ggplot2 in R

Scatter Plot with a lot code repetition
Scatter Plot with a lot code repetition

Okay here is a confession. Often I preach writing functions to simplify life at work. Although I try to follow the “writing functions” mantra decently, there is a grey area where I don’t use functions that much. Any guesses? Often make tonnes of exploratory plots with ggplot2 with tons of repetition of code with slight change of variables.

It is not due to lack of attempts. I have tried writing functions to make plots using tidy evaluations (See evidence here). However, the habit never took off.

Here I am trying to reboot writing functions to make plots. Came across two fantastic resources, one from RStudio 2020 talk Best practices for programming with ggplot2 by Dewey Dunnington and a recent lecture by Claus Wilke. Here I am starting with simple a example, thanks to Claus Wilke’s lesson on functions and functional programming.

Here is an example of saving your self by writing functions to make plots with ggplot2.

library(tidyverse)
library(palmerpenguins)

Let us say want to make a plot using data data corresponding to a group i.e. a subset of data. In this example we use Palmer penguins data and make a plot for just one of the penguin species. Our natural approach would be to filter the data for the group of interest and make a plot.

penguins %>%
  filter(species == "Gentoo") %>%
  ggplot() +
  aes(bill_length_mm, body_mass_g, color=sex) +
  geom_point() +
  ggtitle("Species: Gentoo") +
  xlab("bill length (mm)") +
  ylab("body mass (g)") +
  theme(plot.title.position = "plot")

Scatter Plot with a lot code repetition

If we want to create a plot for different group, often we might repeat most of the code except for the part specifying the group of interest.

Here we make a plot for different penguin species. Note the code is almost the same except for the filter statement and ggtitle statement.

penguins %>%
  filter(species == "Chinstrap") %>%
  ggplot() +
  aes(bill_length_mm, body_mass_g, color=sex) +
  geom_point() +
  ggtitle("Species: Chinstrap") +
  xlab("bill length (mm)") +
  ylab("body mass (g)") +
  theme(plot.title.position = "plot")

Scatter Plot with a lot code repetition Example 2

We can avoid writing similar code by using variables and function. For example, instead of hard-coding the values for groups, we can create variable and use the variable name instead.

For example, we can create a new variable for specifying the sub-group of interest, in this case species of interest. And use the variable name while plotting and this helps us from writing similar code.

For example, we define new variable “species_choice” with the species of interest.

species_choice <- "Adelie"
penguins %>%
  filter(species == species_choice) 

Another trick in accessing variable of interest from different environment. This allows us to use the same variable name from different environment. For example, we can specify species of interest by

species <- "Adelie"

And subset the data using “species” name but from two different environment. Here we access species variable from the data using the pronoun “.data” and access species variable from the current working environment using “.env”.

penguins %>%
  filter(.data$species == .env$species)

Here “.data$species” gets us the column in the data frame, while “.env$species” is a variable in the local environment that we just created.

Now we can write a small function that takes in species name as input and make the plot. Note we use glue package trick to access variable name using curly braces around the variable of interest.

Glue offers interpreted string literals that are small, fast, and dependency-free. Glue does this by embedding R expressions in curly braces which are then evaluated and inserted into the argument string.

make_plot <- function(species) {
  penguins %>%
    filter(.data$species == .env$species) %>%
    ggplot() +
    aes(bill_length_mm, body_mass_g, color=sex) +
    geom_point() +
    ggtitle(glue("Species: {species}")) +
    xlab("bill length (mm)") +
    ylab("body mass (g)") +
    theme(plot.title.position = "plot")
}

We can call the function to make a plot for a single species.

make_plot("Adelie")

With the function to make plots ready, we can make plots for all species easily without repeating ourselves. Here we use “map” function takes each element of the vector species and uses it as input for make_plot(). And the resulting plots are stored in variable as a list.

species <- c("Adelie", "Chinstrap", "Gentoo")
plots <- map(species, make_plot)

We can get the plots from the list. We can get the first plot

plots[[1]]

and the second plot

plots[[2]]

1 comment

Comments are closed.