ggplot2 Version 3.0.0 Brings Tidy Evaluation to ggplot

ggplot2 Version 3.0.0 Supports Tidy Eval
RStudio has unveiled major updates to ggplot2 with new version 3.0.0. The new ggplot2 version is available on CRAN about two weeks ago. ggplot2 3.0.0 was originally announced as ggplot2 2.3.0, but big updates made RStudio to bump the version number to 3.0.0.

One of the biggest additions in the new version is that ggplo2 can now speak the language of tidy evaluation. Yes, tidy eval – one of the coolest features of dplyr and tidyr is now available in ggplot2. The support of tidy evaluation makes it really easy to write general purpose functions that makes ever-awesome beautiful ggplots.

One can install the new version of ggplot2 from CRAN using

install.packages("ggplot2")

And make sure you have installed the ggplot2 3.0.0 using

pacakageVersion("ggplot2")
[1] ‘3.0.0’

Here is my first stab at the simple use of tidy evaluation in ggplot. One of the common frustrations I face is I want to write a simple function that uses a data frame with multiple variables and make a plot of use specified variables.

Let us consider a simple example actually trying to make plots first and understand the problem a bit more. Let us first load the necessary libraries and gapminder dataset.

library(ggplot2)
library(gapminder)

Let us have a peek at the gapminder dataset

head(gapminder, n=3)
      country year      pop continent lifeExp gdpPercap
1 Afghanistan 1952  8425333      Asia  28.801  779.4453
2 Afghanistan 1957  9240934      Asia  30.332  820.8530
3 Afghanistan 1962 10267083      Asia  31.997  853.1007
 

Let us say we want to look the distributions of “gdpPercap” and “pop” across the continents as a box plot. We can simply use ggplot to make the box plots

ggplot(gapminder, aes(x=continent, y=pop)) +
       geom_boxplot() +
       geom_jitter(width = 0.2) +
       scale_y_log10() 


Often you may want to make more box plots using the same data frame but different variables like “continent vs pop”. One option is to think of writing a function that takes in variables as input and make boxplots. The challenge lies in how to use the variables as input to the function. One option that is already available for us to use and make boxplots is to write a function using aes_string() instead of aes()

get_box_plot <- function(x, y){
    ggplot(gapminder, aes_string(x = x, y = y))+
    geom_boxplot() + 
    geom_jitter(width = 0.2)+
    scale_y_log10()
}

Then we can specify the x and y variables as

x_var <- "continent"
y_var <- "pop"

and the call the function with the variable names as arguments

get_box_plot(x = x_var,
            y = y_var)

This works nicely and we can make box plots with other variables from the data frame. For example

get_box_plot(x = x_var,
            y = "gdpPercap")

However, we will run into problems if we want to feed in the variables just like the way we do in dplyr. For example, if we specify the variable names as it is, like (with out double quotes)

get_box_plot(x = continent,
            y = gdpPercap)

We will get the following error

Error in aes_string(x = x, y = y) : object 'continent' not found

One might think, we are getting this error because we used “aes_string()” instead of “aes()” inside ggplot2. Let rewrite the function with “aes()” and try again.

get_box_plot_with_aes <- function(x, y){
    ggplot(gapminder, aes(x = x, y = y))+
    geom_boxplot() + 
    geom_jitter(width = 0.2)+
    scale_y_log10()
}

get_box_plot_with_aes(x = continent,
            y = gdpPercap)

We will get this error.

Error in FUN(X[[i]], ...) : object 'continent' not found

The issue is due to how the statement gets evaluated. This is one of the reasons tidyr packages like dplyr and tidyr have tidy evaluation or Non Standard Evaluation to enable writing R code like this. Now the tidy evaluation is available in ggplot2 and let us see how to use tidy evaluation in ggplot2 to write general functions to make plots.

aes() in ggplot now supports quasiquotation so one can use !! in combination with enquo(). The support for tidy eval basically replaces the existing use of aes_string(). aes_string() is now soft-deprecated, but will be available for use for a long time.

make_box_plots_teval_1 <- function(x, y){
  x_var <- enquo(x)
  y_var <- enquo(y)
  ggplot(gapminder, aes(x = !!x_var, y = !!y_var))+
    geom_boxplot() + 
    geom_jitter(width = 0.2) +
    scale_y_log10()
}

Now we can specify column names in the dataframe without the quotes and make plots.

## column names without quotes as arguments 
make_box_plots_teval_1(x = continent,
                        y = gdpPercap)

make_box_plots_teval_1(x = continent,
                        y = pop)

Isn’t that neat? Not just that, another new feature in ggplot2 3.0.0 is faceting also gets tidy evaluation functionalities. The new function “vars” enables to pass the faceting variable with out quotes. Here is a simple example that lets us write functions. Both, facet_wrap() and facet_grid() now support vars() inputs.

make_box_plots_teval_2 <- function(x, y, facet_var){
  
  x_var <- enquo(x)
  y_var <- enquo(y)
  facet_var <- enquo(facet_var)
  
  ggplot(gapminder, aes(x = !!x_var, y = !!y_var))+
    geom_boxplot()+ 
    geom_jitter(width = 0.2)+
    scale_y_log10()+
    facet_wrap(vars(!!facet_var))
}

We can call the function with the faceting variable without quotes.

make_box_plots_teval_2(x = continent,
                        y = gdpPercap, group_var=year)