RStudio has unveiled major updates to ggplot2 with new version 3.0.0. The new ggplot2 version is available on CRAN about two weeks ago. ggplot2 3.0.0 was originally announced as ggplot2 2.3.0, but big updates made RStudio to bump the version number to 3.0.0.
One of the biggest additions in the new version is that ggplo2 can now speak the language of tidy evaluation. Yes, tidy eval – one of the coolest features of dplyr and tidyr is now available in ggplot2. The support of tidy evaluation makes it really easy to write general purpose functions that makes ever-awesome beautiful ggplots.
ggplot2 3.0.0 now on CRAN — https://t.co/m8fSDaaSbu ? Headline features are tidy eval, sf support, position_dodge2(), & viridis. Huge thanks to all 300+ contributors to this release, and particularly to @ClausWilke, @kara_woo, @_lionelhenry and @thomasp85 pic.twitter.com/igb9k0Z89C
— Hadley Wickham (@hadleywickham) July 4, 2018
One can install the new version of ggplot2 from CRAN using
install.packages("ggplot2")
And make sure you have installed the ggplot2 3.0.0 using
pacakageVersion("ggplot2") [1] ‘3.0.0’
Here is my first stab at the simple use of tidy evaluation in ggplot. One of the common frustrations I face is I want to write a simple function that uses a data frame with multiple variables and make a plot of use specified variables.
Let us consider a simple example actually trying to make plots first and understand the problem a bit more. Let us first load the necessary libraries and gapminder dataset.
library(ggplot2) library(gapminder)
Let us have a peek at the gapminder dataset
head(gapminder, n=3) country year pop continent lifeExp gdpPercap 1 Afghanistan 1952 8425333 Asia 28.801 779.4453 2 Afghanistan 1957 9240934 Asia 30.332 820.8530 3 Afghanistan 1962 10267083 Asia 31.997 853.1007
Let us say we want to look the distributions of “gdpPercap” and “pop” across the continents as a box plot. We can simply use ggplot to make the box plots
ggplot(gapminder, aes(x=continent, y=pop)) + geom_boxplot() + geom_jitter(width = 0.2) + scale_y_log10()
Often you may want to make more box plots using the same data frame but different variables like “continent vs pop”. One option is to think of writing a function that takes in variables as input and make boxplots. The challenge lies in how to use the variables as input to the function. One option that is already available for us to use and make boxplots is to write a function using aes_string() instead of aes()
get_box_plot <- function(x, y){ ggplot(gapminder, aes_string(x = x, y = y))+ geom_boxplot() + geom_jitter(width = 0.2)+ scale_y_log10() }
Then we can specify the x and y variables as
x_var <- "continent" y_var <- "pop"
and the call the function with the variable names as arguments
get_box_plot(x = x_var, y = y_var)
This works nicely and we can make box plots with other variables from the data frame. For example
get_box_plot(x = x_var, y = "gdpPercap")
However, we will run into problems if we want to feed in the variables just like the way we do in dplyr. For example, if we specify the variable names as it is, like (with out double quotes)
get_box_plot(x = continent, y = gdpPercap)
We will get the following error
Error in aes_string(x = x, y = y) : object 'continent' not found
One might think, we are getting this error because we used “aes_string()” instead of “aes()” inside ggplot2. Let rewrite the function with “aes()” and try again.
get_box_plot_with_aes <- function(x, y){ ggplot(gapminder, aes(x = x, y = y))+ geom_boxplot() + geom_jitter(width = 0.2)+ scale_y_log10() } get_box_plot_with_aes(x = continent, y = gdpPercap)
We will get this error.
Error in FUN(X[[i]], ...) : object 'continent' not found
The issue is due to how the statement gets evaluated. This is one of the reasons tidyr packages like dplyr and tidyr have tidy evaluation or Non Standard Evaluation to enable writing R code like this. Now the tidy evaluation is available in ggplot2 and let us see how to use tidy evaluation in ggplot2 to write general functions to make plots.
aes() in ggplot now supports quasiquotation so one can use !! in combination with enquo(). The support for tidy eval basically replaces the existing use of aes_string(). aes_string() is now soft-deprecated, but will be available for use for a long time.
make_box_plots_teval_1 <- function(x, y){ x_var <- enquo(x) y_var <- enquo(y) ggplot(gapminder, aes(x = !!x_var, y = !!y_var))+ geom_boxplot() + geom_jitter(width = 0.2) + scale_y_log10() }
Now we can specify column names in the dataframe without the quotes and make plots.
## column names without quotes as arguments make_box_plots_teval_1(x = continent, y = gdpPercap) make_box_plots_teval_1(x = continent, y = pop)
Isn’t that neat? Not just that, another new feature in ggplot2 3.0.0 is faceting also gets tidy evaluation functionalities. The new function “vars” enables to pass the faceting variable with out quotes. Here is a simple example that lets us write functions. Both, facet_wrap() and facet_grid() now support vars() inputs.
make_box_plots_teval_2 <- function(x, y, facet_var){ x_var <- enquo(x) y_var <- enquo(y) facet_var <- enquo(facet_var) ggplot(gapminder, aes(x = !!x_var, y = !!y_var))+ geom_boxplot()+ geom_jitter(width = 0.2)+ scale_y_log10()+ facet_wrap(vars(!!facet_var)) }
We can call the function with the faceting variable without quotes.
make_box_plots_teval_2(x = continent, y = gdpPercap, group_var=year)