How to Make Boxplot in R with ggplot2?

Boxplot with jittered Data Points in R
Boxplot with jittered Data Points in R

One of many strengths of R is the tidyverse packages and the ability to make great looking plots easily. Boxplot or Box and Whisker plot, introduced by John Tukey is great for visualizing data from multiple groups/ distributions. Boxplot allows you to actually display the data together with efficient summary of the data using min, max, 25th, 50th and 75th percentiles.

Let us learn how to make boxplot using ggplot in R and see a few examples of basic boxplot and adding more details to the plot. First, let us load the packages we need to plot boxplots. Here, ggplot2 for plotting and readr for reading the data as data frame.

library(ggplot2)
library(readr)

Let us use gapminder data from Software Carpentry website. We can use readr’s read_csv to load the gapminder data as data frame from the URL.

gapminder_url <- 'https://bit.ly/2cLzoxH'
gapminder <- read_csv(gapminder_url)
head(gapminder)

How To Make Basic Boxplot?

gapminder data life expectancy for each country and continent over multiple years. Let us make a boxplot of life expectancy across continents. We will first provide the gapminder data frame to ggplot and then specify the aesthetics with aes() function in ggplot2. Inside aes(), we will specify x-axis and y-axis variables. To make the boxplot between continent vs lifeExp, we will use the geom_boxplot() layer in ggplot2.

ggplot(gapminder,aes(x=continent, y=lifeExp))+
      geom_boxplot()

The result is a basic boxplot of liefExp for multiple continents. We can clearly see the trend, which is lower life expectancy for Africas and higher life expectancy for Europe and Oceania.



How To Make Basic Boxplot with Colors?

Let us add colors to the basic boxplot. We can color each continent separately. ggplot2 allows you to color by specifying a variable, here continent. We can use fill argument inside aes() function to color the plot.

ggplot(gapminder,aes(x=continent, y=lifeExp, fill=continent)) +
      geom_boxplot()

How To Make Boxplot with Data Points?

Although the boxplot with colors looks much better than the basic boxplot, we are still not showing the actual data. We are only plotting the summary of the data as boxes. We can add actual data points as an additional layer to the boxplot in ggplot by simply adding the function geom_point().

ggplot(gapminder,aes(x=continent, y=lifeExp, fill=continent)) +
      geom_boxplot()+
      geom_point()

How To Make Boxplot with Data Points and jitter?

Adding geom_point() as additional layer plotted all the data points on a vertical line and it is not that useful since all the points with same life expectancy completely overlaps on each other.

One solution to avoid this and actually visualize the data on boxplot is to randomly jitter data points horizontally. ggplot allows you to do that with geom_jitter() function. One can also control the width of the jitter with width argument and specify transparency of data points with the argument alpha.

ggplot(gapminder,aes(x=continent, y=lifeExp, fill=continent)) +
      geom_boxplot() +
      geom_jitter(width=0.25, alpha=0.5)