Ridgeline plots is a great way to visualize changes in multiple distributions/histogram either over time or space. It was initially called as joyplots, for a brief time. ggridges package from UT Austin professor Claus Wilke lets you make ridgeline plots in combinaton with ggplot. Here is how Claus describes the ridgeline plot with a brief histroy
Ridgeline plots are partially overlapping line plots that create the impression of a mountain range. They can be quite useful for visualizing changes in distributions over time or space. These types of plots have also been called “joyplots”, in reference to the iconic cover art for Joy Division’s album Unknown Pleasures. However, given the unfortunate origin of the name Joy Division, the term “joyplot” is now discouraged.
Let us get started plotting a few examples of ridgeline plots with ggridges.
How to Install ggridges?
# install stable version install.packages("ggridges")
To install latest development version,
library(devtools) install_github("clauswilke/ggridges")
The ggridges package has two types of geoms geom_ridgeline and geom_density_ridges. geom_ridgeline() draws ridgelines() by taking the heights directly from the data. geom_density_ridges() estimates densities from the data and then draws those using ridgelines.
We will using gapminder data to make ridgeline plots. Let us get the data from Software Carpentry URL.
data_url = 'http://bit.ly/2cLzoxH' # read data from url as dataframe gapminder = read.csv(data_url)
The gapminder data has for 12 years. Let us plot a ridgeline plot between year and lifeExp using ggridges and ggplot. So we will first specify data and the aesthetics for the plot. And then we can add the layer for ridgeline plot by letting it estimate the density using geom_density_ridges().
ggplot(gapminder, aes(y=as.factor(year), x=lifeExp)) + geom_density_ridges(alpha=0.5) + scale_y_discrete(expand = c(0.01, 0)) + scale_x_continuous(expand = c(0, 0))+ theme(axis.text=element_text(size=20))
We get a really nice looking plot with mountain ranges or ridge lines of life expectancy over the years. We can clearly see the pattern of increase in life expectancy over time. We can also see that life expectancy varies widely within a year and also see multiple modes within a year.
Note that the above ridgeline plot simply plots life expectancy over years. One of the reasons for multiple modalities, is that within a year different continents could have different distributions of life expectancy. We can see that, by coloring each continent separately within aesthetics.
ggplot(gapminder, aes(y=as.factor(year), x=lifeExp, fill=continent)) + geom_density_ridges(alpha=0.25) + scale_y_discrete(expand = c(0.01, 0)) + scale_x_continuous(expand = c(0, 0))
Here is the ridgeplot by coloring continents and we can see the different distributions of life expectancy with in a year.