Boxplots with actual data points are one of the best ways to visualize the distribution of multiple variables at the same time. Creating a beautiful plot with Boxplots in Python Pandas is very easy. In an earlier post, we saw a good example of how to create publication quality boxplots with Pandas and Seaborn. If you haven’t heard of Seaborn,
Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics
Often you may want to visualize multiple variables as boxplot such that each group has specific color, not the “palette” options available in Seaborn.
Let us see an example of how to make boxplot suing Seaborn such that we use specific color for each box.
Let us first load the packages needed.
import pandas as pd # import matplotlin import matplotlib.pyplot as plt # import seaborn import seaborn as sns %matplotlib inline
Let us load the gapminder data from software carpentry website and subset the data to make it a smaller dataframe. Now the data frame contains rows corresponding to the year 2007.
gapminder_2007 = gapminder[gapminder['year']==2007] gapminder_2007.head(n=3) country year pop continent lifeExp gdpPercap 11 Afghanistan 2007 31889923.0 Asia 43.828 974.580338 23 Albania 2007 3600523.0 Europe 76.423 5937.029526 35 Algeria 2007 33333216.0 Africa 72.301 6223.367465
Let us say want to make a boxplot visualizing distributions of lifeExp variable across the continents from the gapminder data. Let us say we also want a specific color for each continent already available as Hex Code(#RRGGBB).
continents = gapminder_2007.continent.unique().tolist() # Hex code for each continents color continent_colors=["#F0F000","#F00000","#00A000","#00A0F0","#1010F0"]
Let us create a color dictionary with continent as key and its color as value
color_dict = dict(zip(continents, continent_colors))
Let us make basic boxplot using Seaborn’s boxplot function with liefExp on y-axis and continent on x-axis with default colors available in Seaborn.
bplot=sns.boxplot(y='lifeExp', x='continent', data=gapminder_2007, width=0.5)
This boxplot has default colors specified by Seaborn and we want to change that.
Now let us fill each box with the specified color using artists and set_facecolor functions. If you want to know more about Artist objects, read this fantastic blogpost.
for i in range(0,5): mybox = bplot.artists[i] mybox.set_facecolor(color_dict[continents[i]])
Now let us add the data points on top of the boxplot in black color using Seaborn’s stripplot.
bplot = sns.stripplot(y='lifeExp', x='continent', data=gapminder_2007, jitter=True, marker='o', alpha=0.8, color="black")
One can also specify colors with their names instead of Hexcodes. Here is an example using color names to specify box colors of boxplots.
continent_colors=["tomato","darkturquoise","mediumpurple","springgreen","magenta"]
Here is the corresponding boxplot, but this time plotting distributions of gdpPercap across the five continents as boxplots colored by using color names.
Here are two resources for learning color names in Python.