How to Make Boxplots in Python with Pandas and Seaborn?

Plot Boxplot and swarmplot in Python with Seaborn
Plot Boxplot and swarmplot in Python with Seaborn

Boxplot, introduced by John Tukey in his classic book Exploratory Data Analysis close to 50 years ago, is great for visualizing data distributions from multiple groups. Boxplot captures the summary of the data efficiently with a simple box and whiskers and allows us to compare easily across groups. Boxplots summarizes a sample data using 25th, 50th and 75th percentiles. These percentiles are also known as the lower quartile, median and upper quartile. The advantage of comparing quartiles is that they are not influenced by outliers.

If you are interested in learning more about the history and evolution of boxplots, check out Hadley Wickham’s 2011 paper 40 years of Boxplots.

In this post, we will see how to make boxplots using Python’s Pandas and Seaborn. Let us first load the necessary packages needed to plot boxplots in Python.

# import pandas
import pandas as pd
# import matplotlib
import matplotlib.pyplot as plt
# import seaborn
import seaborn as sns
%matplotlib inline

Let us load the gapminder data to make boxplots. We will directly download the gapminder data from Software Carpentry github page. Pandas’ read_csv can easily load the data as a dataframe from a URL.

data_url = 'http://bit.ly/2cLzoxH'
# read data from url as pandas dataframe
gapminder = pd.read_csv(data_url)
print(gapminder.head(3))

Let us filter the gapminder data such that we will keep gapminder data from all countries but only for the year 2007. We will use pandas to filter and subset the original dataframe.

gapminder_2007 = gapminder[gapminder['year']==2007]
gapminder_2007.shape

We will plot boxplots in four ways, first with using Pandas’ boxplot function and then use Seaborn plotting library in three ways to get a much improved boxplot.

How to Make Boxplots with Pandas

Python’s pandas have some plotting capabilities. Once you have created a pandas dataframe, one can directly use pandas plotting option to plot things quickly. One way to plot boxplot using pandas dataframe is to use boxplot function that is part of pandas. Let us say we want to plot a boxplot of life expectancy by continent, we would use pandas like

gapminder_2007.boxplot(by='continent', 
                       column=['lifeExp'], 
                       grid=False)
Boxplot Using Pandas

The pandas boxplot looks okay for a for first pass analysis. One can clearly see the trend in the data. The key to make good visuzlization is to start with something basic, and iterate over to make it better. Let us try to use Python’s Seaborn library to make boxplots .

How to Make Boxplot with Seaborn

To make basic boxplot with Seaborn, we can use the pandas dataframe as input and use Seaborn’s boxplot function. In addition to the data, we can also specify multiple options to customize the boxplot with Seaborn. Let us choose color palette scheme for the boxplot with Seaborn. Here, we have chosen colorblind friendly palette “colorblind”. Other color palette options available in Seaborn include deep, muted, bright, pastel, and dark. Let us also specify the width of the boxes in boxplot.

bplot = sns.boxplot(y='lifeExp', x='continent', 
                 data=gapminder_2007, 
                 width=0.5,
                 palette="colorblind")
Boxplot in Python with Seaborn

Boxplot with data points using Seaborn

Boxplot alone is extremely useful in getting the summary of data within and between groups. However, often, it is a good practice to overlay the actual data points on the boxplot. Using Seaborn, we can do that in a few ways. One way to make boxplot with data points in Seaborn is to use stripplot available in Seaborn.

We will first use Seaborn’s boxplot like before with no data points and add a layer of data points to the boxplot with stripplot. While plotting with stripplot, we can use its multiple options to make it look better. For example we can specify what marker we can use to show the data points and it is also better to use jitter=True option to spread the data points horizontally.

# make boxplot with Seaborn
bplot=sns.boxplot(y='lifeExp', x='continent', 
                 data=gapminder_2007, 
                 width=0.5,
                 palette="colorblind")

# add stripplot to boxplot with Seaborn
bplot=sns.stripplot(y='lifeExp', x='continent', 
                   data=gapminder_2007, 
                   jitter=True, 
                   marker='o', 
                   alpha=0.5,
                   color='black')
Boxplot with data points with Seaborn

Boxplot with Swarm plot using Seaborn

Adding the data points to boxplot with stripplot using Seaborn, definitely make the boxplot look better. Another way we can visualize data points with Seaborn boxplot is to add swarmplot instead of stripplot. We will first plot boxplot with Seaborn and then add swarmplot to display the datapoints.

# plot boxplot with seaborn
bplot=sns.boxplot(y='lifeExp', x='continent', 
                 data=gapminder_2007, 
                 width=0.5,
                 palette="colorblind")

# add swarmplot
bplot=sns.swarmplot(y='lifeExp', x='continent',
              data=gapminder_2007, 
              color='black',
              alpha=0.75)
Plot Boxplot and swarmplot in Python with Seaborn

Adjust x-axis and y-axis label font sizes

Now that we have made much better looking boxplots with Seaborn, we can try to improve other aspects of boxplot. One thing to notice is that the font sizes of x-axis and y-axis labels are small and may not be clearly visible. Here is how to change the fontsizes for x and y-axes labels and also a make a title for the boxplot created by Seaborn.

bplot.axes.set_title("2007: Life Expectancy Vs Continent",
                    fontsize=16)

bplot.set_xlabel("Continent", 
                fontsize=14)

bplot.set_ylabel("Life Expectancy",
                fontsize=14)

bplot.tick_params(labelsize=10)

How to Save the Boxplot as jpg file?

Once we have made the boxplot that we like, we can easily save as a high quality image file, like jpeg file. Here is a way to save the boxplot as jpg file at a specific resolution. By changing the dpi option we can easily increase the resolution of the image.

# output file name
plot_file_name="boxplot_and_swarmplot_with_seaborn.jpg"

# save as jpeg
bplot.figure.savefig(plot_file_name,
                    format='jpeg',
                    dpi=100)