• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / Python / Seaborn / Boxplot with Seaborn / How to Make Boxplots in Python with Pandas and Seaborn?

How to Make Boxplots in Python with Pandas and Seaborn?

March 14, 2018 by cmdlinetips

Boxplot, introduced by John Tukey in his classic book Exploratory Data Analysis close to 50 years ago, is great for visualizing data distributions from multiple groups. Boxplot captures the summary of the data efficiently with a simple box and whiskers and allows us to compare easily across groups. Boxplots summarizes a sample data using 25th, 50th and 75th percentiles. These percentiles are also known as the lower quartile, median and upper quartile. The advantage of comparing quartiles is that they are not influenced by outliers.

If you are interested in learning more about the history and evolution of boxplots, check out Hadley Wickham’s 2011 paper 40 years of Boxplots.

In this post, we will see how to make boxplots using Python’s Pandas and Seaborn. Let us first load the necessary packages needed to plot boxplots in Python.

# import pandas
import pandas as pd
# import matplotlib
import matplotlib.pyplot as plt
# import seaborn
import seaborn as sns
%matplotlib inline

Let us load the gapminder data to make boxplots. We will directly download the gapminder data from Software Carpentry github page. Pandas’ read_csv can easily load the data as a dataframe from a URL.

data_url = 'http://bit.ly/2cLzoxH'
# read data from url as pandas dataframe
gapminder = pd.read_csv(data_url)
print(gapminder.head(3))

Let us filter the gapminder data such that we will keep gapminder data from all countries but only for the year 2007. We will use pandas to filter and subset the original dataframe.

gapminder_2007 = gapminder[gapminder['year']==2007]
gapminder_2007.shape

We will plot boxplots in four ways, first with using Pandas’ boxplot function and then use Seaborn plotting library in three ways to get a much improved boxplot.

How to Make Boxplots with Pandas

Python’s pandas have some plotting capabilities. Once you have created a pandas dataframe, one can directly use pandas plotting option to plot things quickly. One way to plot boxplot using pandas dataframe is to use boxplot function that is part of pandas. Let us say we want to plot a boxplot of life expectancy by continent, we would use pandas like

gapminder_2007.boxplot(by='continent', 
                       column=['lifeExp'], 
                       grid=False)
Make Boxplot Using Pandas
Boxplot Using Pandas

The pandas boxplot looks okay for a for first pass analysis. One can clearly see the trend in the data. The key to make good visuzlization is to start with something basic, and iterate over to make it better. Let us try to use Python’s Seaborn library to make boxplots .

How to Make Boxplot with Seaborn

To make basic boxplot with Seaborn, we can use the pandas dataframe as input and use Seaborn’s boxplot function. In addition to the data, we can also specify multiple options to customize the boxplot with Seaborn. Let us choose color palette scheme for the boxplot with Seaborn. Here, we have chosen colorblind friendly palette “colorblind”. Other color palette options available in Seaborn include deep, muted, bright, pastel, and dark. Let us also specify the width of the boxes in boxplot.

bplot = sns.boxplot(y='lifeExp', x='continent', 
                 data=gapminder_2007, 
                 width=0.5,
                 palette="colorblind")
Make Boxplot in Python with Seaborn?
Boxplot in Python with Seaborn

Boxplot with data points using Seaborn

Boxplot alone is extremely useful in getting the summary of data within and between groups. However, often, it is a good practice to overlay the actual data points on the boxplot. Using Seaborn, we can do that in a few ways. One way to make boxplot with data points in Seaborn is to use stripplot available in Seaborn.

We will first use Seaborn’s boxplot like before with no data points and add a layer of data points to the boxplot with stripplot. While plotting with stripplot, we can use its multiple options to make it look better. For example we can specify what marker we can use to show the data points and it is also better to use jitter=True option to spread the data points horizontally.

# make boxplot with Seaborn
bplot=sns.boxplot(y='lifeExp', x='continent', 
                 data=gapminder_2007, 
                 width=0.5,
                 palette="colorblind")

# add stripplot to boxplot with Seaborn
bplot=sns.stripplot(y='lifeExp', x='continent', 
                   data=gapminder_2007, 
                   jitter=True, 
                   marker='o', 
                   alpha=0.5,
                   color='black')
Make Boxplot with data points in Python with Seaborn
Boxplot with data points with Seaborn

Boxplot with Swarm plot using Seaborn

Adding the data points to boxplot with stripplot using Seaborn, definitely make the boxplot look better. Another way we can visualize data points with Seaborn boxplot is to add swarmplot instead of stripplot. We will first plot boxplot with Seaborn and then add swarmplot to display the datapoints.

# plot boxplot with seaborn
bplot=sns.boxplot(y='lifeExp', x='continent', 
                 data=gapminder_2007, 
                 width=0.5,
                 palette="colorblind")

# add swarmplot
bplot=sns.swarmplot(y='lifeExp', x='continent',
              data=gapminder_2007, 
              color='black',
              alpha=0.75)
Make Boxplot and swarmplot in Python with Seaborn
Plot Boxplot and swarmplot in Python with Seaborn

Adjust x-axis and y-axis label font sizes

Now that we have made much better looking boxplots with Seaborn, we can try to improve other aspects of boxplot. One thing to notice is that the font sizes of x-axis and y-axis labels are small and may not be clearly visible. Here is how to change the fontsizes for x and y-axes labels and also a make a title for the boxplot created by Seaborn.

bplot.axes.set_title("2007: Life Expectancy Vs Continent",
                    fontsize=16)

bplot.set_xlabel("Continent", 
                fontsize=14)

bplot.set_ylabel("Life Expectancy",
                fontsize=14)

bplot.tick_params(labelsize=10)

How to Save the Boxplot as jpg file?

Once we have made the boxplot that we like, we can easily save as a high quality image file, like jpeg file. Here is a way to save the boxplot as jpg file at a specific resolution. By changing the dpi option we can easily increase the resolution of the image.

# output file name
plot_file_name="boxplot_and_swarmplot_with_seaborn.jpg"

# save as jpeg
bplot.figure.savefig(plot_file_name,
                    format='jpeg',
                    dpi=100)

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

Grouped boxplot seabornHow To Make Grouped Boxplots in Python with Seaborn? Boxplots with Specific ColorsHow To Specify Colors to Boxplots in Seaborn? Catplot: Boxplot with jitter SeabornCatplot Python Seaborn: One Function to Rule All Plots With Categorical Variables How To Specify Colors to Scatter Plots in Python

Filed Under: Boxplot with Seaborn, Pandas Boxplot, Python, Python Boxplot, Python Tips, Seaborn, Seaborn Boxplot Tagged With: Boxplot with Seaborn, Pandas Boxplot, Python Boxplot, Python Tips, Seaborn, Seaborn Boxplot

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version