Catplot Python Seaborn: One Function to Rule All Plots With Categorical Variables

Catplot: Boxplot with jitter Seaborn
Catplot: Boxplot with jitter Seaborn

I just discovered catplot in Seaborn. Catplot is a relatively new addition to Seaborn that simplifies plotting that involves categorical variables. In Seaborn version v0.9.0 that came out in July 2018, changed the older factor plot to catplot to make it more consistent with terminology in pandas and in seaborn.

The new catplot function provides a new framework giving access to several types of plots that show relationship between numerical variable and one or more categorical variables, like boxplot, stripplot and so on. Catplot can handle 8 different plots currently available in Seaborn. catplot function can do all these types of plots and one can specify the type of plot one needs with the kind parameter.

The default kind in catplot() is “strip”, corresponding to stripplot(). Here are the list of different type of plots, involving categorical variables, you can make with catplot and the names of the kind.

Categorical scatterplots with catplot

  • stripplot() – with kind=”strip”
  • swarmplot() – with kind=”swarm”

Categorical distribution plots with catplot

  • boxplot() – with kind=”box”
  • violinplot() – with kind=”violin”
  • boxenplot() – with kind=”boxen”

Categorical estimate plots with catplot

  • pointplot() – with kind=”point”
  • barplot() – with kind=”bar”
  • countplot() – with kind=”count”

Let us see examples of using catplots to make these 8 different plots involving categorical variables and a numerical variables.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
data_url = 'http://bit.ly/2cLzoxH'
gapminder = pd.read_csv(data_url)
gapminder.head(n=3)
	country	year	pop	continent	lifeExp	gdpPercap
0	Afghanistan	1952	8425333.0	Asia	28.801	779.445314
1	Afghanistan	1957	9240934.0	Asia	30.332	820.853030
2	Afghanistan	1962	10267083.0	Asia	31.997	853.100710


How To Make Stripplot with jitter using Seaborn catplot?

By default catplot creates stripplot with default jitter showing the original data points. In this example below, we specify the jitter width using “jitter=0.25”. Note that we did not specify “kind” to tell what type of plot we want.

sns.catplot(x='continent', y='lifeExp', 
            data = gapminder,
            jitter = '0.25')
Seaborn Catplot: default stripplot

How To Make Simple stripplot with Seaborn catplot?

We can also make stripplot without jitter. We can switch off jitter with “jitter=False”. To adjust the size of the catplot, in this case stripplot by default, we can use height and aspect to control the height and width of the plot. Basically, aspect * height gives the width.

sns.catplot(x = 'continent', 
            y = 'lifeExp', 
            data = gapminder,
            jitter = False,
            height = 4,
            aspect=1.5)
Catplot: Stripplot with no jitter

How To Make Boxplot with Seaborn catplot?

To make Boxplot with Catplot in Seaborn, we use catplot with variables of interest and simply specify the kind as boxplot with kind=’box’.

sns.catplot(x='continent', y = 'lifeExp', 
            data = gapminder,
            kind = 'box',
            height = 4,
            aspect=1.5)
Seaborn Catplot: Boxplot

How To Make Boxplot with Original Data points with Seaborn catplot?

Sometime it is better to show the original data points in addition to the boxplot. To make Boxplot with original data points over the boxplot, We can use our usual trick of adding layers to the plot object.

We first make a boxplot with Catplot in Seaborn using kind=’box’ and then add stripplot using the same variable. Note that this is the original “Stripplot” function not the one available through catplot.

# make boxplot with Catplot
sns.catplot(x='continent', 
            y='lifeExp',
            kind="box",
            data=gapminder,
            height=4,
            aspect=1.5)
# add data points to boxplot with stripplot
sns.stripplot(x='continent', 
              y='lifeExp',
              data=gapminder,
              alpha=0.3,
              jitter=0.2,
              color='k');
Catplot: Boxplot with jitter Seaborn

How To Make Boxen with Seaborn catplot?

Boxen plot is widely known as letter-value plots. Introducing letter-value plots, Hadley Wickham nicely explains the shortcoming of standard boxplots and how the letter-value plots addresses the shortcomings

Conventional boxplots are useful displays for conveying rough information about the central 50% and the extent of data. For small-sized data sets (n < 200), detailed estimates of tail behavior beyond the quartiles may not be trustworthy, so the information provided by boxplots is appropriately somewhat vague beyond the quartiles, and the expected number of "outliers" of size n is often less than 10. Larger data sets (n ~ 10,000-100,000) afford more precise estimates of quantiles beyond the quartiles, but conventional boxplots do not show this information about the tails, and, in addition, show large numbers of extreme, but not unexpected, observations.

The letter-value plot addresses both of them

it conveys more detailed information in the tails using letter values, but only to the depths where the letter values are reliable estimates of their corresponding quantiles and (2) “outliers” are labeled as those observations beyond the most extreme letter value.

In Seaborn, we can make letter-value plot or boxen plot using kind=’boxen’ argument.

sns.catplot(x='continent',
            y='lifeExp', 
            data=gapminder,
            height=4,
            aspect=1.5,
            kind='boxen')
Catplot Boxen, a new type of boxplot with Seaborn

How To Make Violin with Seaborn catplot?

Violin plots are similar to boxplot, Violin plot shows the density of the data at different values nicely in addition to the range of data like boxplot.

We can use kind=’violin’ to make violin plot with Catplot in Seaborn.

sns.catplot(x='continent',
            y='lifeExp', 
            data=gapminder,
            height=4,aspect=1.5,
            kind='violin')
Violin plot with Seaborn catplot

How To Make Point plot with Seaborn catplot?

A point plot in Seaborn is great for visualizing summary and uncertainty of the data quickly. A point plot shows mean estimate and uncertainty of the estimate with a point and error bar for each categorical variable. It is a great way to visualize the interaction between different variables.

We can make point pint with Catplot in Seaborn with kind=’point’. Visually point plot is easier to use when you have smaller number of categorical variable. So here we first filter the gapminder data to fewer years and continents.

df = gapminder[gapminder['year'].isin([1952,1982,2007]) ]
df = df[~df['continent'].isin(['Oceania'])]

Then we make pointplot of life expectancy for two continents over three years.

sns.catplot(x="continent",
            y="lifeExp",
            hue="year",
            kind="point", 
            data=df[df.continent.isin(['Asia','Europe'])]);

We can clearly see the central tendency and the uncertainty. Also we can see the sharp increase in life expectancy for the year 2007.

Catplot Pointplot with Seaborn

How To Make Barplot with Counts using Seaborn catplot?

Count plot simply plots the number of observations in each categorical variable with a bar. We can make count plot using catplot in Seaborn with kind=’count’.

sns.catplot(x="continent", 
            kind="count", 
            data=gapminder);

We can clearly see that we have fewer observations for Oceania.

Catplot Countplot with Seaborn