Heatmaps with Seaborn's ClusterMap

Just recently stumbled on to Seaborn’s ClusterMap function for making heatmaps. Till now relied on Seaborn’s heatmap function for making simple heatmaps with Seaborn heatmap() function and using pheatmap package in R for anything bit complex. Seaborn’s Clustermap function is great for making simple heatmaps and hierarchically-clustered heatmaps with dendrograms on both rows and/or columns.

Most often when you try to make heatmap you would also like to cluster row-wise or col-wise to see if there is any pattern emerges. Seaborn’s Clustermap’s ability to hierarchically-cluster exactly solves that problem.

In this post, we will see some simple examples of using Seaborn’s ClusterMap to make simple heatmaps and hierarchically-clustered heatmaps.

Let us first load Pandas, Seaborn and matplotlib.pyplot.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

We will use gapminder data from the Carpentries to make heatmaps using Seaborn’s ClusterMap.

data_url = 'http://bit.ly/2cLzoxH'
# read data from url as pandas dataframe
gapminder = pd.read_csv(data_url)
print(gapminder.head(3))

       country  year         pop continent  lifeExp   gdpPercap
0  Afghanistan  1952   8425333.0      Asia   28.801  779.445314
1  Afghanistan  1957   9240934.0      Asia   30.332  820.853030
2  Afghanistan  1962  10267083.0      Asia   31.997  853.100710

In this post, we will make a heatmap of lifeExp over time for multiple countries. For the sake of simplicity, we will subset the gapminder in a few ways. First, we will consider the data from continents; Africa and Europe/

gapminder_df = gapminder[gapminder.continent.isin(['Africa', 'Europe'])]
gapminder_df.continent.unique()
gapminder_df.head()

And then we will select just four variables from the gapminder data.

df = gapminder_df[['country','continent', 'year','lifeExp']]
df.head(n=3)

country	continent	year	lifeExp
12	Albania	Europe	1952	55.23
13	Albania	Europe	1957	59.28
14	Albania	Europe	1962	64.82

Heatmap with Dendrograms with Data in Wide form

Let us first consider a case where you have data in wide form and use Seaborn’s cluster map to make the default heatmap.

Our data is already in long tidy form, so we can use Pandas pivot_table() function to reshape the long form data to wide form data.

# pandas pivot with multiple variables
heatmap_data = pd.pivot_table(df, values='lifeExp', 
                              index=['continent','country'], 
                              columns='year')
heatmap_data.head()

# make heatmap with Seaborn ClusterMap
sns.clustermap(heatmap_data)
plt.savefig('heatmap_with_Seaborn_clustermap_python.jpg',
            dpi=150, figsize=(8,12))

Be default we get hierarchically clustered heatmap. Seaborn’s ClusterMap clusters both columns and rows and adds dendrograms to show the clustering.

Clustered Heatmap with Seaborn's Clustermap — Clustered Heatmap with Seaborn’s Clustermap

In the above example, we saved the heatmap using matplotlib.pyplot outside the ClusterMap function. However, that seemed to cutoff the edges of heatmap.

However, Seaborn’s recommended approach to save heatmap is to specify the figure size as an argument to Clustermap as shown below.

sns.clustermap(heatmap_data, figsize=(18,12))
plt.savefig('clustered_heatmap_with_dendrograms_Seaborn_clustermap_python.jpg',dpi=150)

Having figsize inside Clustermap function help save heatmaps without clipping the dendrograms.

Clustered heatmap with dendrograms: Seaborn clustermap

Heatmap with Dendrograms with Data in Long/Tidy form

One of the little underused features of the Seaborn’s ClusterMap function is that it can handle data in tidy form or long form and make heatmaps. This means one does not have to reshape the long form data to wide form before making the heatmap (like we did in the previous example).

Seaborn’s ClusterMap can handle the reshaping through pivot_kws argument. In this example below, we use the gapminder data in long form and use pivot_kws to specify a dictionary with information needed for reshaping.

sns.clustermap(df, figsize=(14,12),
               pivot_kws={'index': 'country', 
                                  'columns': 'year',
                                  'values': 'lifeExp'})

Within pivot_kws() function, we need to specify which variable to be the index, column and values, just like we did for the pivot_table() function. And we get the clustred heatmap.

Clustered Heatmap with data in long form: Seaborn ClusterMap

Note that when we use pivot_kws, we can specify single variable as index. And we used country as index. To make heatmap with two variables as index, we concatenate two variables before making heatmap. Here we create a new variable by concatenating two existing variables.

# concatenate two variables to create a new variable
df['continent_country'] = df['continent'].str.
                             cat(df['country'],sep="_")
# make heatmap with long/tidy form data with pivot_kws()
sns.clustermap(df, 
               pivot_kws={'index': 'continent_country', 
                                  'columns': 'year',
                                  'values': 'lifeExp'})

Now we have made heat from tidy data using Seaborn’s ClusterMap.

Heatmap without Clustering Columns

By default, Seaborn’s Clustermap clusters both rows and columns and show the dendrograms. We can make histogram without clustering the columns using the argument col_cluster=False.

sns.clustermap(heatmap_data,col_cluster=False, figsize=(8,12))
plt.savefig('heatmap_without_clustering_columns_Seaborn_clustermap_python.jpg',dpi=150,)

Heatmap Without Clustering Columns Seaborn ClusterMap

Heatmap without Clustering Rows

Similarly, we can also make heatmap without clustering rows using the argument row_cluster=False.

sns.clustermap(heatmap_data, row_cluster=False, figsize=(8,12))
plt.savefig('heatmap_without_clustering_rows_Seaborn_clustermap_python.jpg',dpi=150,)

Simple Heatmap without Clustering Columns Rows

We can make simple heatmaps without cluster columns and rows using both row_cluster=False & col_cluster=False

sns.clustermap(heatmap_data, row_cluster=False,col_cluster=False, figsize=(8,12))
plt.savefig('simple_heatmap_without_clustering_Seaborn_clustermap_python.jpg',dpi=150,)

Change Color Palette Seaborn ClusterMap

To change the default color palette, we we use cmap argument and specify any color palettes compatible with Seaborn.

sns.clustermap(df,  
               pivot_kws={'index': 'country', 
                                  'columns': 'year',
                                  'values': 'lifeExp'},
               figsize=(10,12),
               col_cluster=False,
               cmap="coolwarm")
plt.savefig('heatmap_change_color_palette_Seaborn_clustermap_python.jpg',dpi=150,)

In this heatmap, we have used “coolworm” color map using cmap argument.

Seaborn’s Clustermap also offeres a number of options to compute distance or similarity matrix from the data to make the heatmap. Check out Seaborn’s Clustermap help page to find more fantastic options to fine tune your heatmap.