Just recently stumbled on to Seaborn’s ClusterMap function for making heatmaps. Till now relied on Seaborn’s heatmap function for making simple heatmaps with Seaborn heatmap() function and using pheatmap package in R for anything bit complex. Seaborn’s Clustermap function is great for making simple heatmaps and hierarchically-clustered heatmaps with dendrograms on both rows and/or columns.
Most often when you try to make heatmap you would also like to cluster row-wise or col-wise to see if there is any pattern emerges. Seaborn’s Clustermap’s ability to hierarchically-cluster exactly solves that problem.
In this post, we will see some simple examples of using Seaborn’s ClusterMap to make simple heatmaps and hierarchically-clustered heatmaps.
Let us first load Pandas, Seaborn and matplotlib.pyplot.
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt
We will use gapminder data from the Carpentries to make heatmaps using Seaborn’s ClusterMap.
data_url = 'http://bit.ly/2cLzoxH' # read data from url as pandas dataframe gapminder = pd.read_csv(data_url) print(gapminder.head(3)) country year pop continent lifeExp gdpPercap 0 Afghanistan 1952 8425333.0 Asia 28.801 779.445314 1 Afghanistan 1957 9240934.0 Asia 30.332 820.853030 2 Afghanistan 1962 10267083.0 Asia 31.997 853.100710
In this post, we will make a heatmap of lifeExp over time for multiple countries. For the sake of simplicity, we will subset the gapminder in a few ways. First, we will consider the data from continents; Africa and Europe/
gapminder_df = gapminder[gapminder.continent.isin(['Africa', 'Europe'])] gapminder_df.continent.unique() gapminder_df.head()
And then we will select just four variables from the gapminder data.
df = gapminder_df[['country','continent', 'year','lifeExp']] df.head(n=3) country continent year lifeExp 12 Albania Europe 1952 55.23 13 Albania Europe 1957 59.28 14 Albania Europe 1962 64.82
Heatmap with Dendrograms with Data in Wide form
Let us first consider a case where you have data in wide form and use Seaborn’s cluster map to make the default heatmap.
Our data is already in long tidy form, so we can use Pandas pivot_table() function to reshape the long form data to wide form data.
# pandas pivot with multiple variables heatmap_data = pd.pivot_table(df, values='lifeExp', index=['continent','country'], columns='year') heatmap_data.head()
# make heatmap with Seaborn ClusterMap sns.clustermap(heatmap_data) plt.savefig('heatmap_with_Seaborn_clustermap_python.jpg', dpi=150, figsize=(8,12))
Be default we get hierarchically clustered heatmap. Seaborn’s ClusterMap clusters both columns and rows and adds dendrograms to show the clustering.
In the above example, we saved the heatmap using matplotlib.pyplot outside the ClusterMap function. However, that seemed to cutoff the edges of heatmap.
However, Seaborn’s recommended approach to save heatmap is to specify the figure size as an argument to Clustermap as shown below.
sns.clustermap(heatmap_data, figsize=(18,12)) plt.savefig('clustered_heatmap_with_dendrograms_Seaborn_clustermap_python.jpg',dpi=150)
Having figsize inside Clustermap function help save heatmaps without clipping the dendrograms.
Heatmap with Dendrograms with Data in Long/Tidy form
One of the little underused features of the Seaborn’s ClusterMap function is that it can handle data in tidy form or long form and make heatmaps. This means one does not have to reshape the long form data to wide form before making the heatmap (like we did in the previous example).
Seaborn’s ClusterMap can handle the reshaping through pivot_kws argument. In this example below, we use the gapminder data in long form and use pivot_kws to specify a dictionary with information needed for reshaping.
sns.clustermap(df, figsize=(14,12), pivot_kws={'index': 'country', 'columns': 'year', 'values': 'lifeExp'})
Within pivot_kws() function, we need to specify which variable to be the index, column and values, just like we did for the pivot_table() function. And we get the clustred heatmap.
Note that when we use pivot_kws, we can specify single variable as index. And we used country as index. To make heatmap with two variables as index, we concatenate two variables before making heatmap. Here we create a new variable by concatenating two existing variables.
# concatenate two variables to create a new variable df['continent_country'] = df['continent'].str. cat(df['country'],sep="_") # make heatmap with long/tidy form data with pivot_kws() sns.clustermap(df, pivot_kws={'index': 'continent_country', 'columns': 'year', 'values': 'lifeExp'})
Now we have made heat from tidy data using Seaborn’s ClusterMap.
Heatmap without Clustering Columns
By default, Seaborn’s Clustermap clusters both rows and columns and show the dendrograms. We can make histogram without clustering the columns using the argument col_cluster=False.
sns.clustermap(heatmap_data,col_cluster=False, figsize=(8,12)) plt.savefig('heatmap_without_clustering_columns_Seaborn_clustermap_python.jpg',dpi=150,)
Heatmap without Clustering Rows
Similarly, we can also make heatmap without clustering rows using the argument row_cluster=False.
sns.clustermap(heatmap_data, row_cluster=False, figsize=(8,12)) plt.savefig('heatmap_without_clustering_rows_Seaborn_clustermap_python.jpg',dpi=150,)
Simple Heatmap without Clustering Columns Rows
We can make simple heatmaps without cluster columns and rows using both row_cluster=False & col_cluster=False
sns.clustermap(heatmap_data, row_cluster=False,col_cluster=False, figsize=(8,12)) plt.savefig('simple_heatmap_without_clustering_Seaborn_clustermap_python.jpg',dpi=150,)
Change Color Palette Seaborn ClusterMap
To change the default color palette, we we use cmap argument and specify any color palettes compatible with Seaborn.
sns.clustermap(df, pivot_kws={'index': 'country', 'columns': 'year', 'values': 'lifeExp'}, figsize=(10,12), col_cluster=False, cmap="coolwarm") plt.savefig('heatmap_change_color_palette_Seaborn_clustermap_python.jpg',dpi=150,)
In this heatmap, we have used “coolworm” color map using cmap argument.
Seaborn’s Clustermap also offeres a number of options to compute distance or similarity matrix from the data to make the heatmap. Check out Seaborn’s Clustermap help page to find more fantastic options to fine tune your heatmap.