How To Make Lower Triangle Heatmap with Correlation Matrix in Python?

Correlation Heatmap: Lower Triangle with Seaborn
Correlation Heatmap: Lower Triangle with Seaborn

Visualizing data as a heatmap is a great data exploration technique for high dimensional data. Sometimes you would like to visualize the correlation as heatmap instead of the raw data to understand the relationship between the variables in your data. In this post we will see examples of visualizing correlation matrix as a heatmap in multiple ways. Since correlation matrix is symmetric, it is redundant to visualize the full correlation matrix as a heat map. Instead, visualizing just lower or upper triangular matrix of correlation matrix is more useful.

We will use really cool NumPy functions, Pandas and Seaborn to make lower triangular heatmaps in Python. Let us load the packages needed.

import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

To make the lower triangular correlation heatmaps, we will use breast cancer dataset available from scikit- learn’s data sets.

# import breast cancer data from scikit-learn
from sklearn.datasets import load_breast_cancer
# load breast cancer dataset
data = load_breast_cancer()

Let us save the breast cancer data from scikit-learn as Pandas dataframe. Here we use data and feature names of the data as columns of Pandas dataframe. The breast cancer dataset has 30 features about breast cancer.

df = pd.DataFrame(data.data, columns=data.feature_names)
df.iloc[0:5,0:3]

	mean radius	mean texture	mean perimeter
0	17.99	10.38	122.80
1	20.57	17.77	132.90
2	19.69	21.25	130.00
3	11.42	20.38	77.58
4	20.29	14.34	135.10

Let us find how these 30 features are correlated among themselves. We can Pandas’ corr() function on the whole dataframe to compute the correlation matrix. Here we compute Perason correlation co-efficient values between the features by specifying method=’pearson’.

Now we have our correlation matrix of size 30×30. And we could see that the correlation matrix is symmetric.

# compute correlation matrix using pandas corr() function
corr_df =  df.corr(method='pearson') 
# display first few rows/columns of correlation matrix using iloc fucntion in Pandas
corr_df.iloc[0:5,0:3]

	mean radius	mean texture	mean perimeter
mean radius	1.000000	0.323782	0.997855
mean texture	0.323782	1.000000	0.329533
mean perimeter	0.997855	0.329533	1.000000
mean area	0.987357	0.321086	0.986507
mean smoothness	0.170581	-0.023389	0.207278

We can make simple heatmaps with Seaborn’s heatmap() function on the whole correlation matrix.

hmap=sns.heatmap(corr_df)
hmap.figure.savefig("Correlation_Heatmap_with_Seaborn.png",
                    format='png',
                    dpi=150)

We can see that the heatmap of correlation matrix has redundant information as the correlation matrix is symmetric.

Correlation Heatmap with Seaborn

How to Make Lower Triangle Heatmap with Seaborn?

It will be better, if we visualize either the upper triangular correlation matrix or lower triangular correlation matrix as a heatmap.

To do that we just need to extract upper or lower triangular matrix of the correlation matrix. And NumPy has really cool functions to do that. NumPy’s numpy.tril() function takes 2d-numpy array as input and gives the lower triangle of the array. Similarly, numpy.triu() fucntion takes 2d-numpy array as input and gives the upper triangle of the array. Both the functions have the option to return the diagonal elements as part the triangular matrix.

Numpy’s tril() function to extract Lower Triangle Matrix

Let us extract lower triangular matrix of the correlation matrix with diagonal elements using np.tril() function and visualize lower triangular heatmap with Seaborn. We will use np.tril() function with np.ones() function to create a boolean matrix with same size as our correlation matrix. The boolean matrix will have True values on lower triangular matrix and False on upper triangular matrix.

np.tril(np.ones(corr_df.shape)).astype(np.bool)[0:5,0:5]

array([[ True, False, False, False, False],
       [ True,  True, False, False, False],
       [ True,  True,  True, False, False],
       [ True,  True,  True,  True, False],
       [ True,  True,  True,  True,  True]])

We can use the boolean matrix with True on lower triangular matrix to extract lower triangular correlation matrix using pandas’ where() function.Pandas where() function return a dataframe of original size but with NA values on upper triangular correlation matrix.

df_lt = corr_df.where(np.tril(np.ones(corr_df.shape)).astype(np.bool))

We can see that correlation along the diagonal is ones as we kep the diagonal elements. And upper triangular matrix has NaN and lower triangular matrix has correlation values.

df_lt.iloc[0:5,0:3]
	mean radius	mean texture	mean perimeter
mean radius	1.000000	NaN	NaN
mean texture	0.323782	1.000000	NaN
mean perimeter	0.997855	0.329533	1.000000
mean area	0.987357	0.321086	0.986507
mean smoothness	0.170581	-0.023389	0.207278

Now we can feed this data frame with lower triangular correlation matrix to Seaborn’s heatmap() function and get lower triangular correlation heatmap as we wanted.

hmap=sns.heatmap(df_lt,cmap="Spectral")
hmap.figure.savefig("Correlation_Heatmap_Lower_Triangle_with_Seaborn.png",
                    format='png',
                    dpi=150)

Here we have used Spectral color palette using cmap argument for the lower triangular correlation heatmap.

Correlation Heatmap: Lower Triangle with Seaborn

We can also extract upper triangular part of the correlation matrix using np.triu() function. However, the labels of upper triangular heatmap would not be close to the heatmap in Seaborn.

Lower Triangular Heatmap with Seaborn using mask

In the above example, we created a new lower triangular dataframe by subsetting the original correlation matrix. Instead, we can make lower triangular heatmap without creating new lower triangular dataframe. Seaborn’s heatmap function has mask argument that lets you select elements from input data frame. In our example, we want to mask upper triangular elements to make lower triangle correlation heatmap.

Let use create a numpy array to use it as our mask.

mask_ut=np.triu(np.ones(corr_df.shape)).astype(np.bool)

Here we create a boolean matrix with True on upper triangular matrix and False on lower triangular correlation matrix with Numpy’s np.triu() function.

mask_ut[0:5,0:5]

array([[ True,  True,  True,  True,  True],
       [False,  True,  True,  True,  True],
       [False, False,  True,  True,  True],
       [False, False, False,  True,  True],
       [False, False, False, False,  True]])

The mask argument will mask the upper triangular matrix and make us a heatmap with lower triangular matrix.

sns.heatmap(corr_df, mask=mask_ut, cmap="Spectral")
hmap.figure.savefig("Correlation_Heatmap_Lower_Triangle_with_Seaborn_using_mask.png",
                    format='png',
                    dpi=150)

But this time, using mask, we did not have to create a new dataframe .

Correlation Heatmap Lower Triangle with_Seaborn using mask