• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / Python / How To Make Lower Triangle Heatmap with Correlation Matrix in Python?

How To Make Lower Triangle Heatmap with Correlation Matrix in Python?

February 16, 2020 by cmdlinetips

Visualizing data as a heatmap is a great data exploration technique for high dimensional data. Sometimes you would like to visualize the correlation as heatmap instead of the raw data to understand the relationship between the variables in your data. In this post we will see examples of visualizing correlation matrix as a heatmap in multiple ways. Since correlation matrix is symmetric, it is redundant to visualize the full correlation matrix as a heat map. Instead, visualizing just lower or upper triangular matrix of correlation matrix is more useful.

We will use really cool NumPy functions, Pandas and Seaborn to make lower triangular heatmaps in Python. Let us load the packages needed.

import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

To make the lower triangular correlation heatmaps, we will use breast cancer dataset available from scikit- learn’s data sets.

# import breast cancer data from scikit-learn
from sklearn.datasets import load_breast_cancer
# load breast cancer dataset
data = load_breast_cancer()

Let us save the breast cancer data from scikit-learn as Pandas dataframe. Here we use data and feature names of the data as columns of Pandas dataframe. The breast cancer dataset has 30 features about breast cancer.

df = pd.DataFrame(data.data, columns=data.feature_names)
df.iloc[0:5,0:3]

	mean radius	mean texture	mean perimeter
0	17.99	10.38	122.80
1	20.57	17.77	132.90
2	19.69	21.25	130.00
3	11.42	20.38	77.58
4	20.29	14.34	135.10

Let us find how these 30 features are correlated among themselves. We can Pandas’ corr() function on the whole dataframe to compute the correlation matrix. Here we compute Perason correlation co-efficient values between the features by specifying method=’pearson’.

Now we have our correlation matrix of size 30×30. And we could see that the correlation matrix is symmetric.

# compute correlation matrix using pandas corr() function
corr_df =  df.corr(method='pearson') 
# display first few rows/columns of correlation matrix using iloc fucntion in Pandas
corr_df.iloc[0:5,0:3]

	mean radius	mean texture	mean perimeter
mean radius	1.000000	0.323782	0.997855
mean texture	0.323782	1.000000	0.329533
mean perimeter	0.997855	0.329533	1.000000
mean area	0.987357	0.321086	0.986507
mean smoothness	0.170581	-0.023389	0.207278

We can make simple heatmaps with Seaborn’s heatmap() function on the whole correlation matrix.

hmap=sns.heatmap(corr_df)
hmap.figure.savefig("Correlation_Heatmap_with_Seaborn.png",
                    format='png',
                    dpi=150)

We can see that the heatmap of correlation matrix has redundant information as the correlation matrix is symmetric.

Correlation Heatmap with Seaborn
Correlation Heatmap with Seaborn

How to Make Lower Triangle Heatmap with Seaborn?

It will be better, if we visualize either the upper triangular correlation matrix or lower triangular correlation matrix as a heatmap.

To do that we just need to extract upper or lower triangular matrix of the correlation matrix. And NumPy has really cool functions to do that. NumPy’s numpy.tril() function takes 2d-numpy array as input and gives the lower triangle of the array. Similarly, numpy.triu() fucntion takes 2d-numpy array as input and gives the upper triangle of the array. Both the functions have the option to return the diagonal elements as part the triangular matrix.

Numpy’s tril() function to extract Lower Triangle Matrix

Let us extract lower triangular matrix of the correlation matrix with diagonal elements using np.tril() function and visualize lower triangular heatmap with Seaborn. We will use np.tril() function with np.ones() function to create a boolean matrix with same size as our correlation matrix. The boolean matrix will have True values on lower triangular matrix and False on upper triangular matrix.

np.tril(np.ones(corr_df.shape)).astype(np.bool)[0:5,0:5]

array([[ True, False, False, False, False],
       [ True,  True, False, False, False],
       [ True,  True,  True, False, False],
       [ True,  True,  True,  True, False],
       [ True,  True,  True,  True,  True]])

We can use the boolean matrix with True on lower triangular matrix to extract lower triangular correlation matrix using pandas’ where() function.Pandas where() function return a dataframe of original size but with NA values on upper triangular correlation matrix.

df_lt = corr_df.where(np.tril(np.ones(corr_df.shape)).astype(np.bool))

We can see that correlation along the diagonal is ones as we kep the diagonal elements. And upper triangular matrix has NaN and lower triangular matrix has correlation values.

df_lt.iloc[0:5,0:3]
	mean radius	mean texture	mean perimeter
mean radius	1.000000	NaN	NaN
mean texture	0.323782	1.000000	NaN
mean perimeter	0.997855	0.329533	1.000000
mean area	0.987357	0.321086	0.986507
mean smoothness	0.170581	-0.023389	0.207278

Now we can feed this data frame with lower triangular correlation matrix to Seaborn’s heatmap() function and get lower triangular correlation heatmap as we wanted.

hmap=sns.heatmap(df_lt,cmap="Spectral")
hmap.figure.savefig("Correlation_Heatmap_Lower_Triangle_with_Seaborn.png",
                    format='png',
                    dpi=150)

Here we have used Spectral color palette using cmap argument for the lower triangular correlation heatmap.

Correlation Heatmap: Lower Triangle with Seaborn
Correlation Heatmap: Lower Triangle with Seaborn

We can also extract upper triangular part of the correlation matrix using np.triu() function. However, the labels of upper triangular heatmap would not be close to the heatmap in Seaborn.

Lower Triangular Heatmap with Seaborn using mask

In the above example, we created a new lower triangular dataframe by subsetting the original correlation matrix. Instead, we can make lower triangular heatmap without creating new lower triangular dataframe. Seaborn’s heatmap function has mask argument that lets you select elements from input data frame. In our example, we want to mask upper triangular elements to make lower triangle correlation heatmap.

Let use create a numpy array to use it as our mask.

mask_ut=np.triu(np.ones(corr_df.shape)).astype(np.bool)

Here we create a boolean matrix with True on upper triangular matrix and False on lower triangular correlation matrix with Numpy’s np.triu() function.

mask_ut[0:5,0:5]

array([[ True,  True,  True,  True,  True],
       [False,  True,  True,  True,  True],
       [False, False,  True,  True,  True],
       [False, False, False,  True,  True],
       [False, False, False, False,  True]])

The mask argument will mask the upper triangular matrix and make us a heatmap with lower triangular matrix.

sns.heatmap(corr_df, mask=mask_ut, cmap="Spectral")
hmap.figure.savefig("Correlation_Heatmap_Lower_Triangle_with_Seaborn_using_mask.png",
                    format='png',
                    dpi=150)

But this time, using mask, we did not have to create a new dataframe .

Correlation Heatmap Lower Triangle with_Seaborn using mask
Correlation Heatmap Lower Triangle with_Seaborn using mask

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

CCA Plot: Scatter plot first pair of canonical covariateIntroduction to Canonical Correlation Analysis (CCA) in Python Default ThumbnailPearson and Spearman Correlation in Python Default ThumbnailHow To Randomly Add NaN to Pandas Dataframe? Default ThumbnailHow To Change Pandas Column Names to Lower Case

Filed Under: Lower Triangular Heatmap Seaborn, Python Tagged With: Heatmap, Python

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version