Visualizing data with heatmaps is a great way to do exploratory data analysis, when you have a data set with multiple variables. Heatmaps can reveal general pattern in the dataset, instantly. And it is very easy to make beautiful heatmaps with Seaborn library in Python.
Let us see 3 examples of creating heatmap visualizations with Seaborn. One of the manipulation do before making heatmap is it use Pandas pivot functionality to reshape the data for making heatmaps.
Let us first get the packages needed to make heatmap.
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt
We will use gapminder dataset to make heatmaps using Seaborn.
data_url = 'http://bit.ly/2cLzoxH' gapminder = pd.read_csv(data_url) print(gapminder.head(3))
For the first heatmap example, let us filter the original gapminder dataframe so that we have just three columns/variables, continent, year, and lifeExp.
>df1 = gapminder[['continent', 'year','lifeExp']] >print(df1.head()) continent year lifeExp 0 Asia 1952 28.801 1 Asia 1957 30.332 2 Asia 1962 31.997 3 Asia 1967 34.020 4 Asia 1972 36.088
Let us make a heatmap showing life expectancy of each continent over the years. To make that heatmap, we need our data as matrix where row are continents, columns are years and each element contain the life expectancy of the specific continent and year.
Heatmap with Seaborn Example 1
A quick look at our gapminder dataframe will tell you that the data is in tidy format, i.e. Each variable has its own column and each observation has its own row. So we weed to convert the data to wider form such that we can easily make heatmap.
We can use Pandas’ pivot_table function to spread the data from long form to tidy form. See the earlier blog post for more examples of using Pandas’ pivot_table function to reshape the data.
Since we want to reshape the data such that we want continent as rows and year on columns, we specify index and columns variables accordingly.
# pandas pivot heatmap1_data = pd.pivot_table(df1, values='lifeExp', index=['continent'], columns='year')
After pivoting, we have the data in the format we need. Now, we can make heatmap using Seaborn’s function “heatmap” easily. In our example here, we have chosen a specific color palette with “cmap” argument.
sns.heatmap(heatmap1_data, cmap="YlGnBu")
We get a simple heatmap instantly highlighting the trend in the data with values of the color scale on the right. From this heatmap, we can see that the life expectancy in Africa and Asia improved over the years, with Asia doing much better than Africa and so on.
Heatmap with Seaborn Example 2
Let us make another heatmap, but this time using each country’s life expectancy. Let us first subset the gapminder data frame such that we keep the country column. And then use Pandas’ pivot_table function to reshape the data so that it is in wide form and easy to make heatmap with Seaborn’s heatmap function.
df2 = gapminder[['country','continent', 'year','lifeExp']] heatmap2_data = pd.pivot_table(df2,values='lifeExp', index=['country'], columns='year') heatmap2_data.head(n=5) sns.heatmap(heatmap2_data, cmap="BuGn")
We can see that our heatmap shows country’s life expectancy values over years. And automatically, Seaborn’s heatmap function is using all country’s data and labels the names of select countries on rows.
Heatmap Example 3: Customizing Heatmaps with Seaborn
Often we would like to customize our visualization so that it is more informative and suitable to our need. Let us see some examples of customizing heatmap with Seaborn.
Our previous heatmap showed all countries, but in a squished way. Let us customize the heatmap so that it is not squished and we can see more countries. Let us say we would like to keep the continent information in our heatmap, not just country information. Let us also change the color palette, so that we see the pattern in the heatmap more clearly.
Let us first prepare our data frame so that we keep the continent information on the heatmap. To do that we need to reshape our original gapminder dataframe with four variables so that our row or index has continent information in addition to country information. Pandas’ pivot_table comes to our rescue and we can simply specify both country and continent as index using the argument “index”.
df3 = gapminder[['country','continent', 'year','lifeExp']] # pandas pivot with multiple variables heatmap3_data = pd.pivot_table(df3,values='lifeExp', index=['continent','country'], columns='year')
We can see that the new reshaped data from pandas pivot has tw oindices; continent and country. We can change the color palette to “RdBu”, which one of the diverging maps available in Python to show the clear difference between low and high values (of life expectancy). By specifying the size of the figure we would like create using plt.figure, we can make the heatmap taller so that we see more labels of “continent-country”.
plt.figure(figsize=(8, 12)) sns.heatmap(heatmap3_data, cmap="RdBu")
We can see that now we have the customization of heatmap we needed. The row indexes have both country and continent information. Also the image is taller so we have more row indexes. Our new color palette clearly shows the difference between countries with low life expectancy against the countries with high life expectancy.