• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / Python / Pandas DataFrame / Histogram in Pandas / How To Make Histogram in Python with Pandas and Seaborn?

How To Make Histogram in Python with Pandas and Seaborn?

February 10, 2019 by cmdlinetips

Histograms are a great way to visualize the distributions of a single variable and it is one of the must for initial exploratory analysis with fewer variables.

In Python, one can easily make histograms in many ways. Here we will see examples of making histogram with Pandas and Seaborn.

Let us first load Pandas, pyplot from matplotlib, and Seaborn to make histograms in Python.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

We will use gapminder dataset and download it directly from software carpentry website.

data_url = 'http://bit.ly/2cLzoxH'
gapminder = pd.read_csv(data_url)
gapminder.head(n=3)

How To Plot Histogram with Pandas

Let us use Pandas’ hist function to make a histogram showing the distribution of life expectancy in years in our data. One of the key arguments to use while plotting histograms is the number of bins. Here it is specified with the argument ‘bins’. This basically defines the shape of histogram. One should always experiment with a couple of different “bins” while making histogram.

gapminder['lifeExp'].hist(bins=100)
Histogram with Pandas
Histogram with Pandas

Let us change the bins to 10 and see how the histogram looks like.

Histogram with Pandas: smaller bins
Histogram with Pandas: smaller bins

We can see that immediately the histogram with small number of bins does not look that great, smaller details of the distributions can easily disappear. When the number of bins are really high, one might see more patterns in the histogram.


How To Customize Histograms with Pandas?

The default histogram that Pandas make is pretty basic and it is okay for a first pass quick look at the distribution of the data. But not great for full illustration of the data.

For example, the Pandas histogram does not have any labels for x-axis and y-axis. Let us customize the histogram using Pandas.

First, let us remove the grid that we see in the histogram, using grid =False as one of the arguments to Pandas hist function. We can also specify the size of ticks on x and y-axis by specifying xlabelsize/ylabelsize.

Then let us specify our x-axis label with font size and y-axis label with fontsize. We can also specify what is the range of x-axis that we want to show in our histogram. For customizing these options, we directly use matplotlib’s plt object as that is easier.

gapminder['lifeExp'].hist(bins=100, grid=False, xlabelsize=12, ylabelsize=12)
plt.xlabel("Life Expectancy", fontsize=15)
plt.ylabel("Frequency",fontsize=15)
plt.xlim([22.0,90.0])
Customize Histogram in Pandas
Customizing Histogram in Pandas

Now the histogram above is much better with easily readable labels.

Sometimes, we may want to display our histogram in log-scale, Let us see how can make our x-axis as log-scale. We can use matplotlib’s plt object and specify the the scale of x-axis using “xscale=’log’ function.

gapminder['gdpPercap'].hist(bins=1000,grid=False)
plt.xlabel("gdpPercap", fontsize=15)
plt.ylabel("Frequency",fontsize=15)
plt.xscale('log')
Histogram with Log Scale in Pandas
Histogram with Log Scale in Pandas

How To Make Histogram with Seaborn in Python?

The plotting library Seaborn has built-in function to make histogram. The Seaborn function to make histogram is “distplot” for distribution plot. As usual, Seaborn’s distplot can take the column from Pandas dataframe as argument to make histogram.

sns.distplot(gapminder['lifeExp'])

By default, the histogram from Seaborn has multiple elements built right into it. Seaborn can infer the x-axis label and its ranges. It automatically chooses a bin size to make the histogram. Seaborn plots density curve in addition to a histogram.

Histogram with Seaborn
Histogram with Seaborn

Let us customize the histogram from Seaborn. Seaborn’s distplot function has a lot of options to choose from and customize our histogram.

Let us first remove the density line that Seaborn plots automatically, change the color, and then increase the number of bins. We can use Seaborn distplot’s argument ‘kde=False’ to remove the density line on the histogram, ‘color=’red’ argument to change the color of the histogram and then use bins=100 to increase the number of bins. Then we get the following plot.

sns.distplot(gapminder['lifeExp'], kde=False, color='red', bins=100)
Customizing histogram with Seaborn
Customizing histogram with Seaborn

Let us use matplotlib’s pyplot plt object to make more customization. Let us set x-axis label and size, y-axis label and size and title and size. We can use plt’s xlabel, ylabel and title with fontsize argument to make the customization as follows

sns.distplot(gapminder['lifeExp'], kde=False, color='red', bins=100)
plt.title('Life Expectancy', fontsize=18)
plt.xlabel('Life Exp (years)', fontsize=16)
plt.ylabel('Frequency', fontsize=16)

And now the histogram would like this and it is way better than the first one we made.

Customizing histogram with Seaborn: Change x/y axis labels
Customizing histogram with Seaborn: Change x/y axis labels

How To Multiple Histograms with Seaborn in Python?

So far, we visualized just a single variable as histogram. Sometimes, we would like to visualize the distribution of multiple of variables as multiple histograms or density plots. Let us use Seaborn’s distplot to make histograms of multiple variables/distributions. Visualizing multiple variables as histograms may be useful as long as the number of distributions is not really large.

Let us start with two variables and visualize as histograms first. Let us use our gapminder data and make histograms for the variable.

The basic idea to use while plotting multiple histograms is to first make histogram of one variable first and then add the next histogram to the existing plot object.
In this example, we plot histogram of life expectancy for two continents, Africa and Americas. To do that we first subset the original data frame for Africa and make a histogram with distplot.

df = gapminder[gapminder.continent == 'Africa']
sns.distplot(df['lifeExp'],  kde=False, label='Africa')

Then subset the data frame for America and make the histogram plot as an additional layer.

df =gapminder[gapminder.continent == 'Americas']
sns.distplot(df['lifeExp'],  kde=False,label='Americas')

Then we can use the plt object to customize our histogram’s labels like before.

# Plot formatting
plt.legend(prop={'size': 12})
plt.title('Life Expectancy of Two Continents')
plt.xlabel('Life Exp (years)')
plt.ylabel('Density')
Multiple Histograms with Seaborn
Multiple Histograms with Seaborn

How To Multiple Density Curves with Seaborn in Python?

Sometimes simply plotting the density curve is more useful than the actual histograms. We can make density curves like above, but with “hist = False” argument to Seaborn’s distplot.

df = gapminder[gapminder.continent == 'Africa']
sns.distplot(df['lifeExp'], hist = False, kde = True, label='Africa')
df = gapminder[gapminder.continent == 'Americas']
sns.distplot(df['lifeExp'], hist = False, kde = True, label='Americas')
# Plot formatting
plt.legend(prop={'size': 12})
plt.title('Life Expectancy vs Continents')
plt.xlabel('Life Exp (years)')
plt.ylabel('Density')  

Multiple Histogram Density Curves with Seaborn
Multiple Histogram Density Curves with Seaborn

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

Seaborn Version 0.11.0 is HereSeaborn Version 0.11.0 is here with displot, histplot and ecdfplot Overlapping Histograms with Pandas11 Tips to Make Plots with Pandas Default ThumbnailEmpirical cumulative distribution function (ECDF) in Python Plot Boxplot and swarmplot in Python with SeabornHow to Make Boxplots in Python with Pandas and Seaborn?

Filed Under: Histogram in Pandas, Histogram in Seaborn Tagged With: Histogram in Pandas, Histogram with Seaborn

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version