Probability Distributions in Python with SciPy and Seaborn

If you are a beginner in learning data science, understanding probability distributions will be extremely useful. One of the best ways to understand probability distributions is simulate random numbers or generate random variables from specific probability distribution and visualizing them.

9 Most Commonly Used Probability Distributions

There are at least two ways to draw samples from probability distributions in Python. One way is to use Python’s SciPy package to generate random numbers from multiple probability distributions. Here we will draw random numbers from 9 most commonly used probability distributions using SciPy.stats. Not just, that we will be visualizing the probability distributions using Python’s Seaborn plotting library.

Another way to generate random numbers or draw samples from multiple probability distributions in Python is to use NumPy’s random module. We will not be using NumPy in this post, but will do later.

Let us load the Python packages needed to generate random numbers from and plot them.

# for inline plots in jupyter
%matplotlib inline
# import matplotlib
import matplotlib.pyplot as plt

Let us import Seaborn for plotting.

# import seaborn
import seaborn as sns
# settings for seaborn plotting style
sns.set(color_codes=True)
# settings for seaborn plot sizes
sns.set(rc={'figure.figsize':(4.5,3)})

1. Generating Random Numbers from Uniform Distribution

We can import uniform distribution from scipy.stats and use it to generate uniform random numbers.

# import uniform distribution
from scipy.stats import uniform

Generate Uniform random numbers

We can generate random variables/numbers from uniform distribution from uniform distribution’s rvs function like uniform.rvs. To generate 10 uniform random numbers between 0 and 10, we will use

# random numbers from uniform distribution
# Generate 10 numbers from 0 to 10
n = 10000
a = 0
b = 10
data_uniform = uniform.rvs(size=n, loc = a, scale=b)   

Plot Uniform random numbers with Seaborn

We can use Seaborn’s distplot to plot the histogram of uniform random numbers. Seaborn’s distplot takes in multiple arguments to customize the plot. We first create a plot object. Here, we specify the number of bins in the histogram with “bins=100” option, specify color with “color=” option and specify density plot option with “kde” and linewidth option with “hist_kws”. We can also set labels for x and y axis using the plot object we created.

ax = sns.distplot(data_uniform,
                  bins=100,
                  kde=False,
                  color='skyblue',
                  hist_kws={"linewidth": 15,'alpha':1})
ax.set(xlabel='Uniform ', ylabel='Frequency')
Uniform distribution in Python

2. How to Generate Random Numbers from Normal Distribution?

Let us import normal distribution from scipy.stats.

from scipy.stats import norm

Generate random numbers from Gaussian or Normal distribution. We can specify mean and variance of the normal distribution using loc and scale arguments to norm.rvs. To generate 10000 random numbers from normal distribution mean =0 and variance =1, we use norm.rvs function as

# generate random numbersfrom N(0,1)
data_normal = norm.rvs(size=10000,loc=0,scale=1)

Plot the distribution of normal random variables using Seaborn’s distplot.

ax = sns.distplot(data_normal,
                  bins=100,
                  kde=False,
                  color='skyblue',
                  hist_kws={"linewidth": 15,'alpha':1})
ax.set(xlabel='Normal', ylabel='Frequency')
Normal distribution in Python

3. How To Generate Random Numbers from Bernoulli Distribution?

Let us import Bernoulli distribution from scipy.stats.

# import bernoulli
from scipy.stats import bernoulli

Bernoulli random variable can take either 0 or 1 using certain probability as a parameter. To generate 10000, bernoulli random numbers with success probability p =0.3, we will use bernoulli.rvs with two arguments.

# generate bernoulli
data_bern = bernoulli.rvs(size=10000,p=0.3)
ax= sns.distplot(data_bern,
                 kde=False,
                 color="skyblue",
                 hist_kws={"linewidth": 15,'alpha':1})
ax.set(xlabel='Bernoulli', ylabel='Frequency')

We can see from the plot below out of 10000 trials with success probability 0.3, we get about 3000 successes.

Bernoulli Distribution in Python

4. How To Generate Random Numbers from Binomial Distribution?

Let us import binom module from scipy.stats to generate random variables from Binomial distribution.

from scipy.stats import binom

Binomial distribution is a discrete probability distributionlike Bernoulli. It can be used to obtain the number of successes from N Bernoulli trials. For example, to find the number of successes in 10 Bernoulli trials with p =0.5, we will use

binom.rvs(n=10,p=0.5)

We can also use binom.rvs to repeat the trials with size argument. If we want to repeat 5 times, we will use

binom.rvs(size=5,n=10,p=0.5)

Let us generate 10000 from binomial distribution and plot the distribution.

data_binom = binom.rvs(n=10,p=0.5,size=10000)
ax = sns.distplot(data_binom,
                  kde=False,
                  color='skyblue',
                  hist_kws={"linewidth": 15,'alpha':1})
ax.set(xlabel='Binomial', ylabel='Frequency')
Binomial Distribution in Python

5. How To Generate Random Numbers from Poisson Distribution?

Let us import poisson module from scipy.stats to generate poisson random variables.

from scipy.stats import poisson

Generate Poisson Random Variables in SciPy

Poisson random variable is typically used to model the number of times an event happened in a time interval. For example, number of users visited your website in an interval can be thought of a Poisson process. Poisson distribution is described in terms of the rate (mu) at which the events happen. We can generate Poisson random variables in Python using poisson.rvs.

Let us generate 10000 random numbers from Poisson random variable with mu = 0.3 and plot them.

data_poisson = poisson.rvs(mu=3, size=10000)
ax = sns.distplot(data_poisson,
                  kde=False,
                  color='green',
                  hist_kws={"linewidth": 15,'alpha':1})
ax.set(xlabel='Poisson', ylabel='Frequency')
Generate random numbers from Poisson distribution in Python

6. How to Generate Random Numbers from Beta Distribution?

We can understand Beta distribution as a distribution for probabilities. Beta distribution is a continuous distribution taking values from 0 to 1. It is defined by two parameters alpha and beta, depending on the values of alpha and beta they can assume very different distributions.

from scipy.stats import beta

Let us generate 10000, random numbers from Beta distribution with alpha = 1 and beta = 1. The histogram of Beta(1,1) is a uniform distribution.

data_beta = beta.rvs(1, 1, size=10000)
ax = sns.distplot(data_beta,
                  kde=False,
                  bins=100,
                  color='skyblue',
                  hist_kws={"linewidth": 15,'alpha':1})
ax.set(xlabel='Beta(1,1)', ylabel='Frequency')
Beta Distribution in Python

Let us generate 10000, random numbers from Beta distribution with alpha = 10 and beta = 1. The histogram of Beta(10,1) is skewed towards right.

data_beta_a10b1 = beta.rvs(10, 1, size=10000)
sns.distplot(data_beta_a10b1,
             kde=False,
             bins=50,
             color='skyblue',
             hist_kws={"linewidth": 15,'alpha':1})
ax.set(xlabel='Beta(10,1)', ylabel='Frequency')

Let us generate 10000, random numbers from Beta distribution with alpha = 1 and beta = 10. The histogram of Beta(1,10) is skewed towards left.

data_beta_a1b10 = beta.rvs(1, 10, size=10000)
ax = sns.distplot(data_beta_a1b10,
                kde=False,
                bins=100,
                color='skyblue',
                hist_kws={"linewidth": 15,'alpha':1})
ax.set(xlabel='Beta(1,10)', ylabel='Frequency')

Let us generate 10000, random numbers from Beta distribution with alpha = 10 and beta = 10. The histogram of Beta(10,10) is symmetric and looks like a normal distribution.

data_beta_a10b10 = beta.rvs(10, 10, size=10000)
ax = sns.distplot(data_beta_a10b10,
                  kde=False,
                  bins=100,
                  color='skyblue',
                  hist_kws={"linewidth": 15,'alpha':1})
ax.set(xlabel='Beta(10,10)', ylabel='Frequency')


7. How to Generate Random Numbers from Gamma Distribution?

from scipy.stats import gamma
data_gamma = gamma.rvs(a=5, size=10000)
ax = sns.distplot(data_gamma,
                  kde=False,
                  bins=100,
                  color='skyblue',
                  hist_kws={"linewidth": 15,'alpha':1})
ax.set(xlabel='Gamma', ylabel='Frequency')

8. How To Generate Random Numbers from Log Normal Distribution?

from scipy.stats import lognorm
data_lognorm = lognorm.rvs(0.2, size=10000)
ax = sns.distplot(data_lognorm,kde=False,
                  bins=100,
                  color='skyblue',
                  hist_kws={"linewidth": 15,'alpha':1})
ax.set(xlabel='Log Normal', ylabel='Frequency')

9. How To Generate Random Numbers from Negative Binomial Distribution?

Negative Binomial Distribution is another random variable with discrete outcome and as the name suggests it is related to binomial/bernoulli distribution.

from scipy.stats import nbinom
data_nbinom = nbinom.rvs(10, 0.5, size=10000)
ax = sns.distplot(data_nbinom,
                  kde=False,
                  color='skyblue',
                  hist_kws={"linewidth": 15,'alpha':1})
ax.set(xlabel='Negative Binomial', ylabel='Frequency')