If you are a beginner in learning data science, understanding probability distributions will be extremely useful. One of the best ways to understand probability distributions is simulate random numbers or generate random variables from specific probability distribution and visualizing them.
9 Most Commonly Used Probability Distributions
There are at least two ways to draw samples from probability distributions in Python. One way is to use Python’s SciPy package to generate random numbers from multiple probability distributions. Here we will draw random numbers from 9 most commonly used probability distributions using SciPy.stats. Not just, that we will be visualizing the probability distributions using Python’s Seaborn plotting library.
Another way to generate random numbers or draw samples from multiple probability distributions in Python is to use NumPy’s random module. We will not be using NumPy in this post, but will do later.
Let us load the Python packages needed to generate random numbers from and plot them.
# for inline plots in jupyter %matplotlib inline # import matplotlib import matplotlib.pyplot as plt
Let us import Seaborn for plotting.
# import seaborn import seaborn as sns # settings for seaborn plotting style sns.set(color_codes=True) # settings for seaborn plot sizes sns.set(rc={'figure.figsize':(4.5,3)})
1. Generating Random Numbers from Uniform Distribution
We can import uniform distribution from scipy.stats and use it to generate uniform random numbers.
# import uniform distribution from scipy.stats import uniform
Generate Uniform random numbers
We can generate random variables/numbers from uniform distribution from uniform distribution’s rvs function like uniform.rvs. To generate 10 uniform random numbers between 0 and 10, we will use
# random numbers from uniform distribution # Generate 10 numbers from 0 to 10 n = 10000 a = 0 b = 10 data_uniform = uniform.rvs(size=n, loc = a, scale=b)
Plot Uniform random numbers with Seaborn
We can use Seaborn’s distplot to plot the histogram of uniform random numbers. Seaborn’s distplot takes in multiple arguments to customize the plot. We first create a plot object. Here, we specify the number of bins in the histogram with “bins=100” option, specify color with “color=” option and specify density plot option with “kde” and linewidth option with “hist_kws”. We can also set labels for x and y axis using the plot object we created.
ax = sns.distplot(data_uniform, bins=100, kde=False, color='skyblue', hist_kws={"linewidth": 15,'alpha':1}) ax.set(xlabel='Uniform ', ylabel='Frequency')
2. How to Generate Random Numbers from Normal Distribution?
Let us import normal distribution from scipy.stats.
from scipy.stats import norm
Generate random numbers from Gaussian or Normal distribution. We can specify mean and variance of the normal distribution using loc and scale arguments to norm.rvs. To generate 10000 random numbers from normal distribution mean =0 and variance =1, we use norm.rvs function as
# generate random numbersfrom N(0,1) data_normal = norm.rvs(size=10000,loc=0,scale=1)
Plot the distribution of normal random variables using Seaborn’s distplot.
ax = sns.distplot(data_normal, bins=100, kde=False, color='skyblue', hist_kws={"linewidth": 15,'alpha':1}) ax.set(xlabel='Normal', ylabel='Frequency')
3. How To Generate Random Numbers from Bernoulli Distribution?
Let us import Bernoulli distribution from scipy.stats.
# import bernoulli from scipy.stats import bernoulli
Bernoulli random variable can take either 0 or 1 using certain probability as a parameter. To generate 10000, bernoulli random numbers with success probability p =0.3, we will use bernoulli.rvs with two arguments.
# generate bernoulli data_bern = bernoulli.rvs(size=10000,p=0.3) ax= sns.distplot(data_bern, kde=False, color="skyblue", hist_kws={"linewidth": 15,'alpha':1}) ax.set(xlabel='Bernoulli', ylabel='Frequency')
We can see from the plot below out of 10000 trials with success probability 0.3, we get about 3000 successes.
4. How To Generate Random Numbers from Binomial Distribution?
Let us import binom module from scipy.stats to generate random variables from Binomial distribution.
from scipy.stats import binom
Binomial distribution is a discrete probability distributionlike Bernoulli. It can be used to obtain the number of successes from N Bernoulli trials. For example, to find the number of successes in 10 Bernoulli trials with p =0.5, we will use
binom.rvs(n=10,p=0.5)
We can also use binom.rvs to repeat the trials with size argument. If we want to repeat 5 times, we will use
binom.rvs(size=5,n=10,p=0.5)
Let us generate 10000 from binomial distribution and plot the distribution.
data_binom = binom.rvs(n=10,p=0.5,size=10000) ax = sns.distplot(data_binom, kde=False, color='skyblue', hist_kws={"linewidth": 15,'alpha':1}) ax.set(xlabel='Binomial', ylabel='Frequency')
5. How To Generate Random Numbers from Poisson Distribution?
Let us import poisson module from scipy.stats to generate poisson random variables.
from scipy.stats import poisson
Generate Poisson Random Variables in SciPy
Poisson random variable is typically used to model the number of times an event happened in a time interval. For example, number of users visited your website in an interval can be thought of a Poisson process. Poisson distribution is described in terms of the rate (mu) at which the events happen. We can generate Poisson random variables in Python using poisson.rvs.
Let us generate 10000 random numbers from Poisson random variable with mu = 0.3 and plot them.
data_poisson = poisson.rvs(mu=3, size=10000) ax = sns.distplot(data_poisson, kde=False, color='green', hist_kws={"linewidth": 15,'alpha':1}) ax.set(xlabel='Poisson', ylabel='Frequency')
6. How to Generate Random Numbers from Beta Distribution?
We can understand Beta distribution as a distribution for probabilities. Beta distribution is a continuous distribution taking values from 0 to 1. It is defined by two parameters alpha and beta, depending on the values of alpha and beta they can assume very different distributions.
from scipy.stats import beta
Let us generate 10000, random numbers from Beta distribution with alpha = 1 and beta = 1. The histogram of Beta(1,1) is a uniform distribution.
data_beta = beta.rvs(1, 1, size=10000) ax = sns.distplot(data_beta, kde=False, bins=100, color='skyblue', hist_kws={"linewidth": 15,'alpha':1}) ax.set(xlabel='Beta(1,1)', ylabel='Frequency')
Let us generate 10000, random numbers from Beta distribution with alpha = 10 and beta = 1. The histogram of Beta(10,1) is skewed towards right.
data_beta_a10b1 = beta.rvs(10, 1, size=10000) sns.distplot(data_beta_a10b1, kde=False, bins=50, color='skyblue', hist_kws={"linewidth": 15,'alpha':1}) ax.set(xlabel='Beta(10,1)', ylabel='Frequency')
Let us generate 10000, random numbers from Beta distribution with alpha = 1 and beta = 10. The histogram of Beta(1,10) is skewed towards left.
data_beta_a1b10 = beta.rvs(1, 10, size=10000) ax = sns.distplot(data_beta_a1b10, kde=False, bins=100, color='skyblue', hist_kws={"linewidth": 15,'alpha':1}) ax.set(xlabel='Beta(1,10)', ylabel='Frequency')
Let us generate 10000, random numbers from Beta distribution with alpha = 10 and beta = 10. The histogram of Beta(10,10) is symmetric and looks like a normal distribution.
data_beta_a10b10 = beta.rvs(10, 10, size=10000) ax = sns.distplot(data_beta_a10b10, kde=False, bins=100, color='skyblue', hist_kws={"linewidth": 15,'alpha':1}) ax.set(xlabel='Beta(10,10)', ylabel='Frequency')
7. How to Generate Random Numbers from Gamma Distribution?
from scipy.stats import gamma
data_gamma = gamma.rvs(a=5, size=10000) ax = sns.distplot(data_gamma, kde=False, bins=100, color='skyblue', hist_kws={"linewidth": 15,'alpha':1}) ax.set(xlabel='Gamma', ylabel='Frequency')
8. How To Generate Random Numbers from Log Normal Distribution?
from scipy.stats import lognorm
data_lognorm = lognorm.rvs(0.2, size=10000) ax = sns.distplot(data_lognorm,kde=False, bins=100, color='skyblue', hist_kws={"linewidth": 15,'alpha':1}) ax.set(xlabel='Log Normal', ylabel='Frequency')
9. How To Generate Random Numbers from Negative Binomial Distribution?
Negative Binomial Distribution is another random variable with discrete outcome and as the name suggests it is related to binomial/bernoulli distribution.
from scipy.stats import nbinom data_nbinom = nbinom.rvs(10, 0.5, size=10000)
ax = sns.distplot(data_nbinom, kde=False, color='skyblue', hist_kws={"linewidth": 15,'alpha':1}) ax.set(xlabel='Negative Binomial', ylabel='Frequency')