Understanding probability distributions and how one can simulate random numbers from a specific probability distribution is very useful in understanding probability and use them effectively in doing data science. Here we will be looking at how to simulate/generate random numbers from 9 most commonly used probability distributions in R and visualizing the 9 probability distributions as histogram using ggplot2.
R, being a statistical programming language, it has most of the commonly used probability distributions readily available with core R. Each probability distribution in R has a short name, like unif for uniform distribution, and norm for normal distribution.
Each of the probability distributions comes with four related functions, cumulative distribution function(CDF), probability distribution function (PDF), quantile, and random number generating function. Each of these functions have common prefixes, d, p, q, and r for all probability distributions.
For example, the four core functions available for uniform distribution are
- dunif: to get cdf
- punif: to get pdf/pmf
- qunif: to get quantile
- runif: to sample random numbers from uniform distribution
In this post, we will be mainly focusing on functions random number generating numbers, like runif, for 9 commonly used probability distributions and visualizing them with ggplot2.
1. Uniform Distribution
One can simulate random numbers from uniform distributions in R using runif function. runif function takes the number of random numbers to be simulated and range of the distribution. Here we simulate 10000 random numbers from uniform distribution from 0 to 10.
# simulate uniform random numbers data_uniform = runif(n=10000,min=0,max=10)
Plotting the simulated uniform random numbers as a histogram.
ggplot()+ aes(x=data_uniform) + geom_histogram(bins=100,fill="magenta1")
2. Normal Distribution
One can simulate random numbers from normal/gaussian distributions in R using rnorm function. rnorm function takes the number of random numbers to be simulated, mean and standard deviation of the normal distribution to be sampled from. Here we simulate 10000 random numbers from normal distribution with mean=0 and sd =1.
data_normal = rnorm(10000, mean=0, sd=1)
Let us plot the normal random numbers as a histogram with ggplot to visualize its distributuion.
ggplot() + aes(x=data_normal)+ geom_histogram(bins=100,fill="magenta1")
3. Bernoulli Distribution
Bernoulli distribution is discrete probability distribution with just two possible outcomes, like head or tail, 0 or 1, and success or failure.
data_bernoulli = as.numeric(rbernoulli(10000,0.3))
ggplot()+ aes(x=data_bernoulli)+ geom_bar(fill="magenta2",width = 0.1)
4. Binomial Distribution
Binomial distribution is related to Bernoulli distribution and it is useful in answering questions on the probability of k successes out of N independent Bernoulli trials. For example, Binomial Distribution can answer a question like, if we toss a coin, with probability of head is p, 10 times, what is the probability of seeing 8 heads.
data_binomial = rbinom(p=0.5,size=10,n=10000) ggplot()+ aes(x=data_binomial)+ geom_bar(fill="magenta3",width = 0.2)
5. Poisson Distribution
data_poisson = rpois(n=10000,lambda=3)
ggplot()+ aes(x=data_poisson)+ geom_bar(fill="magenta1",width=.2)
6. Beta Distribution
Beta distribution is a continuous distribution taking the values between 0 and 1. Beta distribution depends on two parameters a and b. Depending on the values of a and b, beta distribution can take different shapes.
data_beta_a1b1 = rbeta(n=10000,1,1)
ggplot() + aes(x=data_beta_a1b1) + geom_histogram(fill="magenta1",bins=100)
Here is how Beta distribution looks like, when a=1, and b=1.
Here is how Beta distribution looks like, when a=1, and b=10.
Here is how Beta distribution looks like, when a=10, and b=1.
Here is how Beta distribution looks like, when a=10, and b=10.
7. Gamma Distribution
data_gamma = rgamma(n=10000,5) ggplot() + aes(x=data_gamma) + geom_histogram(fill="magenta1",bins=100)
8. Log Normal Distribution
data_lognorm = rlnorm(10000,0.2) ggplot() + aes(x=data_lognorm) + geom_histogram(fill="magenta1",bins=100)
9. Negative Binomial Distribution
data_negbinom = rnbinom(10000, 10, 0.5) ggplot() + aes(x=data_negbinom) + geom_histogram(fill="magenta1",bins=100)