How To Generate Random Numbers from Probability Distributions in R?

Understanding probability distributions and how one can simulate random numbers from a specific probability distribution is very useful in understanding probability and use them effectively in doing data science. Here we will be looking at how to simulate/generate random numbers from 9 most commonly used probability distributions in R and visualizing the 9 probability distributions as histogram using ggplot2.

R, being a statistical programming language, it has most of the commonly used probability distributions readily available with core R. Each probability distribution in R has a short name, like unif for uniform distribution, and norm for normal distribution.

Each of the probability distributions comes with four related functions, cumulative distribution function(CDF), probability distribution function (PDF), quantile, and random number generating function. Each of these functions have common prefixes, d, p, q, and r for all probability distributions.

For example, the four core functions available for uniform distribution are

dunif: to get cdf
punif: to get pdf/pmf
qunif: to get quantile
runif: to sample random numbers from uniform distribution

In this post, we will be mainly focusing on functions random number generating numbers, like runif, for 9 commonly used probability distributions and visualizing them with ggplot2.

1. Uniform Distribution

One can simulate random numbers from uniform distributions in R using runif function. runif function takes the number of random numbers to be simulated and range of the distribution. Here we simulate 10000 random numbers from uniform distribution from 0 to 10.

# simulate uniform random numbers
data_uniform = runif(n=10000,min=0,max=10)

Plotting the simulated uniform random numbers as a histogram.

ggplot()+
  aes(x=data_uniform) +
  geom_histogram(bins=100,fill="magenta1")

2. Normal Distribution

One can simulate random numbers from normal/gaussian distributions in R using rnorm function. rnorm function takes the number of random numbers to be simulated, mean and standard deviation of the normal distribution to be sampled from. Here we simulate 10000 random numbers from normal distribution with mean=0 and sd =1.

data_normal = rnorm(10000, mean=0, sd=1)

Let us plot the normal random numbers as a histogram with ggplot to visualize its distributuion.

ggplot() +
  aes(x=data_normal)+
  geom_histogram(bins=100,fill="magenta1")

3. Bernoulli Distribution

Bernoulli distribution is discrete probability distribution with just two possible outcomes, like head or tail, 0 or 1, and success or failure.

data_bernoulli = as.numeric(rbernoulli(10000,0.3))

ggplot()+
  aes(x=data_bernoulli)+
  geom_bar(fill="magenta2",width = 0.1)

4. Binomial Distribution

Binomial distribution is related to Bernoulli distribution and it is useful in answering questions on the probability of k successes out of N independent Bernoulli trials. For example, Binomial Distribution can answer a question like, if we toss a coin, with probability of head is p, 10 times, what is the probability of seeing 8 heads.

data_binomial = rbinom(p=0.5,size=10,n=10000)
ggplot()+
  aes(x=data_binomial)+
  geom_bar(fill="magenta3",width = 0.2)

5. Poisson Distribution

data_poisson = rpois(n=10000,lambda=3)

ggplot()+
  aes(x=data_poisson)+
  geom_bar(fill="magenta1",width=.2)

6. Beta Distribution

Beta distribution is a continuous distribution taking the values between 0 and 1. Beta distribution depends on two parameters a and b. Depending on the values of a and b, beta distribution can take different shapes.

data_beta_a1b1 = rbeta(n=10000,1,1)

ggplot() +
  aes(x=data_beta_a1b1) +
  geom_histogram(fill="magenta1",bins=100)

Here is how Beta distribution looks like, when a=1, and b=1.

Here is how Beta distribution looks like, when a=1, and b=10.

Here is how Beta distribution looks like, when a=10, and b=1.

Here is how Beta distribution looks like, when a=10, and b=10.

7. Gamma Distribution

data_gamma = rgamma(n=10000,5)
ggplot() +
  aes(x=data_gamma) +
  geom_histogram(fill="magenta1",bins=100)

8. Log Normal Distribution

data_lognorm = rlnorm(10000,0.2)
ggplot() +
  aes(x=data_lognorm) +
  geom_histogram(fill="magenta1",bins=100)

9. Negative Binomial Distribution

data_negbinom = rnbinom(10000, 10, 0.5)
ggplot() + 
  aes(x=data_negbinom) +
  geom_histogram(fill="magenta1",bins=100)