Maximum likelihood is a very general approach developed by R. A. Fisher, when he was an undergrad. In an earlier post, Introduction to Maximum Likelihood Estimation in R, we introduced the idea of likelihood and how it is a powerful approach for parameter estimation.
We learned that Maximum Likelihood estimates are one of the most common ways to estimate the unknown parameter from the data. Basically, Maximum Likelihood Estimation method gets the estimate of parameter by finding the parameter value that maximizes the probability of observing the data given parameter.
We simulated data from Poisson distribution, which has a single parameter lambda describing the distribution.
# generate data from Poisson distribution # with the parameter lambda=5 data <- rpois(n=100, lambda=5)
We pretended that we did not know the lambda and we just have the data. We estimated the likelihood of seeing the data given the parameter lambda. And we used brute-force approach to find the lambda that has the highest likelihood. Basically, we chose a reasonable range for the parameter lambda.
lambdas <- seq(1,15, by=0.5)
We wrote a simple function to compute likelihood and computed likelihood for each value in the chosen range, found which value gives highest likelihood. Once we find the parameter value, we basically estimated the parameter that maximizes the likelihood.
# function to compute log likelihood # for Poisson distribution llh_poisson <- function(lambda, y){ # log(likelihood) by summing llh <- sum(dpois(y, lambda, log=TRUE)) return(llh) }
The above approach of getting MLE works great for a likelihood function with a single parameter, may be for even two parameters. You can see immediately that for a distribution with multiple parameters, our search space can get really big. The approach of searching the multi-dimensional parameter space will be very slow.
Yes, there are faster ways out there 🙂 . The basic problem we are trying to address here is we have a function- here a likelihood function and we are trying to maximize the function. This problem is pretty common and occurs in a scenarios. A widely known area and closely related to this problem is called “optimization” problem, And there are a number of numerical approaches to do that, like the familiar Newton-Raphson and Gradient Descent methods.
How to Use Maximum Likelihood Estimation in R for Poisson Distribution?
R has functions to do the optimization. One of the optimizers is the function optim and we will be using function called mle from stats4 package. The mle function from stats4 uses optim function under the hood.
How to Estimate a Single Oarameter using MLE
We will write a function to compute the likelihood (We already did it, llh_poisson) and use the likelihood function as input to the optimizing function mle with some starting points. We will demonstrate first using Poisson distributed data and estimate the parameter lambda by MLE.
Let us use mle function with the likelihood fuction llh_poisson and starting value for the parameter. Most of the optimizers, maximizes a function by starting with a parameter value and iteratively updates the current parameter estimate. Ideally as the number of iterations increases, the parameter estimate approaches the true parameter value.
# get mle estimates of parameters fit_poisson <- mle(llh_poisson, start = list(lambda = 1))
We can use the summary on the fit_poisson object to see the ML estimate with summary function. We can see that we use mle function as mle(minuslogl = llh_poisson, start = list(lambda = 1)). Out ML estimate of lambda 5.16, which is pretty close to the true lambda=5.
>summary(fit_poisson) Maximum likelihood estimation Call: mle(minuslogl = llh_poisson, start = list(lambda = 1)) Coefficients: Estimate Std. Error lambda 5.160001 0.2271564 -2 log L: 464.5542
We can also get confidence interval for the estimate using confint function
>confint(poisson_fit) Profiling... 2.5 % 97.5 % 4.727506 5.618133
How To Use Maximum Likelihood Estimation in R for Normal/Gaussian Distribution?
In the above example, we did parameter estimation for data from Poisson model. Poisson model has a single parameter describing it.
Let us consider a slightly more complicated model, Normal/Gaussian distribution. Normal/Gaussian distribution is described by two parameters, mean (mu) and variance.
Let us simulate some data from Normal/Gaussian distribution using rnorm function with specific mean and variance. Note that rnorm function takes standard deviation as input, not variance.
# set seed for random numbers set.seed(42) # simulate data from Normal distribution x <- rnorm(200, mean=2, sd=2);
Let us plot the distribution of our data as histogram using ggplot2.
data.frame(x=x) %>% ggplot(aes(x=x))+ geom_histogram(bins=30, color="blue",fill="dodgerblue") + theme_bw(base_size = 16) + xlab("Data")
We can clearly see that the mean is 2 and variance is around 4.
Let us now write the likelihood function for the data under Normal/Gaussian distribution with two unknown parameters. Like before we will compute negative log likelihood. The probability density function for Normal distribution in R is dnorm and it takes a data point and two parameters as input. Additionally, we specify we want to compute log-likelihood with log=TRUE argument.
# likelihood function for normal distribution # with two unknowns neg_log_lik_gaussian <- function(mu,sigma) { -sum(dnorm(x, mean=mu, sd=sigma, log=TRUE)) }
We can use the above likelihood function and find the estimates of the parameters which maximize the likelihood using mle function. We provide the likelihood function, and starting values for the parameters with start argument and specify the numerical optimization method to use with method option.
gaussian_fit <- mle(neg_log_lik_gaussian, start=list(mu=1, sigma=1), method="L-BFGS-B")
And we can see the estimate from the fit using summary function. Our maximum likelihood estimate for mean is 1.945 and sigma is 1.944, both are pretty close to the true mean=2 and sd=2.
>summary(gaussian_fit) Maximum likelihood estimation Call: mle(minuslogl = neg_log_lik, start = list(mu = 1, sigma = 1), method = "L-BFGS-B") Coefficients: Estimate Std. Error mu 1.945031 0.1374822 sigma 1.944292 0.0972145 -2 log L: 833.5344
We can also get the uncertainty in our estimates using confint function as before.
Profiling... 2.5 % 97.5 % mu 1.674269 2.215793 sigma 1.768288 2.151591