Maximum Likelihood Estimation (MLE) is a key method in statistical inference for estimating model parameters. By maximizing the likelihood of observed data, MLE provides reliable estimates, especially in large samples, making it a popular choice among statisticians.
-
Definition of Maximum Likelihood Estimation (MLE)
- MLE is a method for estimating the parameters of a statistical model.
- It selects the parameter values that maximize the likelihood of observing the given data.
- MLE is widely used due to its desirable properties in large samples.
-
Likelihood function
- The likelihood function measures the probability of the observed data given specific parameter values.
- It is defined as the product of the probability density (or mass) functions for all observed data points.
- The likelihood function is not a probability distribution itself; it is a function of the parameters.
-
Log-likelihood function
- The log-likelihood function is the natural logarithm of the likelihood function.
- It simplifies calculations, especially when dealing with products of probabilities, by converting them into sums.
- Maximizing the log-likelihood is equivalent to maximizing the likelihood function.
-
Steps to derive MLE
- Specify the likelihood function based on the statistical model and observed data.
- Take the natural logarithm to obtain the log-likelihood function.
- Differentiate the log-likelihood with respect to the parameters and set the derivatives to zero to find critical points.
- Solve the resulting equations to obtain the MLE estimates.
-
Properties of MLE (consistency, asymptotic normality, efficiency)
- Consistency: MLE estimates converge in probability to the true parameter values as sample size increases.
- Asymptotic normality: MLE estimates are approximately normally distributed for large samples.
- Efficiency: MLE achieves the lowest possible variance among unbiased estimators in large samples.
-
Score function
- The score function is the gradient (first derivative) of the log-likelihood function with respect to the parameters.
- It indicates the direction in which the likelihood function increases.
- Setting the score function to zero helps find the MLE.
-
Fisher Information
- Fisher Information quantifies the amount of information that an observable random variable carries about an unknown parameter.
- It is defined as the expected value of the squared score function.
- Higher Fisher Information indicates more precise estimates of the parameter.
-
Cramรฉr-Rao lower bound
- The Cramรฉr-Rao lower bound provides a theoretical lower limit on the variance of unbiased estimators.
- It states that the variance of any unbiased estimator is at least as large as the inverse of the Fisher Information.
- MLE is asymptotically efficient, meaning it achieves this lower bound in large samples.
-
MLE for common distributions (e.g., Normal, Binomial, Poisson)
- For the Normal distribution, MLE estimates for mean and variance are the sample mean and sample variance.
- For the Binomial distribution, MLE for the probability of success is the ratio of successes to total trials.
- For the Poisson distribution, MLE for the rate parameter is the average count of events in the observed interval.
-
Numerical methods for finding MLE (e.g., Newton-Raphson)
- The Newton-Raphson method is an iterative technique used to find roots of equations, applicable for MLE.
- It uses the score function and the Fisher Information to update parameter estimates.
- Convergence can be rapid, but it requires a good initial guess and may not work if the likelihood function is not well-behaved.