Quasi-likelihood estimation lets you fit models to overdispersed data without having to fully specify the response variable's distribution. Instead of relying on a complete probability model (the way maximum likelihood does), you only need to get two things right: the mean function and the variance function.

This matters because real-world data, especially count data and proportions, frequently show more variability than standard distributions like Poisson or binomial can accommodate. Quasi-likelihood gives you a principled way to handle that extra variation while still producing valid inference.

Quasi-likelihood for Overdispersion

Introduction to Quasi-likelihood

In a standard GLM, the distribution you choose (Poisson, binomial, etc.) locks in a specific mean-variance relationship. For Poisson regression, the variance must equal the mean. Overdispersion occurs when the observed variance exceeds what that assumed relationship predicts.

Quasi-likelihood addresses this by stepping back from the full distributional assumption. Rather than saying "the data follow a Poisson distribution," you say "the mean and variance are related in this particular way, and there may be extra variation on top." This relaxation is what makes the method flexible: you don't need to identify the exact distribution generating your data.

The key trade-off is that you give up a true likelihood function. You can't compute AIC directly or use standard likelihood ratio tests without modification. But you gain robustness to distributional misspecification, which is often the more pressing concern with overdispersed data.

Construction and Properties of Quasi-likelihood

The quasi-likelihood function is built from just two ingredients:

The mean function $\mu_i = E(Y_i)$ , linked to the linear predictor through a link function
The variance function $V(\mu_i)$ , which specifies how the variance relates to the mean

No full probability distribution is needed. The quasi-log-likelihood for a single observation takes the form:

$Q(y_i, \mu_i) = \int_{y_i}^{\mu_i} \frac{y_i - t}{\phi \, V(t)} \, dt$

where $\phi$ is the dispersion parameter that scales the variance to account for overdispersion.

Despite not coming from a true likelihood, quasi-likelihood estimates have strong theoretical properties. Under correct specification of the mean and variance functions, the parameter estimates are consistent (they converge to the true values as sample size grows) and asymptotically normal (so you can build confidence intervals and do hypothesis tests in the usual way).

Quasi-likelihood also covers a wider range of mean-variance relationships than the exponential family alone. Any reasonable variance function $V(\mu)$ can be plugged in, not just those tied to binomial, Poisson, or gamma distributions.

Estimating Parameters with Quasi-likelihood

Specification of Mean and Variance Functions

Setting up a quasi-likelihood model requires two choices:

A link function that connects the mean $\mu$ to the linear predictor $\eta = X\beta$ . Common choices:
- Log link ( $\log(\mu) = \eta$ ) for count data
- Logit link ( $\log(\mu/(1-\mu)) = \eta$ ) for binary/proportion data
- Identity link ( $\mu = \eta$ ) for continuous data
A variance function $V(\mu)$ that describes how the variance scales with the mean. Standard options:
- $V(\mu) = 1$ : constant variance (normal/Gaussian)
- $V(\mu) = \mu$ : variance proportional to the mean (Poisson-type)
- $V(\mu) = \mu(1 - \mu)$ : binomial-type variance
- $V(\mu) = \mu^2$ : variance proportional to the squared mean (gamma-type)

The actual variance of each observation is then $\text{Var}(Y_i) = \phi \, V(\mu_i)$ , where $\phi > 1$ captures the overdispersion.

Quasi-likelihood Estimating Equations and Optimization

Parameter estimation proceeds through these steps:

Form the quasi-score function. For parameter vector $\beta$ , the quasi-score is:

$U(\beta) = \sum_{i=1}^{n} \frac{(y_i - \mu_i)}{\phi \, V(\mu_i)} \frac{\partial \mu_i}{\partial \beta}$

This looks very similar to the score function in standard GLMs. The key difference is that $\phi$ appears explicitly to handle the extra variance.

Set the quasi-score equal to zero and solve for $\beta$ . Because the equations are nonlinear, this requires iterative methods.
Solve iteratively using Fisher scoring (which is equivalent to iteratively reweighted least squares in this context). Each iteration updates the parameter estimates using the current working weights derived from $V(\mu)$ .
Estimate the dispersion parameter $\phi$ separately, after obtaining $\hat{\beta}$ . Two common approaches:
- Pearson-based: $\hat{\phi} = \frac{1}{n - p} \sum_{i=1}^{n} \frac{(y_i - \hat{\mu}_i)^2}{V(\hat{\mu}_i)}$
- Deviance-based: $\hat{\phi} = \frac{D}{n - p}$ , where $D$ is the deviance and $p$ is the number of estimated parameters

An important detail: the regression parameter estimates $\hat{\beta}$ don't depend on $\phi$ . The dispersion parameter only affects the standard errors and test statistics, not the point estimates themselves. This is why $\phi$ can be estimated after fitting.

Interpreting Quasi-likelihood Results

Parameter Estimates and Inference

The regression coefficients $\hat{\beta}$ from quasi-likelihood have the same interpretation as in standard GLMs. For example, with a log link, $e^{\hat{\beta}_j}$ still represents the multiplicative change in the expected response for a one-unit increase in predictor $j$ , holding other predictors constant.

What changes is the precision of those estimates. The estimated dispersion parameter $\hat{\phi}$ tells you how much extra variation exists:

$\hat{\phi} \approx 1$ : no meaningful overdispersion (standard GLM would have been fine)
$\hat{\phi} > 1$ : overdispersion is present; standard errors need inflating
$\hat{\phi} < 1$ : underdispersion (less common, but possible)

Standard errors under quasi-likelihood are inflated by a factor of $\sqrt{\hat{\phi}}$ compared to the nominal GLM standard errors. This means confidence intervals get wider and p-values get larger, reflecting the genuine uncertainty that overdispersion introduces. Alternatively, sandwich (robust) standard errors can be used, which don't rely on the variance function being exactly correct.

For hypothesis testing, you can use quasi-likelihood F-tests to compare nested models. These replace the chi-square-based likelihood ratio tests used in standard GLMs, with the F-distribution accounting for the estimated dispersion parameter.

Goodness-of-fit and Model Evaluation

Two primary statistics assess model fit:

Deviance: measures the discrepancy between the fitted model and a saturated model (one with a parameter for every observation). Calculated from the quasi-log-likelihood difference.
Pearson chi-square statistic: $X^2 = \sum \frac{(y_i - \hat{\mu}_i)^2}{V(\hat{\mu}_i)}$ , comparing observed to expected values.

Under a well-fitting model, the ratio of either statistic to its degrees of freedom should be close to 1. Ratios substantially greater than 1 suggest remaining overdispersion or structural misfit.

For diagnostics, plot standardized Pearson residuals against fitted values and against each predictor. Patterns in these plots (curves, funnels, clusters) indicate that the mean or variance function may be misspecified.

For model comparison, standard AIC doesn't apply directly because there's no true likelihood. Instead, use:

QAIC (quasi-AIC): $\text{QAIC} = -2Q(\hat{\beta})/\hat{\phi} + 2p$
QBIC (quasi-BIC): similar adjustment using the BIC penalty

These criteria let you compare competing quasi-likelihood models on the same dataset, balancing fit against complexity.

Robustness and Efficiency of Quasi-likelihood Estimates

Robustness to Model Misspecification

The central appeal of quasi-likelihood is its robustness. Because you never commit to a full distribution, there's less that can go wrong. Specifically:

If the mean function is correctly specified, the parameter estimates $\hat{\beta}$ remain consistent even if the variance function is wrong. You'll lose some efficiency, but the estimates still converge to the right values.
If the variance function is also correct, you get valid standard errors and test statistics directly from the model output.
If the variance function is wrong but the mean is right, sandwich standard errors still give you valid inference.

This layered protection makes quasi-likelihood attractive for count data where the true data-generating process might involve zero-inflation, clustering, or other complications that are hard to model explicitly.

Efficiency and Finite-sample Performance

Efficiency depends on how well your variance function matches reality:

With mild overdispersion ( $\phi$ slightly above 1), quasi-likelihood estimates are nearly as efficient as maximum likelihood estimates from a correctly specified model. The cost of not specifying the full distribution is minimal.
With severe overdispersion, quasi-likelihood can actually outperform maximum likelihood if the assumed distribution is wrong. A misspecified Poisson model, for instance, will produce misleadingly small standard errors, while quasi-Poisson correctly inflates them.

In finite samples (as opposed to the asymptotic theory), performance depends on sample size, the degree of overdispersion, and how well the chosen link and variance functions match the data. Simulation studies are the standard way to evaluate this for a specific application. Key things to vary in such studies include sample size, true $\phi$ , and the form of the variance function.

Sensitivity analysis is also good practice: fit the model with different variance functions (e.g., $V(\mu) = \mu$ vs. $V(\mu) = \mu^2$ ) and check whether your substantive conclusions change. If they're stable across reasonable choices, you can be more confident in your results.

2,589 studying →