Goodness-of-fit measures help you evaluate how well a GLM explains your data and makes predictions. In logistic and Poisson regression, you can't just look at residual sum of squares like in OLS. Instead, you rely on deviance, likelihood ratio tests, pseudo-R-squared values, ROC curves, and diagnostics for overdispersion to judge model quality and guide model selection.
Deviance for Model Fit
Measuring Lack of Fit
Deviance measures how far your fitted model is from a perfect fit. It plays the same role in GLMs that residual sum of squares plays in ordinary linear regression. Smaller deviance means better fit.
Formally, deviance equals times the log-likelihood ratio between your fitted model and a saturated model (a model with one parameter per observation that fits the data exactly):
Because the saturated model's log-likelihood is the best you can achieve, deviance is always non-negative.
Comparing Null and Residual Deviances
Two deviance values appear in standard GLM output:
- Null deviance: the deviance of the intercept-only model (no predictors). This is your baseline.
- Residual deviance: the deviance of your proposed model with predictors included.
Comparing these two tells you how much explanatory power your predictors add:
- A large drop (e.g., null deviance of 200 down to residual deviance of 80) means the predictors are capturing a lot of the variation in the response.
- A small drop (e.g., 200 down to 190) suggests the predictors aren't contributing much beyond the intercept.
The difference between null and residual deviance is itself a test statistic, which connects directly to the likelihood ratio test below.
Likelihood Ratio Test for Models
Comparing Nested Models
The likelihood ratio test (LRT) compares two nested models, where the simpler (reduced) model is a special case of the more complex (full) model. "Nested" means the reduced model's parameters are a subset of the full model's parameters.
Here's how it works:
-
Fit both the reduced and full models.
-
Compute the LRT statistic as the difference in their deviances:
-
Under the null hypothesis (that the extra parameters in the full model are all zero), follows a chi-squared distribution:
-
The degrees of freedom equal the difference in the number of parameters between the two models.
-
If the p-value is small (e.g., below 0.05), reject the null and conclude the full model fits significantly better.
Applications of LRT
- Variable selection: Test whether adding a predictor (or group of predictors) significantly improves fit.
- Interaction terms: Compare a model with and without an interaction to see if it's warranted.
- Overall model significance: Compare your full model against the intercept-only model. This is equivalent to comparing null deviance to residual deviance.
- Stepwise model building: Sequentially add or remove predictors, using the LRT at each step to decide whether the change is justified.
For example, if you add three predictors to a logistic regression and the deviance drops by 15.2 on 3 degrees of freedom, you'd compare that to . The p-value of 0.0016 would indicate those predictors significantly improve fit.
Predictive Accuracy of GLMs

Pseudo-R-Squared Measures
In linear regression, has a clean interpretation as the proportion of variance explained. GLMs don't have an exact equivalent, but several pseudo-R-squared measures approximate the idea:
- McFadden's pseudo-R-squared: , where denotes log-likelihood. Values between 0.2 and 0.4 are often considered a good fit (the scale doesn't map directly onto OLS ).
- Cox and Snell's: Based on the likelihood ratio but bounded below 1, so it can never actually reach 1.0.
- Nagelkerke's: A rescaled version of Cox and Snell's that can reach 1.0, making it easier to interpret.
A McFadden's value of 0.3 means the fitted model's log-likelihood is 30% closer to the saturated model than the null model's log-likelihood. That's decent, but don't compare pseudo-R-squared values across different types (McFadden vs. Nagelkerke) since they're on different scales. Use them alongside other diagnostics, not as a standalone verdict.
ROC Curves and AUC
For binary response models (logistic regression), ROC curves visualize how well the model discriminates between the two outcome classes.
- The x-axis plots the false positive rate (): the proportion of actual negatives incorrectly classified as positive.
- The y-axis plots the true positive rate (sensitivity): the proportion of actual positives correctly classified.
- Each point on the curve corresponds to a different classification threshold (the cutoff probability above which you predict "positive").
The area under the ROC curve (AUC) summarizes discriminatory power in a single number:
| AUC Range | Interpretation |
|---|---|
| 0.9–1.0 | Excellent discrimination |
| 0.8–0.9 | Good discrimination |
| 0.7–0.8 | Acceptable |
| 0.5–0.6 | Barely better than random guessing |
| 0.5 | No discrimination (random) |
An AUC of 0.5 means the model is no better than flipping a coin. An AUC of 0.85 means that if you randomly pick one positive and one negative case, the model assigns a higher predicted probability to the positive case 85% of the time.
Cross-Validation Techniques
Cross-validation estimates how well your model will perform on new, unseen data. This is important because a model can fit the training data well but generalize poorly (overfitting).
k-fold cross-validation works as follows:
- Split the data randomly into equally sized subsets (folds).
- For each fold, train the model on the other folds and evaluate predictions on the held-out fold.
- Repeat for all folds.
- Average the performance metric (e.g., deviance, AUC, classification accuracy) across all folds.
Common choices are or . Leave-one-out cross-validation (LOOCV) is the special case where equals the number of observations, so each observation serves as its own test set once. LOOCV has low bias but can be computationally expensive and have high variance.
If your cross-validated performance is much worse than your training performance, that's a sign of overfitting.
Overdispersion in GLMs
Understanding Overdispersion
Overdispersion occurs when the observed variance in the response exceeds what the model assumes. This is especially relevant for Poisson regression, where the model assumes the variance equals the mean (). In practice, the observed variance is often larger than .
Why it matters: overdispersion doesn't bias your coefficient estimates, but it does cause standard errors to be too small. That means confidence intervals are too narrow, test statistics are inflated, and you'll find "significant" predictors that aren't truly significant. Your inferences become unreliable.
Diagnosing Overdispersion
Visual diagnostics:
- On a residuals vs. fitted values plot, overdispersion often shows up as a fan-shaped pattern where residual spread increases with fitted values.
- A scale-location plot ( vs. fitted values) makes non-constant variance easier to spot: you'll see an upward trend instead of a flat band.
Formal tests:
- Dispersion parameter check: Divide the residual deviance (or Pearson chi-square statistic) by the residual degrees of freedom. For a well-fitting Poisson model, this ratio should be close to 1. A value noticeably greater than 1 (say, 2.5 or higher) signals overdispersion.
- Pearson chi-square test: Compares observed and expected frequencies. A significant result suggests the Poisson variance assumption is violated.
Addressing Overdispersion
If you detect overdispersion, you have several options:
- Quasi-Poisson model: Introduces a dispersion parameter that scales the variance as . The coefficient estimates stay the same as standard Poisson, but the standard errors are multiplied by , giving you more honest inference.
- Negative binomial regression: Models the variance as , where is an overdispersion parameter estimated from the data. This is a full likelihood-based model (unlike quasi-Poisson) and works well when overdispersion is substantial.
- Adding predictors or interactions: Sometimes overdispersion is caused by omitted variables. Including relevant covariates can capture the excess variability and bring the dispersion closer to what the model expects.
For example, in a study of doctor visit counts, a Poisson model might show a dispersion ratio of 3.1, indicating serious overdispersion. Switching to negative binomial regression, or adding covariates like age and number of chronic conditions, could reduce the excess variability and produce more trustworthy results.