Poisson regression models the relationship between predictors and the number of events occurring in a fixed interval. It uses a log link function to ensure predictions stay non-negative, which is exactly what you need for count data. This technique shows up constantly in epidemiology, quality control, traffic analysis, and anywhere else you're counting how often something happens.

Count Data and the Poisson Distribution

Properties of Count Data

Count data are non-negative integers representing how many times an event occurs within a fixed interval of time or space. Think of the number of cars passing through a toll booth per hour, or the number of defective items in a batch of manufactured products. These values are discrete: they can only be whole numbers (0, 1, 2, etc.), never fractions.

This matters for modeling because you can't use ordinary linear regression here. OLS regression could predict negative counts or fractional counts, neither of which makes sense. That's why we need a model built specifically for this type of data.

The Poisson Distribution

The Poisson distribution is a discrete probability distribution that gives the probability of a certain number of events occurring in a fixed interval, assuming events happen at a known constant rate and independently of each other.

It's characterized by a single parameter, $\lambda$ (lambda), which represents the average number of events per interval. The probability mass function is:

$P(X = k) = \frac{\lambda^k \cdot e^{-\lambda}}{k!}$

where $X$ is the random variable for the number of events, $k$ is a specific count value, $\lambda$ is the average rate, and $e \approx 2.71828$ .

A unique and important property: the mean and variance of the Poisson distribution are both equal to $\lambda$ . This equidispersion assumption will come back later when we discuss overdispersion.

The Poisson distribution is appropriate when:

Events are relatively rare within the interval
The rate of occurrence is constant across the interval
Events occur independently of one another

Examples include the number of earthquakes in a region per year or the number of mutations in a DNA sequence per generation.

Poisson Regression Models

Formulation of Poisson Regression Models

Poisson regression is a generalized linear model (GLM) used when the response variable is a count that follows (or approximately follows) a Poisson distribution. The model relates the mean of the response variable $\mu$ to predictors through a log link function:

$\log(\mu) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_p X_p$

where $\beta_0$ is the intercept and $\beta_1, \beta_2, \ldots, \beta_p$ are coefficients for the predictor variables $X_1, X_2, \ldots, X_p$ .

Why the log link? Because exponentiating both sides gives $\mu = e^{\beta_0 + \beta_1 X_1 + \cdots}$ , which is always positive. This guarantees predicted counts are non-negative, no matter what values the predictors take.

Estimation of Poisson Regression Models

The coefficients are estimated using maximum likelihood estimation (MLE), which finds the parameter values that make the observed data most probable.

Here's how the process works:

Write the likelihood function as the product of Poisson probabilities across all observations, given the predictors and coefficients.
Take the natural logarithm to get the log-likelihood function. This is mathematically equivalent (same maximum) but much easier to work with because products become sums.
Use an iterative numerical algorithm to find the coefficient values that maximize the log-likelihood. Common algorithms include Newton-Raphson and Fisher scoring.

Unlike OLS regression, there's no closed-form solution here. The software iterates until the estimates converge.

Interpreting Poisson Regression Coefficients

Properties of Count Data, Probability distribution - wikidoc

Interpretation in Terms of Log of Expected Count

Each coefficient $\beta_j$ represents the change in the log of the expected count for a one-unit increase in $X_j$ , holding all other predictors constant.

For example, if $\beta = 0.3$ , then increasing that predictor by one unit raises the log expected count by 0.3. This isn't very intuitive on its own, which is why we usually exponentiate.

Interpretation in Terms of Expected Count

Exponentiating the coefficient gives you the multiplicative effect on the expected count. This is the interpretation you'll use most often.

Binary predictor example: If $\beta = 0.5$ for a binary variable, then $e^{0.5} \approx 1.65$ . The group coded as 1 has an expected count 1.65 times that of the reference group, a 65% increase, holding other predictors constant.

Continuous predictor example: If $\beta = 0.1$ for a continuous variable, then $e^{0.1} \approx 1.11$ . Each one-unit increase in that predictor multiplies the expected count by 1.11, an 11% increase, holding other predictors constant.

The general pattern: $e^{\beta} > 1$ means the predictor increases the expected count, $e^{\beta} < 1$ means it decreases it, and $e^{\beta} = 1$ (i.e., $\beta = 0$ ) means no effect.

Poisson Regression Model Fit

Goodness-of-Fit Measures

Assessing fit means checking how well the model's predicted counts match the observed data and whether the model's assumptions hold.

Deviance statistic: Measures the discrepancy between observed and expected counts under the model. Smaller deviance = better fit.
Pearson chi-square statistic: Another measure of observed-vs-expected discrepancy. Like deviance, smaller values indicate better fit.
Likelihood ratio test: Compares nested Poisson models. The test statistic is the difference in deviances between the two models, and it follows a chi-square distribution with degrees of freedom equal to the difference in the number of parameters.

A quick rule of thumb: if the deviance (or Pearson chi-square) is roughly equal to the residual degrees of freedom, the model fits reasonably well.

Overdispersion and Alternative Models

Overdispersion occurs when the variance of the response exceeds its mean, violating the Poisson assumption that $\text{Var}(Y) = \mu$ .

To detect it, compare the deviance or Pearson chi-square to the residual degrees of freedom. If the ratio is substantially greater than 1, overdispersion is likely present. For instance, a deviance of 200 with 100 degrees of freedom gives a ratio of 2, suggesting the variance is roughly twice the mean.

Why does this matter? Overdispersion doesn't bias your coefficient estimates, but it makes the standard errors too small. That means confidence intervals are too narrow and p-values are too optimistic, so you'll find "significant" effects that aren't really there.

Two common alternatives when overdispersion is present:

Negative binomial regression: Adds an extra parameter that allows the variance to exceed the mean. This is a separate model with its own likelihood.
Quasi-Poisson regression: Estimates a dispersion parameter from the data and inflates the standard errors accordingly. It's not a full likelihood-based model but a practical adjustment.

Finally, residual diagnostics remain important. Plot Pearson residuals or deviance residuals against fitted values and predictor variables. Look for systematic patterns (which suggest a missing predictor or wrong functional form) and outliers (which may indicate influential observations or model violations).

2,589 studying →