Fiveable

🎳Intro to Econometrics Unit 11 Review

QR code for Intro to Econometrics practice questions

11.5 Count data models

11.5 Count data models

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
🎳Intro to Econometrics
Unit & Topic Study Guides

Count data models are essential tools in econometrics for analyzing dependent variables that take non-negative integer values. These models address a real problem: when your outcome variable is a count (0, 1, 2, 3...), ordinary linear regression can produce nonsensical predictions like negative counts or fractional values, and its standard error estimates will be wrong.

This topic covers the core count data models (Poisson and negative binomial regression), how to handle datasets with too many zeros (zero-inflated and hurdle models), and how to interpret and compare these models.

Count data models overview

Count data models are designed for dependent variables that represent counts: things like the number of doctor visits per year, traffic accidents at an intersection, or patent citations. These outcomes are discrete (whole numbers) and bounded below at zero, which violates key assumptions of OLS regression.

Why not just use OLS? Three reasons:

  • OLS assumes the dependent variable is continuous and can take any value, including negatives. Counts can't be negative.
  • OLS assumes constant variance (homoskedasticity), but with count data the variance typically increases with the mean.
  • OLS can generate predicted values below zero, which makes no sense for counts.

The main count data models you need to know are Poisson regression, negative binomial regression, zero-inflated models, and hurdle models. Each relaxes different assumptions to handle different features of real count data.

Poisson regression model

Poisson distribution assumptions

The Poisson regression model assumes the dependent variable follows a Poisson distribution, which is defined by a single parameter λ\lambda that represents both the mean and the variance. The key assumptions are:

  • Events occur independently of one another
  • Events occur at a constant average rate within a given interval (of time, space, etc.)
  • The distribution is characterized entirely by λ\lambda

Because λ\lambda controls everything, the Poisson distribution works best for relatively rare events where counts tend to be low.

Poisson probability mass function

The probability of observing a specific count yy is given by the probability mass function (PMF):

P(Y=yλ)=eλλyy!P(Y = y | \lambda) = \frac{e^{-\lambda} \lambda^y}{y!}

where y=0,1,2,...y = 0, 1, 2, ... and λ>0\lambda > 0.

To connect λ\lambda to your explanatory variables, Poisson regression uses a log-linear link function:

log(λ)=β0+β1x1+β2x2+...+βkxk\log(\lambda) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_k x_k

The log link ensures that λ\lambda is always positive (since ee raised to anything is positive), which makes sense because expected counts can't be negative.

Equidispersion property

A defining feature of the Poisson distribution is equidispersion: the mean equals the variance.

E(Y)=Var(Y)=λE(Y) = Var(Y) = \lambda

This is a strong assumption, and it's frequently violated in practice. Most real-world count data exhibit overdispersion, where the variance exceeds the mean. Less commonly, you'll see underdispersion (variance less than the mean).

When overdispersion is present but you use Poisson regression anyway, the coefficient estimates themselves remain consistent, but the standard errors will be too small. That means confidence intervals are too narrow and p-values are too low, leading you to find "significant" effects that may not actually be significant.

Maximum likelihood estimation

Poisson regression is estimated using maximum likelihood estimation (MLE). MLE finds the parameter values that make the observed data most probable.

The log-likelihood function is:

logL(β)=i=1n[yilog(λi)λilog(yi!)]\log L(\beta) = \sum_{i=1}^{n} \left[ y_i \log(\lambda_i) - \lambda_i - \log(y_i!) \right]

where λi=exp(β0+β1x1i+...+βkxki)\lambda_i = \exp(\beta_0 + \beta_1 x_{1i} + ... + \beta_k x_{ki}) for each observation ii. Software maximizes this function numerically to find the β\beta estimates.

Negative binomial regression

Negative binomial distribution

The negative binomial (NB) distribution is the go-to alternative when count data show overdispersion. It adds a dispersion parameter α\alpha that allows the variance to differ from the mean:

Var(Y)=μ+αμ2Var(Y) = \mu + \alpha \mu^2

When α=0\alpha = 0, the variance equals the mean and the NB distribution collapses to the Poisson. As α\alpha increases, the distribution allows for greater overdispersion.

Overdispersion vs equidispersion

Overdispersion means the observed variance in your count data is larger than the mean. This is extremely common in practice and can arise from:

  • Unobserved heterogeneity: individuals differ in ways your model doesn't capture, creating extra variation
  • Clustering: events are not truly independent (e.g., accidents tend to cluster in bad weather)
  • Excess zeros: more zeros than a Poisson distribution predicts

If you ignore overdispersion and stick with Poisson regression, your standard errors will be biased downward. You'll get overly narrow confidence intervals and reject null hypotheses too often.

Poisson-gamma mixture

The negative binomial distribution has an elegant theoretical foundation: it can be derived as a Poisson-gamma mixture. The idea is that each observation has its own Poisson rate λi\lambda_i, but these rates vary across observations according to a gamma distribution.

This captures unobserved heterogeneity. Different individuals have different underlying rates of experiencing the event, and the gamma distribution models that variation. The result is a distribution with two parameters: the mean μ\mu and the dispersion parameter α\alpha.

Negative binomial probability mass function

In its traditional form, the NB PMF is:

P(Y=yr,p)=(y+r1y)pr(1p)yP(Y = y | r, p) = \binom{y + r - 1}{y} p^r (1-p)^y

For regression purposes, this is reparameterized using the mean μ\mu and dispersion α\alpha:

P(Y=yμ,α)=Γ(y+1α)Γ(y+1)Γ(1α)(11+αμ)1α(αμ1+αμ)yP(Y = y | \mu, \alpha) = \frac{\Gamma\left(y + \frac{1}{\alpha}\right)}{\Gamma(y+1)\,\Gamma\left(\frac{1}{\alpha}\right)} \left(\frac{1}{1 + \alpha\mu}\right)^{\frac{1}{\alpha}} \left(\frac{\alpha\mu}{1 + \alpha\mu}\right)^y

You won't typically compute this by hand, but understanding the role of α\alpha is important: it's what gives the NB model its flexibility over Poisson.

Poisson distribution assumptions, Poisson distribution - Wikipedia

Zero-inflated models

Zero-inflated Poisson (ZIP) model

The zero-inflated Poisson (ZIP) model is designed for count data with more zeros than a standard Poisson distribution can explain. It assumes zeros come from two distinct sources:

  1. A binary process (typically logistic regression) that determines whether an observation is a "certain zero" or a "potential counter"
  2. A Poisson process that generates counts for the "potential counters" (who can still get a zero by chance)

For example, in modeling doctor visits, some people never go to the doctor regardless of circumstances (structural zeros), while others might go but happened not to during the study period (sampling zeros from the Poisson process).

Zero-inflated negative binomial (ZINB) model

The ZINB model replaces the Poisson component with a negative binomial distribution. This handles both excess zeros and overdispersion among the positive counts. The structure is the same two-part setup:

  • A binary model for the probability of being a structural zero
  • A negative binomial model for the count process

The ZINB is the most flexible of the standard count models, but that flexibility comes at a cost: more parameters to estimate, which requires larger sample sizes.

Excess zeros in count data

Excess zeros occur when you observe more zeros than your count distribution predicts. These zeros can come from two conceptually different sources:

  • Structural zeros (true zeros): the event is impossible for certain observations. A non-driver will never have a traffic accident, no matter how long you observe them.
  • Sampling zeros (chance zeros): the event is possible but didn't happen during the observation period. A careful driver might have zero accidents this year but could have one next year.

Distinguishing between these two types of zeros is exactly what zero-inflated models are built to do.

Vuong test for model selection

The Vuong test helps you decide whether a zero-inflated model fits significantly better than its standard counterpart (e.g., ZIP vs. Poisson, or ZINB vs. NB).

  • It's a likelihood ratio-based test for comparing non-nested models
  • The test statistic is based on the observation-level differences in log-likelihoods between the two models
  • A significant positive value favors the zero-inflated model; a significant negative value favors the standard model; a non-significant value means neither model is clearly better

Hurdle models

Hurdle Poisson model

The hurdle Poisson model takes a different approach to excess zeros than zero-inflated models. Instead of assuming two types of zeros, it treats all zeros as coming from one process and all positive counts from another.

The model has two parts:

  1. A binary model (usually logistic) for whether the count is zero or positive (whether the "hurdle" is crossed)
  2. A truncated Poisson model for the positive counts (1, 2, 3, ...)

Different covariates can affect each part. For instance, whether someone visits a doctor at all might depend on insurance status, while how many times they visit might depend on health conditions.

Hurdle negative binomial model

The hurdle negative binomial model swaps in a truncated negative binomial for the count component, allowing for overdispersion among positive counts. The structure is identical:

  • Binary model for zero vs. positive
  • Truncated negative binomial for positive counts

This is the most flexible hurdle specification and is useful when positive counts show more variance than a truncated Poisson can handle.

Two-part model structure

Hurdle models are sometimes called two-part models because they cleanly separate two decisions:

  1. Does the event happen at all? (Binary part)
  2. If yes, how many times? (Truncated count part)

The binary part is estimated with logistic regression. The truncated count part is estimated with MLE on the subset of positive observations. Each part can have its own set of covariates and parameters.

Hurdle vs. zero-inflated: The key conceptual difference is that hurdle models treat all zeros the same (one process generates them), while zero-inflated models distinguish between structural zeros and sampling zeros (two processes can generate them).

Truncated count distributions

Truncated distributions are regular count distributions conditioned on Y>0Y > 0. They redistribute the probability mass that would have been assigned to zero across the positive integers.

The truncated Poisson PMF:

P(Y=yY>0,λ)=λy(eλ1)y!,y=1,2,...P(Y = y \mid Y > 0, \lambda) = \frac{\lambda^y}{(e^{\lambda} - 1)\, y!}, \quad y = 1, 2, ...

The truncated negative binomial PMF:

P(Y=yY>0,r,p)=(y+r1y)pr(1p)y1(1p)r,y=1,2,...P(Y = y \mid Y > 0, r, p) = \frac{\binom{y+r-1}{y} p^r (1-p)^y}{1 - (1-p)^r}, \quad y = 1, 2, ...

Notice the denominators adjust for removing zero from the support of the distribution.

Poisson distribution assumptions, Probability distribution - wikidoc

Model interpretation

Incidence rate ratios (IRRs)

The most intuitive way to interpret coefficients in Poisson and negative binomial regression is through incidence rate ratios (IRRs):

IRR=eβIRR = e^{\beta}

An IRR represents the multiplicative change in the expected count for a one-unit increase in the independent variable, holding everything else constant.

  • IRR=1.25IRR = 1.25: the expected count increases by 25%
  • IRR=0.80IRR = 0.80: the expected count decreases by 20%
  • IRR=1.00IRR = 1.00: no effect

For example, if you're modeling doctor visits and the IRR for age is 1.03, each additional year of age is associated with a 3% increase in expected visits.

Marginal effects calculation

Marginal effects give you the change in the expected count (in the original units) for a one-unit change in an independent variable. For Poisson and NB regression:

E(Yx)xk=βkλ\frac{\partial E(Y \mid \mathbf{x})}{\partial x_k} = \beta_k \cdot \lambda

Because λ\lambda varies across observations, marginal effects are typically reported at the mean of the covariates (marginal effects at the mean) or averaged across all observations (average marginal effects).

For zero-inflated and hurdle models, marginal effects are more complex because they involve both the binary and count components. Software packages handle this computation, but you should understand that the total marginal effect combines the effect on the probability of a positive count and the effect on the count itself.

Predicted probabilities

Predicted probabilities tell you the likelihood of observing a specific count value (0, 1, 2, ...) given the covariate values. For Poisson and NB models, you plug the fitted λ^\hat{\lambda} into the PMF.

For zero-inflated and hurdle models, predicted probabilities combine both model components. For instance, in a ZIP model, the predicted probability of zero is:

P(Y=0)=π+(1π)eλP(Y = 0) = \pi + (1 - \pi) \cdot e^{-\lambda}

where π\pi is the probability of being a structural zero. Comparing predicted vs. observed count distributions is a useful way to assess whether your model captures the data's features.

Model fit assessment

Choosing among count models requires comparing their fit to the data. The main tools are:

  • AIC and BIC: Lower values indicate better fit, with a penalty for model complexity. BIC penalizes extra parameters more heavily than AIC.
  • Deviance statistics: Measure the discrepancy between the fitted model and a saturated model.
  • Residual analysis: Pearson and deviance residuals help identify outliers and check model assumptions. If residuals show systematic patterns, the model may be misspecified.
  • Likelihood ratio tests: Compare nested models (e.g., Poisson vs. negative binomial, since Poisson is a special case of NB when α=0\alpha = 0).
  • Vuong test: Compare non-nested models (e.g., standard vs. zero-inflated).

A practical approach: start with Poisson, test for overdispersion, move to NB if needed, then check whether a zero-inflated or hurdle specification improves the fit.

Applications of count models

Healthcare utilization data

Healthcare data frequently involve counts: number of doctor visits, hospital admissions, or prescriptions filled. These datasets almost always feature excess zeros (many people don't visit a doctor in a given period) and overdispersion (a small group of patients accounts for a disproportionate share of visits).

Zero-inflated and hurdle models are particularly well-suited here because the zeros have a natural two-part interpretation: some people are healthy and unlikely to seek care (structural zeros), while others could seek care but didn't during the observation window (sampling zeros).

Accident frequency analysis

Transportation safety research uses count models to study crash frequency at intersections, road segments, or regions. Negative binomial regression is the standard in this field because crash data are almost always overdispersed.

Excess zeros can arise from underreporting (minor crashes go unreported) or from genuinely safe locations. Zero-inflated NB models can separate these effects.

Patent citation counts

Innovation researchers model how often a patent is cited by subsequent patents as a measure of its importance. Citation counts tend to be heavily right-skewed with many zeros (patents that are never cited). Negative binomial regression handles the overdispersion, while zero-inflated variants can account for low-quality or highly specialized patents that are structurally unlikely to receive citations.

Rare event modeling

Count models also apply to rare events: occurrences of rare diseases, extreme weather events, or industrial accidents. The Poisson distribution's original motivation was modeling rare events (Ladislaus Bortkiewicz famously used it to model deaths from horse kicks in the Prussian army).

For rare events, the key challenge is that most observations will be zero, with occasional small counts. Zero-inflated models help distinguish between locations or time periods where the event is impossible versus those where it's possible but simply didn't occur.