Fiveable

🥖Linear Modeling Theory Unit 13 Review

QR code for Linear Modeling Theory practice questions

13.1 Exponential Family of Distributions

13.1 Exponential Family of Distributions

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
🥖Linear Modeling Theory
Unit & Topic Study Guides

Exponential family of distributions

Definition and properties

The exponential family is a broad class of probability distributions that share a common mathematical form. This shared structure is what makes GLMs possible: because these distributions all follow the same template, you can build a single modeling framework that handles normal, binomial, Poisson, and many other response types.

A distribution belongs to the exponential family if its density (or mass) function can be written as:

f(x;θ)=h(x)exp(η(θ)T(x)A(θ))f(x; \theta) = h(x) \exp\bigl(\eta(\theta)\, T(x) - A(\theta)\bigr)

Each piece of this formula has a specific role:

  • η(θ)\eta(\theta) is the natural (canonical) parameter, a reparameterization of the original distribution parameter(s) that puts the density into this standard form.
  • T(x)T(x) is the sufficient statistic, a function of the data that captures all the information the data contain about θ\theta.
  • h(x)h(x) is the base measure, a term that depends only on the data and acts as a normalizing weight.
  • A(θ)A(\theta) is the log-partition function (also called the cumulant function). It ensures the density integrates (or sums) to 1, and it turns out to be the key to deriving moments.

Two properties worth highlighting:

  • Sufficiency. Because T(x)T(x) is sufficient, you don't lose any information about θ\theta by reducing your entire dataset to T(x)T(x). This connects directly to the factorization theorem: a statistic is sufficient if and only if the joint density factors into one part that depends on the data only through that statistic and another part that depends only on the data.
  • Moments from derivatives. The mean and variance of T(X)T(X) can be read off from derivatives of A(θ)A(\theta), with no integration required. (More on this below.)

The family includes both discrete and continuous distributions. Which specific distribution you get depends on the choices of η\eta, TT, hh, and AA.

Versatility and applicability

Many of the distributions you already know belong to this family:

  • Normal (Gaussian)
  • Binomial
  • Poisson
  • Gamma
  • Beta, Exponential, Geometric, Negative Binomial

Because they all share the same canonical form, a single estimation and inference pipeline (maximum likelihood, score equations, Fisher information) works across all of them. That's exactly the idea behind GLMs.

Common distributions in the exponential family

For each distribution below, the components refer to the canonical form f(x;θ)=h(x)exp(ηT(x)A(θ))f(x;\theta) = h(x)\exp\bigl(\eta\, T(x) - A(\theta)\bigr). Working through these mappings is the best way to build intuition for the general formula.

Normal (Gaussian) distribution

The normal distribution has two natural parameters because it has two unknown quantities (μ\mu and σ2\sigma^2):

  • Natural parameter: θ=(μ/σ2,  1/(2σ2))\theta = \bigl(\mu/\sigma^2,\; -1/(2\sigma^2)\bigr)
  • Sufficient statistic: T(x)=(x,  x2)T(x) = (x,\; x^2)
  • Base measure: h(x)=(2π)1/2h(x) = (2\pi)^{-1/2}
  • Log-partition function: A(θ)=θ124θ212log(2θ2)A(\theta) = -\dfrac{\theta_1^2}{4\theta_2} - \dfrac{1}{2}\log(-2\theta_2)

Notice how the sufficient statistic is a vector here. For a sample of size nn, you only need xi\sum x_i and xi2\sum x_i^2 to estimate both parameters.

Definition and properties, Exponential distribution - Wikipedia

Binomial and Poisson distributions

Binomial (with nn fixed and success probability pp):

  • Natural parameter: θ=log ⁣(p/(1p))\theta = \log\!\bigl(p/(1-p)\bigr) (the log-odds)
  • Sufficient statistic: T(x)=xT(x) = x
  • Base measure: h(x)=(nx)h(x) = \binom{n}{x}
  • Log-partition function: A(θ)=nlog(1+eθ)A(\theta) = n\log(1 + e^\theta)

The natural parameter here is the logit of pp. This is why logistic regression uses a logit link: it connects the linear predictor directly to the canonical parameter.

Poisson (with rate λ\lambda):

  • Natural parameter: θ=log(λ)\theta = \log(\lambda)
  • Sufficient statistic: T(x)=xT(x) = x
  • Base measure: h(x)=1/x!h(x) = 1/x!
  • Log-partition function: A(θ)=eθA(\theta) = e^\theta

Similarly, the natural parameter is log(λ)\log(\lambda), which is why Poisson regression defaults to a log link.

Gamma and other distributions

Gamma (with shape α\alpha and rate β\beta):

  • Natural parameter: θ=(α1,  β)\theta = (\alpha - 1,\; -\beta)
  • Sufficient statistic: T(x)=(logx,  x)T(x) = (\log x,\; x)
  • Base measure: h(x)=1h(x) = 1
  • Log-partition function: A(θ)=αlog(θ2)+log ⁣(Γ(α))A(\theta) = -\alpha\log(-\theta_2) + \log\!\bigl(\Gamma(\alpha)\bigr)

Other exponential-family members (Beta, Exponential, Geometric, Negative Binomial) each have their own specific mappings for η\eta, TT, hh, and AA. The procedure for deriving them is always the same: start from the standard density, algebraically rearrange it into the canonical form, and read off the components.

Natural parameters and sufficient statistics

Role in exponential family distributions

Natural parameters are not just an arbitrary reparameterization. They're chosen so that the distribution takes the clean canonical form shown above. In that form, the natural parameter and the sufficient statistic appear together as a dot product η(θ)T(x)\eta(\theta)\, T(x) in the exponent. This pairing is what gives the exponential family its analytical tractability.

Sufficient statistics compress the data without losing information about θ\theta. For example, if you have nn observations from a Poisson distribution, the single number xi\sum x_i is sufficient for λ\lambda. You could throw away the individual observations and still estimate λ\lambda just as well.

Definition and properties, Exponential distribution - Wikipedia

Relationship between natural parameters and sufficient statistics

The natural parameter and sufficient statistic are tightly coupled:

  • They always appear multiplied together in the exponent of the canonical form.
  • Changing the natural parameter changes which member of the exponential family you're working with; the sufficient statistic tells you what function of the data is relevant for that parameter.

For the normal distribution, the natural parameters are functions of μ\mu and σ2\sigma^2, while the sufficient statistics are xi\sum x_i and xi2\sum x_i^2. For the Poisson, the natural parameter is logλ\log \lambda and the sufficient statistic is simply xi\sum x_i.

Importance in inference and modeling

These properties make estimation straightforward:

  • Maximum likelihood estimation: The MLE for exponential family distributions reduces to matching the expected sufficient statistic to the observed sufficient statistic. That is, you solve Eθ^[T(X)]=Tobs\mathbb{E}_{\hat{\theta}}[T(X)] = T_{\text{obs}}.
  • Bayesian inference: Conjugate priors exist naturally for exponential family likelihoods, which simplifies posterior computation.

This is a big part of why GLMs work so well in practice. The exponential family structure guarantees that score equations are well-behaved and that iterative fitting algorithms (like IRLS) converge reliably.

Mean and variance of exponential family distributions

Deriving the mean

One of the most useful results: the mean of the sufficient statistic equals the first derivative of the log-partition function with respect to the natural parameter.

E[T(X)]=A(θ)θ\mathbb{E}[T(X)] = \frac{\partial A(\theta)}{\partial \theta}

To see this in action:

  • Poisson: A(θ)=eθA(\theta) = e^\theta, so E[X]=θeθ=eθ=λ\mathbb{E}[X] = \frac{\partial}{\partial \theta} e^\theta = e^\theta = \lambda. You recover the rate parameter directly.
  • Normal: E[X]=Aθ1=θ12θ2=μ\mathbb{E}[X] = \frac{\partial A}{\partial \theta_1} = -\frac{\theta_1}{2\theta_2} = \mu. You recover the location parameter.

Deriving the variance

Take one more derivative and you get the variance:

Var[T(X)]=2A(θ)θ2\text{Var}[T(X)] = \frac{\partial^2 A(\theta)}{\partial \theta^2}

  • Poisson: Var[X]=2θ2eθ=eθ=λ\text{Var}[X] = \frac{\partial^2}{\partial \theta^2} e^\theta = e^\theta = \lambda. The variance equals the mean, which is the defining equidispersion property of the Poisson.
  • Normal: Var[X]=2Aθ12=12θ2=σ2\text{Var}[X] = \frac{\partial^2 A}{\partial \theta_1^2} = -\frac{1}{2\theta_2} = \sigma^2.

Because A(θ)A(\theta) is always convex (its second derivative is a variance, which is non-negative), this also guarantees that the variance is non-negative for any member of the family.

Power of the exponential family representation

The log-partition function A(θ)A(\theta) acts as a moment-generating device. Differentiating it once gives the mean; differentiating it twice gives the variance. Higher cumulants follow from higher derivatives.

This eliminates the need for explicit integration or summation to compute moments. For GLM theory specifically, the relationship Var[T(X)]=A(θ)\text{Var}[T(X)] = A''(\theta) is what defines the variance function, which in turn determines how the variance of the response relates to its mean. That connection is central to how GLMs handle non-constant variance across different distribution families.