Fiveable

📊Actuarial Mathematics Unit 1 Review

QR code for Actuarial Mathematics practice questions

1.8 Moment generating functions and transformations

1.8 Moment generating functions and transformations

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
📊Actuarial Mathematics
Unit & Topic Study Guides

Definition of moment generating functions

The moment generating function (MGF) of a random variable uniquely characterizes its probability distribution. If two random variables share the same MGF, they follow the same distribution. This makes MGFs one of the most useful identification tools in actuarial probability.

The MGF of a random variable XX is defined as:

MX(t)=E[etX]M_X(t) = E[e^{tX}]

where tt is a real number. For a continuous random variable, this expands to MX(t)=etxfX(x)dxM_X(t) = \int_{-\infty}^{\infty} e^{tx} f_X(x) \, dx, and for a discrete random variable, MX(t)=xetxpX(x)M_X(t) = \sum_x e^{tx} p_X(x).

The MGF exists if E[etX]E[e^{tX}] is finite for all tt in some open interval around zero (i.e., for t(h,h)t \in (-h, h) where h>0h > 0).

Laplace transforms

Laplace transforms are closely related to MGFs. The Laplace transform of a function f(t)f(t) is defined as:

F(s)=0estf(t)dtF(s) = \int_0^{\infty} e^{-st} f(t) \, dt

where ss is a complex number. Notice the structural similarity: for a non-negative random variable XX with density fX(x)f_X(x), the MGF evaluated at s-s equals the Laplace transform of the density. Laplace transforms appear in actuarial work when solving differential equations in ruin theory and analyzing aggregate claim distributions.

Existence of MGFs

Not every distribution has an MGF. For the MGF to exist, E[etX]E[e^{tX}] must be finite in a neighborhood of t=0t = 0. Distributions with heavy tails can violate this condition.

  • The Cauchy distribution has no MGF because E[etX]E[e^{tX}] is infinite for every t0t \neq 0.
  • The lognormal distribution also lacks an MGF, despite having finite moments of all orders.
  • When an MGF doesn't exist, you can use the characteristic function instead (covered at the end of this guide).

Properties of moment generating functions

MGFs have several properties that make distribution analysis much more tractable, especially when working with sums of independent random variables.

Uniqueness property

If two random variables XX and YY have MGFs that are equal in a neighborhood of zero, i.e., MX(t)=MY(t)M_X(t) = M_Y(t) for all t(h,h)t \in (-h, h), then XX and YY have the same distribution.

This is the property you'll use most often on exam problems: compute an MGF, recognize it as belonging to a known distribution, and conclude the random variable follows that distribution.

MGF of linear transformations

For a linear transformation Y=aX+bY = aX + b, the MGF is:

MY(t)=E[et(aX+b)]=ebtMX(at)M_Y(t) = E[e^{t(aX+b)}] = e^{bt} \cdot M_X(at)

This follows directly from the definition. The constant bb contributes a factor of ebte^{bt}, while the scaling constant aa replaces tt with atat in the original MGF.

MGF of linear combination of independent random variables

If XX and YY are independent random variables, then for Z=aX+bYZ = aX + bY:

MZ(t)=MX(at)MY(bt)M_Z(t) = M_X(at) \cdot M_Y(bt)

Independence is critical here. Without it, you can't factor the joint expectation E[et(aX+bY)]E[e^{t(aX+bY)}] into the product of individual expectations. This extends naturally to nn independent random variables:

Ma1X1++anXn(t)=i=1nMXi(ait)M_{a_1X_1 + \cdots + a_nX_n}(t) = \prod_{i=1}^{n} M_{X_i}(a_i t)

Moments and moment generating functions

The name "moment generating function" comes from the fact that you can extract every moment of a distribution by differentiating the MGF.

Relationship between moments and MGFs

The nn-th moment of XX is obtained by taking the nn-th derivative of the MGF and evaluating at t=0t = 0:

E[Xn]=MX(n)(0)E[X^n] = M_X^{(n)}(0)

Why does this work? Expand etXe^{tX} as a Taylor series:

MX(t)=E[etX]=E[n=0(tX)nn!]=n=0E[Xn]n!tnM_X(t) = E[e^{tX}] = E\left[\sum_{n=0}^{\infty} \frac{(tX)^n}{n!}\right] = \sum_{n=0}^{\infty} \frac{E[X^n]}{n!} t^n

Each coefficient of tnt^n contains E[Xn]/n!E[X^n]/n!, so differentiating nn times and setting t=0t = 0 isolates E[Xn]E[X^n].

Deriving moments from MGFs

Here's the step-by-step process:

  1. Write down the MGF MX(t)M_X(t).

  2. Differentiate with respect to tt once. Evaluate at t=0t = 0 to get the mean: E[X]=MX(0)E[X] = M_X'(0).

  3. Differentiate again. Evaluate at t=0t = 0 to get the second raw moment: E[X2]=MX(0)E[X^2] = M_X''(0).

  4. Compute the variance using Var(X)=E[X2](E[X])2=MX(0)[MX(0)]2\text{Var}(X) = E[X^2] - (E[X])^2 = M_X''(0) - [M_X'(0)]^2.

  5. Continue differentiating for higher moments as needed.

Central moments vs raw moments

  • Raw moments are computed about the origin: E[Xn]E[X^n]
  • Central moments are computed about the mean: E[(Xμ)n]E[(X - \mu)^n]

The first central moment is always zero. The second central moment is the variance. The third and fourth central moments (often standardized) give skewness and kurtosis, which describe the asymmetry and tail weight of the distribution.

You can convert between raw and central moments. For example, the variance equals the second raw moment minus the square of the first: Var(X)=E[X2]μ2\text{Var}(X) = E[X^2] - \mu^2.

Laplace transforms, differential equations - Solving 2nd Order ODE w/Laplace Transforms + Heaviside - Mathematics ...

Transformations using moment generating functions

MGFs are especially handy for finding the distribution of sums and differences of independent random variables, since multiplication of MGFs is much simpler than convolving densities.

MGF of sum of independent random variables

If XX and YY are independent:

MX+Y(t)=MX(t)MY(t)M_{X+Y}(t) = M_X(t) \cdot M_Y(t)

This is the special case of the linear combination formula with a=b=1a = b = 1. It extends to any finite number of independent summands:

MX1+X2++Xn(t)=i=1nMXi(t)M_{X_1 + X_2 + \cdots + X_n}(t) = \prod_{i=1}^{n} M_{X_i}(t)

This property is what makes MGFs so powerful for proving closure properties (e.g., the sum of independent normals is normal, the sum of independent Poissons is Poisson).

MGF of difference of random variables

For independent XX and YY, write XY=X+(Y)X - Y = X + (-Y). The MGF of Y-Y is MY(t)M_Y(-t), so:

MXY(t)=MX(t)MY(t)M_{X-Y}(t) = M_X(t) \cdot M_Y(-t)

MGF of product of independent random variables

There is no general product rule analogous to the sum rule. The MGF of XYXY cannot typically be expressed in terms of MX(t)M_X(t) and MY(t)M_Y(t) alone. For products, you usually need to work directly from the definition or use other techniques (such as the distribution of the product derived from the joint density).

Applications of moment generating functions

Determining distributions from MGFs

The standard exam technique:

  1. Compute the MGF of the random variable in question (often a sum of independent random variables).
  2. Simplify the expression.
  3. Compare the result to the table of known MGFs.
  4. By the uniqueness property, identify the distribution.

Example: Suppose X1,X2,,XnX_1, X_2, \ldots, X_n are i.i.d. Exponential with rate λ\lambda. The MGF of each is MXi(t)=λλtM_{X_i}(t) = \frac{\lambda}{\lambda - t}. The MGF of the sum S=X1++XnS = X_1 + \cdots + X_n is:

MS(t)=(λλt)nM_S(t) = \left(\frac{\lambda}{\lambda - t}\right)^n

This is the MGF of a Gamma distribution with shape nn and rate λ\lambda. So SGamma(n,λ)S \sim \text{Gamma}(n, \lambda).

Deriving probability distributions

You can recover a PDF or PMF from an MGF by expanding it as a Taylor series around t=0t = 0 and reading off the moments, then matching to a known distribution family. In practice, direct recognition (as above) is far more common on exams than full inversion.

Calculating probabilities using MGFs

MGFs are not typically used to compute individual probabilities like P(Xx)P(X \leq x) directly. For that, you'd use the CDF. However, once you've identified the distribution via its MGF, you can use the known CDF or probability tables for that distribution to find probabilities.

Common moment generating functions

The following table summarizes MGFs you should have memorized:

DistributionParametersMGF MX(t)M_X(t)Domain of tt
Bernoullipp1p+pet1 - p + pe^tall tt
Binomialn,pn, p(pet+1p)n(pe^t + 1 - p)^nall tt
Poissonλ\lambdaeλ(et1)e^{\lambda(e^t - 1)}all tt
Geometricpppet1(1p)et\frac{pe^t}{1 - (1-p)e^t}t<ln(1p)t < -\ln(1-p)
Exponentialλ\lambdaλλt\frac{\lambda}{\lambda - t}t<λt < \lambda
Gammaα,β\alpha, \beta(ββt)α\left(\frac{\beta}{\beta - t}\right)^\alphat<βt < \beta
Normalμ,σ2\mu, \sigma^2eμt+12σ2t2e^{\mu t + \frac{1}{2}\sigma^2 t^2}all tt

MGF of normal distribution

MX(t)=eμt+12σ2t2M_X(t) = e^{\mu t + \frac{1}{2}\sigma^2 t^2}

This exists for all real tt. For the standard normal (μ=0,σ2=1\mu = 0, \sigma^2 = 1), it simplifies to MX(t)=et2/2M_X(t) = e^{t^2/2}. You can verify: MX(0)=μM_X'(0) = \mu and MX(0)[MX(0)]2=σ2M_X''(0) - [M_X'(0)]^2 = \sigma^2.

Laplace transforms, Laplace Transform for a trig function - Mathematics Stack Exchange

MGF of exponential distribution

MX(t)=λλt,t<λM_X(t) = \frac{\lambda}{\lambda - t}, \quad t < \lambda

The restriction t<λt < \lambda is important. Differentiating: MX(t)=λ(λt)2M_X'(t) = \frac{\lambda}{(\lambda - t)^2}, so E[X]=MX(0)=1λE[X] = M_X'(0) = \frac{1}{\lambda}. Differentiating again gives E[X2]=2λ2E[X^2] = \frac{2}{\lambda^2}, so Var(X)=1λ2\text{Var}(X) = \frac{1}{\lambda^2}.

MGF of gamma distribution

MX(t)=(ββt)α,t<βM_X(t) = \left(\frac{\beta}{\beta - t}\right)^\alpha, \quad t < \beta

Note that the exponential distribution is the special case α=1\alpha = 1. The mean is E[X]=αβE[X] = \frac{\alpha}{\beta} and the variance is Var(X)=αβ2\text{Var}(X) = \frac{\alpha}{\beta^2}.

MGF of binomial distribution

MX(t)=(pet+1p)nM_X(t) = (pe^t + 1 - p)^n

This exists for all tt. Differentiating and evaluating at t=0t = 0: E[X]=npE[X] = np and Var(X)=np(1p)\text{Var}(X) = np(1-p). You can also use this MGF to prove that the sum of independent Bernoulli random variables is binomial, since the MGF of a single Bernoulli trial is (pet+1p)(pe^t + 1 - p), and multiplying nn of these gives the binomial MGF.

Limitations of moment generating functions

Non-existence of MGFs for certain distributions

The main limitation: some distributions simply don't have an MGF. Heavy-tailed distributions like the Cauchy and lognormal are the classic examples. If you encounter a distribution where E[etX]=E[e^{tX}] = \infty for all t0t \neq 0, the MGF approach won't work, and you'll need to use characteristic functions instead.

Convergence issues with MGFs

Even when an MGF exists, it may only converge on a restricted interval around zero (e.g., t<λt < \lambda for the exponential). This can limit its usefulness in certain theoretical arguments. The characteristic function avoids this problem entirely because eitX=1|e^{itX}| = 1 for all tt, guaranteeing convergence.

Characteristic functions vs moment generating functions

Definition of characteristic functions

The characteristic function of a random variable XX replaces the real exponential in the MGF with a complex exponential:

ϕX(t)=E[eitX]\phi_X(t) = E[e^{itX}]

where i=1i = \sqrt{-1} and tt is real. For continuous random variables: ϕX(t)=eitxfX(x)dx\phi_X(t) = \int_{-\infty}^{\infty} e^{itx} f_X(x) \, dx. This is the Fourier transform of the density.

Properties of characteristic functions

Characteristic functions share the key properties of MGFs:

  • Uniqueness: ϕX(t)=ϕY(t)\phi_X(t) = \phi_Y(t) for all tt implies XX and YY have the same distribution.
  • Independence and sums: If XX and YY are independent, ϕX+Y(t)=ϕX(t)ϕY(t)\phi_{X+Y}(t) = \phi_X(t) \cdot \phi_Y(t).
  • Moment extraction: E[Xn]=ϕX(n)(0)inE[X^n] = \frac{\phi_X^{(n)}(0)}{i^n}, provided the nn-th moment exists.

Advantages of characteristic functions over MGFs

The characteristic function always exists for any random variable, because eitX=1|e^{itX}| = 1 ensures the expectation is bounded. This is the decisive advantage over MGFs.

Characteristic functions are the standard tool for proving the Central Limit Theorem: you show that the characteristic function of the standardized sum converges to et2/2e^{-t^2/2} (the characteristic function of the standard normal), then invoke the continuity theorem.

The tradeoff is that characteristic functions involve complex arithmetic, which makes direct computation less convenient than MGFs when the MGF exists. For exam purposes, use MGFs whenever they exist. Reserve characteristic functions for distributions where MGFs fail or for theoretical proofs requiring universal applicability.