Fiveable

🔀Stochastic Processes Unit 2 Review

QR code for Stochastic Processes practice questions

2.1 Discrete probability distributions

2.1 Discrete probability distributions

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
🔀Stochastic Processes
Unit & Topic Study Guides

Discrete probability distributions describe the likelihood of outcomes for random variables that take on countable values. They're the building blocks for modeling random events in stochastic processes, and nearly every topic in this course builds on them.

This section covers the key discrete distributions (Bernoulli, binomial, geometric, Poisson, and others), along with the core machinery: probability mass functions, cumulative distribution functions, expected value, variance, moment generating functions, and joint distributions.

Types of discrete distributions

Discrete distributions model scenarios where a random variable can only take on isolated, countable values (integers, for instance). In stochastic processes, they show up constantly: modeling arrivals to a queue, counting defects in manufacturing, tracking failures in a reliability system.

The distributions you need to know for this unit:

  • Bernoulli and Binomial (success/failure trials)
  • Geometric and Negative Binomial (waiting for successes)
  • Poisson (counting events over an interval)
  • Hypergeometric (sampling without replacement)

Each has a specific PMF, expected value, and variance that you should be comfortable deriving and applying.

Probability mass functions

The probability mass function (PMF) is the primary way to specify a discrete distribution. It tells you the probability that a random variable equals each of its possible values.

Definition of PMF

A PMF assigns a probability to every possible value of a discrete random variable XX. It's written as P(X=x)P(X = x), and it answers the question: "What's the probability that XX equals exactly xx?"

Properties of valid PMFs

For a PMF to be valid, two conditions must hold:

  • Non-negativity: P(X=x)0P(X = x) \geq 0 for all xx
  • Normalization: xP(X=x)=1\sum_{x} P(X = x) = 1

If either condition fails, you don't have a legitimate probability distribution.

Cumulative distribution functions

The cumulative distribution function (CDF) gives a running total of probability. It's especially useful when you need to compute probabilities involving inequalities, like P(X5)P(X \leq 5).

Definition of CDF

The CDF of a discrete random variable XX is defined as:

F(x)=P(Xx)=txP(X=t)F(x) = P(X \leq x) = \sum_{t \leq x} P(X = t)

It's a non-decreasing, right-continuous step function. As xx \to -\infty, F(x)0F(x) \to 0, and as xx \to \infty, F(x)1F(x) \to 1.

Relationship between PMF and CDF

You can go back and forth between the two:

  • CDF from PMF: Sum up PMF values: F(x)=txP(X=t)F(x) = \sum_{t \leq x} P(X = t)
  • PMF from CDF: Take differences at jump points: P(X=x)=F(x)limyxF(y)P(X = x) = F(x) - \lim_{y \to x^-} F(y)

For integer-valued random variables, this simplifies to P(X=x)=F(x)F(x1)P(X = x) = F(x) - F(x - 1).

Expected value

The expected value summarizes where a distribution is "centered." Think of it as the long-run average if you could repeat the random experiment infinitely many times.

Definition of expected value

For a discrete random variable XX:

E[X]=xxP(X=x)E[X] = \sum_{x} x \cdot P(X = x)

Each possible value gets weighted by its probability. Values that are more likely pull the expected value toward them.

Linearity of expectation

This is one of the most useful properties in probability:

E[X+Y]=E[X]+E[Y]E[X + Y] = E[X] + E[Y]

This holds regardless of whether XX and YY are independent. That's what makes it so powerful. You can also pull out constants: E[aX+b]=aE[X]+bE[aX + b] = aE[X] + b.

Variance and standard deviation

Variance measures how spread out a distribution is around its mean. Two distributions can have the same expected value but very different variances.

Definition of variance

Var(X)=E[(XE[X])2]=x(xE[X])2P(X=x)Var(X) = E[(X - E[X])^2] = \sum_{x} (x - E[X])^2 \cdot P(X = x)

A computationally easier form that's often faster to use:

Var(X)=E[X2](E[X])2Var(X) = E[X^2] - (E[X])^2

Definition of PMF, Probability mass function - Wikipedia

Properties of variance

  • Scaling: Var(aX)=a2Var(X)Var(aX) = a^2 \, Var(X) (the square matters; variance is not linear)
  • Shifting: Var(X+b)=Var(X)Var(X + b) = Var(X) (adding a constant doesn't change spread)
  • Sum of independents: If XX and YY are independent, Var(X+Y)=Var(X)+Var(Y)Var(X + Y) = Var(X) + Var(Y)

Note the independence requirement for the sum rule. Unlike linearity of expectation, this does not hold for dependent variables. For dependent variables, you need the covariance term: Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)Var(X + Y) = Var(X) + Var(Y) + 2\,Cov(X, Y).

Standard deviation vs variance

The standard deviation is σ=Var(X)\sigma = \sqrt{Var(X)}. Its main advantage is that it's in the same units as XX itself, making it more interpretable. Variance is in squared units, which is useful for calculations but harder to reason about directly.

Moment generating functions

Moment generating functions (MGFs) encode the entire distribution of a random variable into a single function. They're especially handy for finding moments and for working with sums of independent variables.

Definition of MGF

MX(t)=E[etX]=xetxP(X=x)M_X(t) = E[e^{tX}] = \sum_{x} e^{tx} \cdot P(X = x)

The MGF exists if this sum converges in some neighborhood of t=0t = 0. When it exists, it uniquely determines the distribution.

Properties and applications of MGFs

  • Extracting moments: The nn-th moment is E[Xn]=MX(n)(0)E[X^n] = M_X^{(n)}(0), the nn-th derivative evaluated at t=0t = 0. So E[X]=MX(0)E[X] = M_X'(0) and E[X2]=MX(0)E[X^2] = M_X''(0).
  • Sums of independents: If XX and YY are independent, MX+Y(t)=MX(t)MY(t)M_{X+Y}(t) = M_X(t) \cdot M_Y(t). This makes MGFs a clean way to derive the distribution of a sum.
  • Identifying distributions: If you compute an MGF and recognize it as the MGF of a known distribution, you've identified the distribution of your random variable.

Common discrete distributions

Bernoulli and binomial distributions

A Bernoulli random variable models a single trial with two outcomes: success (X=1X = 1) with probability pp, or failure (X=0X = 0) with probability 1p1 - p.

  • PMF: P(X=x)=px(1p)1xP(X = x) = p^x(1-p)^{1-x} for x{0,1}x \in \{0, 1\}
  • E[X]=pE[X] = p, Var(X)=p(1p)Var(X) = p(1-p)

The Binomial distribution counts the number of successes in nn independent Bernoulli trials.

  • PMF: P(X=k)=(nk)pk(1p)nkP(X = k) = \binom{n}{k} p^k (1-p)^{n-k} for k=0,1,,nk = 0, 1, \ldots, n
  • E[X]=npE[X] = np, Var(X)=np(1p)Var(X) = np(1-p)

Applications include quality control (number of defective items in a batch) and survey sampling (number of respondents choosing a particular option).

Geometric and negative binomial distributions

The Geometric distribution models the number of trials until the first success.

  • PMF: P(X=k)=(1p)k1pP(X = k) = (1-p)^{k-1} p for k=1,2,3,k = 1, 2, 3, \ldots
  • E[X]=1/pE[X] = 1/p, Var(X)=(1p)/p2Var(X) = (1-p)/p^2

Be careful: some textbooks define the geometric as the number of failures before the first success, which shifts the support to k=0,1,2,k = 0, 1, 2, \ldots and changes the PMF to P(X=k)=(1p)kpP(X = k) = (1-p)^k p. Check which convention your course uses.

The Negative Binomial generalizes this to the number of trials until the rr-th success.

  • PMF: P(X=k)=(k1r1)pr(1p)krP(X = k) = \binom{k-1}{r-1} p^r (1-p)^{k-r} for k=r,r+1,k = r, r+1, \ldots
  • E[X]=r/pE[X] = r/p, Var(X)=r(1p)/p2Var(X) = r(1-p)/p^2

These distributions are natural models for waiting times in stochastic processes.

Poisson distribution

The Poisson distribution models the count of events in a fixed interval, given that events occur independently at a constant average rate λ\lambda.

  • PMF: P(X=k)=eλλkk!P(X = k) = \frac{e^{-\lambda} \lambda^k}{k!} for k=0,1,2,k = 0, 1, 2, \ldots
  • E[X]=λE[X] = \lambda, Var(X)=λVar(X) = \lambda

The fact that the mean equals the variance is a distinctive feature. Typical applications: customer arrivals per hour, number of typos per page, radioactive decay events per second.

The Poisson also arises as a limit of the binomial when nn is large, pp is small, and λ=np\lambda = np stays moderate.

Hypergeometric distribution

The Hypergeometric distribution models successes in nn draws from a finite population of size NN containing KK successes, without replacement.

  • PMF: P(X=k)=(Kk)(NKnk)(Nn)P(X = k) = \frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}
  • E[X]=nK/NE[X] = nK/N, Var(X)=nKNNKNNnN1Var(X) = n \frac{K}{N} \frac{N-K}{N} \frac{N-n}{N-1}

The key difference from the binomial: draws are dependent because there's no replacement. As NN \to \infty with K/NpK/N \to p, the hypergeometric converges to the binomial.

Joint distributions

When you're working with two or more random variables simultaneously, you need joint distributions to capture how they behave together.

Definition of PMF, Discrete Random Variables (2 of 5) | Concepts in Statistics

Joint probability mass functions

The joint PMF of XX and YY is P(X=x,Y=y)P(X = x, Y = y), giving the probability that X=xX = x and Y=yY = y at the same time.

Validity requirements are the same as for single-variable PMFs: non-negativity, and the double sum over all (x,y)(x, y) pairs must equal 1.

Marginal and conditional distributions

Marginal PMFs recover the distribution of a single variable by summing out the other:

P(X=x)=yP(X=x,Y=y)P(X = x) = \sum_{y} P(X = x, Y = y)

Conditional PMFs describe one variable given a specific value of the other:

P(Y=yX=x)=P(X=x,Y=y)P(X=x)P(Y = y \mid X = x) = \frac{P(X = x, Y = y)}{P(X = x)}

This is only defined when P(X=x)>0P(X = x) > 0.

Independent vs dependent random variables

XX and YY are independent if and only if their joint PMF factors into the product of marginals for every pair of values:

P(X=x,Y=y)=P(X=x)P(Y=y)for all x,yP(X = x, Y = y) = P(X = x) \cdot P(Y = y) \quad \text{for all } x, y

Equivalently, independence means conditioning on one variable doesn't change the distribution of the other: P(Y=yX=x)=P(Y=y)P(Y = y \mid X = x) = P(Y = y).

If this factorization fails for even a single pair (x,y)(x, y), the variables are dependent, and you'll need the full joint PMF (or covariance information) to analyze their combined behavior.

Sums of discrete random variables

Adding random variables together comes up constantly in stochastic processes (total service time, aggregate demand, cumulative arrivals, etc.).

Convolution formula for PMFs

For independent random variables XX and YY, the PMF of Z=X+YZ = X + Y is:

P(Z=z)=xP(X=x)P(Y=zx)P(Z = z) = \sum_{x} P(X = x) \cdot P(Y = z - x)

This is called the convolution of the two PMFs. You iterate over all values xx that XX can take and accumulate the products.

For sums of three or more independent variables, apply convolution iteratively (or use MGFs, which is usually cleaner).

Distribution of sum of independent variables

Certain families of distributions are "closed" under summation of independent variables:

  • Sum of nn i.i.d. Bernoulli(pp) variables \to Binomial(n,pn, p)
  • Sum of independent Poisson(λ1\lambda_1) and Poisson(λ2\lambda_2) \to Poisson(λ1+λ2\lambda_1 + \lambda_2)
  • Sum of independent Negative Binomial(r1,pr_1, p) and Negative Binomial(r2,pr_2, p) \to Negative Binomial(r1+r2,pr_1 + r_2, p)

Recognizing these closure properties saves you from doing convolution by hand.

Transformations of discrete random variables

Transformations create new random variables as functions of existing ones. This is how you go from a model of raw data to a model of some derived quantity.

PMF and CDF under transformations

Given Y=g(X)Y = g(X) where XX has a known PMF, the PMF of YY is:

P(Y=y)=x:g(x)=yP(X=x)P(Y = y) = \sum_{x:\, g(x) = y} P(X = x)

You collect all values of xx that map to the same yy and add their probabilities. If gg is one-to-one, each sum has only one term.

Functions of multiple discrete variables

For a transformation (U,V)=(g1(X,Y),g2(X,Y))(U, V) = (g_1(X, Y),\, g_2(X, Y)):

P(U=u,V=v)=(x,y):g1(x,y)=ug2(x,y)=vP(X=x,Y=y)P(U = u, V = v) = \sum_{\substack{(x,y):\, g_1(x,y) = u \\ g_2(x,y) = v}} P(X = x, Y = y)

Once you have the joint PMF of (U,V)(U, V), you can extract marginals and conditionals using the standard formulas from the joint distributions section.

Applications and examples

Modeling real-world scenarios

  • Queueing systems: Customer arrivals are often modeled as Poisson; the number of customers served in a time window can be binomial or geometric depending on the service mechanism.
  • Inventory management: Daily demand for a product might follow a Poisson distribution with rate λ=12\lambda = 12 units/day, which lets you set reorder points and safety stock levels.
  • Reliability engineering: The number of component failures in a system over a fixed period can be modeled as Poisson or binomial, depending on whether components are independent and identical.

Solving problems using discrete distributions

  • Quality control: Use the hypergeometric distribution to find the probability that a sample of 10 items from a lot of 200 contains 2 or more defectives.
  • Risk assessment: Model the number of insurance claims per month as Poisson(λ\lambda) to estimate the probability of exceeding a threshold.
  • Network analysis: Node degree distributions in random graphs often follow Poisson (Erdős–Rényi model) or power-law distributions, which determine connectivity and resilience properties.

The common thread: pick the distribution whose assumptions match your scenario, then use PMFs, CDFs, expected values, and variances to answer quantitative questions about the system.