Fiveable

🔀Stochastic Processes Unit 1 Review

QR code for Stochastic Processes practice questions

1.4 Random variables

1.4 Random variables

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
🔀Stochastic Processes
Unit & Topic Study Guides

Definition of random variables

A random variable is a function that assigns a numerical value to each outcome of a random experiment. This lets you work with outcomes mathematically, which is essential for everything in stochastic processes.

Random variables can take on different values, and the likelihood of each value is governed by an underlying probability distribution.

Formal mathematical definition

A random variable XX is a function that maps the sample space Ω\Omega of a random experiment to the real numbers R\mathbb{R}:

X:ΩRX: \Omega \rightarrow \mathbb{R}

For each outcome ωΩ\omega \in \Omega, X(ω)X(\omega) is a real number. The probability of an event AA related to XX is:

P(XA)=P({ωΩ:X(ω)A})P(X \in A) = P(\{\omega \in \Omega : X(\omega) \in A\})

Technically, XX also needs to be measurable (the preimage of any Borel set must be in the sigma-algebra on Ω\Omega). For this course, the key idea is that XX translates abstract outcomes into numbers you can compute with.

Intuitive understanding

Think of a random variable as a rule that converts experimental outcomes into numbers. If you flip two coins, the sample space is {HH,HT,TH,TT}\{HH, HT, TH, TT\}. Defining XX = "number of heads" gives you X(HH)=2X(HH) = 2, X(HT)=1X(HT) = 1, X(TH)=1X(TH) = 1, X(TT)=0X(TT) = 0. Now you can ask questions like P(X=1)=1/2P(X = 1) = 1/2.

Other examples: the waiting time in a queue, the daily closing price of a stock, or the number of arrivals to a server in a given interval.

Discrete vs continuous variables

Random variables fall into two main categories based on the values they can take.

Discrete random variables have a countable set of possible values (finite or countably infinite). Examples: the number of defective items in a batch, or the number of customers arriving at a store per hour.

Continuous random variables can take any value within some interval of the real line. Examples: the height of a randomly selected person, or the exact time until the next bus arrives.

The distinction matters because discrete and continuous variables use different mathematical tools (sums vs. integrals) throughout the course.

Probability distributions

A probability distribution describes the likelihood of each value a random variable can take. Different types of distributions apply depending on whether the variable is discrete or continuous.

Probability mass functions (PMFs)

A probability mass function describes the distribution of a discrete random variable. The PMF of XX, written pX(x)p_X(x), gives the probability that XX equals a specific value xx:

pX(x)=P(X=x)p_X(x) = P(X = x)

Two properties must hold:

  1. pX(x)0p_X(x) \geq 0 for all xx
  2. xpX(x)=1\sum_x p_X(x) = 1, where the sum runs over all possible values of xx

For example, if XX counts the number of heads in two fair coin flips, then pX(0)=1/4p_X(0) = 1/4, pX(1)=1/2p_X(1) = 1/2, pX(2)=1/4p_X(2) = 1/4.

Probability density functions (PDFs)

A probability density function describes the distribution of a continuous random variable. The PDF of XX, written fX(x)f_X(x), represents the relative likelihood of XX taking a value near xx.

A critical point: fX(x)f_X(x) is not a probability. For continuous variables, P(X=x)=0P(X = x) = 0 for any single point. Instead, you get probabilities by integrating the PDF over an interval:

P(aXb)=abfX(x)dxP(a \leq X \leq b) = \int_a^b f_X(x)\, dx

Two properties must hold:

  1. fX(x)0f_X(x) \geq 0 for all xx
  2. fX(x)dx=1\int_{-\infty}^{\infty} f_X(x)\, dx = 1

Cumulative distribution functions (CDFs)

The cumulative distribution function works for both discrete and continuous random variables. The CDF of XX, written FX(x)F_X(x), gives the probability that XX takes a value less than or equal to xx:

FX(x)=P(Xx)F_X(x) = P(X \leq x)

  • For a discrete variable: FX(x)=txpX(t)F_X(x) = \sum_{t \leq x} p_X(t)
  • For a continuous variable: FX(x)=xfX(t)dtF_X(x) = \int_{-\infty}^{x} f_X(t)\, dt

Every CDF is non-decreasing, right-continuous, and satisfies limxFX(x)=0\lim_{x \to -\infty} F_X(x) = 0 and limxFX(x)=1\lim_{x \to \infty} F_X(x) = 1. For continuous variables, the PDF is the derivative of the CDF: fX(x)=FX(x)f_X(x) = F_X'(x).

Expected value

The expected value (also called the mean or expectation) measures the central tendency of a random variable. It represents the long-run average you'd observe over many independent trials. It's denoted E[X]E[X].

Definition of expected value

For a discrete random variable XX with PMF pX(x)p_X(x):

E[X]=xxpX(x)E[X] = \sum_x x \cdot p_X(x)

For a continuous random variable XX with PDF fX(x)f_X(x):

E[X]=xfX(x)dxE[X] = \int_{-\infty}^{\infty} x \cdot f_X(x)\, dx

These sums/integrals must converge absolutely for the expected value to exist. (The Cauchy distribution is a classic example where E[X]E[X] does not exist.)

Properties of expected value

The expected value satisfies several useful properties:

  1. Linearity: For constants aa and bb, E[aX+b]=aE[X]+bE[aX + b] = aE[X] + b
  2. Non-negativity: If X0X \geq 0, then E[X]0E[X] \geq 0
  3. Monotonicity: If XYX \leq Y, then E[X]E[Y]E[X] \leq E[Y]
Formal mathematical definition, Discrete Random Variables (3 of 5) | Concepts in Statistics

Linearity of expectation

This is one of the most useful results in probability. For any random variables XX and YY:

E[X+Y]=E[X]+E[Y]E[X + Y] = E[X] + E[Y]

This holds regardless of whether XX and YY are independent. That's what makes it so powerful. You can break a complicated random variable into simpler pieces, compute each expectation separately, and add them up.

More generally, for random variables X1,X2,,XnX_1, X_2, \ldots, X_n and constants a1,a2,,ana_1, a_2, \ldots, a_n:

E[i=1naiXi]=i=1naiE[Xi]E\left[\sum_{i=1}^n a_i X_i\right] = \sum_{i=1}^n a_i E[X_i]

Variance and standard deviation

While the expected value tells you where the distribution is centered, variance and standard deviation tell you how spread out it is. A higher variance means the values are more dispersed around the mean.

Definition of variance

The variance of XX, denoted Var(X)\text{Var}(X) or σX2\sigma_X^2, is the expected squared deviation from the mean:

Var(X)=E[(XE[X])2]\text{Var}(X) = E[(X - E[X])^2]

For a discrete variable: Var(X)=x(xE[X])2pX(x)\text{Var}(X) = \sum_x (x - E[X])^2 \cdot p_X(x)

For a continuous variable: Var(X)=(xE[X])2fX(x)dx\text{Var}(X) = \int_{-\infty}^{\infty} (x - E[X])^2 \cdot f_X(x)\, dx

A very handy computational shortcut is the alternative formula:

Var(X)=E[X2](E[X])2\text{Var}(X) = E[X^2] - (E[X])^2

This is often much easier to compute than working directly from the definition.

Definition of standard deviation

The standard deviation is the square root of the variance:

σX=Var(X)\sigma_X = \sqrt{\text{Var}(X)}

The standard deviation has the same units as XX itself, making it more interpretable than variance when you want to describe spread in the original scale.

Properties of variance

  1. Non-negativity: Var(X)0\text{Var}(X) \geq 0 for any random variable XX, with equality only if XX is constant
  2. Scaling: For a constant aa, Var(aX)=a2Var(X)\text{Var}(aX) = a^2 \text{Var}(X)
  3. Shift invariance: For a constant bb, Var(X+b)=Var(X)\text{Var}(X + b) = \text{Var}(X)
  4. Sum of independent variables: If XX and YY are independent, Var(X+Y)=Var(X)+Var(Y)\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)

Note that property 4 requires independence, unlike linearity of expectation. For dependent variables, you need the covariance term: Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) + 2\text{Cov}(X, Y).

Calculating variance and standard deviation

  1. Compute the expected value E[X]E[X]

  2. Compute E[X2]E[X^2] (the second moment)

  3. Apply the shortcut: Var(X)=E[X2](E[X])2\text{Var}(X) = E[X^2] - (E[X])^2

  4. Take the square root for the standard deviation: σX=Var(X)\sigma_X = \sqrt{\text{Var}(X)}

Moment-generating functions

Moment-generating functions (MGFs) encode all the moments of a distribution into a single function. They're particularly useful for identifying distributions and proving results about sums of independent random variables.

Definition of moment-generating functions

The MGF of a random variable XX is defined as:

MX(t)=E[etX]M_X(t) = E[e^{tX}]

where tt is a real number.

For a discrete variable: MX(t)=xetxpX(x)M_X(t) = \sum_x e^{tx} \cdot p_X(x)

For a continuous variable: MX(t)=etxfX(x)dxM_X(t) = \int_{-\infty}^{\infty} e^{tx} \cdot f_X(x)\, dx

The MGF may not exist for all distributions (it requires that E[etX]E[e^{tX}] be finite in some open interval around t=0t = 0).

Properties of moment-generating functions

  1. Uniqueness: If two random variables have the same MGF (in a neighborhood of t=0t = 0), they have the same distribution. This makes MGFs a powerful tool for proving that two variables share a distribution.
  2. Moment extraction: The nn-th moment of XX is obtained by differentiating nn times and evaluating at t=0t = 0: E[Xn]=MX(n)(0)E[X^n] = M_X^{(n)}(0)
  3. Sum of independents: If XX and YY are independent, the MGF of their sum factors: MX+Y(t)=MX(t)MY(t)M_{X+Y}(t) = M_X(t) \cdot M_Y(t)

Applications of moment-generating functions

  • Deriving moments: The mean is E[X]=MX(0)E[X] = M_X'(0) and the variance is Var(X)=MX(0)(MX(0))2\text{Var}(X) = M_X''(0) - (M_X'(0))^2
  • Identifying distributions: If you compute an MGF and recognize its functional form, you can immediately identify the distribution
  • Proving convergence results: MGFs are used in proofs of the Central Limit Theorem and other limit theorems
  • Sums of independent variables: The factoring property makes it straightforward to find the distribution of sums (e.g., showing that the sum of independent Poissons is Poisson)
Formal mathematical definition, Discrete Random Variables (5 of 5) | Concepts in Statistics

Joint distributions

Joint distributions describe the probability behavior of two or more random variables considered together. They capture not just each variable's individual behavior, but also the relationship between them.

Joint probability mass functions

The joint PMF of two discrete random variables XX and YY, written pX,Y(x,y)p_{X,Y}(x, y), gives the probability that X=xX = x and Y=yY = y simultaneously:

pX,Y(x,y)=P(X=x,Y=y)p_{X,Y}(x, y) = P(X = x, Y = y)

Properties:

  1. pX,Y(x,y)0p_{X,Y}(x, y) \geq 0 for all xx and yy
  2. xypX,Y(x,y)=1\sum_x \sum_y p_{X,Y}(x, y) = 1

Joint probability density functions

The joint PDF of two continuous random variables XX and YY, written fX,Y(x,y)f_{X,Y}(x, y), represents the relative likelihood of (X,Y)(X, Y) falling near the point (x,y)(x, y). Probabilities come from double integrals over regions:

P((X,Y)A)=AfX,Y(x,y)dxdyP((X, Y) \in A) = \iint_A f_{X,Y}(x, y)\, dx\, dy

Properties:

  1. fX,Y(x,y)0f_{X,Y}(x, y) \geq 0 for all xx and yy
  2. fX,Y(x,y)dxdy=1\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f_{X,Y}(x, y)\, dx\, dy = 1

Marginal distributions

You recover the distribution of a single variable from the joint distribution by summing or integrating out the other variable.

  • Discrete: pX(x)=ypX,Y(x,y)p_X(x) = \sum_y p_{X,Y}(x, y)
  • Continuous: fX(x)=fX,Y(x,y)dyf_X(x) = \int_{-\infty}^{\infty} f_{X,Y}(x, y)\, dy

The marginal distribution tells you about each variable on its own, but it discards information about the relationship between the variables. Two different joint distributions can produce the same marginals.

Conditional distributions

Conditional distributions describe the behavior of one variable given a specific value of another.

  • Discrete: pYX(yx)=pX,Y(x,y)pX(x)p_{Y|X}(y|x) = \frac{p_{X,Y}(x, y)}{p_X(x)}, provided pX(x)>0p_X(x) > 0
  • Continuous: fYX(yx)=fX,Y(x,y)fX(x)f_{Y|X}(y|x) = \frac{f_{X,Y}(x, y)}{f_X(x)}, provided fX(x)>0f_X(x) > 0

Notice the structure: the conditional distribution is the joint divided by the marginal. This is the multivariate version of P(AB)=P(AB)/P(B)P(A|B) = P(A \cap B)/P(B).

Independence of random variables

Two random variables are independent if knowing the value of one gives you no information about the other. Independence simplifies many calculations and is a key assumption in stochastic modeling.

Definition of independence

XX and YY are independent if and only if their joint distribution factors into the product of their marginals:

  • Discrete: pX,Y(x,y)=pX(x)pY(y)p_{X,Y}(x, y) = p_X(x) \cdot p_Y(y) for all x,yx, y
  • Continuous: fX,Y(x,y)=fX(x)fY(y)f_{X,Y}(x, y) = f_X(x) \cdot f_Y(y) for all x,yx, y

Equivalently, XX and YY are independent if and only if P(XA,YB)=P(XA)P(YB)P(X \in A, Y \in B) = P(X \in A) \cdot P(Y \in B) for all events AA and BB.

To check independence, verify that the factorization holds for every pair of values. If it fails for even one pair, the variables are dependent.

Properties of independent variables

When XX and YY are independent:

  1. P(XA,YB)=P(XA)P(YB)P(X \in A, Y \in B) = P(X \in A) \cdot P(Y \in B) for any events AA and BB
  2. E[XY]=E[X]E[Y]E[XY] = E[X] \cdot E[Y]
  3. Var(X+Y)=Var(X)+Var(Y)\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)
  4. MX+Y(t)=MX(t)MY(t)M_{X+Y}(t) = M_X(t) \cdot M_Y(t)
  5. E[g(X)h(Y)]=E[g(X)]E[h(Y)]E[g(X)h(Y)] = E[g(X)] \cdot E[h(Y)] for any functions gg and hh

Be careful: E[XY]=E[X]E[Y]E[XY] = E[X] \cdot E[Y] does not imply independence. Uncorrelated variables (zero covariance) are not necessarily independent.

Examples of independent variables

Independent:

  • Outcomes of two separate coin tosses
  • Numbers of customers arriving at two different stores during non-overlapping time intervals
  • Heights of randomly selected individuals from different populations

Dependent:

  • The number of defective items in a sample and the total number of items inspected
  • Temperature and humidity at the same location
  • Stock prices of two companies in the same industry

Functions of random variables

A function of random variables creates a new random variable by applying some transformation. For example, if XX is a random variable and Y=X2Y = X^2, then YY is also a random variable with its own distribution.

Distribution of functions of random variables

To find the distribution of Y=g(X)Y = g(X), the general CDF method works as follows:

  1. Write the CDF of YY: FY(y)=P(Yy)=P(g(X)y)F_Y(y) = P(Y \leq y) = P(g(X) \leq y)
  2. Express {g(X)y}\{g(X) \leq y\} in terms of XX (e.g., if gg is increasing, this becomes {Xg1(y)}\{X \leq g^{-1}(y)\})
  3. Evaluate using the known distribution of XX
  4. Differentiate FY(y)F_Y(y) to get the PDF fY(y)f_Y(y) (for continuous variables)

For a monotone differentiable function gg with inverse g1g^{-1}, there's a direct formula:

fY(y)=fX(g1(y))ddyg1(y)f_Y(y) = f_X(g^{-1}(y)) \cdot \left|\frac{d}{dy}g^{-1}(y)\right|

The absolute value of the derivative accounts for whether gg is increasing or decreasing.

Transformations of random variables

Common transformations you'll encounter:

  • Linear: Y=aX+bY = aX + b. If XX has PDF fXf_X, then fY(y)=1afX(yba)f_Y(y) = \frac{1}{|a|} f_X\left(\frac{y - b}{a}\right)
  • Exponential: Y=eXY = e^X. Apply the CDF method or the change-of-variables formula with g1(y)=ln(y)g^{-1}(y) = \ln(y)
  • Square: Y=X2Y = X^2. This is not monotone, so you need to handle positive and negative values of XX separately

These transformations come up frequently when modeling stochastic processes, since you often need to derive the distribution of some quantity that depends on underlying random variables.