A random variable is a function that assigns a numerical value to each outcome of a random experiment. This lets you work with outcomes mathematically, which is essential for everything in stochastic processes.

Random variables can take on different values, and the likelihood of each value is governed by an underlying probability distribution.

Formal mathematical definition

A random variable $X$ is a function that maps the sample space $\Omega$ of a random experiment to the real numbers $\mathbb{R}$ :

$X: \Omega \rightarrow \mathbb{R}$

For each outcome $\omega \in \Omega$ , $X(\omega)$ is a real number. The probability of an event $A$ related to $X$ is:

$P(X \in A) = P(\{\omega \in \Omega : X(\omega) \in A\})$

Technically, $X$ also needs to be measurable (the preimage of any Borel set must be in the sigma-algebra on $\Omega$ ). For this course, the key idea is that $X$ translates abstract outcomes into numbers you can compute with.

Intuitive understanding

Think of a random variable as a rule that converts experimental outcomes into numbers. If you flip two coins, the sample space is $\{HH, HT, TH, TT\}$ . Defining $X$ = "number of heads" gives you $X(HH) = 2$ , $X(HT) = 1$ , $X(TH) = 1$ , $X(TT) = 0$ . Now you can ask questions like $P(X = 1) = 1/2$ .

Other examples: the waiting time in a queue, the daily closing price of a stock, or the number of arrivals to a server in a given interval.

Discrete vs continuous variables

Random variables fall into two main categories based on the values they can take.

Discrete random variables have a countable set of possible values (finite or countably infinite). Examples: the number of defective items in a batch, or the number of customers arriving at a store per hour.

Continuous random variables can take any value within some interval of the real line. Examples: the height of a randomly selected person, or the exact time until the next bus arrives.

The distinction matters because discrete and continuous variables use different mathematical tools (sums vs. integrals) throughout the course.

Probability distributions

A probability distribution describes the likelihood of each value a random variable can take. Different types of distributions apply depending on whether the variable is discrete or continuous.

Probability mass functions (PMFs)

A probability mass function describes the distribution of a discrete random variable. The PMF of $X$ , written $p_X(x)$ , gives the probability that $X$ equals a specific value $x$ :

$p_X(x) = P(X = x)$

Two properties must hold:

$p_X(x) \geq 0$ for all $x$
$\sum_x p_X(x) = 1$ , where the sum runs over all possible values of $x$

For example, if $X$ counts the number of heads in two fair coin flips, then $p_X(0) = 1/4$ , $p_X(1) = 1/2$ , $p_X(2) = 1/4$ .

Probability density functions (PDFs)

A probability density function describes the distribution of a continuous random variable. The PDF of $X$ , written $f_X(x)$ , represents the relative likelihood of $X$ taking a value near $x$ .

A critical point: $f_X(x)$ is not a probability. For continuous variables, $P(X = x) = 0$ for any single point. Instead, you get probabilities by integrating the PDF over an interval:

$P(a \leq X \leq b) = \int_a^b f_X(x)\, dx$

Two properties must hold:

$f_X(x) \geq 0$ for all $x$
$\int_{-\infty}^{\infty} f_X(x)\, dx = 1$

Cumulative distribution functions (CDFs)

The cumulative distribution function works for both discrete and continuous random variables. The CDF of $X$ , written $F_X(x)$ , gives the probability that $X$ takes a value less than or equal to $x$ :

$F_X(x) = P(X \leq x)$

For a discrete variable: $F_X(x) = \sum_{t \leq x} p_X(t)$
For a continuous variable: $F_X(x) = \int_{-\infty}^{x} f_X(t)\, dt$

Every CDF is non-decreasing, right-continuous, and satisfies $\lim_{x \to -\infty} F_X(x) = 0$ and $\lim_{x \to \infty} F_X(x) = 1$ . For continuous variables, the PDF is the derivative of the CDF: $f_X(x) = F_X'(x)$ .

Expected value

The expected value (also called the mean or expectation) measures the central tendency of a random variable. It represents the long-run average you'd observe over many independent trials. It's denoted $E[X]$ .

Definition of expected value

For a discrete random variable $X$ with PMF $p_X(x)$ :

$E[X] = \sum_x x \cdot p_X(x)$

For a continuous random variable $X$ with PDF $f_X(x)$ :

$E[X] = \int_{-\infty}^{\infty} x \cdot f_X(x)\, dx$

These sums/integrals must converge absolutely for the expected value to exist. (The Cauchy distribution is a classic example where $E[X]$ does not exist.)

Properties of expected value

The expected value satisfies several useful properties:

Linearity: For constants $a$ and $b$ , $E[aX + b] = aE[X] + b$
Non-negativity: If $X \geq 0$ , then $E[X] \geq 0$
Monotonicity: If $X \leq Y$ , then $E[X] \leq E[Y]$

Formal mathematical definition, Discrete Random Variables (3 of 5) | Concepts in Statistics

Linearity of expectation

This is one of the most useful results in probability. For any random variables $X$ and $Y$ :

$E[X + Y] = E[X] + E[Y]$

This holds regardless of whether $X$ and $Y$ are independent. That's what makes it so powerful. You can break a complicated random variable into simpler pieces, compute each expectation separately, and add them up.

More generally, for random variables $X_1, X_2, \ldots, X_n$ and constants $a_1, a_2, \ldots, a_n$ :

$E\left[\sum_{i=1}^n a_i X_i\right] = \sum_{i=1}^n a_i E[X_i]$

Variance and standard deviation

While the expected value tells you where the distribution is centered, variance and standard deviation tell you how spread out it is. A higher variance means the values are more dispersed around the mean.

Definition of variance

The variance of $X$ , denoted $\text{Var}(X)$ or $\sigma_X^2$ , is the expected squared deviation from the mean:

$\text{Var}(X) = E[(X - E[X])^2]$

For a discrete variable: $\text{Var}(X) = \sum_x (x - E[X])^2 \cdot p_X(x)$

For a continuous variable: $\text{Var}(X) = \int_{-\infty}^{\infty} (x - E[X])^2 \cdot f_X(x)\, dx$

A very handy computational shortcut is the alternative formula:

$\text{Var}(X) = E[X^2] - (E[X])^2$

This is often much easier to compute than working directly from the definition.

Definition of standard deviation

The standard deviation is the square root of the variance:

$\sigma_X = \sqrt{\text{Var}(X)}$

The standard deviation has the same units as $X$ itself, making it more interpretable than variance when you want to describe spread in the original scale.

Properties of variance

Non-negativity: $\text{Var}(X) \geq 0$ for any random variable $X$ , with equality only if $X$ is constant
Scaling: For a constant $a$ , $\text{Var}(aX) = a^2 \text{Var}(X)$
Shift invariance: For a constant $b$ , $\text{Var}(X + b) = \text{Var}(X)$
Sum of independent variables: If $X$ and $Y$ are independent, $\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)$

Note that property 4 requires independence, unlike linearity of expectation. For dependent variables, you need the covariance term: $\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) + 2\text{Cov}(X, Y)$ .

Calculating variance and standard deviation

Compute the expected value $E[X]$
Compute $E[X^2]$ (the second moment)
Apply the shortcut: $\text{Var}(X) = E[X^2] - (E[X])^2$
Take the square root for the standard deviation: $\sigma_X = \sqrt{\text{Var}(X)}$

Moment-generating functions

Moment-generating functions (MGFs) encode all the moments of a distribution into a single function. They're particularly useful for identifying distributions and proving results about sums of independent random variables.

Definition of moment-generating functions

The MGF of a random variable $X$ is defined as:

$M_X(t) = E[e^{tX}]$

where $t$ is a real number.

For a discrete variable: $M_X(t) = \sum_x e^{tx} \cdot p_X(x)$

For a continuous variable: $M_X(t) = \int_{-\infty}^{\infty} e^{tx} \cdot f_X(x)\, dx$

The MGF may not exist for all distributions (it requires that $E[e^{tX}]$ be finite in some open interval around $t = 0$ ).

Properties of moment-generating functions

Uniqueness: If two random variables have the same MGF (in a neighborhood of $t = 0$ ), they have the same distribution. This makes MGFs a powerful tool for proving that two variables share a distribution.
Moment extraction: The $n$ -th moment of $X$ is obtained by differentiating $n$ times and evaluating at $t = 0$ : $E[X^n] = M_X^{(n)}(0)$
Sum of independents: If $X$ and $Y$ are independent, the MGF of their sum factors: $M_{X+Y}(t) = M_X(t) \cdot M_Y(t)$

Applications of moment-generating functions

Deriving moments: The mean is $E[X] = M_X'(0)$ and the variance is $\text{Var}(X) = M_X''(0) - (M_X'(0))^2$
Identifying distributions: If you compute an MGF and recognize its functional form, you can immediately identify the distribution
Proving convergence results: MGFs are used in proofs of the Central Limit Theorem and other limit theorems
Sums of independent variables: The factoring property makes it straightforward to find the distribution of sums (e.g., showing that the sum of independent Poissons is Poisson)

Formal mathematical definition, Discrete Random Variables (5 of 5) | Concepts in Statistics

Joint distributions

Joint distributions describe the probability behavior of two or more random variables considered together. They capture not just each variable's individual behavior, but also the relationship between them.

Joint probability mass functions

The joint PMF of two discrete random variables $X$ and $Y$ , written $p_{X,Y}(x, y)$ , gives the probability that $X = x$ and $Y = y$ simultaneously:

$p_{X,Y}(x, y) = P(X = x, Y = y)$

Properties:

$p_{X,Y}(x, y) \geq 0$ for all $x$ and $y$
$\sum_x \sum_y p_{X,Y}(x, y) = 1$

Joint probability density functions

The joint PDF of two continuous random variables $X$ and $Y$ , written $f_{X,Y}(x, y)$ , represents the relative likelihood of $(X, Y)$ falling near the point $(x, y)$ . Probabilities come from double integrals over regions:

$P((X, Y) \in A) = \iint_A f_{X,Y}(x, y)\, dx\, dy$

Properties:

$f_{X,Y}(x, y) \geq 0$ for all $x$ and $y$
$\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f_{X,Y}(x, y)\, dx\, dy = 1$

Marginal distributions

You recover the distribution of a single variable from the joint distribution by summing or integrating out the other variable.

Discrete: $p_X(x) = \sum_y p_{X,Y}(x, y)$
Continuous: $f_X(x) = \int_{-\infty}^{\infty} f_{X,Y}(x, y)\, dy$

The marginal distribution tells you about each variable on its own, but it discards information about the relationship between the variables. Two different joint distributions can produce the same marginals.

Conditional distributions

Conditional distributions describe the behavior of one variable given a specific value of another.

Discrete: $p_{Y|X}(y|x) = \frac{p_{X,Y}(x, y)}{p_X(x)}$ , provided $p_X(x) > 0$
Continuous: $f_{Y|X}(y|x) = \frac{f_{X,Y}(x, y)}{f_X(x)}$ , provided $f_X(x) > 0$

Notice the structure: the conditional distribution is the joint divided by the marginal. This is the multivariate version of $P(A|B) = P(A \cap B)/P(B)$ .

Independence of random variables

Two random variables are independent if knowing the value of one gives you no information about the other. Independence simplifies many calculations and is a key assumption in stochastic modeling.

Definition of independence

$X$ and $Y$ are independent if and only if their joint distribution factors into the product of their marginals:

Discrete: $p_{X,Y}(x, y) = p_X(x) \cdot p_Y(y)$ for all $x, y$
Continuous: $f_{X,Y}(x, y) = f_X(x) \cdot f_Y(y)$ for all $x, y$

Equivalently, $X$ and $Y$ are independent if and only if $P(X \in A, Y \in B) = P(X \in A) \cdot P(Y \in B)$ for all events $A$ and $B$ .

To check independence, verify that the factorization holds for every pair of values. If it fails for even one pair, the variables are dependent.

Properties of independent variables

When $X$ and $Y$ are independent:

$P(X \in A, Y \in B) = P(X \in A) \cdot P(Y \in B)$ for any events $A$ and $B$
$E[XY] = E[X] \cdot E[Y]$
$\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)$
$M_{X+Y}(t) = M_X(t) \cdot M_Y(t)$
$E[g(X)h(Y)] = E[g(X)] \cdot E[h(Y)]$ for any functions $g$ and $h$

Be careful: $E[XY] = E[X] \cdot E[Y]$ does not imply independence. Uncorrelated variables (zero covariance) are not necessarily independent.

Examples of independent variables

Independent:

Outcomes of two separate coin tosses
Numbers of customers arriving at two different stores during non-overlapping time intervals
Heights of randomly selected individuals from different populations

Dependent:

The number of defective items in a sample and the total number of items inspected
Temperature and humidity at the same location
Stock prices of two companies in the same industry

Functions of random variables

A function of random variables creates a new random variable by applying some transformation. For example, if $X$ is a random variable and $Y = X^2$ , then $Y$ is also a random variable with its own distribution.

Distribution of functions of random variables

To find the distribution of $Y = g(X)$ , the general CDF method works as follows:

Write the CDF of $Y$ : $F_Y(y) = P(Y \leq y) = P(g(X) \leq y)$
Express $\{g(X) \leq y\}$ in terms of $X$ (e.g., if $g$ is increasing, this becomes $\{X \leq g^{-1}(y)\}$ )
Evaluate using the known distribution of $X$
Differentiate $F_Y(y)$ to get the PDF $f_Y(y)$ (for continuous variables)

For a monotone differentiable function $g$ with inverse $g^{-1}$ , there's a direct formula:

$f_Y(y) = f_X(g^{-1}(y)) \cdot \left|\frac{d}{dy}g^{-1}(y)\right|$

The absolute value of the derivative accounts for whether $g$ is increasing or decreasing.

Transformations of random variables

Common transformations you'll encounter:

Linear: $Y = aX + b$ . If $X$ has PDF $f_X$ , then $f_Y(y) = \frac{1}{|a|} f_X\left(\frac{y - b}{a}\right)$
Exponential: $Y = e^X$ . Apply the CDF method or the change-of-variables formula with $g^{-1}(y) = \ln(y)$
Square: $Y = X^2$ . This is not monotone, so you need to handle positive and negative values of $X$ separately

These transformations come up frequently when modeling stochastic processes, since you often need to derive the distribution of some quantity that depends on underlying random variables.