Definition of random variables
A random variable is a function that assigns a numerical value to each outcome of a random experiment. This lets you work with outcomes mathematically, which is essential for everything in stochastic processes.
Random variables can take on different values, and the likelihood of each value is governed by an underlying probability distribution.
Formal mathematical definition
A random variable is a function that maps the sample space of a random experiment to the real numbers :
For each outcome , is a real number. The probability of an event related to is:
Technically, also needs to be measurable (the preimage of any Borel set must be in the sigma-algebra on ). For this course, the key idea is that translates abstract outcomes into numbers you can compute with.
Intuitive understanding
Think of a random variable as a rule that converts experimental outcomes into numbers. If you flip two coins, the sample space is . Defining = "number of heads" gives you , , , . Now you can ask questions like .
Other examples: the waiting time in a queue, the daily closing price of a stock, or the number of arrivals to a server in a given interval.
Discrete vs continuous variables
Random variables fall into two main categories based on the values they can take.
Discrete random variables have a countable set of possible values (finite or countably infinite). Examples: the number of defective items in a batch, or the number of customers arriving at a store per hour.
Continuous random variables can take any value within some interval of the real line. Examples: the height of a randomly selected person, or the exact time until the next bus arrives.
The distinction matters because discrete and continuous variables use different mathematical tools (sums vs. integrals) throughout the course.
Probability distributions
A probability distribution describes the likelihood of each value a random variable can take. Different types of distributions apply depending on whether the variable is discrete or continuous.
Probability mass functions (PMFs)
A probability mass function describes the distribution of a discrete random variable. The PMF of , written , gives the probability that equals a specific value :
Two properties must hold:
- for all
- , where the sum runs over all possible values of
For example, if counts the number of heads in two fair coin flips, then , , .
Probability density functions (PDFs)
A probability density function describes the distribution of a continuous random variable. The PDF of , written , represents the relative likelihood of taking a value near .
A critical point: is not a probability. For continuous variables, for any single point. Instead, you get probabilities by integrating the PDF over an interval:
Two properties must hold:
- for all
Cumulative distribution functions (CDFs)
The cumulative distribution function works for both discrete and continuous random variables. The CDF of , written , gives the probability that takes a value less than or equal to :
- For a discrete variable:
- For a continuous variable:
Every CDF is non-decreasing, right-continuous, and satisfies and . For continuous variables, the PDF is the derivative of the CDF: .
Expected value
The expected value (also called the mean or expectation) measures the central tendency of a random variable. It represents the long-run average you'd observe over many independent trials. It's denoted .
Definition of expected value
For a discrete random variable with PMF :
For a continuous random variable with PDF :
These sums/integrals must converge absolutely for the expected value to exist. (The Cauchy distribution is a classic example where does not exist.)
Properties of expected value
The expected value satisfies several useful properties:
- Linearity: For constants and ,
- Non-negativity: If , then
- Monotonicity: If , then

Linearity of expectation
This is one of the most useful results in probability. For any random variables and :
This holds regardless of whether and are independent. That's what makes it so powerful. You can break a complicated random variable into simpler pieces, compute each expectation separately, and add them up.
More generally, for random variables and constants :
Variance and standard deviation
While the expected value tells you where the distribution is centered, variance and standard deviation tell you how spread out it is. A higher variance means the values are more dispersed around the mean.
Definition of variance
The variance of , denoted or , is the expected squared deviation from the mean:
For a discrete variable:
For a continuous variable:
A very handy computational shortcut is the alternative formula:
This is often much easier to compute than working directly from the definition.
Definition of standard deviation
The standard deviation is the square root of the variance:
The standard deviation has the same units as itself, making it more interpretable than variance when you want to describe spread in the original scale.
Properties of variance
- Non-negativity: for any random variable , with equality only if is constant
- Scaling: For a constant ,
- Shift invariance: For a constant ,
- Sum of independent variables: If and are independent,
Note that property 4 requires independence, unlike linearity of expectation. For dependent variables, you need the covariance term: .
Calculating variance and standard deviation
-
Compute the expected value
-
Compute (the second moment)
-
Apply the shortcut:
-
Take the square root for the standard deviation:
Moment-generating functions
Moment-generating functions (MGFs) encode all the moments of a distribution into a single function. They're particularly useful for identifying distributions and proving results about sums of independent random variables.
Definition of moment-generating functions
The MGF of a random variable is defined as:
where is a real number.
For a discrete variable:
For a continuous variable:
The MGF may not exist for all distributions (it requires that be finite in some open interval around ).
Properties of moment-generating functions
- Uniqueness: If two random variables have the same MGF (in a neighborhood of ), they have the same distribution. This makes MGFs a powerful tool for proving that two variables share a distribution.
- Moment extraction: The -th moment of is obtained by differentiating times and evaluating at :
- Sum of independents: If and are independent, the MGF of their sum factors:
Applications of moment-generating functions
- Deriving moments: The mean is and the variance is
- Identifying distributions: If you compute an MGF and recognize its functional form, you can immediately identify the distribution
- Proving convergence results: MGFs are used in proofs of the Central Limit Theorem and other limit theorems
- Sums of independent variables: The factoring property makes it straightforward to find the distribution of sums (e.g., showing that the sum of independent Poissons is Poisson)

Joint distributions
Joint distributions describe the probability behavior of two or more random variables considered together. They capture not just each variable's individual behavior, but also the relationship between them.
Joint probability mass functions
The joint PMF of two discrete random variables and , written , gives the probability that and simultaneously:
Properties:
- for all and
Joint probability density functions
The joint PDF of two continuous random variables and , written , represents the relative likelihood of falling near the point . Probabilities come from double integrals over regions:
Properties:
- for all and
Marginal distributions
You recover the distribution of a single variable from the joint distribution by summing or integrating out the other variable.
- Discrete:
- Continuous:
The marginal distribution tells you about each variable on its own, but it discards information about the relationship between the variables. Two different joint distributions can produce the same marginals.
Conditional distributions
Conditional distributions describe the behavior of one variable given a specific value of another.
- Discrete: , provided
- Continuous: , provided
Notice the structure: the conditional distribution is the joint divided by the marginal. This is the multivariate version of .
Independence of random variables
Two random variables are independent if knowing the value of one gives you no information about the other. Independence simplifies many calculations and is a key assumption in stochastic modeling.
Definition of independence
and are independent if and only if their joint distribution factors into the product of their marginals:
- Discrete: for all
- Continuous: for all
Equivalently, and are independent if and only if for all events and .
To check independence, verify that the factorization holds for every pair of values. If it fails for even one pair, the variables are dependent.
Properties of independent variables
When and are independent:
- for any events and
- for any functions and
Be careful: does not imply independence. Uncorrelated variables (zero covariance) are not necessarily independent.
Examples of independent variables
Independent:
- Outcomes of two separate coin tosses
- Numbers of customers arriving at two different stores during non-overlapping time intervals
- Heights of randomly selected individuals from different populations
Dependent:
- The number of defective items in a sample and the total number of items inspected
- Temperature and humidity at the same location
- Stock prices of two companies in the same industry
Functions of random variables
A function of random variables creates a new random variable by applying some transformation. For example, if is a random variable and , then is also a random variable with its own distribution.
Distribution of functions of random variables
To find the distribution of , the general CDF method works as follows:
- Write the CDF of :
- Express in terms of (e.g., if is increasing, this becomes )
- Evaluate using the known distribution of
- Differentiate to get the PDF (for continuous variables)
For a monotone differentiable function with inverse , there's a direct formula:
The absolute value of the derivative accounts for whether is increasing or decreasing.
Transformations of random variables
Common transformations you'll encounter:
- Linear: . If has PDF , then
- Exponential: . Apply the CDF method or the change-of-variables formula with
- Square: . This is not monotone, so you need to handle positive and negative values of separately
These transformations come up frequently when modeling stochastic processes, since you often need to derive the distribution of some quantity that depends on underlying random variables.