๐ŸŽฒIntro to Probability

Expected Value Formulas

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Expected value is the backbone of probability theory. It quantifies what "should" happen on average when randomness is involved. Whether you're analyzing casino games, insurance policies, or scientific experiments, expected value gives you a single number that summarizes a random variable's long-run behavior.

Don't just memorize these formulas in isolation. The real skill is understanding when to use each one and why it works. Can you recognize a binomial setup and immediately write down E(X)=npE(X) = np? Do you know why linearity of expectation is so powerful even without independence? Focus on connecting each formula to the scenario it applies to.


Core Definitions: Discrete vs. Continuous

The fundamental expected value formulas differ based on whether your random variable takes countable values or spans a continuous range. The discrete case uses summation; the continuous case uses integration.

Expected Value of a Discrete Random Variable

E(X)=โˆ‘xiโ‹…P(xi)E(X) = \sum x_i \cdot P(x_i)

Multiply each possible value by its probability, then sum all the products. Think of it as a weighted average where values with higher probabilities pull the expected value toward them. The result represents the "balance point" of the distribution, the center of its probability mass.

Expected Value of a Continuous Random Variable

E(X)=โˆซโˆ’โˆžโˆžxโ‹…f(x)โ€‰dxE(X) = \int_{-\infty}^{\infty} x \cdot f(x) \, dx

Here f(x)f(x) is the probability density function (PDF). If you imagine cutting the shape of the PDF out of cardboard, the expected value is the point where it would balance on your finger. Integration replaces summation because there are uncountably many possible values.

Compare: Discrete vs. Continuous EV: both calculate a weighted average, but discrete uses โˆ‘\sum while continuous uses โˆซ\int. If a problem gives you a PDF, you're integrating. If it gives you a probability table, you're summing.


Key Properties: Simplifying Calculations

These properties are your computational shortcuts. Linearity especially shows up constantly because it works even when variables aren't independent.

Linearity of Expectation

E(X+Y)=E(X)+E(Y)E(X + Y) = E(X) + E(Y)

This is always true, regardless of whether XX and YY are independent. That's what makes it so useful. It extends to any linear combination with constants:

E(aX+b)=aE(X)+bE(aX + b) = aE(X) + b

The problem-solving strategy here is to break a complex random variable into simpler pieces, find each piece's expected value, and add them up. For example, to find the expected number of heads in 10 coin flips, you can treat each flip as its own random variable and sum 10 individual expected values.

Expected Value of a Function of a Random Variable

E(g(X))=โˆ‘g(xi)โ€‰P(xi)(discrete)E(g(X)) = \sum g(x_i) \, P(x_i) \quad \text{(discrete)}

E(g(X))=โˆซg(x)โ€‰f(x)โ€‰dx(continuous)E(g(X)) = \int g(x) \, f(x) \, dx \quad \text{(continuous)}

This is called LOTUS (Law of the Unconscious Statistician). The key insight: you don't need to find the distribution of g(X)g(X) first. You just apply gg directly inside the expectation using the original distribution of XX. The most common application is finding E(X2)E(X^2), which you need for the variance shortcut formula Var(X)=E(X2)โˆ’[E(X)]2\text{Var}(X) = E(X^2) - [E(X)]^2.

Compare: Linearity vs. LOTUS: linearity handles sums of random variables (E(X+Y)E(X + Y)); LOTUS handles functions of a single random variable (E(g(X))E(g(X))). They solve different types of problems.


Conditional Expectation: Incorporating Information

When you learn something about one variable, it changes what you expect from another. Conditional expectation formalizes how new information updates your predictions.

Conditional Expected Value

E(XโˆฃY=y)=โˆ‘xiโ€‰P(X=xiโˆฃY=y)(discrete)E(X \mid Y = y) = \sum x_i \, P(X = x_i \mid Y = y) \quad \text{(discrete)}

For the continuous case, you replace the conditional PMF with the conditional PDF. This gives you the "best guess" for XX once you know YY takes a specific value. If E(XโˆฃY)E(X \mid Y) changes as YY changes, that tells you XX and YY are dependent.

Law of Total Expectation

E(X)=E(E(XโˆฃY))E(X) = E\big(E(X \mid Y)\big)

The overall expected value equals the average of the conditional expected values, weighted by how likely each conditioning event is. This is sometimes called iterated expectation.

Use this when calculating E(X)E(X) directly is hard, but calculating it conditional on some other variable is easy. A classic setup: "First flip a coin. If heads, roll one die; if tails, roll two dice. What's the expected total?" You find the conditional expectation for each coin outcome, then average them.

Compare: Conditional EV vs. Law of Total Expectation: conditional EV gives you the answer for a specific scenario; the law of total expectation averages across all scenarios to recover the unconditional answer.


Named Distributions: Memorize These Formulas

For common distributions, the expected value formulas have already been derived. Know these cold so you can apply them instantly.

Expected Value of a Binomial Distribution

E(X)=npE(X) = np

Here nn is the number of trials and pp is the probability of success on each trial. In 100 coin flips with p=0.5p = 0.5, you'd expect 50 heads on average. This formula comes directly from linearity: a binomial is the sum of nn independent Bernoulli trials, each with expected value pp, so E(X)=nโ‹…pE(X) = n \cdot p.

Expected Value of a Poisson Distribution

E(X)=ฮปE(X) = \lambda

The parameter ฮป\lambda is the average rate of occurrence. For Poisson, the mean and the variance both equal ฮป\lambda. This distribution models counts of events in a fixed interval: emails per hour, accidents per month, typos per page.

Compare: Binomial vs. Poisson: binomial has a fixed number of trials; Poisson models events over a continuous interval with no fixed trial count. As nโ†’โˆžn \to \infty and pโ†’0p \to 0 with np=ฮปnp = \lambda held constant, the binomial approaches the Poisson.

Expected Value of an Exponential Distribution

E(X)=1ฮปE(X) = \frac{1}{\lambda}

Here ฮป\lambda is the rate parameter. The exponential distribution models waiting times between events in a Poisson process. If events happen at rate ฮป=3\lambda = 3 per hour, the average wait between events is 1/31/3 of an hour (20 minutes). The exponential distribution also has the memoryless property: how long you've already waited doesn't affect how much longer you expect to wait.

Expected Value of a Uniform Distribution

E(X)=a+b2E(X) = \frac{a + b}{2}

This is just the midpoint of the interval [a,b][a, b]. Since all values are equally likely, the average lands right in the center by symmetry. Quick sanity check: if XX is uniform on [0,10][0, 10], the expected value is 5.

Compare: Exponential vs. Uniform: both are continuous, but exponential is skewed right (most probability mass near zero) while uniform is perfectly symmetric. Exponential's expected value depends only on the rate; uniform's depends only on the interval boundaries.


Quick Reference Table

ConceptFormula
Discrete EVE(X)=โˆ‘xiP(xi)E(X) = \sum x_i P(x_i)
Continuous EVE(X)=โˆซxf(x)โ€‰dxE(X) = \int x f(x) \, dx
LinearityE(X+Y)=E(X)+E(Y)E(X + Y) = E(X) + E(Y), E(aX+b)=aE(X)+bE(aX + b) = aE(X) + b
LOTUSE(g(X))E(g(X)) computed using the original distribution of XX
Conditional EVE(XโˆฃY)E(X \mid Y), Law of Total Expectation: E(X)=E(E(XโˆฃY))E(X) = E(E(X \mid Y))
BinomialE(X)=npE(X) = np
PoissonE(X)=ฮปE(X) = \lambda
ExponentialE(X)=1/ฮปE(X) = 1/\lambda
UniformE(X)=(a+b)/2E(X) = (a+b)/2

Self-Check Questions

  1. What property allows you to calculate E(X1+X2+โ‹ฏ+Xn)E(X_1 + X_2 + \cdots + X_n) without knowing whether the variables are independent?

  2. Compare the expected value formulas for Poisson and exponential distributions. How are their parameters related, and why does this make sense given the Poisson process interpretation?

  3. If XX is uniform on [2,8][2, 8] and YY is binomial with n=12n = 12 and p=0.5p = 0.5, which has the larger expected value? Show your work.

  4. Explain when you would use the Law of Total Expectation instead of calculating E(X)E(X) directly. Give an example scenario.

  5. You need to find E(X2)E(X^2) for a discrete random variable. Would you use linearity of expectation or LOTUS? Write the formula you'd apply.