Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
Expected value is the backbone of probability theory. It quantifies what "should" happen on average when randomness is involved. Whether you're analyzing casino games, insurance policies, or scientific experiments, expected value gives you a single number that summarizes a random variable's long-run behavior.
Don't just memorize these formulas in isolation. The real skill is understanding when to use each one and why it works. Can you recognize a binomial setup and immediately write down ? Do you know why linearity of expectation is so powerful even without independence? Focus on connecting each formula to the scenario it applies to.
The fundamental expected value formulas differ based on whether your random variable takes countable values or spans a continuous range. The discrete case uses summation; the continuous case uses integration.
Multiply each possible value by its probability, then sum all the products. Think of it as a weighted average where values with higher probabilities pull the expected value toward them. The result represents the "balance point" of the distribution, the center of its probability mass.
Here is the probability density function (PDF). If you imagine cutting the shape of the PDF out of cardboard, the expected value is the point where it would balance on your finger. Integration replaces summation because there are uncountably many possible values.
Compare: Discrete vs. Continuous EV: both calculate a weighted average, but discrete uses while continuous uses . If a problem gives you a PDF, you're integrating. If it gives you a probability table, you're summing.
These properties are your computational shortcuts. Linearity especially shows up constantly because it works even when variables aren't independent.
This is always true, regardless of whether and are independent. That's what makes it so useful. It extends to any linear combination with constants:
The problem-solving strategy here is to break a complex random variable into simpler pieces, find each piece's expected value, and add them up. For example, to find the expected number of heads in 10 coin flips, you can treat each flip as its own random variable and sum 10 individual expected values.
This is called LOTUS (Law of the Unconscious Statistician). The key insight: you don't need to find the distribution of first. You just apply directly inside the expectation using the original distribution of . The most common application is finding , which you need for the variance shortcut formula .
Compare: Linearity vs. LOTUS: linearity handles sums of random variables (); LOTUS handles functions of a single random variable (). They solve different types of problems.
When you learn something about one variable, it changes what you expect from another. Conditional expectation formalizes how new information updates your predictions.
For the continuous case, you replace the conditional PMF with the conditional PDF. This gives you the "best guess" for once you know takes a specific value. If changes as changes, that tells you and are dependent.
The overall expected value equals the average of the conditional expected values, weighted by how likely each conditioning event is. This is sometimes called iterated expectation.
Use this when calculating directly is hard, but calculating it conditional on some other variable is easy. A classic setup: "First flip a coin. If heads, roll one die; if tails, roll two dice. What's the expected total?" You find the conditional expectation for each coin outcome, then average them.
Compare: Conditional EV vs. Law of Total Expectation: conditional EV gives you the answer for a specific scenario; the law of total expectation averages across all scenarios to recover the unconditional answer.
For common distributions, the expected value formulas have already been derived. Know these cold so you can apply them instantly.
Here is the number of trials and is the probability of success on each trial. In 100 coin flips with , you'd expect 50 heads on average. This formula comes directly from linearity: a binomial is the sum of independent Bernoulli trials, each with expected value , so .
The parameter is the average rate of occurrence. For Poisson, the mean and the variance both equal . This distribution models counts of events in a fixed interval: emails per hour, accidents per month, typos per page.
Compare: Binomial vs. Poisson: binomial has a fixed number of trials; Poisson models events over a continuous interval with no fixed trial count. As and with held constant, the binomial approaches the Poisson.
Here is the rate parameter. The exponential distribution models waiting times between events in a Poisson process. If events happen at rate per hour, the average wait between events is of an hour (20 minutes). The exponential distribution also has the memoryless property: how long you've already waited doesn't affect how much longer you expect to wait.
This is just the midpoint of the interval . Since all values are equally likely, the average lands right in the center by symmetry. Quick sanity check: if is uniform on , the expected value is 5.
Compare: Exponential vs. Uniform: both are continuous, but exponential is skewed right (most probability mass near zero) while uniform is perfectly symmetric. Exponential's expected value depends only on the rate; uniform's depends only on the interval boundaries.
| Concept | Formula |
|---|---|
| Discrete EV | |
| Continuous EV | |
| Linearity | , |
| LOTUS | computed using the original distribution of |
| Conditional EV | , Law of Total Expectation: |
| Binomial | |
| Poisson | |
| Exponential | |
| Uniform |
What property allows you to calculate without knowing whether the variables are independent?
Compare the expected value formulas for Poisson and exponential distributions. How are their parameters related, and why does this make sense given the Poisson process interpretation?
If is uniform on and is binomial with and , which has the larger expected value? Show your work.
Explain when you would use the Law of Total Expectation instead of calculating directly. Give an example scenario.
You need to find for a discrete random variable. Would you use linearity of expectation or LOTUS? Write the formula you'd apply.