Foundations of Decision Theory
Risk and expected utility give you a systematic way to make choices when outcomes are uncertain. Instead of just asking "what's the average result?", these tools let you factor in how much you care about different outcomes and how comfortable you are with uncertainty.
This matters throughout Bayesian statistics: whether you're choosing between estimators, designing a medical trial, or allocating a portfolio, you need a principled way to weigh potential gains against potential losses. This section covers the core machinery: expected value, utility functions, risk measures, and how they all connect to Bayesian inference.
Expected Value
The expected value of a random variable is the weighted average of all possible outcomes, where the weights are the probabilities:
For continuous random variables, the sum becomes an integral: .
Expected value tells you what outcome you'd get on average over many repetitions. It's the starting point for evaluating decisions, but it has a real limitation: it treats every dollar (or unit of outcome) equally, regardless of context. Losing $10,000 when you have $11,000 is very different from losing $10,000 when you have $1,000,000, but expected value doesn't capture that distinction. That's where utility comes in.
Utility Functions
A utility function maps outcomes (monetary or otherwise) to a numerical scale representing how much satisfaction or value you actually derive from them. The key idea: the subjective value of an outcome often differs from its face value.
Common forms include:
- Linear: . Every additional dollar is worth the same amount of utility.
- Logarithmic: . Each additional dollar is worth less than the last. Widely used for modeling wealth.
- Exponential: , where controls risk sensitivity.
The shape of the utility function encodes your risk attitude:
- Concave (curves downward) → risk-averse
- Linear (straight line) → risk-neutral
- Convex (curves upward) → risk-seeking
Risk Aversion vs. Risk Seeking
These terms describe how you respond to uncertainty when the expected value is held constant.
A risk-averse person prefers a guaranteed $50 over a coin flip for $100 or nothing (both have an expected value of $50). A risk-seeking person would take the coin flip, or even prefer it to a guaranteed $55.
Risk aversion is the most commonly observed attitude in practice. It explains why people buy insurance (paying a premium to eliminate uncertainty) and why investors demand higher expected returns for riskier assets. Risk-seeking behavior shows up in specific contexts like gambling or speculative investments.
The degree of risk aversion is quantified by the curvature of the utility function. A more sharply concave function means stronger risk aversion.
Expected Utility Theory
Expected utility theory combines probability with utility to produce a single number you can use to rank decisions. Instead of maximizing expected value, you maximize expected utility.
Von Neumann-Morgenstern Axioms
Expected utility theory rests on four axioms about rational preferences:
- Completeness: For any two alternatives A and B, you can always say you prefer A, prefer B, or are indifferent.
- Transitivity: If you prefer A to B and B to C, then you prefer A to C. Preferences are consistent.
- Continuity: If you prefer A to B to C, there exists some probability mix of A and C that you'd find equally preferable to B. No outcome is infinitely good or bad.
- Independence: If you prefer A to B, then you also prefer a lottery mixing A with some irrelevant option C over the same lottery mixing B with C. Irrelevant alternatives don't change your ranking.
If your preferences satisfy these axioms, a utility function exists such that you'll always prefer the option with higher expected utility. Violations of these axioms (and there are well-documented ones) motivate alternative theories like prospect theory.
Utility Maximization Principle
A rational decision-maker picks the action that maximizes expected utility:
Here, is the set of available actions, is the set of possible states of the world, is the utility of taking action when state occurs, and is the probability of state . In a Bayesian context, is typically the posterior probability after observing data.
Certainty Equivalent
The certainty equivalent (CE) is the guaranteed amount that gives you the same utility as a risky prospect. You find it by:
- Computing the expected utility of the risky prospect:
- Inverting the utility function:
For a risk-averse person, the certainty equivalent is less than the expected value. The difference between the expected value and the certainty equivalent is the risk premium: the amount you'd pay to eliminate the uncertainty.
For example, suppose and you face a 50/50 gamble between $100 and $400. The expected utility is . The certainty equivalent is . The expected value is $250, so the risk premium is $50.
Risk Measures
Risk measures quantify the uncertainty or potential for loss in a decision problem. They complement expected utility by giving you concrete numbers to compare alternatives.
Variance and Standard Deviation
Variance measures how spread out a distribution is around its mean:
Standard deviation is the square root of variance, which puts the measure back in the original units:
These are the most common risk measures, but they have limitations. Variance penalizes upside and downside deviations equally, which doesn't match how most people think about risk. They're also sensitive to outliers and work best when the distribution is roughly symmetric.
Value at Risk (VaR)
Value at Risk estimates the maximum loss you'd expect over a given time period at a specified confidence level. For example, a 1-day 95% VaR of $1 million means: "On 95% of days, losses won't exceed $1 million."
VaR can be computed using:
- Historical simulation (use past data directly)
- Variance-covariance method (assume normality)
- Monte Carlo simulation (generate scenarios from a model)
The main weakness of VaR: it tells you nothing about how bad things get in the worst 5% (or 1%) of cases. A portfolio could have the same VaR but very different tail behavior.
Conditional Value at Risk (CVaR)
Conditional Value at Risk (also called Expected Shortfall) addresses VaR's blind spot. It measures the average loss in the scenarios that exceed the VaR threshold.
If your 95% VaR is $1 million, CVaR answers: "In the worst 5% of cases, what's the average loss?"
CVaR is considered a more coherent risk measure than VaR because it satisfies subadditivity: the CVaR of a combined portfolio is never worse than the sum of individual CVaRs. This means diversification is always rewarded, which VaR doesn't guarantee.

Bayesian Decision Analysis
Bayesian decision analysis ties together everything above with Bayesian inference. You use posterior distributions to represent your updated uncertainty, then choose actions that minimize expected loss (or equivalently, maximize expected utility).
Prior and Posterior Distributions
The prior distribution encodes what you believe about unknown parameters before seeing data. After observing data , Bayes' theorem gives you the posterior distribution:
The posterior becomes the basis for all subsequent decisions. Your choice of prior can meaningfully affect the posterior, especially with small samples, so prior selection deserves careful thought.
Loss Functions
A loss function quantifies the cost of taking action when the true parameter value is . Common choices:
- Squared error loss: . Penalizes large errors heavily. The optimal action (Bayes estimator) is the posterior mean.
- Absolute error loss: . Penalizes errors linearly. The Bayes estimator is the posterior median.
- 0-1 loss: if , and otherwise. Used for classification. The Bayes estimator is the posterior mode.
The shape of your loss function directly determines which summary of the posterior is optimal. This is one of the most practical insights in Bayesian decision theory.
Bayes Risk
The posterior expected loss (or posterior risk) for action given observed data is:
The Bayes risk goes one step further: it's the expected loss averaged over all possible datasets, using the marginal distribution of the data:
where is the optimal action for each dataset. A decision rule that minimizes Bayes risk is called a Bayes rule. Comparing Bayes risks across different decision strategies tells you which approach performs best overall.
Risk Attitudes
Risk-Neutral Behavior
A risk-neutral decision-maker has a linear utility function: . Every dollar of expected value is worth the same, regardless of uncertainty. Decisions are based purely on expected values, ignoring variance or higher moments.
This is rarely observed in practice, but it serves as a useful theoretical benchmark. Example: being indifferent between a guaranteed $50 and a 50/50 chance at $100 or $0.
Risk-Averse Behavior
Risk-averse decision-makers have concave utility functions with decreasing marginal utility: each additional dollar of wealth adds less utility than the last. This means you'd prefer a guaranteed $45 over a 50/50 shot at $100 or $0 (expected value $50), because the pain of losing outweighs the pleasure of gaining.
Risk aversion is the most common attitude observed empirically. It drives insurance markets, explains why most investors demand a premium for holding risky assets, and is the default assumption in most financial models.
Risk-Seeking Behavior
Risk-seeking decision-makers have convex utility functions with increasing marginal utility. They prefer a 50/50 chance at $100 or $0 over a guaranteed $55 (even though the guaranteed amount has higher expected value).
This behavior appears in specific contexts: gambling, lottery ticket purchases, and some speculative investment strategies. Prospect theory suggests people can be risk-seeking for losses (preferring a gamble over a sure loss) even if they're risk-averse for gains.
Utility Elicitation Methods
To apply expected utility theory in practice, you need to figure out what someone's utility function actually looks like. This is harder than it sounds.
Direct Assessment Techniques
You ask individuals to assign utility values to outcomes directly. The standard gamble method is the most theoretically grounded: you ask someone to choose between a certain outcome and a lottery, then adjust the lottery probabilities until they're indifferent. The indifference point reveals a point on their utility function.
These methods are straightforward but prone to biases. People struggle with probabilities, and their responses can shift depending on how questions are framed.
Indirect Assessment Techniques
Instead of asking for utility values directly, you infer the utility function from observed choices. The certainty equivalent method presents a series of lotteries and asks what guaranteed amount would make the person indifferent. The probability equivalent method fixes the outcomes and asks what probability would make them indifferent.
These tend to be more reliable for complex decisions because they rely on choices rather than introspection. However, they require carefully designed scenarios to avoid confounding factors.

Consistency Checks
After eliciting a utility function, you should verify it's internally consistent. This means testing whether the axioms hold:
- Does transitivity hold? (If A is preferred to B and B to C, is A preferred to C?)
- Does independence hold across different framings?
- Are responses stable when the same question is asked in different formats?
Inconsistencies often reveal cognitive biases or fatigue effects, and they signal that the elicited function may need adjustment.
Applications in Finance
Portfolio Optimization
Bayesian portfolio optimization improves on classical mean-variance analysis by treating expected returns and covariances as uncertain quantities with their own distributions. The Black-Litterman model is a prominent example: it starts with market equilibrium returns as a prior and blends in an investor's subjective views to produce a posterior distribution over expected returns.
Utility functions enter by determining how aggressively the portfolio tilts toward higher-return (but riskier) assets. A more risk-averse investor's optimal portfolio will be more conservative, even given the same posterior beliefs.
Option Pricing
Bayesian methods address a key weakness of classical option pricing: parameter uncertainty. In the standard Black-Scholes-Merton model, volatility is treated as known. Bayesian approaches place a prior on volatility (or other parameters), update it with market data, and propagate that uncertainty through to option prices.
This produces option valuations that reflect not just the best-guess parameters but also how confident you are in those guesses. Stochastic volatility models with Bayesian estimation are a common application.
Risk Management Strategies
Bayesian decision theory supports dynamic risk management by allowing risk assessments to update as new data arrives. For example, a Bayesian VaR estimate incorporates parameter uncertainty into the loss distribution, producing wider confidence intervals when data is scarce and tighter ones as evidence accumulates.
This applies across market risk, credit risk, and operational risk. The ability to formally combine prior knowledge with incoming data makes Bayesian approaches particularly useful when historical data is limited or market conditions are changing.
Critiques and Limitations
Violations of Expected Utility Theory
Several well-known paradoxes show that real human choices systematically violate the axioms:
- Allais paradox: People's choices shift when the same outcomes are presented with different surrounding probabilities, violating the independence axiom.
- Ellsberg paradox: People prefer gambles with known probabilities over gambles with unknown probabilities, even when expected values are identical. This reveals ambiguity aversion, which expected utility theory doesn't capture.
- St. Petersburg paradox: A game with infinite expected value but finite willingness to pay, which motivated the original development of utility theory (Bernoulli's logarithmic utility).
- Preference reversals: People sometimes rank option A above B in a direct choice but assign a higher price to B when asked to value each separately.
Prospect Theory vs. Expected Utility
Prospect theory (Kahneman and Tversky, 1979) offers a descriptive alternative that better matches observed behavior. Key differences from expected utility:
- Reference dependence: People evaluate outcomes as gains or losses relative to a reference point, not as final wealth levels.
- Loss aversion: Losses hurt roughly twice as much as equivalent gains feel good.
- Probability weighting: People overweight small probabilities and underweight large ones, rather than using objective probabilities directly.
- Diminishing sensitivity: The value function is concave for gains and convex for losses, meaning people are risk-averse for gains but risk-seeking for losses.
Prospect theory is descriptive (how people do decide), while expected utility theory is normative (how people should decide if they want to be consistent).
Behavioral Economics Insights
Beyond prospect theory, behavioral research has identified systematic patterns that affect decision-making:
- Framing effects: The same choice presented as a gain vs. a loss leads to different decisions.
- Anchoring: Irrelevant numbers influence estimates (e.g., being shown a random number before estimating a quantity).
- Availability heuristic: People overestimate the probability of events that are easy to recall.
- Mental accounting: People treat money differently depending on which mental "account" it belongs to, rather than treating wealth as fungible.
These findings don't invalidate Bayesian decision theory, but they highlight the gap between normative models and actual behavior. More descriptively accurate models continue to be developed.
Computational Methods
Many Bayesian decision problems involve integrals that can't be solved analytically. Computational methods make these problems tractable.
Monte Carlo Simulation
Monte Carlo methods estimate expectations by generating random samples and averaging. To estimate :
- Draw samples from the relevant distribution (e.g., the posterior).
- Compute for each sample.
- Approximate:
The accuracy improves with , so quadrupling the number of samples cuts the error in half. Monte Carlo is especially useful when the decision involves multiple uncertain parameters.
Markov Chain Monte Carlo (MCMC)
When you can't sample directly from the posterior (which is common with non-conjugate priors or complex models), MCMC algorithms construct a Markov chain whose stationary distribution is the target posterior.
- Metropolis-Hastings: Proposes candidate samples and accepts/rejects them based on a probability ratio. Very general but can be slow to converge.
- Gibbs sampling: Samples each parameter conditionally on the current values of all others. Efficient when conditional distributions are easy to sample from.
Once you have MCMC samples from the posterior, you can plug them into Monte Carlo estimates of expected utility or expected loss, just as described above.
Sensitivity Analysis
Sensitivity analysis checks how robust your decision is to changes in assumptions or inputs. This is critical because Bayesian decisions depend on the prior, the likelihood model, and the loss function.
- Local sensitivity: Vary one input slightly and see how the optimal decision changes.
- Global sensitivity: Vary inputs across their full plausible range to map out how decisions shift.
For example, you might check whether the optimal portfolio allocation changes substantially if you use a different prior on expected returns, or if you increase the risk aversion parameter by 20%. If the decision is stable across reasonable variations, you can be more confident in it.