Monte Carlo methods are the backbone of modern computational engineering—they let you tackle problems that would be impossible to solve analytically. When you're facing a 50-dimensional integral, a complex Bayesian posterior, or a system with millions of possible states, these techniques transform "unsolvable" into "solvable with enough samples." You're being tested on your ability to recognize when each method applies, why it works, and how to reduce computational cost while maintaining accuracy.
The core insight connecting everything here is that randomness, used strategically, can approximate deterministic quantities. Whether you're estimating an integral, sampling from a weird distribution, or simulating a physical system, the underlying principle is the same: the Law of Large Numbers guarantees convergence. Don't just memorize algorithm names—understand what problem each method solves and when you'd choose one over another.
Foundational Principles
These methods establish the theoretical basis for why random sampling can solve deterministic problems. The key mechanism is statistical convergence: as sample size increases, estimates approach true values with quantifiable uncertainty.
Basic Monte Carlo Simulation
Random sampling to estimate numerical results—the foundational technique that transforms intractable analytical problems into computational ones
Law of Large Numbers drives convergence—your estimate approaches the true expected value as n→∞, with error scaling as O(1/n)
Dimension-independent convergence rate—unlike grid-based methods, Monte Carlo doesn't suffer from the curse of dimensionality, making it essential for high-dimensional problems
Monte Carlo Integration
Estimates integrals via random sampling—approximates ∫f(x)dx by averaging f(xi) over random samples, scaled by the domain volume
High-dimensional integrals become tractable—where quadrature rules require O(nd) points for d dimensions, Monte Carlo needs only O(n) samples regardless of dimensionality
Variance determines accuracy—the standard error of your estimate is σ/n, so reducing variance directly improves efficiency
Compare: Basic Monte Carlo Simulation vs. Monte Carlo Integration—both rely on the Law of Large Numbers, but simulation focuses on expected values of random processes while integration targets definite integrals of functions. FRQs often ask you to set up the estimator for each case—know the difference.
Variance Reduction Strategies
Raw Monte Carlo can be painfully slow to converge. These techniques squeeze more accuracy from fewer samples by exploiting problem structure to reduce estimator variance without introducing bias.
Importance Sampling
Concentrates samples in high-impact regions—instead of sampling uniformly, you sample from a proposal distribution q(x) that emphasizes where f(x)p(x) is large
Reweights samples to correct for bias—each sample gets multiplied by the importance weightw(x)=p(x)/q(x) to maintain an unbiased estimate
Critical for rare event simulation—when you need to estimate probabilities like 10−9, uniform sampling would require billions of samples; importance sampling can reduce this dramatically
Stratified Sampling
Divides the domain into strata and samples each separately—guarantees representation from all regions, eliminating the chance of "missing" important areas
Reduces variance in heterogeneous populations—if your function varies significantly across regions, stratification captures this structure rather than leaving it to chance
Variance reduction is proportional to between-strata variance—the more different your strata are from each other, the bigger the payoff
Latin Hypercube Sampling
Ensures even coverage across all dimensions simultaneously—divides each dimension into n equal intervals and places exactly one sample in each "row" and "column" of the hypercube
Avoids clustering and gaps—unlike pure random sampling, which can leave large regions unexplored by chance, LHS guarantees spatial coverage
Particularly powerful for sensitivity analysis—when you need to understand how outputs depend on multiple inputs, LHS efficiently explores the entire parameter space
Compare: Stratified Sampling vs. Latin Hypercube Sampling—both impose structure on sample placement, but stratified sampling divides the output space into regions while LHS ensures coverage of the input space across all dimensions. Use stratified when you know which output regions matter; use LHS when exploring high-dimensional input spaces.
Variance Reduction Techniques (General)
Control variates use correlated known quantities—if you know E[g(X)] exactly and g(X) correlates with your target f(X), subtract off the "error" in g to reduce variance in f
Antithetic variates exploit symmetry—pair each sample x with a "mirror" sample that induces negative correlation, reducing variance when the function is monotonic
Combining methods multiplies benefits—you can layer importance sampling, control variates, and stratification together for dramatic efficiency gains
Compare: Importance Sampling vs. Control Variates—importance sampling changes where you sample, while control variates adjust what you compute with each sample. Importance sampling requires a good proposal distribution; control variates require a correlated function with known expectation.
Sampling from Complex Distributions
When you can't sample directly from your target distribution, these methods construct samples through clever indirect approaches. The core challenge is generating samples that faithfully represent distributions you can only evaluate pointwise.
Rejection Sampling
Generates samples using an easy-to-sample proposal distribution—draw from q(x), then accept with probability proportional to p(x)/Mq(x) where M bounds the ratio
Acceptance rate determines efficiency—if your proposal poorly matches the target, most samples get rejected and computation is wasted
Requires bounded density ratio—you must find M such that p(x)≤Mq(x) everywhere, which becomes impossible in high dimensions where the target concentrates in tiny regions
Markov Chain Monte Carlo (MCMC)
Constructs a Markov chain whose stationary distribution is your target—instead of independent samples, you generate a sequence of dependent samples that eventually "forget" their starting point
Handles high-dimensional, unnormalized distributions—you only need to evaluate p(x) up to a constant, making MCMC essential for Bayesian posteriors where the normalizing constant is intractable
Convergence requires careful diagnostics—the chain must "burn in" before samples represent the target, and autocorrelation means effective sample size is less than actual sample size
Compare: Rejection Sampling vs. MCMC—rejection sampling produces independent samples but fails in high dimensions due to vanishing acceptance rates; MCMC produces dependent samples but scales to arbitrary dimensions. If your problem is low-dimensional and you have a good proposal, use rejection; otherwise, MCMC is your tool.
MCMC Algorithms
These specific algorithms implement the MCMC framework with different proposal mechanisms. The choice of algorithm depends on your distribution's structure and the tradeoffs between simplicity and efficiency.
Metropolis-Hastings Algorithm
Proposes moves and accepts/rejects based on a probability ratio—accept probability is min(1,p(x)q(x′∣x)p(x′)q(x∣x′)), ensuring detailed balance with the target
Flexible proposal distributions—you can use symmetric random walks, independent proposals, or problem-specific moves; tuning the proposal is critical for efficiency
Foundation for most MCMC methods—Gibbs sampling, Hamiltonian Monte Carlo, and many others are special cases or extensions of Metropolis-Hastings
Gibbs Sampling
Samples each variable from its full conditional distribution—cycles through variables, updating each according to p(xi∣x−i) while holding others fixed
Always accepts proposed moves—unlike Metropolis-Hastings, there's no rejection step because you're sampling exactly from the conditional
Ideal for conjugate Bayesian models—when conditionals have closed forms (common in hierarchical models), Gibbs is simple to implement and efficient
Compare: Metropolis-Hastings vs. Gibbs Sampling—Metropolis-Hastings works with any proposal and handles arbitrary targets, while Gibbs requires tractable conditional distributions but avoids rejection. If you can derive the conditionals, Gibbs is usually faster; if not, fall back to Metropolis-Hastings.
Quick Reference Table
Concept
Best Examples
Foundational estimation
Basic Monte Carlo, Monte Carlo Integration
Variance reduction
Importance Sampling, Control Variates, Antithetic Variates, Stratified Sampling
Space-filling designs
Latin Hypercube Sampling, Stratified Sampling
Direct sampling methods
Rejection Sampling
Markov chain methods
MCMC, Metropolis-Hastings, Gibbs Sampling
Rare event estimation
Importance Sampling
High-dimensional integration
Monte Carlo Integration, Latin Hypercube Sampling
Bayesian inference
MCMC, Gibbs Sampling, Metropolis-Hastings
Self-Check Questions
Convergence comparison: Both basic Monte Carlo and MCMC rely on convergence theorems. What theorem governs each, and why does MCMC require a "burn-in" period while basic Monte Carlo does not?
Method selection: You need to estimate P(X>10) where X∼N(0,1). Why would basic Monte Carlo be inefficient, and which variance reduction technique would you apply? Set up the estimator.
Compare and contrast: Explain why rejection sampling becomes impractical in high dimensions while MCMC remains viable. What property of high-dimensional spaces causes this difference?
Algorithm choice: You're working with a Bayesian hierarchical model where all conditional distributions are conjugate. Would you choose Metropolis-Hastings or Gibbs sampling? Justify your answer and explain what changes if the conditionals aren't tractable.
Variance reduction: A Monte Carlo estimate has standard error of 0.1 with n=1000 samples. How many samples would you need to reduce the error to 0.01 using basic Monte Carlo? Name two variance reduction techniques that could achieve similar accuracy with fewer samples, and explain the mechanism of each.