Forward vs Inverse Modeling in Geophysics
Defining Forward and Inverse Modeling
Forward and inverse modeling are two complementary ways of relating subsurface properties to geophysical measurements. Understanding the distinction is fundamental to everything else in this topic.
Forward modeling starts with a known (or assumed) subsurface model and predicts what geophysical data you'd observe. It requires a mathematical description of the physics linking subsurface properties to measurements. For example:
- Calculating gravity anomalies from a given density distribution
- Computing seismic travel times through a specified velocity model
Inverse modeling works in the opposite direction: you start with observed data and try to recover the subsurface model that produced it. The goal is to find a model whose predicted data (via forward modeling) matches the observations as closely as possible.
Challenges in Inverse Modeling
Inverse problems are typically ill-posed, which means they violate one or more of the conditions for a well-posed problem (existence, uniqueness, and stability of the solution). In practice, this shows up in two major ways:
- Non-uniqueness: Multiple subsurface models can explain the observed data equally well. You can't always tell which one is "correct."
- Instability: Small changes in the data (like noise) can produce large changes in the estimated model.
The relationship between model parameters and geophysical data is also often non-linear. A linear problem would let you solve for the model in one step using matrix algebra. Non-linearity means you typically need iterative optimization, where you repeatedly update your model estimate, run the forward problem, compare to data, and adjust.
Inversion Techniques for Subsurface Properties
Mathematical Formulation of Inversion
At its core, inversion is an optimization problem. You define an objective function that measures the misfit between predicted and observed data, then adjust model parameters to minimize it.
Common misfit measures include:
- Least-squares (L2-norm): . Sensitive to outliers because large residuals get squared.
- L1-norm: . More robust to outliers since residuals aren't squared.
Because the problem is ill-posed, minimizing misfit alone isn't enough. Regularization adds a penalty term to the objective function that encodes your expectations about the model:
- Tikhonov regularization penalizes rough or complex models, favoring smooth solutions. The objective function becomes something like , where controls the trade-off between fitting the data and keeping the model smooth.
- Total variation regularization allows sharp boundaries in the model, which is useful when you expect distinct geological layers or contacts.

Optimization Algorithms and Probabilistic Inversion
Gradient-based methods are the workhorse of deterministic inversion. They iteratively update model parameters by computing how the objective function changes with respect to each parameter:
- Start with an initial model guess.
- Compute the gradient of the objective function (i.e., the direction of steepest increase in misfit).
- Update the model in the direction that reduces misfit.
- Repeat until convergence.
Two common variants:
- Steepest descent: Updates directly along the negative gradient. Simple but can converge slowly, especially near the solution.
- Conjugate gradient: Uses information from previous iterations to choose better search directions, improving convergence rate.
Markov chain Monte Carlo (MCMC) methods take a fundamentally different approach. Instead of seeking one best model, they sample the space of possible models to build up a posterior probability distribution:
- Start with an initial model.
- Propose a new model by perturbing the current one.
- Evaluate whether the new model fits the data better (accounting for prior information).
- Accept or reject the proposal based on a probability rule.
- Repeat thousands to millions of times.
The resulting ensemble of accepted models maps out which parameter values are probable given the data. Common MCMC algorithms include the Metropolis-Hastings algorithm and the Gibbs sampler.
Limitations of Inversion Results
Non-Uniqueness and Resolution
No matter how good your inversion algorithm is, the results carry inherent limitations.
Non-uniqueness means multiple models can fit the data equally well. The data simply don't contain enough information to pin down a single answer. This is especially problematic when data coverage is sparse or when different physical properties produce similar signals.
Resolution describes how well the inversion can distinguish features at different scales and locations. It depends on:
- The spatial and temporal sampling density of your data
- The frequency content of the signal (higher frequencies resolve finer features)
- The physics of the measurement itself
The resolution matrix provides a formal way to assess this. In an ideal case, it would be an identity matrix (perfect resolution). In practice, off-diagonal elements show how features at one location "smear" into neighboring locations in the inverted model.
Uncertainty Quantification and Model Validation
Uncertainty in inverted models comes from several sources:
- Measurement errors (noise in the data)
- Modeling errors (simplified physics, missing geological complexity)
- Ill-posedness of the inverse problem itself
The model covariance matrix quantifies this uncertainty. Its diagonal elements give the variance of each model parameter (how uncertain that parameter is), while off-diagonal elements reveal correlations between parameters (when one parameter is poorly constrained, others linked to it may be as well).
Two key validation strategies:
- Sensitivity analysis: Perturb the input data or model parameters and observe how the inversion result changes. If small perturbations cause large changes, the result is poorly constrained.
- Cross-validation: Test whether the inverted model can predict data it wasn't trained on.
- Leave-one-out: Remove one data point, invert the rest, and check how well the result predicts the removed point.
- K-fold: Split data into K subsets, use each in turn as a validation set while inverting the remaining data.

Deterministic vs Probabilistic Inversion
Deterministic Inversion
Deterministic inversion seeks a single "best" model. The process follows these steps:
- Choose an initial model and define an objective function.
- Use a gradient-based algorithm to iteratively adjust model parameters.
- Stop when a convergence criterion is met (e.g., misfit drops below a threshold) or a maximum number of iterations is reached.
The output is a point estimate of the subsurface properties. This is the most likely or optimal model given the data and your chosen objective function. The main trade-off: deterministic methods are computationally efficient, but they provide limited information about uncertainty. You get one answer, not a range of plausible answers.
Probabilistic Inversion
Probabilistic inversion estimates the full posterior probability distribution of model parameters, conditioned on the observed data and prior information. This follows from Bayes' theorem:
where is the posterior, is the likelihood, and is the prior.
MCMC sampling is the most common approach. The ensemble of models generated by the chain represents the posterior distribution, from which you can extract:
- Marginal distributions for individual parameters
- Confidence intervals (e.g., the 95% credible interval for velocity at a given depth)
- Correlation structure between parameters
The trade-off is computational cost. Exploring a high-dimensional parameter space thoroughly requires many forward model evaluations.
Comparison and Hybrid Approaches
Deterministic: Computationally efficient, produces a single best-fit model. Best for large-scale problems or situations where computational resources are limited.
Probabilistic: Computationally expensive, but gives a full picture of uncertainty. Best when quantifying uncertainty matters for decision-making or risk assessment.
Hybrid approaches try to get the best of both worlds:
- Ensemble Kalman filtering maintains an ensemble of models (capturing uncertainty) but updates them sequentially as new data arrive, avoiding the full cost of MCMC.
- Particle swarm optimization uses a population of candidate solutions ("particles") that explore the parameter space collectively, balancing exploration of new regions with convergence toward promising solutions.
In practice, the choice between deterministic, probabilistic, and hybrid methods depends on the size of the problem, the computational budget, and how critical uncertainty estimates are for the application.