💻Applications of Scientific Computing

Common Optimization Algorithms

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Optimization sits at the heart of scientific computing—whether you're training a neural network, fitting a model to experimental data, or solving engineering design problems, you're asking the same fundamental question: how do we find the best solution? The algorithms in this guide represent different strategies for navigating complex mathematical landscapes, each with trade-offs between speed, accuracy, and computational cost. Understanding these trade-offs is what separates someone who can run code from someone who can solve real problems.

You're being tested on more than just definitions here. Exam questions will ask you to select the right algorithm for a given problem, explain why one method converges faster than another, or analyze when an algorithm might fail. The key concepts are gradient-based vs. derivative-free methods, local vs. global optimization, and computational complexity trade-offs. Don't just memorize what each algorithm does—know what makes it the right (or wrong) choice for a specific scenario.

Gradient-Based Methods: Following the Slope

These algorithms use derivative information to determine which direction leads "downhill" toward a minimum. The gradient points toward the steepest ascent, so we move in the opposite direction. The key differences come down to how much derivative information they use and how they handle step sizes.

Gradient Descent

Updates parameters iteratively by moving in the direction of $-\nabla f(x)$ , the negative gradient of the objective function
Learning rate $\alpha$ controls step size—too large causes overshooting, too small means painfully slow convergence
Best for convex functions where there's a single global minimum; foundational to nearly all machine learning optimization

Stochastic Gradient Descent (SGD)

Computes gradients using random subsets of data rather than the full dataset, dramatically reducing computation per iteration
Noise from sampling can actually help escape shallow local minima and saddle points—a feature, not a bug
Dominates deep learning because full-batch gradient descent is computationally infeasible for millions of training examples

Newton's Method

Uses second-order information via the Hessian matrix $H$ to capture curvature: $x_{n+1} = x_n - H^{-1}\nabla f(x_n)$
Converges quadratically near the optimum—much faster than gradient descent when you're close to the solution
Computationally expensive because computing and inverting the $n \times n$ Hessian scales as $O(n^3)$

Compare: Gradient Descent vs. Newton's Method—both follow derivatives downhill, but Newton's method uses curvature information for faster convergence at the cost of computing the Hessian. If an FRQ asks about trade-offs between convergence speed and computational cost, this is your go-to comparison.

Conjugate Gradient Method

Designed for large sparse systems where storing the full Hessian is impossible—common in physics simulations and PDE solvers
Generates conjugate directions that are orthogonal with respect to the matrix $A$ , guaranteeing convergence in at most $n$ steps for quadratic problems
Memory efficient because it only stores a few vectors rather than an $n \times n$ matrix

Quasi-Newton Methods (BFGS)

Approximates the Hessian iteratively from gradient information, avoiding expensive second-derivative calculations
BFGS (Broyden-Fletcher-Goldfarb-Shanno) is the gold standard, updating a positive-definite approximation each iteration
Balances speed and cost—near-Newton convergence rates without Newton's computational burden

Compare: Newton's Method vs. BFGS—both aim for fast convergence using curvature information, but BFGS approximates the Hessian rather than computing it exactly. Choose Newton when $n$ is small and you need maximum speed; choose BFGS when the Hessian is too expensive to compute.

Global Optimization: Escaping Local Traps

Gradient-based methods can get stuck in local minima. These algorithms sacrifice guaranteed convergence for the ability to explore broadly and find global optima in rugged landscapes. They're essential when you don't know if your objective function is convex.

Simulated Annealing

Allows uphill moves probabilistically, with acceptance probability $P = e^{-\Delta E / T}$ controlled by a "temperature" parameter
Temperature decreases over time—early exploration gives way to late-stage refinement as the algorithm "cools"
Effective for combinatorial problems like traveling salesman, where the landscape has many local minima

Genetic Algorithms

Evolves a population of solutions using selection, crossover, and mutation operations inspired by natural selection
Crossover combines good solutions to explore promising regions; mutation maintains diversity to avoid premature convergence
Ideal for black-box optimization where you can evaluate fitness but can't compute gradients

Particle Swarm Optimization

Models collective behavior where each particle adjusts its trajectory based on its personal best and the swarm's global best
Update rule combines inertia, cognitive, and social terms—balancing exploration of new regions with exploitation of known good areas
Handles multi-modal functions well and requires no gradient information, making it popular for engineering design

Compare: Genetic Algorithms vs. Particle Swarm—both are population-based metaheuristics that don't require gradients, but GAs use discrete operations (crossover, mutation) while PSO uses continuous velocity updates. GAs excel in discrete/combinatorial spaces; PSO is often better for continuous optimization.

Constrained Optimization: Staying in Bounds

Many real problems have constraints—budgets, physical limits, or feasibility requirements. These methods find optima while respecting boundaries, either by staying strictly inside the feasible region or by carefully navigating its structure.

Interior Point Methods

Traverses the interior of the feasible region using barrier functions that penalize approaching constraint boundaries
Polynomial-time complexity for linear programs—theoretically and practically superior to simplex for large-scale problems
Handles inequality constraints naturally by converting them to logarithmic barrier terms in the objective

Trust Region Methods

Defines a local region where a quadratic model of the objective is trusted to be accurate
Adapts region size dynamically—shrinks when the model predicts poorly, expands when predictions are accurate
More robust than line search methods for ill-conditioned problems where the objective function behaves unpredictably

Compare: Interior Point vs. Trust Region—both handle constrained optimization but with different philosophies. Interior point methods work globally by staying inside constraints; trust region methods work locally by controlling how far each step can go. Interior point dominates for linear/convex programs; trust region excels for nonlinear problems with complex curvature.

Quick Reference Table

Concept	Best Examples
First-order gradient methods	Gradient Descent, SGD, Conjugate Gradient
Second-order / curvature methods	Newton's Method, BFGS, Trust Region
Large-scale / memory-limited	Conjugate Gradient, BFGS, SGD
Global optimization	Simulated Annealing, Genetic Algorithms, Particle Swarm
Population-based search	Genetic Algorithms, Particle Swarm Optimization
Constrained optimization	Interior Point Methods, Trust Region Methods
Deep learning training	SGD (and variants like Adam, RMSprop)
Black-box / derivative-free	Genetic Algorithms, Particle Swarm, Simulated Annealing

Self-Check Questions

Which two algorithms both use second-order derivative information (or approximations of it), and what's the key computational trade-off between them?
You're optimizing a function with many local minima and no access to gradient information. Which category of algorithms should you consider, and name two specific methods?
Compare and contrast SGD and full-batch Gradient Descent: when would the "noise" in SGD actually be an advantage rather than a drawback?
An FRQ describes a linear programming problem with 10,000 variables and 50,000 constraints. Which algorithm would likely outperform the simplex method, and why?
Both Conjugate Gradient and BFGS avoid computing the full Hessian matrix. What problem type is Conjugate Gradient specifically designed for, and how does this differ from BFGS's typical use case?

💻Applications of Scientific Computing

Common Optimization Algorithms

Why This Matters

Gradient-Based Methods: Following the Slope

Gradient Descent

Stochastic Gradient Descent (SGD)

Newton's Method

Conjugate Gradient Method

Quasi-Newton Methods (BFGS)

Global Optimization: Escaping Local Traps

Simulated Annealing

Genetic Algorithms

Particle Swarm Optimization

Constrained Optimization: Staying in Bounds

Interior Point Methods

Trust Region Methods

Quick Reference Table

Self-Check Questions

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

hs classes