Fiveable

๐Ÿ“ˆNonlinear Optimization Unit 4 Review

QR code for Nonlinear Optimization practice questions

4.2 Conjugate gradient methods

๐Ÿ“ˆNonlinear Optimization
Unit 4 Review

4.2 Conjugate gradient methods

Written by the Fiveable Content Team โ€ข Last updated September 2025
Written by the Fiveable Content Team โ€ข Last updated September 2025
๐Ÿ“ˆNonlinear Optimization
Unit & Topic Study Guides

Conjugate gradient methods combine steepest descent with conjugate directions for faster optimization. They generate search directions without storing large matrices, making them great for big problems. These methods are like a supercharged version of regular gradient descent.

Fletcher-Reeves and Polak-Ribiรจre are two popular conjugate gradient techniques. They use different formulas to update search directions, with Polak-Ribiรจre often performing better in practice. These methods are like turbo-boosters for optimization algorithms.

Conjugate Gradient Methods

Principles of Conjugate Directions

  • Conjugate directions form basis for efficient optimization algorithms
  • Vectors u and v conjugate with respect to matrix A if uTAv=0u^TAv = 0
  • Conjugate direction methods generate sequence of search directions mutually conjugate to each other
  • Optimize along each direction sequentially leads to faster convergence than steepest descent
  • Conjugate gradient method combines ideas from steepest descent and conjugate directions
  • Generates conjugate directions without explicitly storing matrix A
  • Particularly effective for large-scale problems with sparse matrices

Fletcher-Reeves and Polak-Ribiรจre Methods

  • Fletcher-Reeves method calculates conjugate directions using gradient information
  • Update formula for search direction: dk+1=โˆ’gk+1+ฮฒkFRdkd_{k+1} = -g_{k+1} + \beta_k^{FR} d_k
  • Fletcher-Reeves parameter: ฮฒkFR=gk+1Tgk+1gkTgk\beta_k^{FR} = \frac{g_{k+1}^T g_{k+1}}{g_k^T g_k}
  • Polak-Ribiรจre method modifies Fletcher-Reeves approach for improved performance
  • Polak-Ribiรจre parameter: ฮฒkPR=gk+1T(gk+1โˆ’gk)gkTgk\beta_k^{PR} = \frac{g_{k+1}^T (g_{k+1} - g_k)}{g_k^T g_k}
  • Both methods generate conjugate directions without storing entire matrix
  • Polak-Ribiรจre often performs better in practice due to its ability to self-correct

Advanced Conjugate Gradient Techniques

  • Hestenes-Stiefel method provides alternative update formula for conjugate directions
  • Hestenes-Stiefel parameter: ฮฒkHS=gk+1T(gk+1โˆ’gk)dkT(gk+1โˆ’gk)\beta_k^{HS} = \frac{g_{k+1}^T (g_{k+1} - g_k)}{d_k^T (g_{k+1} - g_k)}
  • Nonlinear conjugate gradient extends method to nonquadratic objective functions
  • Applies conjugate gradient ideas to general nonlinear optimization problems
  • Requires line search to determine step size along search direction
  • Various beta formulas available for nonlinear case (Fletcher-Reeves, Polak-Ribiรจre, Hestenes-Stiefel)
  • Choice of beta formula affects convergence behavior and robustness

Convergence and Preconditioning

Quadratic Convergence Properties

  • Conjugate gradient method exhibits quadratic convergence for quadratic objective functions
  • Converges in at most n iterations for n-dimensional problem
  • Convergence rate depends on eigenvalue distribution of Hessian matrix
  • Clustered eigenvalues lead to faster convergence
  • Ill-conditioned problems with widely spread eigenvalues converge more slowly
  • Practical performance often better than worst-case bound suggests
  • Superlinear convergence observed for well-behaved nonlinear problems

Preconditioning Techniques

  • Preconditioning improves convergence of conjugate gradient method
  • Transforms original problem into equivalent problem with better conditioning
  • Preconditioner M approximates inverse of Hessian matrix
  • Preconditioned conjugate gradient solves system Mโˆ’1Ax=Mโˆ’1bM^{-1}Ax = M^{-1}b
  • Common preconditioners include diagonal scaling, incomplete Cholesky factorization
  • Jacobi preconditioner uses diagonal of matrix A
  • Symmetric successive over-relaxation (SSOR) preconditioner based on matrix splitting
  • Effective preconditioner balances improved convergence with computational cost

Restart Strategies and Implementation

  • Restart strategies periodically reset conjugate gradient algorithm
  • Helps maintain conjugacy in presence of rounding errors or nonlinearity
  • Full restart resets search direction to negative gradient every n iterations
  • Partial restart retains some information from previous iterations
  • Powell restart resets when consecutive gradients are nearly parallel
  • Limited-memory methods store small number of vectors to approximate Hessian
  • Implementation considerations include efficient matrix-vector products
  • Exploit sparsity structure of problem for computational efficiency
  • Termination criteria based on gradient norm or relative change in objective function