📈Nonlinear Optimization Unit 4 Review

Conjugate gradient methods combine steepest descent with conjugate directions for faster optimization. They generate search directions without storing large matrices, making them great for big problems. These methods are like a supercharged version of regular gradient descent.

Fletcher-Reeves and Polak-Ribière are two popular conjugate gradient techniques. They use different formulas to update search directions, with Polak-Ribière often performing better in practice. These methods are like turbo-boosters for optimization algorithms.

Conjugate Gradient Methods

Principles of Conjugate Directions

Conjugate directions form basis for efficient optimization algorithms
Vectors u and v conjugate with respect to matrix A if $u^TAv = 0$
Conjugate direction methods generate sequence of search directions mutually conjugate to each other
Optimize along each direction sequentially leads to faster convergence than steepest descent
Conjugate gradient method combines ideas from steepest descent and conjugate directions
Generates conjugate directions without explicitly storing matrix A
Particularly effective for large-scale problems with sparse matrices

Fletcher-Reeves and Polak-Ribière Methods

Fletcher-Reeves method calculates conjugate directions using gradient information
Update formula for search direction: $d_{k+1} = -g_{k+1} + \beta_k^{FR} d_k$
Fletcher-Reeves parameter: $\beta_k^{FR} = \frac{g_{k+1}^T g_{k+1}}{g_k^T g_k}$
Polak-Ribière method modifies Fletcher-Reeves approach for improved performance
Polak-Ribière parameter: $\beta_k^{PR} = \frac{g_{k+1}^T (g_{k+1} - g_k)}{g_k^T g_k}$
Both methods generate conjugate directions without storing entire matrix
Polak-Ribière often performs better in practice due to its ability to self-correct

Principles of Conjugate Directions, calculus - Newton conjugate gradient algorithm - Mathematics Stack Exchange

Advanced Conjugate Gradient Techniques

Hestenes-Stiefel method provides alternative update formula for conjugate directions
Hestenes-Stiefel parameter: $\beta_k^{HS} = \frac{g_{k+1}^T (g_{k+1} - g_k)}{d_k^T (g_{k+1} - g_k)}$
Nonlinear conjugate gradient extends method to nonquadratic objective functions
Applies conjugate gradient ideas to general nonlinear optimization problems
Requires line search to determine step size along search direction
Various beta formulas available for nonlinear case (Fletcher-Reeves, Polak-Ribière, Hestenes-Stiefel)
Choice of beta formula affects convergence behavior and robustness

Convergence and Preconditioning

Quadratic Convergence Properties

Conjugate gradient method exhibits quadratic convergence for quadratic objective functions
Converges in at most n iterations for n-dimensional problem
Convergence rate depends on eigenvalue distribution of Hessian matrix
Clustered eigenvalues lead to faster convergence
Ill-conditioned problems with widely spread eigenvalues converge more slowly
Practical performance often better than worst-case bound suggests
Superlinear convergence observed for well-behaved nonlinear problems

Preconditioning Techniques

Preconditioning improves convergence of conjugate gradient method
Transforms original problem into equivalent problem with better conditioning
Preconditioner M approximates inverse of Hessian matrix
Preconditioned conjugate gradient solves system $M^{-1}Ax = M^{-1}b$
Common preconditioners include diagonal scaling, incomplete Cholesky factorization
Jacobi preconditioner uses diagonal of matrix A
Symmetric successive over-relaxation (SSOR) preconditioner based on matrix splitting
Effective preconditioner balances improved convergence with computational cost

Restart Strategies and Implementation

Restart strategies periodically reset conjugate gradient algorithm
Helps maintain conjugacy in presence of rounding errors or nonlinearity
Full restart resets search direction to negative gradient every n iterations
Partial restart retains some information from previous iterations
Powell restart resets when consecutive gradients are nearly parallel
Limited-memory methods store small number of vectors to approximate Hessian
Implementation considerations include efficient matrix-vector products
Exploit sparsity structure of problem for computational efficiency
Termination criteria based on gradient norm or relative change in objective function

📈Nonlinear Optimization Unit 4 Review