Fiveable

🎛️Control Theory Unit 8 Review

QR code for Control Theory practice questions

8.2 Calculus of variations

8.2 Calculus of variations

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
🎛️Control Theory
Unit & Topic Study Guides

Calculus of variations provides the mathematical foundation for optimizing functionals, which are mappings from function spaces to real numbers. In control theory, these tools are essential for finding optimal trajectories and control strategies. This guide covers the core concepts, the Euler-Lagrange equation, constrained problems, Hamilton's principle, direct methods, and applications to optimal control.

Fundamental concepts of calculus of variations

Calculus of variations deals with finding extrema (maxima or minima) of functionals rather than ordinary functions. While regular calculus optimizes functions of numbers, calculus of variations optimizes over entire function spaces. This distinction is what makes it so useful in control theory, where you're searching for the best function (a trajectory or control input over time), not just the best number.

Functionals and function spaces

A functional assigns a real number to each function in a given function space. Think of it as a "function of functions." For example, the arc length of a curve y(x)y(x) between two points is a functional: it takes in a function and outputs a single number.

Function spaces are sets of functions sharing certain properties:

  • C[a,b]C[a,b]: continuous functions on [a,b][a,b]
  • L2[a,b]L^2[a,b]: square-integrable functions on [a,b][a,b]
  • C1[a,b]C^1[a,b]: continuously differentiable functions on [a,b][a,b]

The choice of function space depends on the problem. If your solution needs to be smooth, you work in C1C^1. If you only need finite energy, L2L^2 may suffice.

Weak and strong variations

Variations are perturbations applied to a candidate function to analyze whether it's truly an extremum.

  • Weak variations are infinitesimal perturbations that satisfy the boundary conditions. They're "weak" because both the function and its derivatives change by small amounts. These are used to derive necessary conditions like the Euler-Lagrange equation.
  • Strong variations allow finite changes to the function and may produce large changes in derivatives, even if the function change is small. These are used to establish sufficient conditions for extrema.

The distinction matters because a function can satisfy the necessary conditions from weak variations without being a true extremum under strong variations.

Necessary and sufficient conditions for extrema

  • Necessary conditions must hold at any extremum but don't guarantee one exists. The Euler-Lagrange equation is the primary necessary condition.
  • Sufficient conditions confirm that a candidate function actually is an extremum.

The key sufficient conditions are:

  • Legendre condition: For a minimum, 2Fy20\frac{\partial^2 F}{\partial y'^2} \geq 0 along the extremal. For a maximum, this quantity must be non-positive.
  • Jacobi condition: Ensures no conjugate points exist in the interval, which rules out saddle points and confirms the extremum is genuine.

Together, the Euler-Lagrange equation, Legendre condition, and Jacobi condition form the classical toolkit for verifying extrema.

Euler-Lagrange equation

The Euler-Lagrange equation is the central result in calculus of variations. It provides a necessary condition that any extremizing function must satisfy, converting a variational problem into a differential equation.

Derivation of Euler-Lagrange equation

Consider a functional of the form:

J[y]=abF(x,y,y)dxJ[y] = \int_{a}^{b} F(x, y, y') \, dx

where y(x)y(x) is the unknown function and FF is a known integrand depending on xx, yy, and y=dydxy' = \frac{dy}{dx}.

The derivation proceeds in these steps:

  1. Suppose y(x)y^*(x) is the extremizing function. Introduce a perturbed function y(x)=y(x)+ϵη(x)y(x) = y^*(x) + \epsilon \eta(x), where η(x)\eta(x) is an arbitrary smooth function satisfying η(a)=η(b)=0\eta(a) = \eta(b) = 0 (so the endpoints stay fixed), and ϵ\epsilon is a small parameter.
  2. Substitute into JJ to get J(ϵ)J(\epsilon), which is now an ordinary function of ϵ\epsilon.
  3. Set dJdϵϵ=0=0\frac{dJ}{d\epsilon}\big|_{\epsilon=0} = 0 (the first variation must vanish).
  4. Apply integration by parts to eliminate the η(x)\eta'(x) term.
  5. Since η(x)\eta(x) is arbitrary, the integrand itself must vanish, yielding:

FyddxFy=0\frac{\partial F}{\partial y} - \frac{d}{dx} \frac{\partial F}{\partial y'} = 0

This is the Euler-Lagrange equation. Any function that extremizes J[y]J[y] must satisfy it.

First and second order conditions

The Euler-Lagrange equation is a first-order necessary condition (first-order in the sense of the variational calculus, though the resulting ODE may be second-order in xx).

To determine whether a solution is a minimum, maximum, or saddle point, you need second-order conditions:

  • Legendre condition: Check the sign of Fyy=2Fy2F_{y'y'} = \frac{\partial^2 F}{\partial y'^2}. For a minimum, Fyy0F_{y'y'} \geq 0 (strengthened: >0> 0). For a maximum, Fyy0F_{y'y'} \leq 0.
  • Jacobi condition: Analyze the Jacobi equation (a linear ODE derived from the second variation) and verify that no conjugate points exist in the open interval (a,b)(a, b).

A solution satisfying the Euler-Lagrange equation, the strengthened Legendre condition, and the Jacobi condition is a local minimum (or maximum, depending on sign).

Generalizations and extensions

The basic Euler-Lagrange equation extends to several more complex settings:

  • Higher-order derivatives: If FF depends on y,y,y,,y(n)y, y', y'', \ldots, y^{(n)}, the equation becomes:

k=0n(1)kdkdxkFy(k)=0\sum_{k=0}^{n} (-1)^k \frac{d^k}{dx^k} \frac{\partial F}{\partial y^{(k)}} = 0

  • Multiple functions: If the functional depends on y1(x),y2(x),,ym(x)y_1(x), y_2(x), \ldots, y_m(x), you get a separate Euler-Lagrange equation for each yiy_i.
  • Multiple independent variables: For functionals like F(x1,x2,y,yx1,yx2)dx1dx2\iint F(x_1, x_2, y, y_{x_1}, y_{x_2}) \, dx_1 \, dx_2, the Euler-Lagrange equation becomes a PDE.

Variational problems with constraints

Most real-world optimization problems involve constraints. In control theory, you might need to minimize fuel usage while reaching a target state, or optimize a trajectory subject to physical limitations.

Holonomic and non-holonomic constraints

  • Holonomic constraints involve only the functions and independent variables: g(x,y1,,ym)=0g(x, y_1, \ldots, y_m) = 0. These are algebraic relationships that restrict which configurations are allowed.
  • Non-holonomic constraints also involve derivatives: g(x,y1,,ym,y1,,ym)=0g(x, y_1, \ldots, y_m, y_1', \ldots, y_m') = 0. These restrict velocities or rates of change and cannot generally be integrated into purely algebraic form.

Holonomic constraints are handled straightforwardly with Lagrange multipliers. Non-holonomic constraints require more advanced techniques such as the Lagrange-d'Alembert principle.

Functionals and function spaces, Functional (mathematics) - Wikipedia

Lagrange multipliers and constrained optimization

The method of Lagrange multipliers converts a constrained problem into an unconstrained one by introducing auxiliary variables.

  1. Start with the functional to optimize, J[y]=abF(x,y,y)dxJ[y] = \int_a^b F(x, y, y') \, dx, subject to a constraint g(x,y,y)=0g(x, y, y') = 0.
  2. Introduce a Lagrange multiplier λ\lambda and form the augmented integrand: Fˉ=F+λg\bar{F} = F + \lambda g.
  3. Apply the Euler-Lagrange equation to Fˉ\bar{F} as if it were unconstrained.
  4. Solve the resulting system of equations (Euler-Lagrange equations plus the constraint equation) for yy and λ\lambda.

The multiplier λ\lambda has a physical interpretation: it represents the sensitivity of the optimal cost to changes in the constraint.

Isoperimetric problems and applications

Isoperimetric problems are constrained variational problems where the constraint is an integral condition:

abg(x,y,y)dx=c\int_{a}^{b} g(x, y, y') \, dx = c

The classic example: find the closed curve of fixed perimeter that encloses the maximum area. The answer is a circle.

To solve these problems:

  1. Introduce a Lagrange multiplier λ\lambda and form the modified functional:

J[y]=ab[F(x,y,y)+λg(x,y,y)]dxJ[y] = \int_{a}^{b} \left[ F(x, y, y') + \lambda \, g(x, y, y') \right] dx

  1. Apply the Euler-Lagrange equation to the combined integrand F+λgF + \lambda g.
  2. Use the integral constraint to determine λ\lambda.

In control theory, isoperimetric problems arise when optimizing performance subject to resource limits (e.g., total fuel, total time, or total energy expenditure).

Hamilton's principle and least action

Hamilton's principle provides a variational foundation for all of classical mechanics. Rather than writing force-balance equations directly, you define a single scalar quantity (the action) and require it to be stationary. The equations of motion then follow automatically.

Formulation of Hamilton's principle

The action integral is:

S[q]=t1t2L(q,q˙,t)dtS[q] = \int_{t_1}^{t_2} L(q, \dot{q}, t) \, dt

where LL is the Lagrangian, qq represents generalized coordinates, and q˙\dot{q} represents generalized velocities.

Hamilton's principle states: the actual path q(t)q(t) taken by the system makes the action stationary among all paths with the same endpoints. Mathematically, δS=0\delta S = 0 for all variations that vanish at t1t_1 and t2t_2.

Applying the calculus of variations to this condition yields the Euler-Lagrange equations for the system, which are equivalent to Newton's equations of motion.

Principle of least action in mechanics

In mechanics, the Lagrangian is defined as:

L=TVL = T - V

where TT is kinetic energy and VV is potential energy. Hamilton's principle then says the system evolves to make t1t2(TV)dt\int_{t_1}^{t_2} (T - V) \, dt stationary.

This variational formulation is often more convenient than Newton's laws for complex systems because:

  • It works naturally with generalized coordinates (no need to resolve forces along coordinate axes).
  • Constraints can be incorporated directly through the choice of coordinates or via Lagrange multipliers.
  • It generalizes readily to field theories and relativistic mechanics.

Note: the name "least action" is slightly misleading. The action is stationary (first variation is zero), but it's not always a minimum. In most practical mechanics problems, however, it does turn out to be a minimum.

Conservation laws and Noether's theorem

Noether's theorem establishes a deep connection between symmetries and conservation laws: every continuous symmetry of the action integral corresponds to a conserved quantity.

SymmetryConserved Quantity
Time translation (LL doesn't depend on tt)Energy
Spatial translation (LL doesn't depend on position)Linear momentum
Rotational symmetry (LL doesn't depend on angle)Angular momentum

This theorem is one of the most powerful results in theoretical physics. In control theory, identifying symmetries can simplify optimal control problems by reducing the number of independent variables.

Direct methods in calculus of variations

Direct methods bypass the Euler-Lagrange equation entirely. Instead of solving a differential equation, they approximate the solution within a finite-dimensional subspace and minimize the functional directly. This converts the infinite-dimensional variational problem into a finite-dimensional optimization problem that can be solved numerically.

Ritz and Galerkin methods

Ritz method:

  1. Choose a set of basis functions ϕ1(x),ϕ2(x),,ϕn(x)\phi_1(x), \phi_2(x), \ldots, \phi_n(x) that satisfy the boundary conditions.
  2. Approximate the solution as yn(x)=i=1nciϕi(x)y_n(x) = \sum_{i=1}^{n} c_i \phi_i(x).
  3. Substitute into the functional J[y]J[y] to get J(c1,,cn)J(c_1, \ldots, c_n), an ordinary function of the coefficients.
  4. Minimize with respect to c1,,cnc_1, \ldots, c_n by setting Jci=0\frac{\partial J}{\partial c_i} = 0 for each ii.
  5. Solve the resulting system of algebraic equations for the coefficients.

Galerkin method: Similar setup, but instead of minimizing the functional, you require the residual of the Euler-Lagrange equation to be orthogonal to each basis function:

abR(x)ϕi(x)dx=0,i=1,,n\int_a^b R(x) \, \phi_i(x) \, dx = 0, \quad i = 1, \ldots, n

where R(x)R(x) is the residual. For self-adjoint problems, the Ritz and Galerkin methods produce identical results.

Finite element approximations

The finite element method (FEM) is a specific implementation of direct methods that divides the domain into small subdomains (elements) and uses piecewise polynomial basis functions.

  1. Discretize the domain [a,b][a,b] into elements (e.g., subintervals).
  2. Define local basis functions (typically piecewise linear, quadratic, or cubic polynomials) on each element.
  3. Assemble the global system by combining contributions from all elements.
  4. Solve the resulting algebraic system.

FEM is particularly well-suited for problems with complex geometries and non-uniform material properties. Its mathematical foundation rests on Sobolev spaces and weak formulations of the variational problem.

Functionals and function spaces, calculus of variations - Using Rayleigh-Ritz Method to approximate solutions to extremum problem ...

Convergence and error analysis

The quality of direct method approximations depends on the basis functions chosen and the refinement level.

  • Convergence: As the number of basis functions (or elements) increases, the approximate solution should approach the exact solution in a suitable norm. Proving this rigorously requires tools from functional analysis.
  • A priori error estimates bound the error before computing the solution, typically in terms of the element size hh and polynomial degree pp. For example, FEM with piecewise linear elements often gives O(h2)O(h^2) convergence in the L2L^2 norm.
  • A posteriori error estimates use the computed solution to assess accuracy and guide adaptive mesh refinement, concentrating elements where the error is largest.

Applications in control theory

Calculus of variations provides the theoretical backbone for optimal control. The goal is to find control inputs that minimize a cost functional while the system obeys its dynamics.

Optimal control problems and formulations

A standard optimal control problem has three components:

  • System dynamics: x˙=f(x,u,t)\dot{x} = f(x, u, t), where xx is the state vector and uu is the control input.
  • Cost functional: J[x,u]=t0tfL(x,u,t)dt+ϕ(x(tf),tf)J[x, u] = \int_{t_0}^{t_f} L(x, u, t) \, dt + \phi(x(t_f), t_f), where LL is the running cost and ϕ\phi is the terminal cost.
  • Constraints: Boundary conditions, state constraints, or control limits.

The objective is to find the optimal control u(t)u^*(t) that minimizes JJ subject to the dynamics and constraints. This is where the calculus of variations connects directly to control engineering.

Pontryagin's maximum principle

Pontryagin's maximum principle extends the Euler-Lagrange framework to optimal control problems with control constraints. It introduces the Hamiltonian:

H(x,u,λ,t)=L(x,u,t)+λTf(x,u,t)H(x, u, \lambda, t) = L(x, u, t) + \lambda^T f(x, u, t)

where λ\lambda is the vector of costate (adjoint) variables.

The necessary conditions are:

  1. State equation: x˙=Hλ=f(x,u,t)\dot{x} = \frac{\partial H}{\partial \lambda} = f(x, u, t)
  2. Costate equation: λ˙=Hx\dot{\lambda} = -\frac{\partial H}{\partial x}
  3. Optimality condition: The optimal control u(t)u^*(t) maximizes HH at each time tt:

H(x,u,λ,t)=maxuH(x,u,λ,t)H(x^*, u^*, \lambda^*, t) = \max_u H(x^*, u, \lambda^*, t)

  1. Boundary conditions: Specified at t0t_0 for the state and at tft_f for the costate (transversality conditions).

Solving these conditions typically leads to a two-point boundary value problem (state conditions at t0t_0, costate conditions at tft_f), which is one of the main computational challenges.

Dynamic programming and Hamilton-Jacobi-Bellman equation

Dynamic programming takes a different approach based on Bellman's principle of optimality: an optimal policy has the property that, regardless of the initial state and decision, the remaining decisions must be optimal with respect to the resulting state.

This leads to the Hamilton-Jacobi-Bellman (HJB) equation for the optimal cost-to-go function V(x,t)V(x, t):

Vt=minu(L(x,u,t)+Vxf(x,u,t))-\frac{\partial V}{\partial t} = \min_u \left( L(x, u, t) + \frac{\partial V}{\partial x} f(x, u, t) \right)

with boundary condition V(x,tf)=ϕ(x,tf)V(x, t_f) = \phi(x, t_f).

Once V(x,t)V(x, t) is found, the optimal control is:

u(x,t)=argminu(L(x,u,t)+Vxf(x,u,t))u^*(x, t) = \arg\min_u \left( L(x, u, t) + \frac{\partial V}{\partial x} f(x, u, t) \right)

A key advantage of dynamic programming over Pontryagin's principle is that it yields a feedback (closed-loop) control law u(x,t)u^*(x, t) rather than an open-loop trajectory. The main disadvantage is the "curse of dimensionality": the HJB equation is a PDE in the state space, which becomes computationally intractable for high-dimensional systems.

Numerical methods for variational problems

Analytical solutions to variational problems are rare in practice. Numerical methods are essential for solving the differential equations and optimization problems that arise from the calculus of variations.

Discretization techniques and algorithms

Discretization converts the continuous problem into a finite-dimensional one:

  • Finite difference methods approximate derivatives with difference quotients (e.g., y(xi)yi+1yi12hy'(x_i) \approx \frac{y_{i+1} - y_{i-1}}{2h}).
  • Finite element methods approximate the solution with piecewise polynomials over a mesh.
  • Collocation methods require the differential equation to be satisfied exactly at selected points.

The resulting algebraic optimization problem can be solved with gradient-based methods (steepest descent, conjugate gradient), Newton's method, or interior-point methods. The choice depends on problem size, structure, and required accuracy.

Shooting methods and boundary value problems

Optimal control problems often produce two-point boundary value problems (from Pontryagin's principle). Shooting methods convert these into initial value problems:

  1. Single shooting: Guess the unknown initial values (e.g., the initial costate λ(t0)\lambda(t_0)), integrate forward to tft_f, and check whether the boundary conditions at tft_f are satisfied. Adjust the guess and repeat.
  2. Multiple shooting: Divide [t0,tf][t_0, t_f] into subintervals, guess initial values on each subinterval, integrate each segment independently, and enforce continuity between segments.

Multiple shooting is more robust than single shooting for stiff or sensitive problems because errors don't propagate across the entire time interval. Both approaches use root-finding algorithms (e.g., Newton's method) to iteratively correct the guesses.

Computational challenges and solutions

Several challenges arise in practice:

  • High dimensionality: Large state or control spaces lead to many variables. Sparse matrix techniques and problem decomposition can help.
  • Stiffness: When the system has widely separated time scales, standard integrators require very small step sizes. Implicit integration methods (e.g., backward differentiation formulas) handle stiffness more efficiently.
  • Ill-conditioning: Small changes in the data can cause large changes in the solution. Regularization techniques and careful scaling of variables improve numerical stability.
  • Curse of dimensionality: For dynamic programming, the computational cost grows exponentially with state dimension. Approximate dynamic programming and reinforcement learning methods offer partial remedies for high-dimensional problems.