Fiveable

5️⃣Multivariable Calculus Unit 3 Review

QR code for Multivariable Calculus practice questions

3.4 Chain Rule and Directional Derivatives

3.4 Chain Rule and Directional Derivatives

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025

Chain Rule for Multivariable Functions

The multivariable chain rule extends the familiar single-variable chain rule to handle composite functions where variables depend on other variables. You need it whenever you're differentiating a function whose inputs are themselves functions of something else.

Chain Rule for Composite Functions

In single-variable calculus, if y=f(g(t))y = f(g(t)), then dydt=f(g(t))g(t)\frac{dy}{dt} = f'(g(t)) \cdot g'(t). The multivariable version follows the same logic but accounts for multiple paths of dependence.

Suppose f(x,y)f(x, y) where x=x(t)x = x(t) and y=y(t)y = y(t). Both xx and yy contribute to how ff changes as tt changes, so you sum both contributions:

dfdt=fxdxdt+fydydt\frac{df}{dt} = \frac{\partial f}{\partial x}\frac{dx}{dt} + \frac{\partial f}{\partial y}\frac{dy}{dt}

Each term captures one "pathway" from tt to ff. Drawing a tree diagram helps: tt branches into xx and yy, and both feed into ff. Each branch contributes a product of derivatives along its path, and you add them all up.

Computing the Chain Rule with Two Intermediate Variables

When the inner functions depend on two variables instead of one, the chain rule expands accordingly. If f(u,v)f(u, v) where u=g(x,y)u = g(x, y) and v=h(x,y)v = h(x, y), here's the process:

  1. Identify the outer function f(u,v)f(u, v) and the inner functions u=g(x,y)u = g(x, y), v=h(x,y)v = h(x, y)
  2. Compute the partial derivatives of the outer function: fu\frac{\partial f}{\partial u} and fv\frac{\partial f}{\partial v}
  3. Compute the partial derivatives of each inner function with respect to xx and yy
  4. Assemble using the chain rule:

fx=fugx+fvhx\frac{\partial f}{\partial x} = \frac{\partial f}{\partial u}\frac{\partial g}{\partial x} + \frac{\partial f}{\partial v}\frac{\partial h}{\partial x}

fy=fugy+fvhy\frac{\partial f}{\partial y} = \frac{\partial f}{\partial u}\frac{\partial g}{\partial y} + \frac{\partial f}{\partial v}\frac{\partial h}{\partial y}

Example: Let f(u,v)=u2+uvf(u,v) = u^2 + uv, where u=x+2yu = x + 2y and v=xyv = xy. To find fx\frac{\partial f}{\partial x}:

  • fu=2u+v\frac{\partial f}{\partial u} = 2u + v, and fv=u\frac{\partial f}{\partial v} = u
  • ux=1\frac{\partial u}{\partial x} = 1, and vx=y\frac{\partial v}{\partial x} = y
  • So fx=(2u+v)(1)+(u)(y)=2u+v+uy\frac{\partial f}{\partial x} = (2u + v)(1) + (u)(y) = 2u + v + uy

The pattern generalizes naturally: for any number of intermediate variables, you add one term per pathway from the independent variable to ff.

Chain rule for composite functions, Use the chain rule to find the derivative of a multivariable function? - Mathematics Stack Exchange

Applications of the Multivariable Chain Rule

The chain rule is essential whenever quantities depend on each other through intermediate variables:

  • Parametric motion: If a particle's position (x(t),y(t))(x(t), y(t)) moves through a temperature field T(x,y)T(x,y), the chain rule gives the rate of temperature change the particle experiences over time
  • Coordinate transformations: Converting between polar and Cartesian coordinates requires the chain rule to relate partial derivatives in each system
  • Optimization algorithms: Gradient descent in machine learning relies on the chain rule (called "backpropagation") to compute how a loss function changes with respect to model parameters through many layers of composition

Directional Derivatives and Gradients

Partial derivatives tell you the rate of change along the coordinate axes. But what if you want the rate of change in some other direction, like 30° from the xx-axis? That's what directional derivatives do. The gradient ties it all together by encoding the rates of change in every direction at once.

Chain rule for composite functions, multivariable calculus - Chain rule for functions of two variables twice - Mathematics Stack ...

The Gradient Vector

The gradient of f(x,y)f(x, y) is the vector of its partial derivatives:

f=fx,fy\nabla f = \left\langle \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right\rangle

For three variables, it extends to f=fx,fy,fz\nabla f = \left\langle \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z} \right\rangle.

Two key geometric facts about the gradient:

  • It points in the direction of steepest increase of ff
  • Its magnitude f|\nabla f| equals the maximum rate of change of ff at that point
  • It is always perpendicular to level curves (in 2D) or level surfaces (in 3D). If you picture a topographic map, the gradient at any point aims straight uphill, cutting across contour lines at right angles.

Calculating Directional Derivatives

The directional derivative of ff at a point in the direction of a unit vector u\mathbf{u} is:

Duf=fuD_\mathbf{u}f = \nabla f \cdot \mathbf{u}

The vector u\mathbf{u} must be a unit vector (u=1|\mathbf{u}| = 1). If you're given a direction that isn't unit length, normalize it first.

Step-by-step process:

  1. Compute f=fx,fy\nabla f = \left\langle \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right\rangle at the point of interest
  2. Determine the unit vector u=a,b\mathbf{u} = \langle a, b \rangle in your desired direction. If given an angle θ\theta from the positive xx-axis, then u=cosθ,sinθ\mathbf{u} = \langle \cos\theta, \sin\theta \rangle
  3. Take the dot product: Duf=fuD_\mathbf{u}f = \nabla f \cdot \mathbf{u}

Interpreting the result:

  • Positive value: ff is increasing in that direction
  • Negative value: ff is decreasing in that direction
  • Zero: ff is momentarily constant in that direction (you're moving along a level curve)

Example: Let f(x,y)=x2+3xyf(x,y) = x^2 + 3xy at the point (1,2)(1, 2) in the direction of v=3,4\mathbf{v} = \langle 3, 4 \rangle.

  1. f=2x+3y,  3x=8,3\nabla f = \langle 2x + 3y,\; 3x \rangle = \langle 8, 3 \rangle at (1,2)(1,2)
  2. Normalize: v=5|\mathbf{v}| = 5, so u=3/5,  4/5\mathbf{u} = \langle 3/5,\; 4/5 \rangle
  3. Duf=8,33/5,4/5=24/5+12/5=36/5D_\mathbf{u}f = \langle 8, 3 \rangle \cdot \langle 3/5, 4/5 \rangle = 24/5 + 12/5 = 36/5

Direction of Steepest Ascent and Descent

Since Duf=fcosθD_\mathbf{u}f = |\nabla f| \cos\theta, where θ\theta is the angle between f\nabla f and u\mathbf{u}:

  • Maximum rate of change occurs when θ=0\theta = 0 (u\mathbf{u} parallel to f\nabla f), giving Duf=fD_\mathbf{u}f = |\nabla f|
  • Minimum (most negative) rate of change occurs when θ=π\theta = \pi (u\mathbf{u} opposite to f\nabla f), giving Duf=fD_\mathbf{u}f = -|\nabla f|
  • Zero rate of change occurs when θ=π/2\theta = \pi/2 (u\mathbf{u} perpendicular to f\nabla f)

To find the direction of steepest ascent at a point:

  1. Compute f\nabla f at that point
  2. Normalize it: usteepest=ff\mathbf{u}_{\text{steepest}} = \frac{\nabla f}{|\nabla f|}

The direction of steepest descent is simply f-\nabla f. This is exactly what gradient descent algorithms exploit: to minimize a function, take steps in the direction of f-\nabla f.

Connecting the Gradient and Directional Derivatives

Think of the gradient as containing all directional derivative information at a point. The directional derivative in any direction u\mathbf{u} is just the projection of f\nabla f onto u\mathbf{u}:

Duf=fu=fcosθD_\mathbf{u}f = \nabla f \cdot \mathbf{u} = |\nabla f| \cos\theta

This means:

  • The partial derivative fx\frac{\partial f}{\partial x} is the directional derivative in the i\mathbf{i} direction
  • The partial derivative fy\frac{\partial f}{\partial y} is the directional derivative in the j\mathbf{j} direction
  • Every other directional derivative is a weighted combination of these, determined by the dot product with u\mathbf{u}

The gradient perpendicularity to level curves is worth remembering for exams. If f(x,y)=cf(x,y) = c defines a level curve, then f\nabla f at any point on that curve is normal to it. This connects directly to finding tangent lines and normal vectors to implicitly defined curves.