The multivariable chain rule extends the familiar single-variable chain rule to handle composite functions where variables depend on other variables. You need it whenever you're differentiating a function whose inputs are themselves functions of something else.

Chain Rule for Composite Functions

In single-variable calculus, if $y = f(g(t))$ , then $\frac{dy}{dt} = f'(g(t)) \cdot g'(t)$ . The multivariable version follows the same logic but accounts for multiple paths of dependence.

Suppose $f(x, y)$ where $x = x(t)$ and $y = y(t)$ . Both $x$ and $y$ contribute to how $f$ changes as $t$ changes, so you sum both contributions:

$\frac{df}{dt} = \frac{\partial f}{\partial x}\frac{dx}{dt} + \frac{\partial f}{\partial y}\frac{dy}{dt}$

Each term captures one "pathway" from $t$ to $f$ . Drawing a tree diagram helps: $t$ branches into $x$ and $y$ , and both feed into $f$ . Each branch contributes a product of derivatives along its path, and you add them all up.

Computing the Chain Rule with Two Intermediate Variables

When the inner functions depend on two variables instead of one, the chain rule expands accordingly. If $f(u, v)$ where $u = g(x, y)$ and $v = h(x, y)$ , here's the process:

Identify the outer function $f(u, v)$ and the inner functions $u = g(x, y)$ , $v = h(x, y)$
Compute the partial derivatives of the outer function: $\frac{\partial f}{\partial u}$ and $\frac{\partial f}{\partial v}$
Compute the partial derivatives of each inner function with respect to $x$ and $y$
Assemble using the chain rule:

$\frac{\partial f}{\partial x} = \frac{\partial f}{\partial u}\frac{\partial g}{\partial x} + \frac{\partial f}{\partial v}\frac{\partial h}{\partial x}$

$\frac{\partial f}{\partial y} = \frac{\partial f}{\partial u}\frac{\partial g}{\partial y} + \frac{\partial f}{\partial v}\frac{\partial h}{\partial y}$

Example: Let $f(u,v) = u^2 + uv$ , where $u = x + 2y$ and $v = xy$ . To find $\frac{\partial f}{\partial x}$ :

$\frac{\partial f}{\partial u} = 2u + v$ , and $\frac{\partial f}{\partial v} = u$
$\frac{\partial u}{\partial x} = 1$ , and $\frac{\partial v}{\partial x} = y$
So $\frac{\partial f}{\partial x} = (2u + v)(1) + (u)(y) = 2u + v + uy$

The pattern generalizes naturally: for any number of intermediate variables, you add one term per pathway from the independent variable to $f$ .

Chain rule for composite functions, Use the chain rule to find the derivative of a multivariable function? - Mathematics Stack Exchange

Applications of the Multivariable Chain Rule

The chain rule is essential whenever quantities depend on each other through intermediate variables:

Parametric motion: If a particle's position $(x(t), y(t))$ moves through a temperature field $T(x,y)$ , the chain rule gives the rate of temperature change the particle experiences over time
Coordinate transformations: Converting between polar and Cartesian coordinates requires the chain rule to relate partial derivatives in each system
Optimization algorithms: Gradient descent in machine learning relies on the chain rule (called "backpropagation") to compute how a loss function changes with respect to model parameters through many layers of composition

Directional Derivatives and Gradients

Partial derivatives tell you the rate of change along the coordinate axes. But what if you want the rate of change in some other direction, like 30° from the $x$ -axis? That's what directional derivatives do. The gradient ties it all together by encoding the rates of change in every direction at once.

Chain rule for composite functions, multivariable calculus - Chain rule for functions of two variables twice - Mathematics Stack ...

The Gradient Vector

The gradient of $f(x, y)$ is the vector of its partial derivatives:

$\nabla f = \left\langle \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right\rangle$

For three variables, it extends to $\nabla f = \left\langle \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z} \right\rangle$ .

Two key geometric facts about the gradient:

It points in the direction of steepest increase of $f$
Its magnitude $|\nabla f|$ equals the maximum rate of change of $f$ at that point
It is always perpendicular to level curves (in 2D) or level surfaces (in 3D). If you picture a topographic map, the gradient at any point aims straight uphill, cutting across contour lines at right angles.

Calculating Directional Derivatives

The directional derivative of $f$ at a point in the direction of a unit vector $\mathbf{u}$ is:

$D_\mathbf{u}f = \nabla f \cdot \mathbf{u}$

The vector $\mathbf{u}$ must be a unit vector ( $|\mathbf{u}| = 1$ ). If you're given a direction that isn't unit length, normalize it first.

Step-by-step process:

Compute $\nabla f = \left\langle \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right\rangle$ at the point of interest
Determine the unit vector $\mathbf{u} = \langle a, b \rangle$ in your desired direction. If given an angle $\theta$ from the positive $x$ -axis, then $\mathbf{u} = \langle \cos\theta, \sin\theta \rangle$
Take the dot product: $D_\mathbf{u}f = \nabla f \cdot \mathbf{u}$

Interpreting the result:

Positive value: $f$ is increasing in that direction
Negative value: $f$ is decreasing in that direction
Zero: $f$ is momentarily constant in that direction (you're moving along a level curve)

Example: Let $f(x,y) = x^2 + 3xy$ at the point $(1, 2)$ in the direction of $\mathbf{v} = \langle 3, 4 \rangle$ .

$\nabla f = \langle 2x + 3y,\; 3x \rangle = \langle 8, 3 \rangle$ at $(1,2)$
Normalize: $|\mathbf{v}| = 5$ , so $\mathbf{u} = \langle 3/5,\; 4/5 \rangle$
$D_\mathbf{u}f = \langle 8, 3 \rangle \cdot \langle 3/5, 4/5 \rangle = 24/5 + 12/5 = 36/5$

Direction of Steepest Ascent and Descent

Since $D_\mathbf{u}f = |\nabla f| \cos\theta$ , where $\theta$ is the angle between $\nabla f$ and $\mathbf{u}$ :

Maximum rate of change occurs when $\theta = 0$ ( $\mathbf{u}$ parallel to $\nabla f$ ), giving $D_\mathbf{u}f = |\nabla f|$
Minimum (most negative) rate of change occurs when $\theta = \pi$ ( $\mathbf{u}$ opposite to $\nabla f$ ), giving $D_\mathbf{u}f = -|\nabla f|$
Zero rate of change occurs when $\theta = \pi/2$ ( $\mathbf{u}$ perpendicular to $\nabla f$ )

To find the direction of steepest ascent at a point:

Compute $\nabla f$ at that point
Normalize it: $\mathbf{u}_{\text{steepest}} = \frac{\nabla f}{|\nabla f|}$

The direction of steepest descent is simply $-\nabla f$ . This is exactly what gradient descent algorithms exploit: to minimize a function, take steps in the direction of $-\nabla f$ .

Connecting the Gradient and Directional Derivatives

Think of the gradient as containing all directional derivative information at a point. The directional derivative in any direction $\mathbf{u}$ is just the projection of $\nabla f$ onto $\mathbf{u}$ :

$D_\mathbf{u}f = \nabla f \cdot \mathbf{u} = |\nabla f| \cos\theta$

This means:

The partial derivative $\frac{\partial f}{\partial x}$ is the directional derivative in the $\mathbf{i}$ direction
The partial derivative $\frac{\partial f}{\partial y}$ is the directional derivative in the $\mathbf{j}$ direction
Every other directional derivative is a weighted combination of these, determined by the dot product with $\mathbf{u}$

The gradient perpendicularity to level curves is worth remembering for exams. If $f(x,y) = c$ defines a level curve, then $\nabla f$ at any point on that curve is normal to it. This connects directly to finding tangent lines and normal vectors to implicitly defined curves.