Chain Rule for Multivariable Functions
The multivariable chain rule extends the familiar single-variable chain rule to handle composite functions where variables depend on other variables. You need it whenever you're differentiating a function whose inputs are themselves functions of something else.
Chain Rule for Composite Functions
In single-variable calculus, if , then . The multivariable version follows the same logic but accounts for multiple paths of dependence.
Suppose where and . Both and contribute to how changes as changes, so you sum both contributions:
Each term captures one "pathway" from to . Drawing a tree diagram helps: branches into and , and both feed into . Each branch contributes a product of derivatives along its path, and you add them all up.
Computing the Chain Rule with Two Intermediate Variables
When the inner functions depend on two variables instead of one, the chain rule expands accordingly. If where and , here's the process:
- Identify the outer function and the inner functions ,
- Compute the partial derivatives of the outer function: and
- Compute the partial derivatives of each inner function with respect to and
- Assemble using the chain rule:
Example: Let , where and . To find :
- , and
- , and
- So
The pattern generalizes naturally: for any number of intermediate variables, you add one term per pathway from the independent variable to .

Applications of the Multivariable Chain Rule
The chain rule is essential whenever quantities depend on each other through intermediate variables:
- Parametric motion: If a particle's position moves through a temperature field , the chain rule gives the rate of temperature change the particle experiences over time
- Coordinate transformations: Converting between polar and Cartesian coordinates requires the chain rule to relate partial derivatives in each system
- Optimization algorithms: Gradient descent in machine learning relies on the chain rule (called "backpropagation") to compute how a loss function changes with respect to model parameters through many layers of composition
Directional Derivatives and Gradients
Partial derivatives tell you the rate of change along the coordinate axes. But what if you want the rate of change in some other direction, like 30° from the -axis? That's what directional derivatives do. The gradient ties it all together by encoding the rates of change in every direction at once.

The Gradient Vector
The gradient of is the vector of its partial derivatives:
For three variables, it extends to .
Two key geometric facts about the gradient:
- It points in the direction of steepest increase of
- Its magnitude equals the maximum rate of change of at that point
- It is always perpendicular to level curves (in 2D) or level surfaces (in 3D). If you picture a topographic map, the gradient at any point aims straight uphill, cutting across contour lines at right angles.
Calculating Directional Derivatives
The directional derivative of at a point in the direction of a unit vector is:
The vector must be a unit vector (). If you're given a direction that isn't unit length, normalize it first.
Step-by-step process:
- Compute at the point of interest
- Determine the unit vector in your desired direction. If given an angle from the positive -axis, then
- Take the dot product:
Interpreting the result:
- Positive value: is increasing in that direction
- Negative value: is decreasing in that direction
- Zero: is momentarily constant in that direction (you're moving along a level curve)
Example: Let at the point in the direction of .
- at
- Normalize: , so
Direction of Steepest Ascent and Descent
Since , where is the angle between and :
- Maximum rate of change occurs when ( parallel to ), giving
- Minimum (most negative) rate of change occurs when ( opposite to ), giving
- Zero rate of change occurs when ( perpendicular to )
To find the direction of steepest ascent at a point:
- Compute at that point
- Normalize it:
The direction of steepest descent is simply . This is exactly what gradient descent algorithms exploit: to minimize a function, take steps in the direction of .
Connecting the Gradient and Directional Derivatives
Think of the gradient as containing all directional derivative information at a point. The directional derivative in any direction is just the projection of onto :
This means:
- The partial derivative is the directional derivative in the direction
- The partial derivative is the directional derivative in the direction
- Every other directional derivative is a weighted combination of these, determined by the dot product with
The gradient perpendicularity to level curves is worth remembering for exams. If defines a level curve, then at any point on that curve is normal to it. This connects directly to finding tangent lines and normal vectors to implicitly defined curves.