Directional derivatives generalize partial derivatives by measuring how a function changes along any direction, not just along the coordinate axes. They tie together gradient vectors and unit vectors into a single computation, and they're the foundation for identifying directions of steepest ascent and descent in optimization.
Directional Derivatives and Unit Vectors
Calculating Directional Derivatives
The directional derivative of at a point in the direction of a unit vector is the rate of change of as you move away from along . It's denoted .
The formula is a dot product:
To compute it step by step:
- Find all partial derivatives of and assemble the gradient .
- Make sure your direction vector is a unit vector (magnitude 1). If it isn't, normalize it first.
- Take the dot product of the gradient with the unit vector.
Example. Let and suppose you want the directional derivative at in the direction of .
- , so .
- Normalize: , so .
- Dot product: .
The function is increasing at a rate of per unit length in that direction.
Unit Vectors and Their Properties
A unit vector has magnitude 1 and specifies a pure direction. The standard basis unit vectors are , , and , pointing along the -, -, and -axes respectively.
To convert any nonzero vector into a unit vector, divide by its magnitude:
The directional derivative formula requires a unit vector. If you plug in a non-unit vector, you'll scale the answer incorrectly. This is a common mistake on exams.
Interpreting Directional Derivatives
The sign of tells you what the function is doing in that direction:
- Positive: is increasing along
- Negative: is decreasing along
- Zero: is momentarily constant along , meaning is tangent to a level curve (or level surface)
Geometrically, the directional derivative is the scalar projection of onto . Since , where is the angle between the gradient and , the directional derivative depends entirely on that angle.

Partial Derivatives and Linearity
Partial Derivatives and Gradient Vectors
Partial derivatives are the special case of directional derivatives along coordinate axes. For :
- is the rate of change holding constant (equivalent to )
- is the rate of change holding constant (equivalent to )
The gradient vector packages all partial derivatives together:
Two key properties of the gradient:
- It points in the direction of greatest rate of increase of .
- It is perpendicular to the level curves of at every point. This perpendicularity is what makes level curves "flat" relative to the gradient: moving along a level curve gives a directional derivative of zero.
Linearity of Directional Derivatives
Directional derivatives are linear operators. For functions and and scalars and :
This follows directly from the linearity of the gradient (since the gradient of a linear combination is the linear combination of the gradients) and the linearity of the dot product. In practice, this means you can compute directional derivatives of complicated expressions term by term.

Chain Rule for Directional Derivatives
When you have a composition of differentiable functions, the chain rule extends to directional derivatives. If where and , then:
where is the Jacobian matrix of . For the simpler case where and , this reduces to:
This is the same "outer derivative times inner derivative" pattern from single-variable calculus, just applied in a directional context.
Steepest Ascent and Descent
Finding the Direction of Steepest Ascent
Since , the directional derivative is maximized when , i.e., when points in the same direction as .
- The direction of steepest ascent is
- The maximum directional derivative equals , the magnitude of the gradient
So the gradient does double duty: its direction tells you where the function increases fastest, and its magnitude tells you how fast.
Finding the Direction of Steepest Descent
By the same cosine argument, the directional derivative is minimized when , i.e., when points opposite to .
- The direction of steepest descent is
- The minimum directional derivative equals
Notice the symmetry: the steepest ascent and steepest descent have the same magnitude but opposite signs.
Applications of Steepest Ascent and Descent
Gradient descent in machine learning works by repeatedly stepping in the direction of to minimize a cost function. Each iteration updates parameters as , where is the learning rate.
At a critical point, , which means the directional derivative is zero in every direction. These are the candidates for local maxima, minima, and saddle points.
Beyond machine learning, steepest ascent/descent analysis shows up in physics (heat flow follows the negative gradient of temperature), engineering (stress analysis on surfaces), and economics (maximizing utility or profit over multiple variables). The core idea is always the same: the gradient tells you the best direction to move.