Fact-checked by Grok 2 weeks ago

Gradient

In mathematics and physics, the gradient of a scalar-valued differentiable function f of several variables is a vector field that points in the direction of the function's steepest increase at each point and whose magnitude equals the rate of that increase.^[1] For a function f(x, y, z) in three dimensions, the gradient is formally defined as the vector \nabla f = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z} \right), where the components are the partial derivatives of f with respect to each variable.^[2] This operator, symbolized by the nabla \nabla, was introduced by William Rowan Hamilton in 1853 as part of his work on quaternions and vector analysis.^[3] The gradient plays a central role in multivariable calculus, where it enables the computation of directional derivatives: the directional derivative of f in the direction of a unit vector \mathbf{u} is given by the dot product \nabla f \cdot \mathbf{u}.^[4] Geometrically, level surfaces (or curves in 2D) of f are perpendicular to the gradient vector at every point, making \nabla f normal to these surfaces and useful for finding tangent planes.^[1] In physics, the gradient describes conservative force fields, such as the gravitational or electric field, where the force on a particle is \mathbf{F} = - \nabla V for a potential V.^[5] Beyond pure mathematics, the gradient is foundational in optimization algorithms like gradient descent, which iteratively adjusts parameters to minimize a loss function by moving opposite to the gradient direction. It also appears in fluid dynamics for pressure gradients driving flow and in computer graphics for shading based on surface normals derived from gradients.^[6] These applications underscore the gradient's versatility across disciplines, from theoretical analysis to practical computations in engineering and machine learning.

Basic Concepts

Motivation and Intuition

The concept of the gradient emerged in the 19th century as part of the development of vector calculus, building on the foundations of partial derivatives established earlier. Partial derivatives, which capture how a multivariable function varies with respect to one independent variable while treating others as constant, were systematically developed by Leonhard Euler around 1734,[] with notation refinements by Carl Gustav Jacob Jacobi in the 1840s.[] The gradient itself took shape through William Rowan Hamilton's introduction of quaternions in 1843 and the nabla operator in 1853,[] which laid groundwork for vector operations, and was fully articulated in modern form by J. Willard Gibbs and Oliver Heaviside in the 1880s as they separated scalar and vector components in calculus.^[7] Intuitively, the gradient generalizes the idea of a slope to functions of multiple variables, providing a directional measure of change in a scalar field across multidimensional space. At any point, it indicates the path of most rapid increase in the function's value, much like following the steepest uphill route on a hilly landscape, with its length reflecting the sharpness of that rise. This vectorial perspective allows for a unified understanding of variation in all directions, bridging single-variable derivatives to complex spatial behaviors without relying on isolated one-dimensional slices.^[8] Physically, the gradient motivates many natural processes by quantifying how scalar quantities like temperature or potential evolve in space, driving flows and forces accordingly. In heat transfer, for example, the temperature gradient determines the direction of thermal conduction, where heat moves perpendicular to isotherms from hotter to cooler regions, as heat flux is proportional to this gradient per Fourier's law established in 1822.[] Likewise, in gravitational contexts, the gradient of the potential field points toward decreasing potential, aligning with the direction of attractive force and exemplifying how such vectors model conservative systems in mechanics.^[9] As a cornerstone of multivariable calculus, the gradient establishes essential intuition for analyzing scalar fields—functions assigning values to points in space—before formal mathematical treatments. It underscores why tracking multidimensional changes matters for modeling real-world scenarios involving multiple influences, such as environmental variations or fluid dynamics, setting the stage for deeper explorations in optimization and field theory.^[10]

Notation

The gradient of a scalar function f, denoted as \nabla f or \mathbf{\nabla} f, represents the vector field consisting of its partial derivatives, where \nabla is the nabla symbol or del operator.^[11] In vector form, it is often written using boldface notation, such as \mathbf{\nabla} f, to emphasize its status as a vector.^[12] The nabla operator \nabla itself is a vector differential operator, commonly expressed in Cartesian coordinates as \nabla = \hat{\mathbf{i}} \frac{\partial}{\partial x} + \hat{\mathbf{j}} \frac{\partial}{\partial y} + \hat{\mathbf{k}} \frac{\partial}{\partial z}, acting on f to yield the gradient vector.^[13] Variations in notation include index form, where the i-th component of the gradient is \frac{\partial f}{\partial x_i} for coordinates x_i, useful in higher-dimensional or tensorial settings.^[14] In computational and optimization contexts, the gradient may appear as a column matrix or vector, such as \nabla f = \begin{pmatrix} \frac{\partial f}{\partial x_1} \\ \vdots \\ \frac{\partial f}{\partial x_n} \end{pmatrix}, facilitating numerical implementations.^[15] Conventions distinguish the gradient from related operators: applied to a scalar field, \nabla f produces a vector, whereas the divergence \nabla \cdot \mathbf{v} (for vector \mathbf{v}) yields a scalar, and the curl \nabla \times \mathbf{v} yields a vector, ensuring no ambiguity in multivariable calculus.^[11] In mathematics, \nabla f is the predominant notation, while physics texts often prefer \operatorname{grad} f for clarity in electromagnetic or fluid dynamics applications.^[16] This notation will appear consistently in subsequent equations, such as the simple two-dimensional example \nabla (x^2 + y^2) = (2x, 2y), illustrating the vector pointing in the direction of steepest ascent without specifying coordinate systems here.^[13]

Definition

In Cartesian Coordinates

In Cartesian coordinates, the gradient of a scalar-valued function f: \mathbb{R}^n \to \mathbb{R} defined on an open set in Euclidean space is a vector whose components are the partial derivatives of f with respect to each coordinate variable. Specifically, at a point \mathbf{x} = (x_1, x_2, \dots, x_n), the gradient is given by

\nabla f(\mathbf{x}) = \left( \frac{\partial f}{\partial x_1}(\mathbf{x}), \frac{\partial f}{\partial x_2}(\mathbf{x}), \dots, \frac{\partial f}{\partial x_n}(\mathbf{x}) \right).

This assumes that the partial derivatives exist at \mathbf{x}.^[17]^[11] In two dimensions, for f(x, y), the gradient takes the form

\nabla f(x, y) = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right),

while in three dimensions, for f(x, y, z), it is

\nabla f(x, y, z) = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z} \right) = \frac{\partial f}{\partial x} \mathbf{i} + \frac{\partial f}{\partial y} \mathbf{j} + \frac{\partial f}{\partial z} \mathbf{k}.

These expressions hold assuming the partial derivatives exist at the point of interest, on an open domain in \mathbb{R}^2 or \mathbb{R}^3.^[18]^[11] To compute the gradient, evaluate each partial derivative separately by treating the other variables as constants and differentiating with respect to the respective coordinate; the resulting vector components are assembled at the point of interest. For example, consider f(x, y) = x^2 + y^2; the partial with respect to x is $2x, and with respect to y is $2y, yielding \nabla f(x, y) = (2x, 2y). This vector is normal to the level curves of f, which are circles centered at the origin.^[18]^[11]

In Curvilinear Coordinates

In orthogonal curvilinear coordinate systems, the gradient of a scalar function f accounts for the local geometry through scale factors, which adjust the partial derivatives to reflect the varying metric of the coordinate basis.^[19] These systems are particularly useful for problems with cylindrical or spherical symmetry, where the coordinate curves align with the physical domain.^[20] The general expression for the gradient in an orthogonal curvilinear system with coordinates (u_1, u_2, u_3) and corresponding scale factors h_1, h_2, h_3 is

\nabla f = \sum_{i=1}^3 \frac{1}{h_i} \frac{\partial f}{\partial u_i} \hat{e}_i,

where \hat{e}_i are the unit basis vectors along each coordinate direction.^[19] The scale factors h_i are defined as h_i = |\partial \mathbf{r}/\partial u_i|, quantifying the infinitesimal arc length per unit change in u_i.^[20] Cartesian coordinates represent a special case where all h_i = 1.^[19] In cylindrical coordinates (\rho, \phi, z), the scale factors are h_\rho = 1, h_\phi = \rho, and h_z = 1.^[21] Thus, the gradient takes the form

\nabla f = \frac{\partial f}{\partial \rho} \hat{e}_\rho + \frac{1}{\rho} \frac{\partial f}{\partial \phi} \hat{e}_\phi + \frac{\partial f}{\partial z} \hat{e}_z.

This expression arises from the metric in cylindrical systems, where the azimuthal direction stretches with radius \rho.^[22] For spherical coordinates (r, \theta, \phi), the scale factors are h_r = 1, h_\theta = r, and h_\phi = r \sin \theta.^[21] The gradient is then

\nabla f = \frac{\partial f}{\partial r} \hat{e}_r + \frac{1}{r} \frac{\partial f}{\partial \theta} \hat{e}_\theta + \frac{1}{r \sin \theta} \frac{\partial f}{\partial \phi} \hat{e}_\phi.

The dependence on \sin \theta in the \phi-component reflects the contraction of azimuthal circles toward the poles.^[22] A representative example is the gradient of a radial potential, such as the electric potential V = 1/(4\pi \epsilon_0 r) from a point charge, which depends only on the radial coordinate r.^[23] In spherical coordinates, \partial V / \partial r = -1/(4\pi \epsilon_0 r^2) and the angular derivatives vanish, yielding \nabla V = -\frac{1}{4\pi \epsilon_0 r^2} \hat{e}_r.^[23] This purely radial form aligns with the symmetry of the field.^[23] These formulations are essential in fields like electromagnetism, where the electric field is the negative gradient of the scalar potential in symmetric geometries, and in fluid dynamics, for computing pressure gradients in axisymmetric or spherical flows.^[20]

In General Coordinate Systems

In a general coordinate system on a smooth manifold equipped with a Riemannian metric, the gradient of a scalar function f: M \to \mathbb{R} is defined abstractly as the unique vector field \nabla f on M such that for every smooth vector field X on M, the inner product satisfies \langle \nabla f, X \rangle = df(X), where df denotes the differential of f and \langle \cdot, \cdot \rangle is the metric tensor.^[24] This definition assumes M is a smooth manifold and the metric provides a smoothly varying positive definite inner product on each tangent space T_p M, enabling the identification of tangent and cotangent spaces via the musical isomorphism.^[25] The differential df is a smooth 1-form, and \nabla f arises as its image under the sharp operator (^\sharp) induced by the metric, which maps covectors to vectors by raising indices. In local coordinates (x^1, \dots, x^n) on M, where the metric tensor has components g_{ij} (with inverse g^{ij}), the gradient takes the explicit form

\nabla f = g^{ij} \frac{\partial f}{\partial x^j} \frac{\partial}{\partial x^i},

with summation over repeated indices i, j = 1, \dots, n.^[25] This coordinate expression leverages the metric to contract the covector \partial f / \partial x^j \, dx^j (the local representation of df) against g^{ij} to yield vector components. The assumption here is that f is smooth, ensuring the partial derivatives exist and the expression defines a smooth vector field.^[24] From the perspective of differential forms, the gradient \nabla f corresponds to the 1-form df via the metric's musical isomorphism in the Riemannian setting, which provides a canonical way to associate vector fields to 1-forms without relying on a specific coordinate chart.^[25] This view emphasizes the coordinate-free nature of the construction, where the metric bridges the duality between tangent vectors and covectors. As a representative example, in flat Euclidean space \mathbb{R}^n with the standard metric g_{ij} = \delta_{ij} (the Kronecker delta), the inverse is g^{ij} = \delta^{ij}, so the general formula reduces to the familiar Cartesian gradient \nabla f = \sum_{i=1}^n \frac{\partial f}{\partial x^i} \frac{\partial}{\partial x^i}.^[24]

Relationships to Derivatives

Connection to Total Derivative

For a scalar-valued function f: \mathbb{R}^n \to \mathbb{R}, the total derivative Df(\mathbf{x}) at a point \mathbf{x} is the linear map from \mathbb{R}^n to \mathbb{R} that approximates the change in f for small displacements \mathbf{h}, given by Df(\mathbf{x})(\mathbf{h}) = \nabla f(\mathbf{x}) \cdot \mathbf{h}.^[26] This representation shows that the gradient \nabla f(\mathbf{x}) fully encodes the total derivative as a dot product, providing the best linear approximation to the function's variation in any direction.^[27] The total differential of f expands this as

df = \sum_{i=1}^n \frac{\partial f}{\partial x_i} \, dx_i = \nabla f \cdot d\mathbf{x},

where dx_i are infinitesimal changes in the coordinates, directly linking the partial derivatives in the gradient to the overall rate of change.^[27] This form arises from the definition of differentiability, where f is differentiable at \mathbf{x} if

\lim_{\mathbf{h} \to \mathbf{0}} \frac{f(\mathbf{x} + \mathbf{h}) - f(\mathbf{x}) - \nabla f(\mathbf{x}) \cdot \mathbf{h}}{\|\mathbf{h}\|} = 0,

with the linear term \nabla f(\mathbf{x}) \cdot \mathbf{h} constituting the total derivative; a proof follows by verifying that the existence of partial derivatives and this limit condition imply the gradient's role in the approximation.^[26] A key application is the directional derivative, which measures the instantaneous rate of change of f along a unit vector \mathbf{u}, defined as \nabla f(\mathbf{x}) \cdot \mathbf{u}.^[28] This is a special case of the total derivative where \mathbf{h} = t \mathbf{u} for small t, reducing to the projection of the gradient onto the direction \mathbf{u}.^[28] The connection extends to the multivariable chain rule: for a differentiable path \mathbf{g}(t): \mathbb{R} \to \mathbb{R}^n, the derivative of the composition f(\mathbf{g}(t)) is

\frac{d}{dt} f(\mathbf{g}(t)) = \nabla f(\mathbf{g}(t)) \cdot \mathbf{g}'(t).

^[28] This follows from applying the total derivative along the curve, where \mathbf{g}'(t) acts as the tangential displacement, and a sketch of the proof uses the linear approximation along the path to match the limit definition of the derivative.^[28]

Linear Approximations

The gradient of a differentiable scalar-valued function f: \mathbb{R}^n \to \mathbb{R} at a point \mathbf{x} enables the best linear approximation of f near \mathbf{x}. Specifically,

f(\mathbf{x} + \mathbf{h}) \approx f(\mathbf{x}) + \nabla f(\mathbf{x}) \cdot \mathbf{h},

with the error satisfying o(\|\mathbf{h}\|) as \mathbf{h} \to \mathbf{0}.^[29]^[30] This formula arises from the first-order Taylor expansion in multiple variables, where the gradient captures the linear change in f along any direction \mathbf{h}. This approximation is particularly useful for estimating function values when exact computation is difficult. Geometrically, the linear approximation defines the tangent hyperplane to the graph of f at the point (\mathbf{x}, f(\mathbf{x})) in \mathbb{R}^{n+1}. The hyperplane equation is z = f(\mathbf{x}) + \nabla f(\mathbf{x}) \cdot (\mathbf{u} - \mathbf{x}), providing the closest affine approximation to the graph locally at that point.^[31] This extends the one-dimensional tangent line concept to higher dimensions, where the gradient vector serves as the normal to the level sets but here defines the plane's slope in all directions. For illustration, consider f(x,y) = \sin x + \cos y near (0,0). Here, f(0,0) = 1 and \nabla f(0,0) = (1, 0), so the linear approximation is L(x,y) = 1 + x. For small increments (h,k), f(h,k) = \sin h + \cos k \approx h + (1 - k^2/2), confirming that the linear term $1 + h captures the dominant first-order behavior while neglecting higher-order contributions like -k^2/2. A higher-order refinement incorporates the Hessian matrix Hf(\mathbf{x}) for the quadratic term \frac{1}{2} \mathbf{h}^T Hf(\mathbf{x}) \mathbf{h}, yielding a second-order approximation f(\mathbf{x} + \mathbf{h}) \approx f(\mathbf{x}) + \nabla f(\mathbf{x}) \cdot \mathbf{h} + \frac{1}{2} \mathbf{h}^T Hf(\mathbf{x}) \mathbf{h}.^[30] In optimization, the condition \nabla f(\mathbf{x}) = \mathbf{0} identifies critical points, which may correspond to local minima if the function decreases in all directions away from \mathbf{x}.^[32] This linear approximation underpins methods like gradient descent, where the gradient's direction and magnitude guide iterative improvements toward minima. The total derivative Df(\mathbf{x}) formalizes this as the linear map whose standard-basis matrix is the row vector \nabla f(\mathbf{x}).^[29]

Fréchet Derivative

The Fréchet derivative generalizes the concept of the derivative to functions between normed vector spaces, particularly Banach spaces, providing a linear approximation that is uniform in all directions. For a function f: X \to Y where X and Y are Banach spaces and U \subseteq X is an open set containing x \in X, the Fréchet derivative of f at x, denoted Df(x) or T, is a bounded linear operator T: X \to Y such that

f(x + h) = f(x) + T(h) + o(\|h\|)

as h \to 0, where the little-o notation indicates that \|o(\|h\|)\| / \|h\| \to 0 as \|h\| \to 0. This condition ensures that the linear term T(h) captures the first-order behavior of f uniformly over the space, making it a stronger notion of differentiability than directional variants.^[33] In the specific case of finite-dimensional Euclidean spaces, such as f: \mathbb{R}^n \to \mathbb{R}, the Fréchet derivative aligns directly with the classical gradient. Here, the bounded linear operator T is represented by the inner product T(h) = \nabla f(x) \cdot h, where \nabla f(x) is the gradient vector of f at x. The defining limit then becomes

\frac{\|f(x + h) - f(x) - \nabla f(x) \cdot h\|}{\|h\|} \to 0

as \|h\| \to 0, illustrating how the gradient serves as the Fréchet derivative in this setting by providing the best linear approximation to f near x.^[33] An illustrative example arises in function spaces, common in the calculus of variations, where functionals map infinite-dimensional spaces like C[0,1] (continuous functions on [0,1] with the sup norm) to \mathbb{R}. Consider the integral functional \phi(f) = \int_0^1 f(x)^2 \, dx for f \in C[0,1]. The Fréchet derivative at f is the bounded linear functional A(h) = 2 \int_0^1 f(x) h(x) \, dx, satisfying \phi(f + h) = \phi(f) + A(h) + o(\|h\|_\infty) as \|h\|_\infty \to 0. This derivative, often identified via the Riesz representation theorem with multiplication by $2f(x), highlights how Fréchet differentiability facilitates optimization in such spaces by linearizing variations around a function.^[34] The Fréchet derivative is distinguished from the weaker Gâteaux derivative, which only requires the existence of directional derivatives along each direction h (i.e., the limit along rays t h as t \to 0) that form a linear map, but without uniformity over all directions. While a continuous Gâteaux derivative implies the Fréchet derivative (and they coincide), the converse holds, but Gâteaux differentiability alone does not guarantee the stronger uniform approximation essential for applications in Banach spaces.^[35]

Properties and Applications

Level Sets

In multivariable calculus, the level set of a scalar function f: \mathbb{R}^n \to \mathbb{R} at a constant value c is defined as the set L_c = \{ \mathbf{x} \in \mathbb{R}^n \mid f(\mathbf{x}) = c \}. Where the gradient \nabla f(\mathbf{x}_0) \neq \mathbf{0} at a point \mathbf{x}_0 \in L_c, this gradient vector is perpendicular to the tangent space of the level set at \mathbf{x}_0.^[36] To see this, consider a smooth curve \mathbf{r}(t) on the level set L_c passing through \mathbf{x}_0 at t=0, so f(\mathbf{r}(t)) = c for all t near 0. Differentiating with respect to t yields \frac{d}{dt} f(\mathbf{r}(t)) = \nabla f(\mathbf{r}(t)) \cdot \mathbf{r}'(t) = 0, implying that \nabla f(\mathbf{x}_0) is orthogonal to the tangent vector \mathbf{r}'(0). Since this holds for any tangent direction, \nabla f(\mathbf{x}_0) is normal to the entire tangent space of L_c at \mathbf{x}_0.^[36]^[37] This perpendicularity has key implications for analysis and computation. The integral curves of the gradient field, known as gradient flow lines, are everywhere normal to the level sets, providing a natural way to traverse from one level set to another along the direction of maximum change. In implicit differentiation, the relation enables computation of tangent spaces or normals to surfaces defined implicitly by f(\mathbf{x}) = c, such as in computer graphics or optimization, without explicit parameterization.^[37]^[1] A simple example in two dimensions is f(x,y) = x^2 + y^2, whose level sets L_c = \{ (x,y) \mid x^2 + y^2 = c \} for c > 0 are circles centered at the origin. The gradient \nabla f = (2x, 2y) points radially outward, perpendicular to the tangent (circumferential) direction at every point on the circle. In physics, equipotential surfaces—level sets of electric potential V—have the electric field \mathbf{E} = -\nabla V normal to them, explaining why field lines are orthogonal to equipotentials in electrostatics.^[1]^[38] At points where \nabla f(\mathbf{x}_0) = \mathbf{0}, known as critical points, the level set L_c may develop singularities, such as cusps or isolated points, and need not form a smooth manifold; the perpendicularity property fails there, complicating local analysis.^[36]

Conservative Vector Fields and Gradient Theorem

A vector field \mathbf{V} defined on a domain in \mathbb{R}^n is called conservative if there exists a scalar potential function f such that \mathbf{V} = \nabla f.^[39] In \mathbb{R}^3, for a simply connected domain, a continuously differentiable vector field \mathbf{V} is conservative if and only if its curl is zero, i.e., \nabla \times \mathbf{V} = \mathbf{0}.^[40] This irrotational condition ensures that line integrals of \mathbf{V} are path-independent, meaning the integral from point \mathbf{a} to \mathbf{b} yields the same value regardless of the path taken.^[39] The gradient theorem, also known as the fundamental theorem for line integrals, states that if \mathbf{V} = \nabla f for a scalar function f with continuous partial derivatives on a domain, then for any piecewise smooth curve C parameterized by \mathbf{r}(t) from t = a to t = b, the line integral is given by

\int_C \mathbf{V} \cdot d\mathbf{r} = f(\mathbf{r}(b)) - f(\mathbf{r}(a)).

^[40] This result generalizes the one-dimensional fundamental theorem of calculus to higher dimensions./16:_Vector_Calculus/16.03:_The_Fundamental_Theorem_of_Line_Integrals) The proof relies on the chain rule and the fundamental theorem of calculus. Consider the composition g(t) = f(\mathbf{r}(t)); then g'(t) = \nabla f(\mathbf{r}(t)) \cdot \mathbf{r}'(t) = \mathbf{V}(\mathbf{r}(t)) \cdot \mathbf{r}'(t). Integrating both sides from a to b yields

\int_a^b g'(t) \, dt = \int_a^b \mathbf{V}(\mathbf{r}(t)) \cdot \mathbf{r}'(t) \, dt = g(b) - g(a) = f(\mathbf{r}(b)) - f(\mathbf{r}(a)),

which is exactly the line integral along C.^[40] For the potential f to exist, the domain must be simply connected (open, connected, and every closed curve can be continuously shrunk to a point), ensuring that \nabla \times \mathbf{V} = \mathbf{0} implies conservativeness.^[40] In non-simply connected domains, additional conditions may be needed, but the curl-zero test suffices in simply connected regions.^[39] This theorem has key applications in physics, where conservative fields like gravitational or electrostatic forces allow work done by the field to be computed as a potential difference, independent of path. For instance, the gravitational field \mathbf{F} = - \frac{GM m}{r^2} \hat{r} derives from the potential f = - \frac{GM m}{r}, so work is f(\mathbf{b}) - f(\mathbf{a}).^[40] Similarly, in electrostatics, the electric field \mathbf{E} = - \nabla V yields work as a voltage difference.^[40]

Direction of Steepest Ascent

The gradient vector \nabla f(\mathbf{x}) at a point \mathbf{x} in the domain of a differentiable scalar function f points in the direction of the steepest ascent of f, meaning it maximizes the directional derivative among all unit vectors. The magnitude |\nabla f(\mathbf{x})| equals the supremum of the directional derivatives \nabla f(\mathbf{x}) \cdot \mathbf{u} over all unit vectors \mathbf{u} with |\mathbf{u}| = 1, and the maximizing direction is given by the unit vector \nabla f(\mathbf{x}) / |\nabla f(\mathbf{x})|. This property arises because the directional derivative \nabla f(\mathbf{x}) \cdot \mathbf{u} represents the rate of change of f along the direction \mathbf{u}, and the maximum occurs when \mathbf{u} aligns with \nabla f(\mathbf{x}).^[41] To see this formally, apply the Cauchy-Schwarz inequality to the inner product:

|\nabla f(\mathbf{x}) \cdot \mathbf{u}| \leq |\nabla f(\mathbf{x})| \cdot |\mathbf{u}| = |\nabla f(\mathbf{x})|,

since |\mathbf{u}| = 1. Equality holds if and only if \mathbf{u} is parallel to \nabla f(\mathbf{x}), confirming that the gradient direction achieves the supremum and that |\nabla f(\mathbf{x})| is the maximum rate of increase. The direction of steepest descent, which maximizes the rate of decrease, is then -\nabla f(\mathbf{x}) / |\nabla f(\mathbf{x})|. This duality between ascent and descent directions is fundamental in analyzing local behavior of functions.^[41] In optimization, the steepest ascent property underpins gradient ascent algorithms, where iterates are updated as \mathbf{x}_{k+1} = \mathbf{x}_k + t_k \nabla f(\mathbf{x}_k) for a step size t_k > 0 to maximize convex or nonconvex objectives, such as likelihood functions in statistical models. Similarly, the flow lines of the gradient vector field—curves \mathbf{r}(t) satisfying \frac{d\mathbf{r}}{dt} = \nabla f(\mathbf{r}(t))—trace paths of steepest ascent, representing trajectories that follow the field's direction at each point. These paths align with the normals to the level sets of f, pointing toward regions of higher function values. As an illustrative example, consider the function f(x, y) = -x^2 - y^2 in \mathbb{R}^2, which models a downward paraboloid with a global maximum at the origin. At a point (x_0, y_0) away from the origin, \nabla f(x_0, y_0) = (-2x_0, -2y_0), so the unit direction of steepest ascent is (-x_0, -y_0)/\sqrt{x_0^2 + y_0^2}, directing movement inward toward the peak; following this repeatedly simulates hill-climbing to the maximum.^[41]

Generalizations

Jacobian Matrix

The Jacobian matrix provides a generalization of the gradient to functions mapping from \mathbb{R}^n to \mathbb{R}^m, where m > 1. For a differentiable function \mathbf{F}: \mathbb{R}^n \to \mathbb{R}^m with components F_1, \dots, F_m, the Jacobian matrix J_\mathbf{F} at a point \mathbf{x} \in \mathbb{R}^n is the m \times n matrix whose i-th row is the gradient vector \nabla F_i(\mathbf{x}), given by

J_\mathbf{F}(\mathbf{x}) = \begin{pmatrix} \frac{\partial F_1}{\partial x_1} & \cdots & \frac{\partial F_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial F_m}{\partial x_1} & \cdots & \frac{\partial F_m}{\partial x_n} \end{pmatrix}.

^[12] This matrix represents the best linear approximation to \mathbf{F} near \mathbf{x}, capturing how infinitesimal changes in the input variables affect each output component.^[12] When m = 1, so \mathbf{F} = f: \mathbb{R}^n \to \mathbb{R} is scalar-valued, the Jacobian matrix reduces to a $1 \times n row vector that is the transpose of the standard column gradient \nabla f.^[12] In this case, J_f(\mathbf{x}) = (\nabla f(\mathbf{x}))^T, linking the two concepts directly as the Jacobian extends the directional information of the gradient to multiple outputs.^[12] Key properties of the Jacobian include the chain rule for composition: if \mathbf{F}: \mathbb{R}^m \to \mathbb{R}^p and \mathbf{G}: \mathbb{R}^n \to \mathbb{R}^m are differentiable, then J_{\mathbf{F} \circ \mathbf{G}}(\mathbf{x}) = J_\mathbf{F}(\mathbf{G}(\mathbf{x})) \cdot J_\mathbf{G}(\mathbf{x}).^[42] When the Jacobian is square (m = n), its determinant \det J_\mathbf{F}(\mathbf{x}) measures the local scaling of volumes under the transformation \mathbf{F}, with |\det J_\mathbf{F}(\mathbf{x})| giving the factor by which infinitesimal volumes in the input space are multiplied in the output space.^[12] If \det J_\mathbf{F}(\mathbf{x}) \neq 0, then \mathbf{F} is locally invertible near \mathbf{x}, establishing it as a local diffeomorphism by the inverse function theorem.^[43] A representative example is the transformation from polar to Cartesian coordinates in \mathbb{R}^2, defined by x = r \cos \theta, y = r \sin \theta. The Jacobian matrix is

J = \begin{pmatrix} \cos \theta & -r \sin \theta \\ \sin \theta & r \cos \theta \end{pmatrix},

with determinant \det J = r.^[44] This positive value for r > 0 indicates that the transformation stretches areas by a factor of r, explaining the adjustment in polar integrals.^[44] Applications of the Jacobian include change of variables in multiple integrals, where for a transformation \mathbf{T}: \mathbb{R}^n \to \mathbb{R}^n, the integral \int_{\mathbf{F}(D)} f(\mathbf{y}) \, d\mathbf{y} = \int_D f(\mathbf{T}(\mathbf{u})) |\det J_\mathbf{T}(\mathbf{u})| \, d\mathbf{u}.^[45] The absolute value of the determinant ensures the integral accounts for orientation-preserving or reversing effects while preserving the total measure.^[45] Additionally, the invertibility condition via nonzero determinant is essential for confirming local diffeomorphisms in analysis and geometry.^[43]

Gradient of Vector Fields

The gradient of a vector field \mathbf{V}: \mathbb{R}^3 \to \mathbb{R}^3 is a second-order tensor, represented as a $3 \times 3 matrix whose entries are the partial derivatives of the components of \mathbf{V}. Specifically, the components are given by (\nabla \mathbf{V})_{ij} = \frac{\partial V_i}{\partial x_j}, where the i-th row corresponds to the gradient of the scalar component V_i.^[12] In explicit matrix form,

\nabla \mathbf{V} = \begin{pmatrix} \frac{\partial V_1}{\partial x_1} & \frac{\partial V_1}{\partial x_2} & \frac{\partial V_1}{\partial x_3} \\ \frac{\partial V_2}{\partial x_1} & \frac{\partial V_2}{\partial x_2} & \frac{\partial V_2}{\partial x_3} \\ \frac{\partial V_3}{\partial x_1} & \frac{\partial V_3}{\partial x_2} & \frac{\partial V_3}{\partial x_3} \end{pmatrix}.

This matrix is a special case of the Jacobian matrix for vector-valued functions from \mathbb{R}^3 to \mathbb{R}^3.^[12] The gradient tensor can be decomposed into its symmetric and antisymmetric parts, which capture the deformation and rotation of the field, respectively. The trace of \nabla \mathbf{V} equals the divergence \nabla \cdot \mathbf{V} = \sum_{i=1}^3 \frac{\partial V_i}{\partial x_i}, measuring the net flux out of a volume element.^[46] The antisymmetric part relates to the curl \nabla \times \mathbf{V}, where the curl vector is twice the axial vector associated with this antisymmetric tensor.^[46] In fluid dynamics, the gradient of the velocity field \mathbf{u} plays a central role in describing local fluid behavior. A divergence of zero, \nabla \cdot \mathbf{u} = 0, characterizes incompressible flows, where fluid elements neither expand nor contract, simplifying the Navier-Stokes equations.^[47] The curl \nabla \times \mathbf{u} defines the vorticity \boldsymbol{\omega}, which quantifies the local rotation or spinning of fluid parcels around an axis.^[48] For example, consider a simple shear flow with velocity field \mathbf{u} = (y, 0, 0). The gradient tensor is

\nabla \mathbf{u} = \begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{pmatrix},

yielding \nabla \cdot \mathbf{u} = 0 (incompressible) and \boldsymbol{\omega} = \nabla \times \mathbf{u} = (0, 0, -1), indicating uniform vorticity in the negative z-direction due to shearing.^[49]

On Riemannian Manifolds

In a Riemannian manifold (M, g), the gradient of a smooth scalar function f: M \to \mathbb{R} is the unique vector field \nabla f satisfying g(\nabla f, X) = df(X) for every smooth vector field X on M, where df is the differential of f. Equivalently, \nabla f is obtained by applying the musical isomorphism induced by the metric g, which raises the index of the covector df, yielding \nabla f = g^{-1}(df). This definition ensures that \nabla f points in the direction of steepest ascent of f with respect to the geometry defined by g.^[50]^[51] In local coordinates (x^i) on M, the components of the gradient are given by \nabla f = g^{ij} \frac{\partial f}{\partial x^j} \frac{\partial}{\partial x^i}, where g^{ij} are the entries of the inverse metric tensor and summation over repeated indices is implied. This expression arises directly from contracting the covector components \frac{\partial f}{\partial x^j} with g^{ij}, without involvement of connection terms, as the differential df is covariantly constant for scalars. The squared norm of the gradient is then |\nabla f|^2 = g(\nabla f, \nabla f) = g^{ij} \frac{\partial f}{\partial x^i} \frac{\partial f}{\partial x^j}, which quantifies the maximum rate of change of f at each point.^[50]^[51] The integral curves of \nabla f, known as gradient flow lines, satisfy the ordinary differential equation \frac{d\gamma}{dt} = \nabla f(\gamma(t)) and evolve to increase f along geodesics in the direction of \nabla f when appropriately normalized, though the flow itself incorporates the Hessian of f in its acceleration. A classic example occurs on the unit sphere S^2 \subset \mathbb{R}^3 endowed with the induced Riemannian metric from the Euclidean inner product. For the height function f(p) = z, where p = (x, y, z) \in S^2 and z is the third coordinate, the gradient at p is the orthogonal projection of the ambient Euclidean gradient (0, 0, 1) onto the tangent space T_p S^2, given explicitly by \nabla f(p) = (0, 0, 1) - z p = (-xz, -yz, 1 - z^2). This vector field vanishes at the poles (0, 0, \pm 1), the critical points of f, and points equatorially elsewhere, directing flow toward the north pole.^[50]^[52] When the Riemannian manifold is flat, such as Euclidean space in Cartesian coordinates where the metric is \delta_{ij} and the Christoffel symbols \Gamma^k_{ij} = 0, the expression simplifies to the classical gradient \nabla f = \sum_i \frac{\partial f}{\partial x^i} \frac{\partial}{\partial x^i}, recovering the familiar directional derivative structure. This flat limit highlights how the Riemannian gradient generalizes the Euclidean case to account for intrinsic geometry via the metric.^[50]

References

[1]
Calculus III - Gradient Vector, Tangent Planes and Normal Lines
Nov 16, 2022 · This says that the gradient vector is always orthogonal, or normal, to the surface at a point. Also recall that the gradient vector is,. ∇f= ...
[2]
[PDF] 18.02SC Notes: Gradient: definition and properties
Definition of the gradient. ∂w. ∂w. If w = f(x, y), then ∂x and ∂y are the rates of change of w in the i and j directions. It will be quite useful to put these ...
[3]
History of Nabla and Other Math Symbols
Jan 26, 1998 · The symbol, which is also called a "del," "nabla," or "atled" (delta spelled backwards), was introduced by William Rowan Hamilton (1805-1865) in 1853.<|control11|><|separator|>
[4]
The Gradient and Directional Derivative
The gradient of a function w=f(x,y,z) is the vector function: For a function of two variables z=f(x,y), the gradient is the two-dimensional vector <f_x(x,y),f_ ...
[5]
Gradient
The gradient is a vector operation which operates on a scalar function to produce a vector whose magnitude is the maximum rate of change of the function.
[6]
4.1 Gradient, Divergence and Curl
“Gradient, divergence and curl”, commonly called “grad, div and curl”, refer to a very widely used family of differential operators and related notations.
[7]
4.1 Gradient, Divergence and Curl
The gradient of a scalar-valued function f ( x , y , z ) is the vector field. grad grad f = ∇ ∇ f = ∂ f ∂ x ı ı ^ + ∂ f ∂ y ȷ ȷ ^ + ∂ f ∂ z k ^ · The divergence ...
[8]
The Curious History of Vectors and Tensors - SIAM.org
Sep 3, 2024 · The idea of a vector as a mathematical object in its own right first appeared as part of William Rowan Hamilton's theory of quaternions.
[9]
[PDF] MATH 230-1: Multivariable Differential Calculus
• gradients: the notion of a “gradient vector” is one of the most important ones in multivari- able calculus, and has no real analog in the single-variable ...
[10]
Temperature Gradient - an overview | ScienceDirect Topics
A temperature gradient is created with the hotter temperatures near the Equator and tapering off as the poles are approached.
[11]
[PDF] Gradients Math 131 Multivariate Calculus
The vectors in this vector field point in the direction of fastest ascent. In the 4th quadrant, they point left meaning that the quickest way up out of that.
[12]
Gradient -- from Wolfram MathWorld
Gradient is a synonym for slope, and in vector analysis, it's a vector operator denoted del, often applied to a function of three variables.
[13]
[PDF] 3.3 Gradient Vector and Jacobian Matrix
The gradient vector is typically denoted ∇f and sometimes as grad(f). The downward pointing triangular vector symbol is called a “nabla”.
[14]
Gradient, divergence, and curl - MIT
Gradient. The gradient is an operator that takes a scalar valued function of several variables and gives a vector. It is one way of encoding the rate of ...
[15]
Gradients - Department of Mathematics at UTSA
Jan 20, 2022 · The gradient of f is defined as the unique vector field whose dot product with any vector v at each point x is the directional derivative of f along v.
[16]
[PDF] Lecture 5 Vector Operators: Grad, Div and Curl
We introduce three field operators which reveal interesting collective field properties, viz. • the gradient of a scalar field,. • the divergence of a vector ...
[17]
https://math.libretexts.org/Bookshelves/Calculus/CLP-3_Multivariable_Calculus_(Feldman_Rechnitzer_and_Yeager)/02%3A_Partial_Derivatives/2.07%3A_Directional_Derivatives_and_the_Gradient
[18]
https://math.libretexts.org/Bookshelves/Calculus/Vector_Calculus_(Corral)/04%3A_Line_and_Surface_Integrals/4.06%3A_Gradient_Divergence_Curl_and_Laplacian
[19]
Div, Grad and Curl in Orthogonal Curvilinear Coordinates - Galileo
Putting this together with the expression for the gradient gives immediately the expression for the Laplacian operator in curvilinear coordinates: ∇2ψ=1h1h ...
[20]
Orthogonal Curvilinear Coordinates - Richard Fitzpatrick
Let us define the gradient $ \nabla{\bf A}$ of a vector field ... In an orthogonal curvilinear coordinate system, the previous expression generalizes to ...
[21]
[PDF] Coordinate Systems and Vector Derivatives Formula Sheet
Gradient: ∇ f = ∂f. ∂x x +. ∂f. ∂y y +. ∂f. ∂z z. Divergence: ∇ · v = ∂vx ... Cylindrical Coordinates (r, φ, z). Relations to rectangular (Cartesian) ...
[22]
[PDF] Curl, Divergence, and Gradient in Cylindrical and Spherical ...
Find the curl and the divergence for each of the following vectors in cylindrical coordi- nates: (a). ; (b). ; (c) . B.2. Find the gradient for each of the ...
[23]
[PDF] notes.coordinates.pdf - OSU Math
x2+y2 + z calculate ∇f in cylindrical coordinates. Solution: We note f(r, θ, z) = r cos θ r2. + z = 1 r cosθ. So, from formula. (2.23), ∇f = er. −1 r2 cosθ ...
[24]
[PDF] NOTES ON RIEMANNIAN GEOMETRY Contents 1. Smooth ...
Apr 1, 2015 · Riemannian metrics. 3.1. The metric. Definition 3.1 (Riemannian metric). Let M be a smooth manifold. A Riemannian metric is a symmetric positive ...
[25]
[PDF] Lectures on Riemannian Geometry
Sep 23, 2005 · We can define the Riemannian gradient of f as g(gradgf,X) = dXf which is the (0,1)-tensor or vector field g-dual to df. Definition 2.3.6 Let ...
[26]
[PDF] Total derivatives Math 131 Multivariate Calculus
When n = 2 the gradient, ∇f = (fx,fy), gives the slopes of the tangent plane in the x-direction and the y-direction. Total derivatives to vector-valued ...Missing: connection | Show results with:connection
[27]
None
### Summary of Sections from Multivariate Calculus PDF
[28]
[PDF] CHAIN RULE Maths21a, O. Knill - Harvard Mathematics Department
PROOFS OF THE CHAIN RULE. d dt f(r(t)) = d dt (a(x0 + tu) + b(y0 + tv)) = au + bv and this is the dot product of ∇f = (a, b) with r ′(t)=(u, v). 2.
[29]
Introduction to Taylor's theorem for multivariable functions
Taylor's theorem. Given a one variable function f(x), you can fit it with a polynomial around x=a. For example, the best linear approximation for f(x) is f(x)≈ ...
[30]
[PDF] Lecture 3: 20 September 2018 3.1 Taylor series approximation
Sep 20, 2018 · Here the error of the approximation goes to zero at least as fast as (∆x)k as ∆x → 0. Thus, the larger the k the better is the approximation.
[31]
Chapter 8 The Gradient and Linear Approximation - Bookdown
We do this by introducing the gradient vector. This vector has components which are the slopes on the surface at the point of interest in both directions. In ...
[32]
10.7 Optimization - Active Calculus
Because , ∇ f = ⟨ 2 x , − 2 y ⟩ , we see that the origin ( x 0 , y 0 ) = ( 0 , 0 ) is a critical point. However, this critical point is neither a local maximum ...Missing: source | Show results with:source
[33]
[PDF] FUNCTIONAL ANALYSIS | Second Edition Walter Rudin
Integration of vector-valued functions is treated. strictly as a tool; attention is confined to continuous integrands, with values. in a Frechet space. ...
[34]
[PDF] Waves and Imaging, Calculus of Variations, Functional Derivatives
An operator F is a map from X to Y . We denote its action on a function f as Ff. We say that a functional φ is Fréchet differentiable at f ∈ X when there.
[35]
[PDF] Fréchet & Gâteaux Derivatives1and the Chain Rule
The Fréchet derivative is defined in a way that is somewhat different than the Gâteaux derivative. Let V , W, Ω and F be as defined earlier. Again, fix y ∈ Ω.
[36]
[PDF] Gradient: proof that it is perpendicular to level curves and surfaces
By this we mean it is perpendicular to the tangent to any curve that lies on the surface and goes through P . (See figure.) This follows easily from the chain ...
[37]
[PDF] Gradient
Gradients are orthogonal to level ... perpendicular to any vector (x -x0) in the plane. It is one of the most important statements in multivariable calculus.
[38]
4.3: Equipotential Curves and Surfaces - Physics LibreTexts
Jul 30, 2025 · Work is needed to move a charge from one equipotential to another. Equipotentials are perpendicular to electric field lines in every case.
[39]
Conservative Field -- from Wolfram MathWorld
The following conditions are equivalent for a conservative vector field on a particular domain D ; 1. For any oriented simple closed curve C ; 2. For any two ...Missing: zero | Show results with:zero<|control11|><|separator|>
[40]
https://math.libretexts.org/Bookshelves/Calculus/Calculus_(OpenStax)/16:_Vector_Calculus/16.03:_Conservative_Vector_Fields
[41]
https://math.libretexts.org/Bookshelves/Calculus/The_Calculus_of_Functions_of_Several_Variables_(Sloughter)/03%3A_Functions_from_R_to_R/3.02%3A_Directional_Derivatives_and_the_Gradient
[42]
2.3 The Chain Rule
The chain rule from single variable calculus has a direct analogue in multivariable calculus, where the derivative of each function is replaced by its Jacobian ...
[43]
[PDF] 1.4 Smooth Manifolds Defined
Jacobian matrix df (x) = Rmxn is invertible for every x EU and so m = n. The ... Apply the same argument to its inverse to deduce that it is a diffeomorphism.
[44]
Jacobians
Example 1: Compute the Jacobian of the polar coordinates transformation x = rcosθ,y=rsinθ. Solution: Since ∂x ...
[45]
Calculus III - Change of Variables - Pauls Online Math Notes
Nov 16, 2022 · The Jacobian is defined as a determinant of a 2x2 matrix, if you are unfamiliar with this that is okay. Here is how to compute the determinant.
[46]
[PDF] 1.14 Tensor Calculus I: Tensor Fields
The gradient of a scalar field and the divergence and curl of vector fields have been seen in §1.6. Other important quantities are the gradient of vectors and ...<|control11|><|separator|>
[47]
Incompressible Flow - an overview | ScienceDirect Topics
In other words, the divergence of the incompressible flow (∇) is zero. In ... incompressible flow is that the divergence of the flow velocity vanishes.
[48]
Vorticity | Applied Mathematics | University of Waterloo
Vorticity measures the local rotation of a fluid parcel, and is the curl of the velocity field, usually denoted by the greek letter omega.
[49]
[PDF] On the velocity gradient tensor
In this example, the 'deformation' takes the form of the shear term du/dy. What about the more general case of non-parallel flow? For simplicity, we'll talk ...
[50]
[PDF] Riemannian Geometry
Manfredo Perdigao do Carmo. Riemannian Geometry. Translated by Francis Flaherty. Birkhauser. Boston • Basel • Berlin. Page 2. CONTENTS. Preface to the first ...
[51]
[PDF] Math 868 — Homework 12
Let (M,g) be a Riemannian manifold, f ∈ C∞(M) and let ∇f be the gradient of f, defined by g(∇f,X) = df(X) for all vectors X.
[52]
Gradient in coordinates of function in 2-sphere - Math Stack Exchange
Dec 8, 2020 · The metric on the sphere is defined to be the pullback metric g=i∗g0 where g0 is the euclidean metric. if p=(x,y,z)∈S2⊂R3. As a consequence, ...The gradient on sphere - riemannian geometry - Math Stack ExchangeDerivation of the gradient on the n-sphere - Math Stack ExchangeMore results from math.stackexchange.com