Partial derivative
In multivariable calculus, a partial derivative measures the rate of change of a function of multiple variables with respect to one specific variable, while treating all other variables as constants.[1] For a function f(x, y), the partial derivative with respect to x at a point (a, b), denoted \frac{\partial f}{\partial x}(a, b), represents how f changes as x varies near a, with y fixed at b.[2] This concept generalizes the single-variable derivative and is essential for analyzing functions in higher dimensions, such as those arising in physics, economics, and engineering.[3] The formal definition of the partial derivative \frac{\partial f}{\partial x} at (a, b) is the limit \frac{\partial f}{\partial x}(a, b) = \lim_{h \to 0} \frac{f(a + h, b) - f(a, b)}{h}, provided the limit exists; a similar limit defines the partial with respect to y.[2] Computationally, it involves differentiating f as if the other variables are constants, using standard rules like the chain rule or product rule.[4] Geometrically, partial derivatives correspond to the slopes of tangent lines to the function's graph along axes-parallel directions, aiding in approximations via tangent planes.[5] Partial derivatives underpin key applications, including linear approximations of multivariable functions, identification of local extrema through critical points where all first partials vanish, and the formation of the gradient vector, which points in the direction of steepest ascent.[1] Higher-order partial derivatives, such as \frac{\partial^2 f}{\partial x \partial y}, describe curvatures and concavities; under continuity assumptions, mixed partials are equal by Clairaut's theorem, enabling the Hessian matrix for second-order optimization tests.[6] In fields like thermodynamics and fluid dynamics, partials model rates such as heat flow or pressure changes while isolating specific influences.[7] The notation \partial originated in the mid-18th century, with early developments traced to mathematicians like Leonhard Euler and Alexis Clairaut in the context of solving problems in mechanics and geometry.[8]Fundamentals
Definition
In multivariable calculus, the partial derivative measures the rate of change of a function with respect to one of its variables while treating all other variables as constants. This concept extends the familiar derivative from single-variable functions to functions of multiple variables, allowing analysis of how the function varies along specific directions in the domain.[9][2] Consider a function f: \mathbb{R}^n \to \mathbb{R} defined on an open subset of \mathbb{R}^n. The partial derivative of f with respect to the i-th variable x_i at a point \mathbf{a} = (a_1, \dots, a_n) is given by the limit \frac{\partial f}{\partial x_i}(\mathbf{a}) = \lim_{h \to 0} \frac{f(\mathbf{a} + h \mathbf{e}_i) - f(\mathbf{a})}{h}, provided the limit exists, where \mathbf{e}_i is the i-th standard basis vector in \mathbb{R}^n with 1 in the i-th position and 0 elsewhere. This definition assumes familiarity with the concept of limits and the ordinary derivative from single-variable calculus.[2][10] This formulation generalizes the single-variable derivative, where for a function g: \mathbb{R} \to \mathbb{R}, the derivative g'(a) = \lim_{h \to 0} \frac{g(a + h) - g(a)}{h} captures the instantaneous rate of change at a. In the multivariable case, the partial derivative isolates the contribution of one input variable by fixing the others, effectively reducing the problem to a one-dimensional derivative along the corresponding coordinate axis.[9][10] Geometrically, the partial derivative \frac{\partial f}{\partial x_i}(\mathbf{a}) represents the slope of the tangent line to the curve obtained by intersecting the graph of f with the hyperplane where all variables except x_i are fixed at their values in \mathbf{a}. This slope lies within the tangent hyperplane to the graph at (\mathbf{a}, f(\mathbf{a})), providing insight into the function's local behavior along the i-th coordinate direction.[10][2]Notation
The notation for partial derivatives draws from established conventions in calculus, primarily adapting the Leibniz notation for ordinary derivatives. The most widely used form is \frac{\partial f}{\partial x}, where f is a function of multiple variables and the partial derivative is taken with respect to x while treating other variables as constants. This notation emphasizes the fractional aspect reminiscent of total derivatives but uses the distinctive \partial symbol to signify the partial nature of the operation.[11] Alternative notations include the subscript form f_x, commonly employed for functions of several variables to denote the partial derivative with respect to x.[12] Another variant is the operator notation D_x f, which treats the partial derivative as an application of the operator D_x to the function f.[12] These forms are particularly useful in contexts requiring brevity, such as in proofs or when composing multiple derivatives. The \partial symbol was first used in 1770 by Marquis de Condorcet in his "Mémoire sur les équations aux différences partielles" to denote partial differences. Adrien-Marie Legendre introduced the modern notation \frac{\partial u}{\partial x} in 1786 in his "Mémoire sur la manière de distinguer les maxima des minima dans le calcul des variations," though he later abandoned it. The notation was revived and popularized by Carl Gustav Jacob Jacobi in 1841, becoming a standard in multivariable calculus.[13] For functions of multiple variables, indexed notations facilitate clarity, such as \frac{\partial}{\partial x_i} to denote the partial derivative with respect to the i-th variable x_i.[14] In tensor calculus and related fields, the comma notation f_{,i} is conventional for the partial derivative \frac{\partial f}{\partial x^i}, often appearing in index notation for efficiency in expressions involving multiple indices.[15] A key distinction exists between \partial and the ordinary differential symbol d: the latter denotes total derivatives, applicable to functions of a single variable or when all dependent variables are allowed to vary (as in total differentials), whereas \partial specifically indicates differentiation with respect to one variable while holding others fixed. Thus, d is reserved for contexts without independent variables to isolate, such as ordinary calculus, while \partial is essential for multivariable settings to avoid ambiguity.[11]Computation and Examples
Basic Computation
To compute a partial derivative, treat all variables other than the one of interest as constants and apply the standard rules of differentiation from single-variable calculus.[2][16] Consider the function f(x,y) = x^2 y + \sin(y). To find \partial f / \partial x, differentiate with respect to x while holding y constant: the term x^2 y yields $2x yby the power rule, and\sin(y)is constant with respect tox, so its derivative is zero. Thus, \partial f / \partial x = 2x y$.[2][17] For \partial f / \partial y, differentiate with respect to y while holding x constant: the term x^2 y yields x^2 by the power rule, and \sin(y) yields \cos(y) by the trigonometric derivative rule. Thus, \partial f / \partial y = x^2 + \cos(y).[2][17] Now consider a function of three variables, f(x,y,z) = x y z. To compute \partial f / \partial x, treat y and z as constants: this yields y z. Similarly, \partial f / \partial y = x z and \partial f / \partial z = x y.[18][19] Partial derivatives can be evaluated at specific points by substituting the coordinates into the resulting expression. For the function f(x,y) = x^2 y + \sin(y), at the point (1,0), \partial f / \partial x = 2(1)(0) = 0.[2][16]Higher-Order Partial Derivatives
Higher-order partial derivatives arise when partial derivatives of a multivariable function are themselves differentiated with respect to one or more variables, extending the process beyond the first order. For a function f of two variables x and y, the second-order partial derivatives include the pure second partials \frac{\partial^2 f}{\partial x^2} and \frac{\partial^2 f}{\partial y^2}, as well as the mixed partial \frac{\partial^2 f}{\partial x \partial y}, which is obtained by differentiating first with respect to one variable and then the other.[20] These derivatives measure rates of change of the first-order partials, providing information about curvature and higher-level behavior of the function.[20] To illustrate computation, consider the function f(x,y) = x^3 y^2. The first partial derivative with respect to x is \frac{\partial f}{\partial x} = 3x^2 y^2. Differentiating this with respect to y yields the mixed second partial \frac{\partial^2 f}{\partial y \partial x} = 6x^2 y. Alternatively, starting with \frac{\partial f}{\partial y} = 2x^3 y and differentiating with respect to x gives \frac{\partial^2 f}{\partial x \partial y} = 6x^2 y, demonstrating that the order of differentiation does not matter when the relevant partial derivatives are continuous.[20][21] For higher orders, notation generalizes accordingly: the nth-order pure partial with respect to a single variable x_i is denoted \frac{\partial^n f}{\partial x_i^n}, while mixed higher-order partials, such as a third-order one involving two differentiations with respect to x and one with respect to y, can be written as \frac{\partial^3 f}{\partial x^2 \partial y} or using subscript notation f_{xxy}.[20] In the case of second-order partials, these are often arranged into the Hessian matrix, a square matrix whose entries are the second partial derivatives H_{ij} = \frac{\partial^2 f}{\partial x_i \partial x_j}.[22]Related Concepts
Total Derivative
In multivariable calculus, the total derivative of a scalar-valued function f: \mathbb{R}^n \to \mathbb{R} at a point a \in \mathbb{R}^n is defined as the best linear approximation to the change in f near a, represented by a linear map Df(a): \mathbb{R}^n \to \mathbb{R}.[23] Specifically, f is differentiable at a if there exists a linear map such that \lim_{\mathbf{h} \to \mathbf{0}} \frac{|f(a + \mathbf{h}) - f(a) - Df(a)(\mathbf{h})|}{\|\mathbf{h}\|} = 0, where Df(a)(\mathbf{h}) captures the first-order variation in all directions.[24] For functions with continuous partial derivatives, this linear map is given by the dot product of the gradient vector \nabla f(a) with the increment vector \mathbf{h}, so Df(a)(\mathbf{h}) = \nabla f(a) \cdot \mathbf{h} = \sum_{i=1}^n \frac{\partial f}{\partial x_i}(a) h_i.[25] The total differential df formalizes this approximation as df = \sum_{i=1}^n \frac{\partial f}{\partial x_i} dx_i, where each dx_i represents an infinitesimal change in the input variables.[26] Unlike a single partial derivative, which holds all other variables constant and measures change along one axis, the total derivative accounts for simultaneous variations in all variables, providing the full linear response of f to a multivariable increment.[23] For vector-valued functions f: \mathbb{R}^n \to \mathbb{R}^m, the total derivative generalizes to the Jacobian matrix Df(a), an m \times n matrix whose entries are the partial derivatives \frac{\partial f_j}{\partial x_i}(a), but in the scalar case (m=1), it reduces to the row vector of partials.[25] The total derivative plays a central role in the chain rule for composite functions: if g: \mathbb{R}^m \to \mathbb{R} is differentiable at f(a) and f at a, then D(g \circ f)(a) = Dg(f(a)) \circ Df(a), or in matrix form, the Jacobian of the composition is the product of the individual Jacobians.[26] This extends the single-variable chain rule to multivariable settings, enabling computation of derivatives along paths or through function compositions.[25]Gradient
The gradient of a scalar-valued function f: \mathbb{R}^n \to \mathbb{R}, denoted \nabla f, is defined as the vector whose components are the partial derivatives of f with respect to each variable: \nabla f(\mathbf{x}) = \left( \frac{\partial f}{\partial x_1}(\mathbf{x}), \frac{\partial f}{\partial x_2}(\mathbf{x}), \dots, \frac{\partial f}{\partial x_n}(\mathbf{x}) \right). This vector points in the direction of the greatest rate of increase of f at the point \mathbf{x}, and it is defined wherever the partial derivatives exist.[27][28][29] Key properties of the gradient include its magnitude \|\nabla f(\mathbf{x})\|, which equals the rate of steepest ascent of f at \mathbf{x}, and the fact that \nabla f(\mathbf{x}) is orthogonal to the level surface of f passing through \mathbf{x}.[30][31] These properties arise because the directional derivative of f in any direction \mathbf{u} (a unit vector) is maximized when \mathbf{u} aligns with \nabla f, and the level sets satisfy \nabla f \cdot d\mathbf{r} = 0 for tangent vectors d\mathbf{r}.[32][33] For example, consider f(x, y) = x^2 + y^2. The gradient is \nabla f(x, y) = (2x, 2y), which at (1, 1) gives (2, 2) with magnitude \sqrt{8} \approx 2.828, indicating the steepest ascent rate there.[30][31] The gradient connects to the total derivative of f at a point \mathbf{a} via the relation Df(\mathbf{a})(\mathbf{h}) = \nabla f(\mathbf{a}) \cdot \mathbf{h}, where Df(\mathbf{a}) is the linear approximation and \mathbf{h} is a direction vector; this expresses the total derivative as a dot product with the gradient.[28][27]Directional Derivative
The directional derivative of a scalar-valued multivariable function f: \mathbb{R}^n \to \mathbb{R} at a point a \in \mathbb{R}^n in the direction of a unit vector u \in \mathbb{R}^n with \|u\| = 1 is defined as the dot product D_u f(a) = \nabla f(a) \cdot u, where \nabla f(a) is the gradient vector of f at a.[34] This measures the instantaneous rate of change of f along the line passing through a in the direction specified by u.[35] When the direction u aligns with one of the standard basis vectors e_i (the i-th unit vector along the coordinate axes), the directional derivative reduces to the corresponding partial derivative: D_{e_i} f(a) = \frac{\partial f}{\partial x_i}(a).[36] Thus, partial derivatives are special cases of directional derivatives restricted to axis-aligned directions, while the general form extends this concept to arbitrary directions in the domain.[37] For example, consider the function f(x, y) = xy evaluated at the point (1, 1) in the direction of the unit vector u = \left( \frac{1}{\sqrt{2}}, \frac{1}{\sqrt{2}} \right). The gradient is \nabla f(x, y) = (y, x), so at (1, 1), \nabla f(1, 1) = (1, 1). The directional derivative is then D_u f(1, 1) = (1, 1) \cdot \left( \frac{1}{\sqrt{2}}, \frac{1}{\sqrt{2}} \right) = \frac{1}{\sqrt{2}} + \frac{1}{\sqrt{2}} = \sqrt{2}.[34] The directional derivative is linear in the direction vector u, meaning D_{c u + v} f(a) = c D_u f(a) + D_v f(a) for scalars c and vectors u, v (with appropriate normalization for unit vectors).[33] Its maximum value at a is \|\nabla f(a)\|, achieved when u is parallel to the gradient vector \nabla f(a); conversely, it is zero when u is perpendicular to \nabla f(a).[34]Properties
Symmetry of Mixed Partials
In multivariable calculus, Clairaut's theorem asserts that if a function f(x, y) of two variables has continuous partial derivatives f_x, f_y, f_{xy}, and f_{yx} in a neighborhood of a point (a, b), then the mixed second partial derivatives are equal at that point: f_{xy}(a, b) = f_{yx}(a, b). [38]This result, named after the French mathematician Alexis Clairaut who first stated and sketched a proof of it in 1740, establishes a key symmetry property for sufficiently smooth functions.[38] A standard proof begins with the increment definition. Consider the difference f(a + h, b + k) - f(a + h, b) - f(a, b + k) + f(a, b). By the mean value theorem applied to the function g(t) = f(t, b + k) - f(t, b) on [a, a + h], there exists \xi between a and a + h such that g(a + h) - g(a) = h g'(\xi) = h f_y(\xi, b + k) - h f_y(\xi, b). Applying the mean value theorem again to f_y(\xi, t) on [b, b + k], there exists \eta between b and b + k such that this equals h k f_{yx}(\xi, \eta). Repeating the process by switching the order of differentiation yields h k f_{xy}(\xi', \eta') for some \xi', \eta'. Dividing by h k and taking limits as h, k \to 0, continuity of the mixed partials ensures both limits equal f_{xy}(a, b) = f_{yx}(a, b).[38] Without the continuity assumption, the mixed partials may differ, as shown by the counterexample f(x, y) = \frac{xy(x^2 - y^2)}{x^2 + y^2} for (x, y) \neq (0, 0) and f(0, 0) = 0. The first partials f_x(0, 0) = 0 and f_y(0, 0) = 0 exist, but f_{xy}(0, 0) = -1 while f_{yx}(0, 0) = 1.[39]