In mathematics, particularly in calculus, the derivative of a function measures the instantaneous rate of change of the function with respect to one of its variables, equivalent to the slope of the tangent line to the function's graph at a given point.[1] Formally, for a differentiable function f, the derivative f'(x) at a point x is defined as the limit
f'(x) = \lim_{h \to 0} \frac{f(x + h) - f(x)}{h},
provided this limit exists, representing the sensitivity of the function's output to infinitesimal changes in its input.[2] This concept underpins much of differential calculus and extends to higher-order derivatives, such as the second derivative f''(x), which describes concavity or acceleration-like behavior.[3]The development of the derivative traces back to the late 17th century, when Isaac Newton and Gottfried Wilhelm Leibniz independently formulated the foundations of calculus, introducing notation like \frac{dy}{dx} for the derivative and applying it to problems in physics and geometry.[4] Earlier precursors, including work by mathematicians like Archimedes on tangents and the Persian scholar Sharaf al-Dīn al-Ṭūsī on cubic polynomials in the 12th century, anticipated aspects of instantaneous rates of change, but Newton and Leibniz's rigorous framework revolutionized mathematics.[5]Augustin-Louis Cauchy later provided a more precise limit-based definition in the 19th century, solidifying its role in analysis.[6]Derivatives have broad applications across science, engineering, and economics, enabling the modeling of dynamic systems and optimization.[7] In physics, the first derivative of position with respect to time yields velocity, while the second derivative gives acceleration, fundamental to kinematics and Newtonian mechanics.[8] In economics, derivatives quantify marginal quantities, such as the rate of change in cost or revenue functions, aiding decision-making in production and pricing.[9] They also support techniques like related rates for solving real-world problems involving varying quantities and Newton's method for numerical root-finding of equations.[10]
Definition
As a limit
In calculus, the derivative of a function f at a point a in its domain is formally defined as the limitf'(a) = \lim_{h \to 0} \frac{f(a + h) - f(a)}{h},provided this limit exists.[2] This definition applies to functions f defined on an open interval containing a, where the limit represents the instantaneous rate of change of f at a.[11]The expression \frac{f(a + h) - f(a)}{h} is known as the difference quotient, which measures the averagerate of change of f over the small interval [a, a + h]. Geometrically, this quotient equals the slope of the secant line connecting the points (a, f(a)) and (a + h, f(a + h)) on the graph of f. As h approaches 0, the secant line approaches the tangent line to the curve at (a, f(a)), so the derivative f'(a) gives the slope of this tangent line.[12][13]For the limit to exist at an interior point a, the two one-sided limits must agree: the right-hand derivative \lim_{h \to 0^+} \frac{f(a + h) - f(a)}{h} and the left-hand derivative \lim_{h \to 0^-} \frac{f(a + h) - f(a)}{h} must both exist and be equal. At an endpoint of the domain, such as the left endpoint of a closed interval, differentiability is defined using the appropriate one-sided limit, typically the right-hand derivative.[14]Consider the example of f(x) = x^2 at a = 1. The derivative isf'(1) = \lim_{h \to 0} \frac{(1 + h)^2 - 1^2}{h}.First, expand the numerator: (1 + h)^2 - 1 = 1 + 2h + h^2 - 1 = 2h + h^2. Then,f'(1) = \lim_{h \to 0} \frac{2h + h^2}{h} = \lim_{h \to 0} \frac{h(2 + h)}{h} = \lim_{h \to 0} (2 + h) = 2,since h \neq 0 in the intermediate step, and the limit exists.[15]
Using infinitesimals
Gottfried Wilhelm Leibniz introduced the concept of infinitesimals in the late 17th century as a foundational tool for his development of calculus, viewing them as quantities smaller than any finite number but nonzero, which allowed for the representation of instantaneous rates of change in continuous motion.[16] In his 1684 paper "Nova Methodus pro Maximis et Minimis," Leibniz employed infinitesimals to derive tangents and areas, treating differentials like dx as infinitesimal increments that model the "momentary endeavor" of force and variation in functions.[17] Although criticized for lacking rigor, this approach provided an intuitive framework for calculus that influenced early practitioners until the 19th-century shift to limit-based definitions resolved foundational issues.[16]An intuitive definition of the derivative using infinitesimals expresses it as the ratio of infinitesimal changes: for a function f, the derivative at x is f'(x) = \frac{f(x + dx) - f(x)}{dx}, where dx is an infinitesimal quantity approaching zero but treated as nonzero in computations.[18] This heuristic avoids explicit limits by directly manipulating small increments, aligning with Leibniz's original vision of calculus as a method for handling continuous change through such "fictions."[16]The modern rigorous revival of infinitesimals occurred in the 1960s through Abraham Robinson's non-standard analysis, which constructs the hyperreal numbers *\mathbb{R} as an extension of the reals incorporating genuine infinitesimals and infinite numbers via ultrapowers of real sequences. Hyperreals form a totally ordered field where finite elements are those bounded by standard reals, and infinitesimals are nonzero hyperreals smaller in absolute value than any positive real.[19] Central to this framework is the transfer principle, which states that a first-order sentence holds in the reals if and only if its non-standard counterpart holds in the hyperreals, enabling the rigorous importation of standard theorems into the extended system.[19]This approach offers advantages in intuition by permitting direct computations with infinitesimals, bypassing the complexities of epsilon-delta limits while preserving logical equivalence to standard analysis, thus aiding pedagogical clarity and simplifying proofs of calculus rules.[19] For example, the derivative of \sin(x) at a real number x can be found by considering an infinitesimal \epsilon \in {}^*\mathbb{R} with \epsilon \approx 0 but \epsilon \neq 0:\frac{\sin(x + \epsilon) - \sin(x)}{\epsilon} = \frac{\sin(x)\cos(\epsilon) + \cos(x)\sin(\epsilon) - \sin(x)}{\epsilon} = \sin(x) \cdot \frac{\cos(\epsilon) - 1}{\epsilon} + \cos(x) \cdot \frac{\sin(\epsilon)}{\epsilon}.By the transfer principle, \cos(\epsilon) \approx 1 and \sin(\epsilon) \approx \epsilon, so the first term is approximately zero and the second is \cos(x), yielding \sin'(x) = \cos(x).[20]
Notation and Representation
Standard notations
In mathematical analysis, the derivative of a function is expressed using several standard notations, each suited to different contexts in calculus and its applications. The two primary notations are the Lagrange notation and the Leibniz notation, which are widely adopted in textbooks and research for denoting the rate of change of a function.[21]The Lagrange notation, introduced by Joseph-Louis Lagrange, denotes the first derivative of a function f at a point x as f'(x), where the prime symbol indicates differentiation with respect to the independent variable.[22] For higher-order derivatives, this extends to multiple primes, such as f''(x) for the second derivative, or more generally f^{(n)}(x) for the nth derivative, providing a compact way to represent successive differentiations.[21] This notation is particularly convenient for single-variable functions, as it treats the derivative as an operation on the function itself without explicitly referencing the variable of differentiation.[23]In contrast, the Leibniz notation, developed by Gottfried Wilhelm Leibniz, expresses the derivative of y = f(x) as \frac{dy}{dx} or \frac{df}{dx}, emphasizing the ratio of infinitesimal changes in the dependent and independent variables.[22] For higher orders, it uses \frac{d^n y}{dx^n} or \frac{d^n f}{dx^n}.[21] This form is especially useful in contexts like related rates problems, where rates of change with respect to different variables (such as time) must be related, as it naturally accommodates implicit differentiation and chain rule applications.[24]For derivatives with respect to time, particularly in physics and engineering, Newton's dot notation is standard, denoting \dot{f}(t) = \frac{df}{dt} for the first derivative and \ddot{f}(t) for the second, among higher orders.[21] In multivariable calculus, partial derivatives are conventionally written as \frac{\partial f}{\partial x}, indicating differentiation with respect to one variable while holding othersconstant.[25]The choice of notation often depends on the problem: Lagrange notation excels for abstract function analysis in single-variable calculus, while Leibniz notation facilitates problems involving interrelated variables, such as in differential equations or optimization.[22][24]
Historical notations
The development of notation for the derivative began in the late 17th century with Isaac Newton's introduction of fluxion notation, where he used a dot over the variable, such as \dot{x}, to denote the rate of change or "fluxion" of a quantity x. Newton conceived this notation around 1666 during his early work on what he called the "method of fluxions," though it was not published until 1693 in his work on quadratures and later fully in 1736.[26] This notation emphasized the temporal or geometric flow of quantities, aligning with Newton's physical and geometric perspective on calculus, but it gradually fell out of favor in favor of more algebraic and analytic approaches.Independently, Gottfried Wilhelm Leibniz developed a differential notation in 1675, using symbols like \frac{dy}{dx} or d/dx to represent the derivative as a ratio of infinitesimals, which profoundly influenced the analytic framework of calculus.[27] Although conceived in a 1675 manuscript, this notation first appeared in print in Leibniz's 1684 paper "Nova methodus pro maximis et minimis" in Acta Eruditorum, where the lowercase d signified an infinitesimal difference. Leibniz's system, with its operator-like d/dx, facilitated manipulations such as the chain rule and became the dominant notation due to its clarity in expressing differentials and its adaptability to integration and series expansions.[17]In the 18th century, Leonhard Euler employed variations including an increment-based approach with small quantities in expressions for differences, building on Newtonian and Leibnizian ideas but integrated into his analytic works, such as his use of the prime symbol f'(x) and the operator Df(x) for the derivative. These notations appeared in Euler's texts like his 1755 Institutiones calculi differentialis and highlighted infinitesimal methods, though they were eventually refined for greater conciseness in higher-order derivatives compared to emerging functional notations.[27]A significant shift occurred with Joseph-Louis Lagrange's introduction of the prime notation f' in his 1797 treatise Théorie des fonctions analytiques, where he treated the derivative as a "derived function" to emphasize the calculus of functions without relying on infinitesimals or limits.[27] This notation, which used successive primes for higher derivatives like f'', gained adoption in mathematical analysis for its simplicity and direct association with functions, influencing modern textbooks and theoretical work. These historical notations evolved into the standard modern forms like Leibniz's \frac{dy}{dx} and Lagrange's f', which remain prevalent today.
Differentiability
Conditions for differentiability
A function f: D \to \mathbb{R}, where D \subseteq \mathbb{R} is an interval, is differentiable at a point c \in D if the limit\lim_{h \to 0} \frac{f(c + h) - f(c)}{h}exists and is finite; this limit is denoted f'(c).[28] This condition is equivalent to the difference quotient approaching the same value along every sequence (h_n) in \mathbb{R} with h_n \neq 0 and h_n \to 0.[29]A function is differentiable on an interval I if it is differentiable at every point in I, with the understanding that for interior points this requires the two-sided limit to exist, while for endpoints of a closed interval one-sided limits may be used if specified.[28]Derivatives possess the intermediate value property, even if they are discontinuous: if f is differentiable on an interval I and f'(a) < \lambda < f'(b) for a, b \in I with a < b, then there exists c \in (a, b) such that f'(c) = \lambda. This result, known as Darboux's theorem, follows from the mean value theorem applied to auxiliary functions.[30]Sufficient conditions for differentiability include membership in the class C^1(I), meaning f is differentiable on I with continuous derivative f'; this implies differentiability everywhere on I.[28] A weaker condition is the Lipschitz condition: if |f(x) - f(y)| \leq K |x - y| for some constant K > 0 and all x, y \in I, then f is differentiable almost everywhere on I with respect to Lebesgue measure, by Rademacher's theorem.[31]An example of a function differentiable everywhere on \mathbb{R} but with a discontinuous derivative isf(x) =
\begin{cases}
x^2 \sin(1/x) & \text{if } x \neq 0, \\
0 & \text{if } x = 0.
\end{cases}Here, f'(x) = 2x \sin(1/x) - \cos(1/x) for x \neq 0 and f'(0) = 0, but \lim_{x \to 0} f'(x) does not exist due to the oscillation of -\cos(1/x).[28]
A fundamental result in calculus establishes that differentiability at a point implies continuity at that same point. Specifically, if a function f is differentiable at a point a in its domain, then f is continuous at a.[32]To prove this theorem, consider the definition of the derivative:f'(a) = \lim_{h \to 0} \frac{f(a + h) - f(a)}{h}.Since the limit exists and equals f'(a), multiply both sides by h (noting that \lim_{h \to 0} h = 0):\lim_{h \to 0} [f(a + h) - f(a)] = f'(a) \cdot \lim_{h \to 0} h = f'(a) \cdot 0 = 0.This shows that \lim_{h \to 0} f(a + h) = f(a), which is precisely the definition of continuity at a.[32]The converse of this theorem does not hold: a function can be continuous at a point without being differentiable there. For example, the absolute valuefunction f(x) = |x| is continuous at x = 0 because \lim_{x \to 0} |x| = 0 = f(0), but it is not differentiable at x = 0 since the left-hand derivative is -1 and the right-hand derivative is $1, so the two-sided limit does not exist.[33]This one-way implication has significant consequences in analysis: all differentiable functions are continuous, but continuity alone does not guarantee differentiability, highlighting that differentiability is a stricter condition. It plays a crucial role in theorems like the Mean Value Theorem, which requires a function to be continuous on a closed interval [a, b] and differentiable on the open interval (a, b); the implication ensures the continuity condition is satisfied on the interior points where differentiability holds.[34]Moreover, differentiability is a strictly local property: it only requires the existence of the derivative (and thus continuity) at the specific point a, without implications for behavior elsewhere in the domain.[35]
Computation of Derivatives
Derivatives of basic functions
The derivative of a constant function f(x) = c, where c is a constant, is zero. This follows from the limit definition of the derivative:f'(x) = \lim_{h \to 0} \frac{f(x + h) - f(x)}{h} = \lim_{h \to 0} \frac{c - c}{h} = \lim_{h \to 0} \frac{0}{h} = 0.This constant rule holds for any real constant c.[36]The power rule states that for a function f(x) = x^n, where n is a positive integer, the derivative is f'(x) = n x^{n-1}. To derive this using the limit definition, substitute into the definition:f'(x) = \lim_{h \to 0} \frac{(x + h)^n - x^n}{h}.Expand (x + h)^n using the binomial theorem:(x + h)^n = \sum_{k=0}^{n} \binom{n}{k} x^{n-k} h^k = x^n + n x^{n-1} h + \frac{n(n-1)}{2} x^{n-2} h^2 + \cdots + h^n.Subtract x^n and divide by h:\frac{(x + h)^n - x^n}{h} = n x^{n-1} + \frac{n(n-1)}{2} x^{n-2} h + \cdots + h^{n-1}.As h \to 0, all terms with h vanish, yielding n x^{n-1}. This derivation extends to rational exponents via roots and powers, and to real exponents through limits and continuity arguments, maintaining the form f'(x) = n x^{n-1}.[36]The derivative of the sine function is \frac{d}{dx} \sin x = \cos x. Using the limit definition:(\sin x)' = \lim_{h \to 0} \frac{\sin(x + h) - \sin x}{h}.Apply the angle addition formula: \sin(x + h) = \sin x \cos h + \cos x \sin h. Substitute:\frac{\sin x \cos h + \cos x \sin h - \sin x}{h} = \sin x \cdot \frac{\cos h - 1}{h} + \cos x \cdot \frac{\sin h}{h}.As h \to 0, \lim_{h \to 0} \frac{\cos h - 1}{h} = 0 and \lim_{h \to 0} \frac{\sin h}{h} = 1, so the limit simplifies to \cos x \cdot 1 = \cos x. Similarly, for cosine, \frac{d}{dx} \cos x = -\sin x, derived analogously using \cos(x + h) = \cos x \cos h - \sin x \sin h, yielding \lim_{h \to 0} \frac{\cos(x + h) - \cos x}{h} = -\sin x. These rely on the standard limits \lim_{h \to 0} \frac{\sin h}{h} = 1 and \lim_{h \to 0} \frac{\cos h - 1}{h} = 0.[37]The derivative of the exponential function f(x) = e^x is f'(x) = e^x. From the limit definition:(e^x)' = \lim_{h \to 0} \frac{e^{x + h} - e^x}{h} = e^x \lim_{h \to 0} \frac{e^h - 1}{h}.The key limit \lim_{h \to 0} \frac{e^h - 1}{h} = 1 defines the differentiability of e^x at the base, confirming the result. This property distinguishes the natural exponential from other bases.[38]The derivative of the natural logarithm f(x) = \ln x for x > 0 is f'(x) = \frac{1}{x}. Using the limit definition:(\ln x)' = \lim_{h \to 0^+} \frac{\ln(x + h) - \ln x}{h} = \lim_{h \to 0^+} \frac{\ln\left(1 + \frac{h}{x}\right)}{h} = \frac{1}{x} \lim_{h \to 0^+} \frac{\ln\left(1 + \frac{h}{x}\right)}{\frac{h}{x}}.Let k = \frac{h}{x}, so as h \to 0^+, k \to 0^+, and the limit becomes \frac{1}{x} \lim_{k \to 0^+} \frac{\ln(1 + k)}{k} = \frac{1}{x} \cdot 1, since \lim_{k \to 0} \frac{\ln(1 + k)}{k} = 1 follows from the definition of the derivative of \ln at 1 or the series expansion of \ln(1 + k).[38]
Derivatives of combined functions
In calculus, derivatives of combined functions are computed using specific rules that extend the differentiation of basic functions to products, quotients, compositions, and other forms. These rules, developed in the late 17th century, enable the analysis of more complex expressions without expanding them fully, preserving efficiency in calculations./02%3A_Calculus_in_the_17th_and_18th_Centuries/2.01%3A_Newton_and_Leibniz_Get_Started)The product rule applies to the derivative of a product of two differentiable functions u(x) and v(x). It states that(u v)'(x) = u'(x) v(x) + u(x) v'(x),where the prime denotes differentiation with respect to x. This formula, first articulated by Gottfried Wilhelm Leibniz in his 1684 paper Nova Methodus pro Maximis et Minimis, accounts for the rate of change of each factor while holding the other constant./02%3A_Calculus_in_the_17th_and_18th_Centuries/2.01%3A_Newton_and_Leibniz_Get_Started)The quotient rule handles the derivative of a quotient of two differentiable functions u(x) and v(x), where v(x) \neq 0. It is given by\left( \frac{u}{v} \right)'(x) = \frac{u'(x) v(x) - u(x) v'(x)}{[v(x)]^2}.Originating from the foundational work of Leibniz and Johann Bernoulli in the development of infinitesimal calculus, this rule derives from applying the product rule to u(x) \cdot [v(x)]^{-1}.[39]For compositions of functions, the chain rule computes the derivative of f(g(x)), where f and g are differentiable. The rule states(f \circ g)'(x) = f'(g(x)) \cdot g'(x).Leibniz introduced an early form of this in a 1676 memoir, formalizing it as a substitution in limits. A proof sketch proceeds by definition: the derivative is\lim_{h \to 0} \frac{f(g(x + h)) - f(g(x))}{h} = \lim_{h \to 0} \left[ \frac{f(g(x + h)) - f(g(x))}{g(x + h) - g(x)} \cdot \frac{g(x + h) - g(x)}{h} \right].Letting k = g(x + h) - g(x), as h \to 0, k \to 0 by differentiability of g, so the limit becomes f'(g(x)) \cdot g'(x).[40][41]Implicit differentiation finds \frac{dy}{dx} when y is defined implicitly by an equation F(x, y) = 0, assuming y is differentiable with respect to x. Differentiating both sides with respect to x yields\frac{dy}{dx} = -\frac{\frac{\partial F}{\partial x}}{\frac{\partial F}{\partial y}},provided \frac{\partial F}{\partial y} \neq 0. This one-variable technique, rooted in Leibniz's notation for relating differentials, treats y as a function of x and applies the chain rule to terms involving y./02%3A_Calculus_in_the_17th_and_18th_Centuries/2.01%3A_Newton_and_Leibniz_Get_Started)Logarithmic differentiation simplifies derivatives of products, quotients, or powers by taking the natural logarithm. For a function y = u(x)^{v(x)} or a product y = u(x) v(x), compute \ln y = v(x) \ln u(x) or \ln y = \ln u(x) + \ln v(x), differentiate implicitly to get \frac{1}{y} y' = right-hand side, then multiply by y. This method leverages the chain rule and properties of logarithms, particularly useful for expressions with variable exponents or multiple factors.[40]
Computation examples
To illustrate the practical computation of derivatives, the following examples apply the product rule, quotient rule, and chain rule to specific functions, followed by a related rates application involving the volume of a sphere. These computations demonstrate step-by-step differentiation and simplification where appropriate.Consider the function y = x^3 \sin(x). To find \frac{dy}{dx}, apply the product rule, which states that if y = u(x) v(x), then \frac{dy}{dx} = u'(x) v(x) + u(x) v'(x), where u(x) = x^3 and v(x) = \sin(x). The derivative of u(x) is u'(x) = 3x^2 by the power rule, and the derivative of v(x) is v'(x) = \cos(x). Substituting yields:\frac{dy}{dx} = 3x^2 \sin(x) + x^3 \cos(x).This can be factored as x^2 (3 \sin(x) + x \cos(x)) for simplification.[42]Next, differentiate y = \frac{x^2 + 1}{x - 1} using the quotient rule: if y = \frac{u(x)}{v(x)}, then \frac{dy}{dx} = \frac{u'(x) v(x) - u(x) v'(x)}{[v(x)]^2}, with u(x) = x^2 + 1 and v(x) = x - 1. Here, u'(x) = 2x and v'(x) = 1. Substituting gives:\frac{dy}{dx} = \frac{2x (x - 1) - (x^2 + 1) \cdot 1}{(x - 1)^2} = \frac{2x^2 - 2x - x^2 - 1}{(x - 1)^2} = \frac{x^2 - 2x - 1}{(x - 1)^2}.This simplified form highlights the algebraic reduction after applying the rule.[43]For the chain rule, consider y = \sin(x^2). Identify the outer function as f(u) = \sin(u) where u = x^2 is the inner function. The chain rule states \frac{dy}{dx} = f'(u) \cdot \frac{du}{dx}, so f'(u) = \cos(u) = \cos(x^2) and \frac{du}{dx} = 2x. Thus:\frac{dy}{dx} = \cos(x^2) \cdot 2x = 2x \cos(x^2).This example underscores the need to differentiate the inner function separately.[44]In related rates problems, derivatives relate rates of change over time. For an inflating spherical balloon with volume V = \frac{4}{3} \pi r^3, where r is the radius, differentiate implicitly with respect to time t: \frac{dV}{dt} = 4 \pi r^2 \frac{dr}{dt}. Suppose the volume increases at \frac{dV}{dt} = 100 \pi cubic units per second when r = 2 units. Then:$100 \pi = 4 \pi (2)^2 \frac{dr}{dt} \implies 100 \pi = 16 \pi \frac{dr}{dt} \implies \frac{dr}{dt} = \frac{100}{16} = 6.25 \text{ units per second}.This computes the radius growth rate from the known volume rate.[45]Derivatives can be verified numerically by approximating the slope via finite differences, such as f'(x) \approx \frac{f(x + h) - f(x)}{h} for small h, or by plotting the function and its derivative to check tangency. For instance, for y = x^3 \sin(x) at x = \frac{\pi}{2}, the exact derivative is approximately 7.40, while a numerical approximation with h = 0.001 yields about 7.40, confirming closeness; plotting shows the derivative curve matching the function's slope visually.[46]
Antiderivatives
Definition of antiderivative
In calculus, an antiderivative of a function f, denoted F, is a differentiable function such that its derivative equals f, that is, F'(x) = f(x) for all x in the domain of f.[47] This relationship positions antiderivation as the inverse operation to differentiation.The general form of an antiderivative incorporates an arbitrary constant C, yielding F(x) = \int f(x) \, [dx](/page/DX) + C, where the indefinite integral notation \int f(x) \, [dx](/page/DX) represents the family of all such antiderivatives.[48] This notation emphasizes that antiderivatives are unique only up to an additive constant; if F and G are two antiderivatives of f, then F(x) - G(x) = C for some constant C.[49]For basic power functions, the antiderivative of f(x) = x^n where n \neq -1 is given by\int x^n \, dx = \frac{x^{n+1}}{n+1} + C.[50] Differentiating this antiderivative returns the original function x^n, illustrating how differentiation reverses the antiderivation process while discarding the constant.[51]
The Fundamental Theorem of Calculus (FTC) establishes the profound connection between differentiation and definite integration, demonstrating that these two core operations in calculus are inverses under appropriate conditions. It consists of two parts that together justify the use of antiderivatives to evaluate definite integrals and reveal the derivative of an accumulated integral.[52][53]The first part, often called the differentiation under the integral sign theorem, states that if f is continuous on the closed interval [a, b], then the function defined byF(x) = \int_a^x f(t) \, dtis differentiable on (a, b) (and continuous on [a, b]) with derivative F'(x) = f(x) for all x \in (a, b).[52][53] This result shows that the definite integral from a fixed lower limit to a variable upper limit yields an antiderivative of f, interpreting integration as the accumulation of the rate of change given by f.[54]A standard proof sketch for the first part relies on the Mean Value Theorem for Integrals. Consider the difference quotient for F'(x):F'(x) = \lim_{h \to 0} \frac{F(x+h) - F(x)}{h} = \lim_{h \to 0} \frac{1}{h} \int_x^{x+h} f(t) \, dt.Since f is continuous on [x, x+h], the Mean Value Theorem for Integrals guarantees a point c_h \in [x, x+h] such that \int_x^{x+h} f(t) \, dt = f(c_h) \cdot h, so the quotient simplifies to f(c_h). As h \to 0, continuity of f implies c_h \to x and thus f(c_h) \to f(x), yielding F'(x) = f(x).[52][53]The second part, known as the evaluation theorem, states that if f is continuous on [a, b] and F is any antiderivative of f (so F'(x) = f(x) on [a, b]), then\int_a^b f(x) \, dx = F(b) - F(a).This allows definite integrals, representing net accumulation over [a, b], to be computed directly from the values of an antiderivative at the endpoints, bypassing explicit summation or approximation.[55][54]The continuity assumption on f ensures Riemann integrability over [a, b] and the existence of the derivative F'(x) = f(x) for all x \in (a, b). Weaker conditions, such as f being Riemann integrable on [a, b] (e.g., bounded with discontinuities on a set of measure zero), yield versions where F is differentiable almost everywhere with F'(x) = f(x) almost everywhere on (a, b) .[52][53] The FTC's implications extend to numerical methods, where it underpins algorithms for approximating integrals by estimating antiderivatives, and conceptually frames derivatives as instantaneous rates within the total change captured by integrals.[53][54]
Higher-Order Derivatives
Second and higher derivatives
The second derivative of a function f, denoted f''(x), is obtained by differentiating the first derivative f'(x) with respect to x, providing insight into the function's curvature or concavity. For a twice-differentiable function, if f''(x) > 0, the graph is concave up at x (resembling a U-shape), indicating that the function is accelerating upward; conversely, if f''(x) < 0, it is concave down (resembling an inverted U), showing downward acceleration. This interpretation extends from the first derivative's role in slope to the second's role in rate of change of slope, a concept formalized in classical calculus texts.Higher-order derivatives generalize this process: the n-th derivative f^{(n)}(x) is the result of differentiating f n times successively, capturing increasingly refined aspects of the function's behavior, such as jerk (third derivative) in kinematics or higher moments in approximation theory. These derivatives exist if the function is sufficiently smooth, typically in the class C^n of n-times continuously differentiable functions. In physics, for position s(t), the first derivative is velocity v(t) = s'(t), the second is acceleration a(t) = s''(t), and higher ones describe changes in acceleration, essential for modeling oscillatory or projectile motion. Applications include identifying inflection points, where f''(x) changes sign, marking transitions from concave up to down, which signal potential changes in the function's monotonicity or growth rate.A key application of higher derivatives is in local approximations via Taylor's theorem, which expands a function around a point a asf(x) \approx f(a) + f'(a)(x - a) + \frac{f''(a)}{2!}(x - a)^2 + \cdots + \frac{f^{(n)}(a)}{n!}(x - a)^n + R_n(x),where R_n(x) is the remainder term, allowing precise estimation of function values near a using derivative information up to order n. This theorem, attributed to Brook Taylor in 1715, underpins series expansions and numerical methods in analysis.For example, consider f(x) = x^4. The first derivative is f'(x) = 4x^3, the second is f''(x) = 12x^2 (always non-negative, indicating global concavity up), and the third is f'''(x) = 24x, which changes sign at x=0.
Notation for higher derivatives
For the second derivative of a function f(x), the prime notation extends the first derivative symbol by adding a double prime, denoted as f''(x). This convention, part of Lagrange's notation, indicates successive differentiation with respect to the independent variable x. Similarly, the third derivative is written as f'''(x), with additional primes for each higher order.[21][56]To denote the nth derivative in a more general form, the prime notation uses superscript parentheses, expressed as f^{(n)}(x). This avoids excessive primes for large n and clearly specifies the order of differentiation. An alternative in some contexts is the operator notation D^n f(x), where D represents the differentiationoperator applied n times.[21][57]In Leibniz's notation, higher derivatives generalize the first derivative form \frac{dy}{dx} to \frac{d^n y}{dx^n} for the nth order, emphasizing the ratio of infinitesimal changes raised to the power n. This is particularly useful when the function is expressed as y = f(x), as in \frac{d^2 y}{dx^2} for the second derivative. Evaluation of higher derivatives at a specific point a follows standard function notation, such as f''(a) or f^{(n)}(a).[21][56]In the context of differential equations, especially those involving time as the independent variable, Newton's dot notation is commonly employed for higher orders. The first derivative is \dot{y}, the second is \ddot{y}, and higher orders use multiple dots, such as \dddot{y} for the third. This contrasts with the prime notation y'' often used for the same second-order derivative in non-time-dependent equations. Additionally, the prime notation y'' is standard in ordinary differential equations to denote second derivatives without specifying the variable explicitly.[22][58]For functions of several variables, higher-order partial derivatives use analogous notations but with partial symbols, such as \frac{\partial^n f}{\partial x^n} or f_{xx\dots x} (with n subscripts), distinguishing them from total derivatives; these are addressed in multivariable contexts.[25]
Derivatives in Several Variables
Partial derivatives
In multivariable calculus, the partial derivative of a function f: \mathbb{R}^n \to \mathbb{R} with respect to one of its variables, say x_i, at a point (a_1, \dots, a_n) is defined as the limit\frac{\partial f}{\partial x_i}(a_1, \dots, a_n) = \lim_{h \to 0} \frac{f(a_1, \dots, a_i + h, \dots, a_n) - f(a_1, \dots, a_n)}{h},provided the limit exists; this measures the rate of change of f in the direction of the x_i-axis while holding all other variables constant.[59] The definition extends the single-variable derivative by fixing the other inputs, analogous to the ordinary limit-based derivative but applied to a univariate slice of the function.[60]Common notation for the partial derivative of f with respect to x in a function of two variables f(x, y) includes the Leibniz symbol \frac{\partial f}{\partial x} or the subscript form f_x; subscripts are extended for higher dimensions, such as f_{x y} for a second-order mixed partial.[61] To compute a partial derivative, treat all variables except the one of interest as constants and apply the rules of single-variable differentiation, such as the power rule or chain rule.[62]For example, consider f(x, y) = x^2 y + \sin y; the partial with respect to x is \frac{\partial f}{\partial x} = 2 x y, obtained by differentiating x^2 y as if y were constant and treating \sin y as such, while the partial with respect to y is \frac{\partial f}{\partial y} = x^2 + \cos y, differentiating x^2 y using the product rule with x^2 constant and \sin y directly.[59]Higher-order partial derivatives are obtained by successive partial differentiation; for a second-order mixed partial, such as \frac{\partial^2 f}{\partial x \partial y}, first compute \frac{\partial f}{\partial y} and then take its partial with respect to x.[60] Under suitable conditions, Clairaut's theorem states that if the mixed partial derivatives \frac{\partial^2 f}{\partial x \partial y} and \frac{\partial^2 f}{\partial y \partial x} both exist and are continuous in a neighborhood of a point, then they are equal at that point.[63] This equality holds for most continuously differentiable functions encountered in applications.[64]
Directional derivatives
The directional derivative of a scalar-valued function f: \mathbb{R}^n \to \mathbb{R} at a point \mathbf{a} in the direction of a unit vector \mathbf{u} is defined asD_{\mathbf{u}} f(\mathbf{a}) = \lim_{h \to 0} \frac{f(\mathbf{a} + h \mathbf{u}) - f(\mathbf{a})}{h},provided the limit exists.[65] This measures the instantaneous rate of change of f at \mathbf{a} as one moves along the line through \mathbf{a} in the direction \mathbf{u}.[66]Geometrically, the directional derivative D_{\mathbf{u}} f(\mathbf{a}) represents the slope of the tangent line to the curve obtained by restricting f to the line passing through \mathbf{a} in the direction \mathbf{u}.[67] Partial derivatives are special cases of directional derivatives, corresponding to directions along the coordinate axes.[68]If f is differentiable at \mathbf{a}, then the directional derivative exists in every direction \mathbf{u} and equals the dot product of the gradient vector \nabla f(\mathbf{a}) with \mathbf{u}.[65] However, the existence of partial derivatives at \mathbf{a} does not ensure that directional derivatives exist in all directions; full differentiability of f at \mathbf{a} is required for directional derivatives to exist universally.[69]For instance, consider f(x,y) = xy at the point (1,1) in the direction \mathbf{u} = \left( \frac{1}{\sqrt{2}}, \frac{1}{\sqrt{2}} \right). Substituting into the definition yieldsD_{\mathbf{u}} f(1,1) = \lim_{h \to 0} \frac{(1 + h/\sqrt{2})(1 + h/\sqrt{2}) - 1}{h} = \lim_{h \to 0} \left( \frac{2h}{\sqrt{2}} + \frac{h^2}{2} \right) / h = \sqrt{2}.This value indicates the rate of change of f along the line y = x at (1,1).[65]
Total derivative
The total derivative provides the best linear approximation to a multivariable function near a given point, capturing how the function changes when all input variables vary simultaneously. For a function f: \mathbb{R}^n \to \mathbb{R}^m defined on an open set containing a \in \mathbb{R}^n, the total derivative at a is the linear transformation Df(a): \mathbb{R}^n \to \mathbb{R}^m satisfyingDf(a)(h) = \lim_{t \to 0} \frac{f(a + t h) - f(a)}{t}for all h \in \mathbb{R}^n, provided the limit exists uniformly.[70] Equivalently, f is differentiable at a if there exists such a linear map where\lim_{h \to 0} \frac{\|f(a + h) - f(a) - Df(a)(h)\|}{\|h\|} = 0,with the norm denoting the Euclidean norm on \mathbb{R}^n or \mathbb{R}^m.[71] This linear map is unique if it exists and implies that f is continuous at a.[71]In coordinates, for a scalar-valued function f: \mathbb{R}^n \to \mathbb{R}, the total derivative manifests as the total differentialdf(a) = \sum_{i=1}^n \frac{\partial f}{\partial x_i}(a) \, dx_i,where dx_i represent infinitesimal changes in the variables x_i, and the partial derivatives \partial f / \partial x_i are evaluated at a. The total derivative exists at a if all partial derivatives exist in some neighborhood of a and are continuous at a, in which case Df(a) is given by the linear map whose components are these partials.[71] For vector-valued functions, the definition extends componentwise, with each component of Df(a)(h) following the scalar case.As an illustrative example, consider f(x, y) = x^2 + y^2: \mathbb{R}^2 \to \mathbb{R}. The total derivative at (1, 1) is the linear map Df(1,1): \mathbb{R}^2 \to \mathbb{R} such that Df(1,1)(h, k) = 2h + 2k, since the partials are \partial f / \partial x = 2x and \partial f / \partial y = 2y, yielding the total differential df = 2x \, dx + 2y \, dy at (1,1).[70] This approximates the change: f(1 + h, 1 + k) \approx f(1,1) + 2h + 2k = 2 + 2h + 2k for small h, k. The directional derivative of f at a in the direction of a unit vector u is then simply Df(a)(u), extracting a scalar projection of the total change.The total derivative in \mathbb{R}^n serves as the finite-dimensional instance of the Fréchet derivative, which generalizes differentiability to mappings between normed vector spaces via the same limit condition using general norms.[71]
Jacobian matrix
The Jacobian matrix of a differentiable function f: \mathbb{R}^n \to \mathbb{R}^m at a point a \in \mathbb{R}^n is the m \times n matrix whose entries are the partial derivatives of the component functions of f, specifically J_f(a)_{ij} = \frac{\partial f_i}{\partial x_j}(a) for i = 1, \dots, m and j = 1, \dots, n.[72][73] This matrix provides the concrete matrix representation of the total derivative of f at a, which is the linear map Df(a): \mathbb{R}^n \to \mathbb{R}^m given by matrix-vector multiplication with J_f(a).[74]When m = 1, so f: \mathbb{R}^n \to \mathbb{R} is a scalar-valued function, the Jacobian matrix J_f(a) reduces to a $1 \times n row vector consisting of the partial derivatives \left( \frac{\partial f}{\partial x_1}(a), \dots, \frac{\partial f}{\partial x_n}(a) \right), which is precisely the gradientvector \nabla f(a).[72][73]A key property of the Jacobian matrix is its behavior under composition of functions. If g: \mathbb{R}^k \to \mathbb{R}^n and f: \mathbb{R}^n \to \mathbb{R}^m are differentiable at points b \in \mathbb{R}^k and a = g(b) \in \mathbb{R}^n, respectively, then the chain rule states that the Jacobian of the composition f \circ g at b is the matrix product J_{f \circ g}(b) = J_f(a) \, J_g(b).[74][73]For example, consider the function f: \mathbb{R}^2 \to \mathbb{R}^2 defined by f(x, y) = (xy, x + y). The Jacobian matrix at a point (x, y) isJ_f(x, y) = \begin{pmatrix} y & x \\ 1 & 1 \end{pmatrix}.This follows directly from computing the partial derivatives of each component.[72]The Jacobian matrix plays a central role in the inverse function theorem for functions between Euclidean spaces of the same dimension. Specifically, if f: \mathbb{R}^n \to \mathbb{R}^n is continuously differentiable near a point a and \det J_f(a) \neq 0, then f is locally invertible near a, with the inverse also being continuously differentiable, and the Jacobian of the inverse at f(a) is the inverse matrix (J_f(a))^{-1}.[74][73]
Vector-valued functions
A vector-valued function, often denoted as \mathbf{r}(t) = (x_1(t), x_2(t), \dots, x_n(t)), maps a scalar parameter t to a point in \mathbb{R}^n. Its derivative is defined componentwise as \mathbf{r}'(t) = \lim_{h \to 0} \frac{\mathbf{r}(t+h) - \mathbf{r}(t)}{h} = (x_1'(t), x_2'(t), \dots, x_n'(t)), provided the limit exists for each component. This derivative vector represents the instantaneous rate of change of the position and points in the direction of the tangent to the curve traced by \mathbf{r}(t) at t. The magnitude \|\mathbf{r}'(t)\| gives the speed of the parametrization along the curve.[75][76]The tangent vector \mathbf{r}'(t) indicates the direction of motion at each point on the curve, and its normalization \mathbf{T}(t) = \frac{\mathbf{r}'(t)}{\|\mathbf{r}'(t)\|} provides the unittangent vector, which is useful for describing the orientation without regard to speed. For arc length parameterization, the parameter s is chosen such that the speed is constant and equal to 1, i.e., \|\mathbf{r}'(s)\| = 1, ensuring that increments in s correspond directly to distances traveled along the curve; this is achieved by reparametrizing via the arc length function s(t) = \int_a^t \|\mathbf{r}'(u)\| \, du. Such parametrizations simplify calculations in differential geometry, like curvature.A classic example is the helix parametrized by \mathbf{r}(t) = (\cos t, \sin t, t), where the derivative is \mathbf{r}'(t) = (-\sin t, \cos t, 1), with constant speed \|\mathbf{r}'(t)\| = \sqrt{2}. To obtain an arc length parametrization, rescale the parameter by s = \sqrt{2} t, yielding \mathbf{r}(s/\sqrt{2}) = (\cos(s/\sqrt{2}), \sin(s/\sqrt{2}), s/\sqrt{2}), now with \|\mathbf{r}'(s)\| = 1. For compositions involving vector-valued functions, the multivariable chain rule applies: if \mathbf{f}(\mathbf{u}(t)) where \mathbf{u}(t) is vector-valued, the derivative is the Jacobian matrix of \mathbf{f} evaluated at \mathbf{u}(t) multiplied by \mathbf{u}'(t).[76][75]
Generalizations
Derivatives in normed spaces
In normed vector spaces, the classical notion of the derivative is extended to functions f: X \to Y, where X and Y are normed vector spaces over the real or complex numbers, and the domain of f is an open subset of X. The Fréchet derivative of f at a point a \in X is defined as a bounded linear map L: X \to Y satisfying\lim_{h \to 0} \frac{\|f(a + h) - f(a) - L(h)\|_Y}{\|h\|_X} = 0.This condition ensures that L provides the best uniform linear approximation to f in a neighborhood of a, generalizing the first-order Taylor expansion to infinite-dimensional settings. The map L is unique if it exists and belongs to the space of bounded linear operators from X to Y.[77]When X and Y are complete (i.e., Banach spaces), the Fréchet derivative takes values in the Banach space \mathcal{L}(X, Y) of bounded linear operators equipped with the operator norm \|L\| = \sup_{\|h\|_X \leq 1} \|L(h)\|_Y. This framework allows for powerful results analogous to finite-dimensional calculus, such as the chain rule and inverse function theorem under appropriate conditions. The total derivative for functions between finite-dimensional normed spaces is a special case of this construction.A related but weaker concept is the Gâteaux derivative, introduced by René Gâteaux, which at a point a assigns to each direction h \in X the directional limitL(h) = \lim_{t \to 0} \frac{f(a + th) - f(a)}{t},provided the limit exists for all h. Unlike the Fréchet derivative, the Gâteaux derivative need not be uniform over directions and does not imply continuity of f at a, though if f is Gâteaux differentiable in a neighborhood and the derivative is continuous, then f is Fréchet differentiable.A simple example occurs for bounded linear maps f: X \to Y between normed spaces, which are Fréchet differentiable everywhere with derivative equal to f itself, since f(a + h) - f(a) - f(h) = 0 for all a, h. In Hilbert spaces, the Riesz representation theorem further specifies that every bounded linear functional (a special case where Y is the scalars) on a Hilbert space H is of the form L(h) = \langle h, g \rangle for some fixed g \in H, where \langle \cdot, \cdot \rangle is the inner product. This representation aids in explicitly computing derivatives for quadratic forms and other inner product-based functions.
Distributional derivatives
In the theory of distributions, introduced by Laurent Schwartz, a distribution is defined as a continuous linear functional on the space of test functions \mathcal{D}(\Omega), consisting of infinitely differentiable functions with compact support in an open set \Omega \subseteq \mathbb{R}^n.[78] This framework extends the notion of functions to generalized objects that can handle singularities and discontinuities, allowing differentiation even when classical derivatives do not exist.[79]The distributional derivative of a distribution T is uniquely defined by the relation\langle T', \phi \rangle = -\langle T, \phi' \ranglefor every test function \phi \in \mathcal{D}(\Omega), where \langle \cdot, \cdot \rangle denotes the action of the distribution on the test function.[79] This definition satisfies the product rule and chain rule in the distributional sense, and for sufficiently smooth functions f, the distributional derivative coincides with the classical derivative via integration by parts, without boundary terms due to the compact support of \phi.[80] Higher-order distributional derivatives are obtained by iterated application of this operator, preserving linearity and continuity.[78]A prominent example is the Heaviside step function H(x), defined as H(x) = 0 for x < 0 and H(x) = 1 for x \geq 0, which is not classically differentiable at x = 0. Its distributional derivative is the Dirac delta distribution \delta, satisfying\langle H', \phi \rangle = -\int_0^\infty \phi'(x) \, dx = \phi(0) = \langle \delta, \phi \ranglefor all test functions \phi.[81] This illustrates how distributional derivatives capture impulsive behavior at discontinuities, with \delta acting as a "point mass" that integrates test functions to their value at the origin.[82]Distributional derivatives underpin the definition of Sobolev spaces W^{k,p}(\Omega), which comprise functions u \in L^p(\Omega) such that all weak (or distributional) derivatives up to order k belong to L^p(\Omega), equipped with a norm incorporating these derivatives.[83] The weak derivative D^\alpha u of order \alpha (with |\alpha| \leq k) satisfies\int_\Omega u \, D^\alpha \phi \, dx = (-1)^{|\alpha|} \int_\Omega (D^\alpha u) \phi \, dxfor all \phi \in \mathcal{D}(\Omega), generalizing integration by parts to functions lacking classical smoothness.[84] These spaces enable the study of functions with controlled irregularity, such as those in H^k(\Omega) = W^{k,2}(\Omega), which form Hilbert spaces useful for variational formulations.[85]In applications to partial differential equations (PDEs), distributional derivatives are essential for defining weak solutions where classical derivatives fail, such as in hyperbolic conservation laws exhibiting shock waves or discontinuities.[86] For instance, the Burgers' equation u_t + (u^2/2)_x = 0 admits entropy solutions in the distributional sense, allowing existence and uniqueness proofs via mollification and passage to limits, even across shocks where pointwise derivatives diverge.[87] This approach facilitates the analysis of fundamental solutions and Green's functions for elliptic and hyperbolic PDEs, bridging generalized functions with physical phenomena like wave propagation.[88]