Fact-checked by Grok 2 weeks ago

Derivative

In , particularly in , the derivative of a measures the instantaneous rate of change of the with respect to one of its variables, equivalent to the of the line to the 's at a given point. Formally, for a f, the derivative f'(x) at a point x is defined as the
f'(x) = \lim_{h \to 0} \frac{f(x + h) - f(x)}{h},
provided this limit exists, representing the sensitivity of the 's output to changes in its input. This concept underpins much of and extends to higher-order derivatives, such as the second derivative f''(x), which describes concavity or acceleration-like behavior.
The development of the derivative traces back to the late 17th century, when and independently formulated the foundations of , introducing notation like \frac{dy}{dx} for the derivative and applying it to problems in physics and . Earlier precursors, including work by mathematicians like on tangents and the Persian scholar Sharaf al-Dīn al-Ṭūsī on cubic polynomials in the , anticipated aspects of instantaneous rates of change, but and Leibniz's rigorous revolutionized . later provided a more precise limit-based definition in the , solidifying its role in . Derivatives have broad applications across , , and , enabling the modeling of dynamic systems and optimization. In physics, the first derivative of position with respect to time yields , while the second derivative gives , fundamental to and Newtonian . In economics, derivatives quantify marginal quantities, such as the rate of change in or functions, aiding in production and pricing. They also support techniques like for solving real-world problems involving varying quantities and Newton's method for numerical root-finding of equations.

Definition

As a limit

In , the derivative of a f at a point a in its is formally defined as the f'(a) = \lim_{h \to 0} \frac{f(a + h) - f(a)}{h}, provided this limit exists. This definition applies to functions f defined on an open interval containing a, where the limit represents the instantaneous rate of change of f at a. The expression \frac{f(a + h) - f(a)}{h} is known as the , which measures the of change of f over the small [a, a + h]. Geometrically, this quotient equals the of the connecting the points (a, f(a)) and (a + h, f(a + h)) on the graph of f. As h approaches 0, the approaches the line to the curve at (a, f(a)), so the derivative f'(a) gives the of this line. For the limit to exist at an interior point a, the two one-sided limits must agree: the right-hand derivative \lim_{h \to 0^+} \frac{f(a + h) - f(a)}{h} and the left-hand derivative \lim_{h \to 0^-} \frac{f(a + h) - f(a)}{h} must both exist and be equal. At an endpoint of the domain, such as the left endpoint of a closed interval, differentiability is defined using the appropriate one-sided limit, typically the right-hand derivative. Consider the example of f(x) = x^2 at a = 1. The derivative is f'(1) = \lim_{h \to 0} \frac{(1 + h)^2 - 1^2}{h}. First, expand the numerator: (1 + h)^2 - 1 = 1 + 2h + h^2 - 1 = 2h + h^2. Then, f'(1) = \lim_{h \to 0} \frac{2h + h^2}{h} = \lim_{h \to 0} \frac{h(2 + h)}{h} = \lim_{h \to 0} (2 + h) = 2, since h \neq 0 in the intermediate step, and the exists.

Using infinitesimals

introduced the concept of infinitesimals in the late as a foundational tool for his development of , viewing them as quantities smaller than any finite number but nonzero, which allowed for the representation of instantaneous rates of change in continuous motion. In his 1684 paper "Nova Methodus pro Maximis et Minimis," Leibniz employed infinitesimals to derive tangents and areas, treating differentials like dx as infinitesimal increments that model the "momentary endeavor" of and variation in functions. Although criticized for lacking rigor, this approach provided an intuitive framework for that influenced early practitioners until the 19th-century shift to -based definitions resolved foundational issues. An intuitive definition of the derivative using expresses it as the ratio of infinitesimal changes: for a f, the derivative at x is f'(x) = \frac{f(x + dx) - f(x)}{dx}, where dx is an infinitesimal quantity approaching zero but treated as nonzero in computations. This heuristic avoids explicit limits by directly manipulating small increments, aligning with Leibniz's original vision of as a method for handling continuous change through such "fictions." The modern rigorous revival of infinitesimals occurred in the through Abraham Robinson's non-standard analysis, which constructs the hyperreal numbers *\mathbb{R} as an extension of the reals incorporating genuine infinitesimals and infinite numbers via ultrapowers of real sequences. Hyperreals form a totally where finite elements are those bounded by standard reals, and infinitesimals are nonzero hyperreals smaller in absolute value than any positive real. Central to this framework is the , which states that a sentence holds in the reals if and only if its non-standard counterpart holds in the hyperreals, enabling the rigorous importation of standard theorems into the extended system. This approach offers advantages in intuition by permitting direct computations with infinitesimals, bypassing the complexities of epsilon-delta limits while preserving logical equivalence to standard analysis, thus aiding pedagogical clarity and simplifying proofs of calculus rules. For example, the derivative of \sin(x) at a real number x can be found by considering an infinitesimal \epsilon \in {}^*\mathbb{R} with \epsilon \approx 0 but \epsilon \neq 0: \frac{\sin(x + \epsilon) - \sin(x)}{\epsilon} = \frac{\sin(x)\cos(\epsilon) + \cos(x)\sin(\epsilon) - \sin(x)}{\epsilon} = \sin(x) \cdot \frac{\cos(\epsilon) - 1}{\epsilon} + \cos(x) \cdot \frac{\sin(\epsilon)}{\epsilon}. By the transfer principle, \cos(\epsilon) \approx 1 and \sin(\epsilon) \approx \epsilon, so the first term is approximately zero and the second is \cos(x), yielding \sin'(x) = \cos(x).

Notation and Representation

Standard notations

In mathematical analysis, the derivative of a function is expressed using several standard notations, each suited to different contexts in calculus and its applications. The two primary notations are the Lagrange notation and the Leibniz notation, which are widely adopted in textbooks and research for denoting the rate of change of a function. The Lagrange notation, introduced by Joseph-Louis Lagrange, denotes the first derivative of a function f at a point x as f'(x), where the prime symbol indicates differentiation with respect to the independent variable. For higher-order derivatives, this extends to multiple primes, such as f''(x) for the second derivative, or more generally f^{(n)}(x) for the nth derivative, providing a compact way to represent successive differentiations. This notation is particularly convenient for single-variable functions, as it treats the derivative as an operation on the function itself without explicitly referencing the variable of differentiation. In contrast, the Leibniz notation, developed by , expresses the derivative of y = f(x) as \frac{dy}{dx} or \frac{df}{dx}, emphasizing the ratio of infinitesimal changes in the . For higher orders, it uses \frac{d^n y}{dx^n} or \frac{d^n f}{dx^n}. This form is especially useful in contexts like problems, where rates of change with respect to different variables (such as time) must be related, as it naturally accommodates implicit and applications. For derivatives with respect to time, particularly in physics and engineering, Newton's dot notation is standard, denoting \dot{f}(t) = \frac{df}{dt} for the first derivative and \ddot{f}(t) for the second, among higher orders. In , partial derivatives are conventionally written as \frac{\partial f}{\partial x}, indicating with respect to one while holding . The choice of notation often depends on the problem: Lagrange notation excels for abstract function analysis in single-variable , while Leibniz notation facilitates problems involving interrelated variables, such as in differential equations or optimization.

Historical notations

The development of notation for the derivative began in the late 17th century with Isaac 's introduction of fluxion notation, where he used a dot over the variable, such as \dot{x}, to denote the rate of change or "fluxion" of a quantity x. Newton conceived this notation around 1666 during his early work on what he called the "," though it was not published until 1693 in his work on quadratures and later fully in 1736. This notation emphasized the temporal or geometric flow of quantities, aligning with Newton's physical and geometric perspective on , but it gradually fell out of favor in favor of more algebraic and analytic approaches. Independently, developed a notation in 1675, using symbols like \frac{dy}{dx} or d/dx to represent the derivative as a of infinitesimals, which profoundly influenced the analytic framework of . Although conceived in a 1675 , this notation first appeared in print in Leibniz's 1684 "Nova methodus pro maximis et minimis" in Acta Eruditorum, where the lowercase d signified an infinitesimal difference. Leibniz's system, with its operator-like d/dx, facilitated manipulations such as the chain rule and became the dominant notation due to its clarity in expressing s and its adaptability to and series expansions. In the 18th century, Leonhard Euler employed variations including an increment-based approach with small quantities in expressions for differences, building on Newtonian and Leibnizian ideas but integrated into his analytic works, such as his use of the prime symbol f'(x) and the operator Df(x) for the derivative. These notations appeared in Euler's texts like his 1755 Institutiones calculi differentialis and highlighted infinitesimal methods, though they were eventually refined for greater conciseness in higher-order derivatives compared to emerging functional notations. A significant shift occurred with Joseph-Louis Lagrange's introduction of the prime notation f' in his 1797 treatise Théorie des fonctions analytiques, where he treated the derivative as a "derived function" to emphasize the of functions without relying on infinitesimals or limits. This notation, which used successive primes for higher derivatives like f'', gained adoption in for its simplicity and direct association with s, influencing modern textbooks and theoretical work. These historical notations evolved into the standard modern forms like Leibniz's \frac{dy}{dx} and Lagrange's f', which remain prevalent today.

Differentiability

Conditions for differentiability

A function f: D \to \mathbb{R}, where D \subseteq \mathbb{R} is an interval, is differentiable at a point c \in D if the limit \lim_{h \to 0} \frac{f(c + h) - f(c)}{h} exists and is finite; this limit is denoted f'(c). This condition is equivalent to the difference quotient approaching the same value along every sequence (h_n) in \mathbb{R} with h_n \neq 0 and h_n \to 0. A is differentiable on an I if it is differentiable at every point in I, with the understanding that for interior points this requires the two-sided to exist, while for endpoints of a closed one-sided limits may be used if specified. Derivatives possess the intermediate value property, even if they are discontinuous: if f is differentiable on an I and f'(a) < \lambda < f'(b) for a, b \in I with a < b, then there exists c \in (a, b) such that f'(c) = \lambda. This result, known as Darboux's theorem, follows from the mean value theorem applied to auxiliary functions. Sufficient conditions for differentiability include membership in the class C^1(I), meaning f is differentiable on I with continuous derivative f'; this implies differentiability everywhere on I. A weaker condition is the Lipschitz condition: if |f(x) - f(y)| \leq K |x - y| for some constant K > 0 and all x, y \in I, then f is differentiable on I with respect to , by . An example of a differentiable everywhere on \mathbb{R} but with a discontinuous derivative is f(x) = \begin{cases} x^2 \sin(1/x) & \text{if } x \neq 0, \\ 0 & \text{if } x = 0. \end{cases} Here, f'(x) = 2x \sin(1/x) - \cos(1/x) for x \neq 0 and f'(0) = 0, but \lim_{x \to 0} f'(x) does not exist due to the of -\cos(1/x).

Relation to

A fundamental result in establishes that differentiability at a point implies at that same point. Specifically, if a f is differentiable at a point a in its domain, then f is continuous at a. To prove this theorem, consider the definition of the derivative: f'(a) = \lim_{h \to 0} \frac{f(a + h) - f(a)}{h}. Since the exists and equals f'(a), multiply both sides by h (noting that \lim_{h \to 0} h = 0): \lim_{h \to 0} [f(a + h) - f(a)] = f'(a) \cdot \lim_{h \to 0} h = f'(a) \cdot 0 = 0. This shows that \lim_{h \to 0} f(a + h) = f(a), which is precisely the definition of at a. The of this does not hold: a can be continuous at a point without being differentiable there. For example, the f(x) = |x| is continuous at x = 0 because \lim_{x \to 0} |x| = 0 = f(0), but it is not differentiable at x = 0 since the left-hand derivative is -1 and the right-hand derivative is $1, so the two-sided does not exist. This one-way implication has significant consequences in : all differentiable functions are , but alone does not guarantee differentiability, highlighting that differentiability is a stricter condition. It plays a crucial role in theorems like the , which requires a function to be continuous on a closed interval [a, b] and differentiable on the open interval (a, b); the implication ensures the continuity condition is satisfied on the interior points where differentiability holds. Moreover, differentiability is a strictly local property: it only requires the existence of the derivative (and thus ) at the specific point a, without implications for behavior elsewhere in the .

Computation of Derivatives

Derivatives of basic functions

The derivative of a f(x) = c, where c is a constant, is zero. This follows from the limit definition of the derivative: f'(x) = \lim_{h \to 0} \frac{f(x + h) - f(x)}{h} = \lim_{h \to 0} \frac{c - c}{h} = \lim_{h \to 0} \frac{0}{h} = 0. This constant rule holds for any real constant c. The power rule states that for a function f(x) = x^n, where n is a positive integer, the derivative is f'(x) = n x^{n-1}. To derive this using the limit definition, substitute into the definition: f'(x) = \lim_{h \to 0} \frac{(x + h)^n - x^n}{h}. Expand (x + h)^n using the binomial theorem: (x + h)^n = \sum_{k=0}^{n} \binom{n}{k} x^{n-k} h^k = x^n + n x^{n-1} h + \frac{n(n-1)}{2} x^{n-2} h^2 + \cdots + h^n. Subtract x^n and divide by h: \frac{(x + h)^n - x^n}{h} = n x^{n-1} + \frac{n(n-1)}{2} x^{n-2} h + \cdots + h^{n-1}. As h \to 0, all terms with h vanish, yielding n x^{n-1}. This derivation extends to rational exponents via roots and powers, and to real exponents through limits and continuity arguments, maintaining the form f'(x) = n x^{n-1}. The derivative of the sine function is \frac{d}{dx} \sin x = \cos x. Using the limit definition: (\sin x)' = \lim_{h \to 0} \frac{\sin(x + h) - \sin x}{h}. Apply the angle addition formula: \sin(x + h) = \sin x \cos h + \cos x \sin h. Substitute: \frac{\sin x \cos h + \cos x \sin h - \sin x}{h} = \sin x \cdot \frac{\cos h - 1}{h} + \cos x \cdot \frac{\sin h}{h}. As h \to 0, \lim_{h \to 0} \frac{\cos h - 1}{h} = 0 and \lim_{h \to 0} \frac{\sin h}{h} = 1, so the limit simplifies to \cos x \cdot 1 = \cos x. Similarly, for cosine, \frac{d}{dx} \cos x = -\sin x, derived analogously using \cos(x + h) = \cos x \cos h - \sin x \sin h, yielding \lim_{h \to 0} \frac{\cos(x + h) - \cos x}{h} = -\sin x. These rely on the standard limits \lim_{h \to 0} \frac{\sin h}{h} = 1 and \lim_{h \to 0} \frac{\cos h - 1}{h} = 0. The derivative of the exponential function f(x) = e^x is f'(x) = e^x. From the limit definition: (e^x)' = \lim_{h \to 0} \frac{e^{x + h} - e^x}{h} = e^x \lim_{h \to 0} \frac{e^h - 1}{h}. The key limit \lim_{h \to 0} \frac{e^h - 1}{h} = 1 defines the differentiability of e^x at the base, confirming the result. This property distinguishes the natural exponential from other bases. The derivative of the natural logarithm f(x) = \ln x for x > 0 is f'(x) = \frac{1}{x}. Using the limit definition: (\ln x)' = \lim_{h \to 0^+} \frac{\ln(x + h) - \ln x}{h} = \lim_{h \to 0^+} \frac{\ln\left(1 + \frac{h}{x}\right)}{h} = \frac{1}{x} \lim_{h \to 0^+} \frac{\ln\left(1 + \frac{h}{x}\right)}{\frac{h}{x}}. Let k = \frac{h}{x}, so as h \to 0^+, k \to 0^+, and the limit becomes \frac{1}{x} \lim_{k \to 0^+} \frac{\ln(1 + k)}{k} = \frac{1}{x} \cdot 1, since \lim_{k \to 0} \frac{\ln(1 + k)}{k} = 1 follows from the definition of the derivative of \ln at 1 or the series expansion of \ln(1 + k).

Derivatives of combined functions

In calculus, derivatives of combined functions are computed using specific rules that extend the differentiation of basic functions to products, quotients, compositions, and other forms. These rules, developed in the late , enable the analysis of more complex expressions without expanding them fully, preserving efficiency in calculations./02%3A_Calculus_in_the_17th_and_18th_Centuries/2.01%3A_Newton_and_Leibniz_Get_Started) The applies to the derivative of a product of two differentiable functions u(x) and v(x). It states that (u v)'(x) = u'(x) v(x) + u(x) v'(x), where the prime denotes with respect to x. This formula, first articulated by in his 1684 paper Nova Methodus pro Maximis et Minimis, accounts for the rate of change of each factor while holding the other constant./02%3A_Calculus_in_the_17th_and_18th_Centuries/2.01%3A_Newton_and_Leibniz_Get_Started) The quotient rule handles the derivative of a quotient of two differentiable functions u(x) and v(x), where v(x) \neq 0. It is given by \left( \frac{u}{v} \right)'(x) = \frac{u'(x) v(x) - u(x) v'(x)}{[v(x)]^2}. Originating from the foundational work of Leibniz and Johann Bernoulli in the development of infinitesimal calculus, this rule derives from applying the product rule to u(x) \cdot [v(x)]^{-1}. For compositions of functions, the chain rule computes the derivative of f(g(x)), where f and g are differentiable. The rule states (f \circ g)'(x) = f'(g(x)) \cdot g'(x). Leibniz introduced an early form of this in a 1676 , formalizing it as a in limits. A proof sketch proceeds by definition: the derivative is \lim_{h \to 0} \frac{f(g(x + h)) - f(g(x))}{h} = \lim_{h \to 0} \left[ \frac{f(g(x + h)) - f(g(x))}{g(x + h) - g(x)} \cdot \frac{g(x + h) - g(x)}{h} \right]. Letting k = g(x + h) - g(x), as h \to 0, k \to 0 by differentiability of g, so the limit becomes f'(g(x)) \cdot g'(x). Implicit differentiation finds \frac{dy}{dx} when y is defined implicitly by an F(x, y) = 0, assuming y is differentiable with respect to x. Differentiating both sides with respect to x yields \frac{dy}{dx} = -\frac{\frac{\partial F}{\partial x}}{\frac{\partial F}{\partial y}}, provided \frac{\partial F}{\partial y} \neq 0. This one-variable technique, rooted in for relating differentials, treats y as a of x and applies the chain rule to terms involving y./02%3A_Calculus_in_the_17th_and_18th_Centuries/2.01%3A_Newton_and_Leibniz_Get_Started) Logarithmic differentiation simplifies derivatives of products, quotients, or powers by taking the natural logarithm. For a function y = u(x)^{v(x)} or a product y = u(x) v(x), compute \ln y = v(x) \ln u(x) or \ln y = \ln u(x) + \ln v(x), differentiate implicitly to get \frac{1}{y} y' = right-hand side, then multiply by y. This method leverages the chain rule and properties of logarithms, particularly useful for expressions with variable exponents or multiple factors.

Computation examples

To illustrate the practical computation of derivatives, the following examples apply the , , and to specific functions, followed by a related rates application involving of a . These computations demonstrate step-by-step and simplification where appropriate. Consider the function y = x^3 \sin(x). To find \frac{dy}{dx}, apply the , which states that if y = u(x) v(x), then \frac{dy}{dx} = u'(x) v(x) + u(x) v'(x), where u(x) = x^3 and v(x) = \sin(x). The derivative of u(x) is u'(x) = 3x^2 by the power rule, and the derivative of v(x) is v'(x) = \cos(x). Substituting yields: \frac{dy}{dx} = 3x^2 \sin(x) + x^3 \cos(x). This can be factored as x^2 (3 \sin(x) + x \cos(x)) for simplification. Next, differentiate y = \frac{x^2 + 1}{x - 1} using the : if y = \frac{u(x)}{v(x)}, then \frac{dy}{dx} = \frac{u'(x) v(x) - u(x) v'(x)}{[v(x)]^2}, with u(x) = x^2 + 1 and v(x) = x - 1. Here, u'(x) = 2x and v'(x) = 1. Substituting gives: \frac{dy}{dx} = \frac{2x (x - 1) - (x^2 + 1) \cdot 1}{(x - 1)^2} = \frac{2x^2 - 2x - x^2 - 1}{(x - 1)^2} = \frac{x^2 - 2x - 1}{(x - 1)^2}. This simplified form highlights the algebraic reduction after applying the rule. For the chain rule, consider y = \sin(x^2). Identify the outer function as f(u) = \sin(u) where u = x^2 is the inner function. The chain rule states \frac{dy}{dx} = f'(u) \cdot \frac{du}{dx}, so f'(u) = \cos(u) = \cos(x^2) and \frac{du}{dx} = 2x. Thus: \frac{dy}{dx} = \cos(x^2) \cdot 2x = 2x \cos(x^2). This example underscores the need to differentiate the inner function separately. In related rates problems, derivatives relate rates of change over time. For an inflating spherical balloon with volume V = \frac{4}{3} \pi r^3, where r is the radius, differentiate implicitly with respect to time t: \frac{dV}{dt} = 4 \pi r^2 \frac{dr}{dt}. Suppose the volume increases at \frac{dV}{dt} = 100 \pi cubic units per second when r = 2 units. Then: $100 \pi = 4 \pi (2)^2 \frac{dr}{dt} \implies 100 \pi = 16 \pi \frac{dr}{dt} \implies \frac{dr}{dt} = \frac{100}{16} = 6.25 \text{ units per second}. This computes the radius growth rate from the known volume rate. Derivatives can be verified numerically by approximating the via finite differences, such as f'(x) \approx \frac{f(x + h) - f(x)}{h} for small h, or by plotting the and its derivative to check tangency. For instance, for y = x^3 \sin(x) at x = \frac{\pi}{2}, the exact derivative is approximately 7.40, while a numerical with h = 0.001 yields about 7.40, confirming closeness; plotting shows the derivative curve matching the 's visually.

Antiderivatives

Definition of antiderivative

In , an of a f, denoted F, is a such that its derivative equals f, that is, F'(x) = f(x) for all x in the of f. This positions antiderivation as the inverse operation to . The general form of an antiderivative incorporates an arbitrary constant C, yielding F(x) = \int f(x) \, [dx](/page/DX) + C, where the indefinite notation \int f(x) \, [dx](/page/DX) represents the family of all such antiderivatives. This notation emphasizes that antiderivatives are unique only up to an additive constant; if F and G are two antiderivatives of f, then F(x) - G(x) = C for some constant C. For basic power functions, the antiderivative of f(x) = x^n where n \neq -1 is given by \int x^n \, dx = \frac{x^{n+1}}{n+1} + C. Differentiating this returns the original function x^n, illustrating how reverses the antiderivation process while discarding the constant.

The (FTC) establishes the profound connection between and definite , demonstrating that these two core operations in are inverses under appropriate conditions. It consists of two parts that together justify the use of antiderivatives to evaluate definite integrals and reveal the derivative of an accumulated integral. The first part, often called the differentiation under the integral sign theorem, states that if f is continuous on the closed interval [a, b], then the function defined by F(x) = \int_a^x f(t) \, dt is differentiable on (a, b) (and continuous on [a, b]) with derivative F'(x) = f(x) for all x \in (a, b). This result shows that the definite from a fixed lower limit to a variable upper limit yields an of f, interpreting as the accumulation of the rate of change given by f. A standard proof sketch for the first part relies on the for Integrals. Consider the for F'(x): F'(x) = \lim_{h \to 0} \frac{F(x+h) - F(x)}{h} = \lim_{h \to 0} \frac{1}{h} \int_x^{x+h} f(t) \, dt. Since f is on [x, x+h], the for Integrals guarantees a point c_h \in [x, x+h] such that \int_x^{x+h} f(t) \, dt = f(c_h) \cdot h, so the quotient simplifies to f(c_h). As h \to 0, of f implies c_h \to x and thus f(c_h) \to f(x), yielding F'(x) = f(x). The second part, known as the evaluation theorem, states that if f is continuous on [a, b] and F is any of f (so F'(x) = f(x) on [a, b]), then \int_a^b f(x) \, dx = F(b) - F(a). This allows definite integrals, representing net accumulation over [a, b], to be computed directly from the values of an at the endpoints, bypassing explicit or approximation. The assumption on f ensures Riemann integrability over [a, b] and the existence of the derivative F'(x) = f(x) for all x \in (a, b). Weaker conditions, such as f being Riemann integrable on [a, b] (e.g., bounded with discontinuities on a set of measure zero), yield versions where F is differentiable with F'(x) = f(x) on (a, b) . The FTC's implications extend to numerical methods, where it underpins algorithms for approximating integrals by estimating antiderivatives, and conceptually frames derivatives as instantaneous rates within the total change captured by integrals.

Higher-Order Derivatives

Second and higher derivatives

The second derivative of a function f, denoted f''(x), is obtained by differentiating the first derivative f'(x) with respect to x, providing insight into the function's or . For a twice-differentiable function, if f''(x) > 0, the graph is up at x (resembling a U-shape), indicating that the function is accelerating upward; conversely, if f''(x) < 0, it is down (resembling an inverted U), showing downward . This interpretation extends from the first derivative's role in to the second's role in rate of change of , a concept formalized in classical texts. Higher-order derivatives generalize this process: the n-th derivative f^{(n)}(x) is the result of differentiating f n times successively, capturing increasingly refined aspects of the function's behavior, such as jerk () in or higher moments in approximation theory. These derivatives exist if the function is sufficiently , typically in the C^n of n-times continuously differentiable functions. In physics, for s(t), the first derivative is v(t) = s'(t), the second is a(t) = s''(t), and higher ones describe changes in acceleration, essential for modeling oscillatory or . Applications include identifying points, where f''(x) changes sign, marking transitions from concave up to down, which signal potential changes in the function's monotonicity or growth rate. A key application of higher derivatives is in local approximations via , which expands a function around a point a as f(x) \approx f(a) + f'(a)(x - a) + \frac{f''(a)}{2!}(x - a)^2 + \cdots + \frac{f^{(n)}(a)}{n!}(x - a)^n + R_n(x), where R_n(x) is the remainder term, allowing precise estimation of function values near a using derivative information up to order n. This theorem, attributed to Brook Taylor in 1715, underpins series expansions and numerical methods in analysis. For example, consider f(x) = x^4. The first derivative is f'(x) = 4x^3, the second is f''(x) = 12x^2 (always non-negative, indicating global concavity up), and the third is f'''(x) = 24x, which changes sign at x=0.

Notation for higher derivatives

For the second derivative of a function f(x), the prime notation extends the first derivative symbol by adding a double prime, denoted as f''(x). This convention, part of Lagrange's notation, indicates successive differentiation with respect to the independent variable x. Similarly, the third derivative is written as f'''(x), with additional primes for each higher order. To denote the nth derivative in a more general form, the prime notation uses superscript parentheses, expressed as f^{(n)}(x). This avoids excessive primes for large n and clearly specifies the of . An alternative in some contexts is the notation D^n f(x), where D represents the applied n times. In , higher generalize the first derivative form \frac{dy}{dx} to \frac{d^n y}{dx^n} for the nth , emphasizing the ratio of changes raised to the power n. This is particularly useful when the function is expressed as y = f(x), as in \frac{d^2 y}{dx^2} for the second derivative. Evaluation of higher at a specific point a follows standard notation, such as f''(a) or f^{(n)}(a). In the context of differential equations, especially those involving time as the independent variable, Newton's dot notation is commonly employed for higher orders. The first derivative is \dot{y}, the second is \ddot{y}, and higher orders use multiple dots, such as \dddot{y} for the third. This contrasts with the prime notation y'' often used for the same second-order derivative in non-time-dependent equations. Additionally, the prime notation y'' is standard in ordinary differential equations to denote second derivatives without specifying the variable explicitly. For functions of several variables, higher-order partial derivatives use analogous notations but with partial symbols, such as \frac{\partial^n f}{\partial x^n} or f_{xx\dots x} (with n subscripts), distinguishing them from total derivatives; these are addressed in multivariable contexts.

Derivatives in Several Variables

Partial derivatives

In multivariable calculus, the partial derivative of a function f: \mathbb{R}^n \to \mathbb{R} with respect to one of its variables, say x_i, at a point (a_1, \dots, a_n) is defined as the limit \frac{\partial f}{\partial x_i}(a_1, \dots, a_n) = \lim_{h \to 0} \frac{f(a_1, \dots, a_i + h, \dots, a_n) - f(a_1, \dots, a_n)}{h}, provided the limit exists; this measures the rate of change of f in the direction of the x_i-axis while holding all other variables constant. The definition extends the single-variable derivative by fixing the other inputs, analogous to the ordinary limit-based derivative but applied to a univariate slice of the function. Common notation for the partial derivative of f with respect to x in a function of two variables f(x, y) includes the Leibniz symbol \frac{\partial f}{\partial x} or the subscript form f_x; subscripts are extended for higher dimensions, such as f_{x y} for a second-order mixed partial. To compute a partial derivative, treat all variables except the one of interest as constants and apply the rules of single-variable differentiation, such as the power rule or chain rule. For example, consider f(x, y) = x^2 y + \sin y; the partial with respect to x is \frac{\partial f}{\partial x} = 2 x y, obtained by differentiating x^2 y as if y were constant and treating \sin y as such, while the partial with respect to y is \frac{\partial f}{\partial y} = x^2 + \cos y, differentiating x^2 y using the with x^2 constant and \sin y directly. Higher-order partial derivatives are obtained by successive partial differentiation; for a second-order mixed partial, such as \frac{\partial^2 f}{\partial x \partial y}, first compute \frac{\partial f}{\partial y} and then take its partial with respect to x. Under suitable conditions, Clairaut's theorem states that if the mixed partial derivatives \frac{\partial^2 f}{\partial x \partial y} and \frac{\partial^2 f}{\partial y \partial x} both exist and are continuous in a neighborhood of a point, then they are equal at that point. This equality holds for most continuously differentiable functions encountered in applications.

Directional derivatives

The directional derivative of a scalar-valued f: \mathbb{R}^n \to \mathbb{R} at a point \mathbf{a} in the direction of a \mathbf{u} is defined as D_{\mathbf{u}} f(\mathbf{a}) = \lim_{h \to 0} \frac{f(\mathbf{a} + h \mathbf{u}) - f(\mathbf{a})}{h}, provided the exists. This measures the instantaneous rate of change of f at \mathbf{a} as one moves along the line through \mathbf{a} in the direction \mathbf{u}. Geometrically, the directional derivative D_{\mathbf{u}} f(\mathbf{a}) represents the slope of the tangent line to the curve obtained by restricting f to the line passing through \mathbf{a} in the direction \mathbf{u}. Partial derivatives are special cases of directional derivatives, corresponding to directions along the coordinate axes. If f is differentiable at \mathbf{a}, then the directional derivative exists in every direction \mathbf{u} and equals the dot product of the gradient vector \nabla f(\mathbf{a}) with \mathbf{u}. However, the existence of partial derivatives at \mathbf{a} does not ensure that directional derivatives exist in all directions; full differentiability of f at \mathbf{a} is required for directional derivatives to exist universally. For instance, consider f(x,y) = xy at the point (1,1) in the direction \mathbf{u} = \left( \frac{1}{\sqrt{2}}, \frac{1}{\sqrt{2}} \right). Substituting into the definition yields D_{\mathbf{u}} f(1,1) = \lim_{h \to 0} \frac{(1 + h/\sqrt{2})(1 + h/\sqrt{2}) - 1}{h} = \lim_{h \to 0} \left( \frac{2h}{\sqrt{2}} + \frac{h^2}{2} \right) / h = \sqrt{2}. This value indicates the rate of change of f along the line y = x at (1,1).

Total derivative

The total derivative provides the best to a multivariable near a given point, capturing how the function changes when all input variables vary simultaneously. For a f: \mathbb{R}^n \to \mathbb{R}^m defined on an containing a \in \mathbb{R}^n, the at a is the Df(a): \mathbb{R}^n \to \mathbb{R}^m satisfying Df(a)(h) = \lim_{t \to 0} \frac{f(a + t h) - f(a)}{t} for all h \in \mathbb{R}^n, provided the limit exists uniformly. Equivalently, f is differentiable at a if there exists such a linear map where \lim_{h \to 0} \frac{\|f(a + h) - f(a) - Df(a)(h)\|}{\|h\|} = 0, with the denoting the Euclidean on \mathbb{R}^n or \mathbb{R}^m. This is unique if it exists and implies that f is continuous at a. In coordinates, for a scalar-valued function f: \mathbb{R}^n \to \mathbb{R}, the manifests as the df(a) = \sum_{i=1}^n \frac{\partial f}{\partial x_i}(a) \, dx_i, where dx_i represent changes in the variables x_i, and the partial derivatives \partial f / \partial x_i are evaluated at a. The exists at a if all partial derivatives exist in some neighborhood of a and are continuous at a, in which case Df(a) is given by the whose components are these partials. For vector-valued functions, the definition extends componentwise, with each component of Df(a)(h) following the scalar case. As an illustrative example, consider f(x, y) = x^2 + y^2: \mathbb{R}^2 \to \mathbb{R}. The at (1, 1) is the Df(1,1): \mathbb{R}^2 \to \mathbb{R} such that Df(1,1)(h, k) = 2h + 2k, since the partials are \partial f / \partial x = 2x and \partial f / \partial y = 2y, yielding the total differential df = 2x \, dx + 2y \, dy at (1,1). This approximates the change: f(1 + h, 1 + k) \approx f(1,1) + 2h + 2k = 2 + 2h + 2k for small h, k. The of f at a in the direction of a u is then simply Df(a)(u), extracting a scalar of the total change. The total derivative in \mathbb{R}^n serves as the finite-dimensional instance of the , which generalizes differentiability to mappings between normed vector spaces via the same limit condition using general norms.

Jacobian matrix

The Jacobian matrix of a f: \mathbb{R}^n \to \mathbb{R}^m at a point a \in \mathbb{R}^n is the m \times n matrix whose entries are the partial derivatives of the component functions of f, specifically J_f(a)_{ij} = \frac{\partial f_i}{\partial x_j}(a) for i = 1, \dots, m and j = 1, \dots, n. This matrix provides the concrete matrix representation of the of f at a, which is the linear map Df(a): \mathbb{R}^n \to \mathbb{R}^m given by matrix-vector with J_f(a). When m = 1, so f: \mathbb{R}^n \to \mathbb{R} is a scalar-valued function, the Jacobian matrix J_f(a) reduces to a $1 \times n row consisting of the partial derivatives \left( \frac{\partial f}{\partial x_1}(a), \dots, \frac{\partial f}{\partial x_n}(a) \right), which is precisely the \nabla f(a). A key property of the Jacobian matrix is its behavior under composition of functions. If g: \mathbb{R}^k \to \mathbb{R}^n and f: \mathbb{R}^n \to \mathbb{R}^m are differentiable at points b \in \mathbb{R}^k and a = g(b) \in \mathbb{R}^n, respectively, then the chain rule states that the Jacobian of the composition f \circ g at b is the matrix product J_{f \circ g}(b) = J_f(a) \, J_g(b). For example, consider the function f: \mathbb{R}^2 \to \mathbb{R}^2 defined by f(x, y) = (xy, x + y). The matrix at a point (x, y) is J_f(x, y) = \begin{pmatrix} y & x \\ 1 & 1 \end{pmatrix}. This follows directly from computing the partial derivatives of each component. The matrix plays a central role in the for functions between spaces of the same . Specifically, if f: \mathbb{R}^n \to \mathbb{R}^n is continuously differentiable near a point a and \det J_f(a) \neq 0, then f is locally invertible near a, with the inverse also being continuously differentiable, and the Jacobian of the inverse at f(a) is the inverse matrix (J_f(a))^{-1}.

Vector-valued functions

A , often denoted as \mathbf{r}(t) = (x_1(t), x_2(t), \dots, x_n(t)), maps a scalar t to a point in \mathbb{R}^n. Its derivative is defined componentwise as \mathbf{r}'(t) = \lim_{h \to 0} \frac{\mathbf{r}(t+h) - \mathbf{r}(t)}{h} = (x_1'(t), x_2'(t), \dots, x_n'(t)), provided the exists for each component. This derivative represents the instantaneous of change of the and points in the direction of the to the traced by \mathbf{r}(t) at t. The \|\mathbf{r}'(t)\| gives the speed of the parametrization along the . The \mathbf{r}'(t) indicates the direction of motion at each point on the , and its \mathbf{T}(t) = \frac{\mathbf{r}'(t)}{\|\mathbf{r}'(t)\|} provides , which is useful for describing the without regard to speed. For parameterization, the parameter s is chosen such that the speed is constant and equal to 1, i.e., \|\mathbf{r}'(s)\| = 1, ensuring that increments in s correspond directly to distances traveled along the ; this is achieved by reparametrizing via the function s(t) = \int_a^t \|\mathbf{r}'(u)\| \, du. Such parametrizations simplify calculations in , like . A classic example is the parametrized by \mathbf{r}(t) = (\cos t, \sin t, t), where the derivative is \mathbf{r}'(t) = (-\sin t, \cos t, 1), with constant speed \|\mathbf{r}'(t)\| = \sqrt{2}. To obtain an parametrization, rescale the parameter by s = \sqrt{2} t, yielding \mathbf{r}(s/\sqrt{2}) = (\cos(s/\sqrt{2}), \sin(s/\sqrt{2}), s/\sqrt{2}), now with \|\mathbf{r}'(s)\| = 1. For compositions involving vector-valued functions, the multivariable applies: if \mathbf{f}(\mathbf{u}(t)) where \mathbf{u}(t) is vector-valued, the derivative is the matrix of \mathbf{f} evaluated at \mathbf{u}(t) multiplied by \mathbf{u}'(t).

Generalizations

Derivatives in normed spaces

In normed spaces, the classical notion of the derivative is extended to functions f: X \to Y, where X and Y are normed spaces over or numbers, and the domain of f is an open subset of X. The of f at a point a \in X is defined as a bounded L: X \to Y satisfying \lim_{h \to 0} \frac{\|f(a + h) - f(a) - L(h)\|_Y}{\|h\|_X} = 0. This condition ensures that L provides the best uniform to f in a neighborhood of a, generalizing the Taylor expansion to infinite-dimensional settings. The map L is if it exists and belongs to the space of bounded linear operators from X to Y. When X and Y are complete (i.e., s), the takes values in the \mathcal{L}(X, Y) of bounded linear operators equipped with the \|L\| = \sup_{\|h\|_X \leq 1} \|L(h)\|_Y. This framework allows for powerful results analogous to finite-dimensional , such as the chain rule and under appropriate conditions. The for functions between finite-dimensional normed spaces is a special case of this construction. A related but weaker concept is the Gâteaux derivative, introduced by Gâteaux, which at a point a assigns to each h \in X the directional L(h) = \lim_{t \to 0} \frac{f(a + th) - f(a)}{t}, provided the limit exists for all h. Unlike the , the Gâteaux derivative need not be uniform over directions and does not imply of f at a, though if f is Gâteaux differentiable in a neighborhood and the derivative is continuous, then f is Fréchet differentiable. A simple example occurs for bounded linear maps f: X \to Y between normed spaces, which are Fréchet differentiable everywhere with derivative equal to f itself, since f(a + h) - f(a) - f(h) = 0 for all a, h. In Hilbert spaces, the further specifies that every bounded linear functional (a special case where Y is the scalars) on a H is of the form L(h) = \langle h, g \rangle for some fixed g \in H, where \langle \cdot, \cdot \rangle is the inner product. This representation aids in explicitly computing derivatives for quadratic forms and other inner product-based functions.

Distributional derivatives

In the theory of distributions, introduced by , a distribution is defined as a continuous linear functional on the space of test functions \mathcal{D}(\Omega), consisting of infinitely differentiable functions with compact support in an open set \Omega \subseteq \mathbb{R}^n. This framework extends the notion of functions to generalized objects that can handle singularities and discontinuities, allowing differentiation even when classical derivatives do not exist. The distributional derivative of a distribution T is uniquely defined by the relation \langle T', \phi \rangle = -\langle T, \phi' \rangle for every test function \phi \in \mathcal{D}(\Omega), where \langle \cdot, \cdot \rangle denotes the action of the distribution on the test function. This definition satisfies the and in the distributional sense, and for sufficiently smooth functions f, the distributional derivative coincides with the classical derivative via , without boundary terms due to the compact support of \phi. Higher-order distributional derivatives are obtained by iterated application of this operator, preserving linearity and continuity. A prominent example is the Heaviside step function H(x), defined as H(x) = 0 for x < 0 and H(x) = 1 for x \geq 0, which is not classically differentiable at x = 0. Its distributional derivative is the Dirac delta distribution \delta, satisfying \langle H', \phi \rangle = -\int_0^\infty \phi'(x) \, dx = \phi(0) = \langle \delta, \phi \rangle for all test functions \phi. This illustrates how distributional derivatives capture impulsive behavior at discontinuities, with \delta acting as a "point mass" that integrates test functions to their value at the origin. Distributional derivatives underpin the definition of Sobolev spaces W^{k,p}(\Omega), which comprise functions u \in L^p(\Omega) such that all weak (or distributional) derivatives up to order k belong to L^p(\Omega), equipped with a norm incorporating these derivatives. The weak derivative D^\alpha u of order \alpha (with |\alpha| \leq k) satisfies \int_\Omega u \, D^\alpha \phi \, dx = (-1)^{|\alpha|} \int_\Omega (D^\alpha u) \phi \, dx for all \phi \in \mathcal{D}(\Omega), generalizing to functions lacking classical smoothness. These spaces enable the study of functions with controlled irregularity, such as those in H^k(\Omega) = W^{k,2}(\Omega), which form Hilbert spaces useful for variational formulations. In applications to partial differential equations (PDEs), distributional derivatives are essential for defining weak solutions where classical derivatives fail, such as in hyperbolic conservation laws exhibiting shock waves or discontinuities. For instance, the u_t + (u^2/2)_x = 0 admits solutions in the distributional sense, allowing and proofs via mollification and passage to limits, even across shocks where derivatives diverge. This approach facilitates the analysis of fundamental solutions and Green's functions for elliptic and hyperbolic PDEs, bridging generalized functions with physical phenomena like wave propagation.