Fact-checked by Grok 2 weeks ago

Subderivative

In , the subderivative generalizes the notion of the to functions that may not be differentiable at certain points, providing a set of slopes at a given point x that bound the function from below. For a function f: \mathbb{R}^n \to \mathbb{R}, a vector g \in \mathbb{R}^n is a subderivative (also called a subgradient) of f at x if it satisfies the f(z) \geq f(x) + g^T (z - x) for all z in the of f. The collection of all such subderivatives at x forms the subdifferential \partial f(x), which is a nonempty, closed, and when f is and x lies in the relative interior of the . When f is differentiable at x, the subdifferential reduces to the singleton set \{\nabla f(x)\}, aligning precisely with the standard . A classic example is the function f(x) = |x|, where the subdifferential at x = 0 is the [-1, 1], reflecting the range of possible supporting line slopes at the nondifferentiable kink. In , the subderivative plays a crucial role: a point x^* minimizes f if and only if $0 \in \partial f(x^*), enabling algorithms like subgradient descent to handle nonsmooth objectives. This framework extends classical to broader classes of functions, supporting applications in , , and variational analysis.

Fundamentals

Definition in One Dimension

In the one-dimensional case, consider a convex function f: I \to \mathbb{R}, where I is an open interval in \mathbb{R}. A real number c is a subderivative of f at a point x_0 \in I if it satisfies the inequality f(x) \geq f(x_0) + c(x - x_0) for all x \in I. The set of all such subderivatives at x_0, known as the subdifferential \partial f(x_0), forms a closed [d^-_f(x_0), d^+_f(x_0)], where d^-_f(x_0) = \liminf_{x \to x_0^-} \frac{f(x) - f(x_0)}{x - x_0} and d^+_f(x_0) = \liminf_{x \to x_0^+} \frac{f(x) - f(x_0)}{x - x_0}. For functions, these one-sided liminfs exist and are finite, ensuring the subdifferential is nonempty at every x_0 \in I. When f is differentiable at x_0, the subdifferential reduces to the \{f'(x_0)\}, as the left and right coincide with the standard . Geometrically, each subderivative c \in \partial f(x_0) corresponds to the of a supporting line to the graph of f at the point (x_0, f(x_0)), lying below the graph and touching it at that point.

Prerequisites: Convex Functions

A convex function f: I \to \mathbb{R}, where I \subseteq \mathbb{R} is a interval, satisfies the f(\lambda x + (1-\lambda)y) \leq \lambda f(x) + (1-\lambda) f(y) for all x, y \in I and all \lambda \in [0,1]. This condition captures the intuitive notion that the does not curve upwards more sharply than a straight line between any two points in its domain. Equivalent characterizations of convexity include the graph of f lying below the chords (secant lines) connecting any two points on it, meaning that for x < y in I, the slope of the secant satisfies \frac{f(y) - f(x)}{y - x} \geq \frac{f(z) - f(x)}{z - x} for all z \in (x, y). Another key formulation is Jensen's inequality, which extends the definition to convex combinations: for points x_1, \dots, x_n \in I and weights \lambda_1, \dots, \lambda_n \geq 0 with \sum \lambda_i = 1, f\left( \sum \lambda_i x_i \right) \leq \sum \lambda_i f(x_i). Convex functions exhibit several important regularity properties. On the interior of their domain (an open interval), they are continuous. Furthermore, they are locally Lipschitz continuous, meaning that around any interior point, there exists a neighborhood where |f(x) - f(y)| \leq L \|x - y\| for some constant L > 0. If the one-sided derivatives exist at a point, the right-hand derivative f'_+(x) = \lim_{h \to 0^+} \frac{f(x+h) - f(x)}{h} and left-hand derivative f'_-(x) = \lim_{h \to 0^-} \frac{f(x+h) - f(x)}{h} are nondecreasing functions of x. The structural of functions, particularly the convexity of their epigraph, the of supporting hyperplanes (in one , supporting lines) at every point in the relative interior of the , providing the foundational mechanism for subderivatives to characterize their behavior at nondifferentiable points.

Examples and Illustrations

Absolute Value Function

The function f(x) = |x|, defined on \mathbb{R}, serves as a fundamental example in , illustrating a function that is yet nondifferentiable at the . This nondifferentiability arises due to the sharp corner at x = 0, where the left-hand equals -1 and the right-hand equals 1, precluding a unique . In the context of subderivatives, the subdifferential generalizes the to capture this behavior through a set of supporting slopes. The subdifferential of f at a point x_0 is computed as follows: for x_0 > 0, \partial f(x_0) = \{1\}; for x_0 < 0, \partial f(x_0) = \{-1\}; and at x_0 = 0, \partial f(0) = [-1, 1]. These singletons away from zero align with the function's linear segments, where the slope is constant, while the interval at zero reflects the range of possible subgradients accommodating the kink. Verification of these computations relies on the defining inequality of the subdifferential: a number c belongs to \partial f(0) if |x| \geq c x for all x \in \mathbb{R}. For c \in [-1, 1], this holds, as the cases x \geq 0 (yielding c \leq 1) and x < 0 (yielding c \geq -1) bound c precisely within the interval, with equality achieved along the supporting lines y = c x at x = 0. Similarly, for x_0 > 0, c = 1 satisfies |x| \geq x_0 + 1 \cdot (x - x_0) globally, confirming the . Geometrically, the subdifferential at zero represents all possible slopes of lines that are to the graph of f or it from below at the , underscoring how subderivatives extend differentiability to nonsmooth points while preserving convexity. This interval nature highlights the multiplicity of linear underapproximators at nondifferentiable vertices, a key feature distinguishing subderivatives from classical derivatives.

Piecewise Linear Functions

Piecewise linear convex functions provide a natural class of examples for illustrating subderivatives beyond simple cases like the absolute value, as they feature multiple points of nondifferentiability where the function transitions between linear segments. In one dimension, a piecewise linear convex function is defined as a continuous function f: \mathbb{R} \to \mathbb{R} that is affine (linear plus constant) on each of a finite number of contiguous subintervals partitioning the domain, with the property that the slopes of these affine pieces are non-decreasing to ensure convexity. Such functions can equivalently be expressed as the pointwise maximum of a finite collection of affine functions, inheriting convexity from this representation. At points interior to a linear , the subderivative coincides with the usual , which is the constant of that , yielding a singleton subdifferential. However, at the kink points where two adjacent segments meet, the subdifferential forms a closed comprising all values between the left-hand (slope of the left ) and the right-hand ( of the right ). For instance, consider the f(x) = \max\{0, x - 1\}, which is zero for x \leq 1 ( 0) and x - 1 for x > 1 ( 1). At the x = 1, the left is 0 and the right is 1, so the subdifferential is \partial f(1) = [0, 1]. This captures all supporting lines at the that lie below the graph, consistent with the subderivative definition for functions. In general, for a piecewise linear convex function with ordered breakpoints x_1 < x_2 < \cdots < x_k and corresponding non-decreasing slopes m_1 < m_2 < \cdots < m_k on the intervals (-\infty, x_1], (x_1, x_2], ..., (x_k, \infty), the subderivative at an interior kink x_i is the interval [m_i, m_{i+1}], reflecting the range of possible slopes from the adjacent pieces. This computation highlights how subderivatives handle multiple nondifferentiable points by localizing the analysis to neighboring segments. Visually, the subderivatives at any point correspond to the slopes of the linear pieces immediately adjacent to it, providing a geometric interpretation of the supporting hyperplanes (lines in one dimension) that touch the function at that point without crossing above it.

Properties

Basic Properties

For a convex function f: \mathbb{R} \to \mathbb{R} that is finite at a point x_0, the subdifferential \partial f(x_0) is defined as the nonempty closed interval consisting of all subderivatives (subgradients) at x_0. This interval is nonempty because every proper convex function on \mathbb{R} admits at least one subgradient at any point in its domain. Furthermore, \partial f(x_0) is closed as the intersection of closed half-spaces supporting the epigraph of f at (x_0, f(x_0)). A key characterization links the subdifferential to classical differentiability: if \partial f(x_0) is a singleton set \{c\}, then f is differentiable at x_0 with derivative f'(x_0) = c. In one dimension, the converse also holds; differentiability at x_0 implies the subdifferential is the singleton \{f'(x_0)\}. This equivalence highlights how subderivatives generalize derivatives for nondifferentiable convex functions. The subdifferential provides a necessary and sufficient condition for global minima in convex optimization. Specifically, x_0 is a global minimizer of f if and only if $0 \in \partial f(x_0). For instance, in the absolute value function f(x) = |x|, the subdifferential at the minimum x_0 = 0 is [-1, 1], which contains 0. Subdifferentials exhibit additivity under summation of convex functions. For convex functions f and g both finite at x_0, the subdifferential of their sum satisfies \partial (f + g)(x_0) = \partial f(x_0) + \partial g(x_0), where the sum of intervals is the set of all pairwise sums. This property, which holds without additional qualification conditions in one dimension due to the structure of intervals, facilitates analysis of composite convex models.

Monotonicity and Boundedness

In one dimension, the subdifferential of a convex function f: I \to \mathbb{R}, where I is an interval, takes the form of a closed interval \partial f(x) = [f'_-(x), f'_+(x)], with the left and right derivatives f'_-(x) and f'_+(x) being nondecreasing functions.Rockafellar 1970 This monotonicity implies that for x_1 < x_2, \sup \partial f(x_1) = f'_+(x_1) \leq f'_-(x_2) = \inf \partial f(x_2), ensuring the subdifferential intervals are ordered in a nondecreasing manner along the domain.Rockafellar 1970 Such properties highlight how subderivatives preserve the order-preserving behavior inherent to convex functions, distinguishing them from derivatives of nonconvex functions that may not exhibit this consistency. For locally Lipschitz continuous convex functions, the subdifferential is locally bounded, meaning that on any compact subset of the interior of the domain, there exists a constant M > 0 such that \|g\| \leq M for all g \in \partial f(x) and x in that subset.Rockafellar 1970 This boundedness directly relates to the constant L of f, as any subgradient g \in \partial f(x) satisfies \|g\| \leq L, where L bounds the rate of change of f locally.Rockafellar and Wets 1998 Consequently, the subderivatives provide a quantitative measure of the function's , with the supremum of subgradient norms over a set equaling the minimal constant for f on that set.Rockafellar and Wets 1998 Under certain conditions, such as when f is twice differentiable or satisfies additional regularity like uniform , the subdifferential mapping x \mapsto \partial f(x) exhibits Hausdorff continuity, meaning small perturbations in x lead to small Hausdorff distances between the corresponding subdifferentials.Thibault 2010 More generally, for proper functions, the mapping is upper Hausdorff semicontinuous everywhere in the and lower semicontinuous at points of differentiability, ensuring stability in variational problems.Rockafellar 1970

Multivariable Generalization

Subgradients in Higher Dimensions

In the multivariable setting, the concept of a subgradient extends naturally to functions f: \mathbb{R}^n \to \mathbb{R}. A v \in \mathbb{R}^n is a subgradient of f at a point x_0 \in \dom f if it satisfies the f(x) \geq f(x_0) + v \cdot (x - x_0) for all x \in \dom f. This generalizes the one-dimensional subderivative, where the reduces to . Geometrically, a subgradient v at x_0 corresponds to the normal vector of a to the epigraph of f at the point (x_0, f(x_0)). The epigraph, defined as \epi f = \{ (x, \alpha) \in \mathbb{R}^n \times \mathbb{R} \mid \alpha \geq f(x) \}, is a , and the hyperplane with equation \alpha = f(x_0) + v \cdot (x - x_0) lies below the epigraph while touching it at that point. This interpretation underscores the role of subgradients in separating the graph of f from points above it. Subgradients also relate to directional derivatives for convex functions. The directional derivative of f at x_0 in direction d \in \mathbb{R}^n with \|d\| = 1 is given by f'(x_0; d) = \lim_{t \to 0^+} \frac{f(x_0 + t d) - f(x_0)}{t}, which exists and is finite for convex f. Any subgradient v satisfies v \cdot d \leq f'(x_0; d), providing a lower bound on the rate of change in that direction. For convex functions, subgradients are guaranteed to exist at relative interior points of the domain. Specifically, if x_0 lies in the relative interior of \dom f, then there exists at least one subgradient at x_0, as ensured by the supporting hyperplane theorem applied to the epigraph. This existence holds even when f is not differentiable at x_0, distinguishing subgradients from classical gradients.

The Subdifferential Set

In the multivariable setting, the subdifferential of a f: \mathbb{R}^n \to \mathbb{R} at a point x_0 is the set \partial f(x_0) = \{ v \in \mathbb{R}^n \mid f(x) \geq f(x_0) + v \cdot (x - x_0) \ \forall x \in \dom f \}. This set collects all subgradients v that satisfy the subgradient inequality at x_0, generalizing the one-dimensional case to higher dimensions where multiple subgradients may exist simultaneously. For proper functions, \partial f(x_0) is , closed, nonempty when x_0 lies in the relative interior of \dom f, and compact under these conditions in finite dimensions. A key characterization of membership in the subdifferential links it to the : v \in \partial f(x_0) if and only if the directional derivative satisfies f'(x_0; d) \geq v \cdot d for d \in \mathbb{R}^n. Equivalently, the directional derivative can be expressed as the of the subdifferential, f'(x_0; d) = \sup \{ v \cdot d \mid v \in \partial f(x_0) \}, highlighting how the subdifferential encodes the first-order behavior of f in all directions. This property underscores the subdifferential's role as a that bounds the rates of change of f. When f is differentiable at x_0, the subdifferential reduces to a singleton set containing the : \partial f(x_0) = \{ \nabla f(x_0) \}. In this case, the compact collapses to a single point, recovering the classical as the unique subgradient. The subdifferential exhibits additivity under of functions: for f and g, \partial (f + g)(x_0) = \partial f(x_0) + \partial g(x_0), where the right-hand side denotes the Minkowski sum of the sets. This rule facilitates the analysis of composite functions by decomposing their subdifferentials into sums of simpler components, preserving convexity and compactness in the result.

Applications

In Convex Optimization

In convex optimization, the subdifferential plays a central role in characterizing optimality for nondifferentiable convex functions. For a convex function f: \mathbb{R}^n \to \mathbb{R}, a point x^* minimizes f if and only if $0 \in \partial f(x^*), where \partial f(x^*) is the subdifferential at x^*. This condition generalizes the first-order necessary and sufficient optimality criterion from smooth convex functions, where the gradient vanishes at the minimizer, and extends it to cases where f may not be differentiable, such as piecewise linear or norm-based objectives. Subgradient methods provide iterative algorithms for minimizing such nondifferentiable functions by leveraging elements from the subdifferential. At each k, a subgradient g_k \in \partial f(x_k) is selected, and the update proceeds as x_{k+1} = x_k - \alpha_k g_k, where \alpha_k > 0 is a step size; the subgradient inequality gives f(x_{k+1}) \geq f(x_k) + g_k^\top (x_{k+1} - x_k), which with the update direction implies f(x_{k+1}) \geq f(x_k) - \alpha_k \|g_k\|^2, providing a lower bound that facilitates analysis while accounting for nondifferentiability. These methods are particularly useful for large-scale problems where computing the full subdifferential is infeasible, as any single subgradient suffices for progress. Under appropriate step size rules, subgradient descent converges to the minimum for convex functions. Specifically, if the step sizes \alpha_k satisfy \sum_{k=1}^\infty \alpha_k = \infty and \sum_{k=1}^\infty \alpha_k^2 < \infty (diminishing step sizes), the of the iterates \bar{x}_K = \frac{1}{K} \sum_{k=1}^K x_k satisfies f(\bar{x}_K) - f^* = O(1/\sqrt{K}), where f^* is the optimal value, establishing sublinear to the global minimum. Constant step sizes yield similar rates over a fixed horizon, making the robust for bounded subgradients. A representative application arises in minimizing the \ell_1-norm \|Ax - b\|_1 = \sum_{i=1}^m |(Ax - b)_i| for an Ax = b, which promotes sparse solutions in and . Here, the subdifferential of each term at zero includes the [-1, 1], allowing subgradient methods to handle the nondifferentiability at points where residuals vanish; for instance, a subgradient can be formed as A^\top \operatorname{sign}(Ax - b) when no components are zero, enabling iterative descent toward sparse minimizers.

In Nonsmooth Analysis

In nonsmooth analysis, subderivatives, or subgradients, extend classical rules to nondifferentiable , enabling the study of compositions and other operations without smoothness assumptions. A key result is the chain rule for the subdifferential of a composite f \circ g, which states that under suitable qualification conditions, such as the inner g(x) lying in the relative interior of the of f, the \partial (f \circ g)(x) \subseteq \partial f(g(x)) \cdot \partial g(x) holds, where \cdot denotes the set product. This rule, applicable to convex functions and more general settings via limiting or approximate subdifferentials, allows of nonsmooth mappings while accounting for potential set-valuedness. Subderivatives also furnish first-order approximations for nonsmooth functions, particularly in the of variational , where the subgradient f(z) \geq f(x) + \langle g, z - x \rangle for g \in \partial f(x) models the linear underestimator of the function's behavior. In this context, such approximations characterize solutions to variational involving set-valued operators derived from subdifferentials, providing necessary and sufficient conditions for points in nonsmooth systems. This approach is fundamental for analyzing inclusions and generalized equations without requiring differentiability. Subderivatives connect to smoothing techniques via proximal operators and the Moreau envelope, which regularizes a nonsmooth function f as e_\lambda f(x) = \inf_y \left\{ f(y) + \frac{1}{2\lambda} \|y - x\|^2 \right\}, yielding a differentiable surrogate whose gradient relates to the resolvent of the subdifferential: \nabla e_\lambda f(x) = \frac{x - \prox_{\lambda f}(x)}{\lambda}, with \prox_{\lambda f}(x) = \argmin_y \left\{ f(y) + \frac{1}{2\lambda} \|y - x\|^2 \right\}. This linkage facilitates the computation of approximate subgradients through optimization of the proximal mapping, bridging exact nonsmooth calculus with smooth approximations for theoretical and numerical purposes. In applications, subderivatives model discontinuities in , such as in hybrid systems or with nonsmooth dynamics, where the generalized uses Clarke or limiting subdifferentials to derive necessary conditions for trajectories involving jumps or switches. Similarly, in , subderivatives capture kinked preferences in functions, representing non-differentiable indifference curves at thresholds like levels, enabling variational of and under incomplete differentiability.

Historical Development

Origins in Convex Analysis

The concept of subderivatives, more commonly known as subgradients in the context of , traces its early roots to the work of Werner Fenchel in the , where he laid foundational ideas through the development of conjugate functions and the of supporting hyperplanes for sets. Fenchel's 1949 paper introduced the transformation, which establishes a duality between a and its conjugate, providing a framework for understanding the supporting properties of epigraphs via hyperplanes. This duality implicitly underpins the notion of subgradients, as the points where the primal and conjugate functions achieve equality correspond to supporting hyperplanes, a connection later formalized in subdifferential theory. Fenchel's contributions, built on earlier geometric studies of bodies, emphasized the role of these hyperplanes in characterizing without relying on differentiability. The formal introduction of subgradients for convex functions occurred in the early 1960s, primarily through the independent efforts of Jean-Jacques Moreau and , who defined the subdifferential as the set of vectors providing linear underestimators to the function at a given point. Moreau's 1962 paper introduced dual convex functions and proximal points in Hilbert spaces, providing insights into the geometry of convex functionals and optimality conditions in variational problems, which laid important groundwork for later developments in subdifferential theory. Rockafellar extended this in his seminal 1970 monograph , where he rigorously defined subgradients and the subdifferential operator for proper convex functions on Banach spaces, establishing their role as a of gradients for nondifferentiable cases. These definitions enabled the treatment of problems where classical derivatives fail, such as those involving indicator functions or norms. This development was motivated by the growing need in during the mid-20th century to solve nondifferentiable programming problems arising in , flows, and economic modeling, where traditional gradient-based methods were inadequate. In the 1960s, as expanded beyond to nonlinear models, researchers sought tools to handle linear or polyhedral objective functions common in practical applications like transportation and . Subgradients provided a way to compute descent directions and certify optimality in these settings, bridging theoretical with computational needs in the field. Rockafellar's stands as the key publication that solidified subdifferential calculus for convex functions, offering a comprehensive treatment of chain rules, sum rules, and conjugate-based representations that form the basis for subsequent optimization algorithms. The book synthesized Moreau's proximal insights with Fenchel's duality, proving essential properties like monotonicity and maximal cyclicity of the subdifferential mapping, which ensure its utility in proving convergence of subgradient methods. This work not only formalized the calculus but also highlighted its applicability to finite-dimensional convex programs prevalent in .

Key Extensions and Contributors

In the 1970s and 1980s, Frank H. Clarke extended the concept of subderivatives beyond functions to locally continuous functions, introducing the generalized as the of limits of gradients from nearby differentiable points. This framework, detailed in Clarke's seminal 1975 paper, enabled the analysis of nonsmooth functions without requiring convexity, providing tools for optimization and applications. Building on this, advanced variational analysis in the 1980s, incorporating subderivatives into broader structures like coderivatives, which dualize derivatives for set-valued mappings and support stability analysis in optimization. His work, culminating in the 1998 book Variational Analysis co-authored with Roger J.-B. Wets, unified subdifferential calculus with geometric concepts, influencing nonsmooth optimization techniques. More recent contributions include those of A. D. Ioffe and Lionel Thibault, who in the and refined tangent and normal cones for nonsmooth sets, extending subderivative-based calculus to handle metric regularity and constraint qualifications without smoothness assumptions. Ioffe's 1981 paper on nonsmooth and differential calculus of nondifferentiable mappings, along with Thibault's papers on subdifferentials of set-valued maps, established foundational results for these cones, enabling robust error bounds in variational problems. These extensions facilitated the integration of subderivatives into by the 2000s, particularly for optimizing nonsmooth losses like the in support vector machines (SVMs). Algorithms such as Pegasos, introduced in 2007, leverage stochastic subgradient descent to scale SVM training efficiently on large datasets.