In mathematics, the subderivative generalizes the notion of the derivative to convex functions that may not be differentiable at certain points, providing a set of supporting hyperplane slopes at a given point x that bound the function from below.[1] For a convex function f: \mathbb{R}^n \to \mathbb{R}, a vector g \in \mathbb{R}^n is a subderivative (also called a subgradient) of f at x if it satisfies the inequality f(z) \geq f(x) + g^T (z - x) for all z in the domain of f.[2] The collection of all such subderivatives at x forms the subdifferential \partial f(x), which is a nonempty, closed, and convex set when f is convex and x lies in the relative interior of the domain.[3]When f is differentiable at x, the subdifferential reduces to the singleton set \{\nabla f(x)\}, aligning precisely with the standard gradient.[2] A classic example is the absolute value function f(x) = |x|, where the subdifferential at x = 0 is the interval [-1, 1], reflecting the range of possible supporting line slopes at the nondifferentiable kink.[1] In convex optimization, the subderivative plays a crucial role: a point x^* minimizes f if and only if $0 \in \partial f(x^*), enabling algorithms like subgradient descent to handle nonsmooth objectives.[4] This framework extends classical calculus to broader classes of functions, supporting applications in machine learning, operations research, and variational analysis.[2]
Fundamentals
Definition in One Dimension
In the one-dimensional case, consider a convex function f: I \to \mathbb{R}, where I is an open interval in \mathbb{R}. A real number c is a subderivative of f at a point x_0 \in I if it satisfies the inequalityf(x) \geq f(x_0) + c(x - x_0)for all x \in I.[5]The set of all such subderivatives at x_0, known as the subdifferential \partial f(x_0), forms a closed interval [d^-_f(x_0), d^+_f(x_0)], whered^-_f(x_0) = \liminf_{x \to x_0^-} \frac{f(x) - f(x_0)}{x - x_0}andd^+_f(x_0) = \liminf_{x \to x_0^+} \frac{f(x) - f(x_0)}{x - x_0}.For convex functions, these one-sided liminfs exist and are finite, ensuring the subdifferential is nonempty at every x_0 \in I.[5]When f is differentiable at x_0, the subdifferential reduces to the singleton \{f'(x_0)\}, as the left and right derivatives coincide with the standard derivative.[5]Geometrically, each subderivative c \in \partial f(x_0) corresponds to the slope of a supporting line to the graph of f at the point (x_0, f(x_0)), lying below the graph and touching it at that point.[5]
Prerequisites: Convex Functions
A convex function f: I \to \mathbb{R}, where I \subseteq \mathbb{R} is a convex interval, satisfies the inequality f(\lambda x + (1-\lambda)y) \leq \lambda f(x) + (1-\lambda) f(y) for all x, y \in I and all \lambda \in [0,1].[6] This condition captures the intuitive notion that the function does not curve upwards more sharply than a straight line between any two points in its domain.[7]Equivalent characterizations of convexity include the graph of f lying below the chords (secant lines) connecting any two points on it, meaning that for x < y in I, the slope of the secant satisfies \frac{f(y) - f(x)}{y - x} \geq \frac{f(z) - f(x)}{z - x} for all z \in (x, y).[7] Another key formulation is Jensen's inequality, which extends the definition to convex combinations: for points x_1, \dots, x_n \in I and weights \lambda_1, \dots, \lambda_n \geq 0 with \sum \lambda_i = 1, f\left( \sum \lambda_i x_i \right) \leq \sum \lambda_i f(x_i).[8]Convex functions exhibit several important regularity properties. On the interior of their domain (an open interval), they are continuous.[7] Furthermore, they are locally Lipschitz continuous, meaning that around any interior point, there exists a neighborhood where |f(x) - f(y)| \leq L \|x - y\| for some constant L > 0.[9] If the one-sided derivatives exist at a point, the right-hand derivative f'_+(x) = \lim_{h \to 0^+} \frac{f(x+h) - f(x)}{h} and left-hand derivative f'_-(x) = \lim_{h \to 0^-} \frac{f(x+h) - f(x)}{h} are nondecreasing functions of x.[7]The structural properties of convex functions, particularly the convexity of their epigraph, guarantee the existence of supporting hyperplanes (in one dimension, supporting lines) at every point in the relative interior of the domain, providing the foundational mechanism for subderivatives to characterize their behavior at nondifferentiable points.
Examples and Illustrations
Absolute Value Function
The absolute value function f(x) = |x|, defined on \mathbb{R}, serves as a fundamental example in convex analysis, illustrating a function that is convex yet nondifferentiable at the origin.[10] This nondifferentiability arises due to the sharp corner at x = 0, where the left-hand derivative equals -1 and the right-hand derivative equals 1, precluding a unique derivative.[2] In the context of subderivatives, the subdifferential generalizes the derivative to capture this behavior through a set of supporting slopes.[11]The subdifferential of f at a point x_0 is computed as follows: for x_0 > 0, \partial f(x_0) = \{1\}; for x_0 < 0, \partial f(x_0) = \{-1\}; and at x_0 = 0, \partial f(0) = [-1, 1].[10][2] These singletons away from zero align with the function's linear segments, where the slope is constant, while the interval at zero reflects the range of possible subgradients accommodating the kink.[11]Verification of these computations relies on the defining inequality of the subdifferential: a number c belongs to \partial f(0) if |x| \geq c x for all x \in \mathbb{R}. For c \in [-1, 1], this holds, as the cases x \geq 0 (yielding c \leq 1) and x < 0 (yielding c \geq -1) bound c precisely within the interval, with equality achieved along the supporting lines y = c x at x = 0.[10] Similarly, for x_0 > 0, c = 1 satisfies |x| \geq x_0 + 1 \cdot (x - x_0) globally, confirming the singleton.[2]Geometrically, the subdifferential at zero represents all possible slopes of lines that are tangent to the graph of f or support it from below at the origin, underscoring how subderivatives extend differentiability to nonsmooth points while preserving convexity.[11] This interval nature highlights the multiplicity of linear underapproximators at nondifferentiable vertices, a key feature distinguishing subderivatives from classical derivatives.[10]
Piecewise Linear Functions
Piecewise linear convex functions provide a natural class of examples for illustrating subderivatives beyond simple cases like the absolute value, as they feature multiple points of nondifferentiability where the function transitions between linear segments. In one dimension, a piecewise linear convex function is defined as a continuous function f: \mathbb{R} \to \mathbb{R} that is affine (linear plus constant) on each of a finite number of contiguous subintervals partitioning the domain, with the property that the slopes of these affine pieces are non-decreasing to ensure convexity.[12] Such functions can equivalently be expressed as the pointwise maximum of a finite collection of affine functions, inheriting convexity from this representation.[13]At points interior to a linear segment, the subderivative coincides with the usual derivative, which is the constant slope of that segment, yielding a singleton subdifferential. However, at the kink points where two adjacent segments meet, the subdifferential forms a closed interval comprising all values between the left-hand derivative (slope of the left segment) and the right-hand derivative (slope of the right segment). For instance, consider the function f(x) = \max\{0, x - 1\}, which is zero for x \leq 1 (slope 0) and x - 1 for x > 1 (slope 1). At the kink x = 1, the left derivative is 0 and the right derivative is 1, so the subdifferential is \partial f(1) = [0, 1].[13] This interval captures all supporting lines at the kink that lie below the graph, consistent with the subderivative definition for convex functions.In general, for a piecewise linear convex function with ordered breakpoints x_1 < x_2 < \cdots < x_k and corresponding non-decreasing slopes m_1 < m_2 < \cdots < m_k on the intervals (-\infty, x_1], (x_1, x_2], ..., (x_k, \infty), the subderivative at an interior kink x_i is the interval [m_i, m_{i+1}], reflecting the range of possible slopes from the adjacent pieces. This computation highlights how subderivatives handle multiple nondifferentiable points by localizing the analysis to neighboring segments. Visually, the subderivatives at any point correspond to the slopes of the linear pieces immediately adjacent to it, providing a geometric interpretation of the supporting hyperplanes (lines in one dimension) that touch the function at that point without crossing above it.[12]
Properties
Basic Properties
For a convex function f: \mathbb{R} \to \mathbb{R} that is finite at a point x_0, the subdifferential \partial f(x_0) is defined as the nonempty closed interval consisting of all subderivatives (subgradients) at x_0. This interval is nonempty because every proper convex function on \mathbb{R} admits at least one subgradient at any point in its domain. Furthermore, \partial f(x_0) is closed as the intersection of closed half-spaces supporting the epigraph of f at (x_0, f(x_0)).A key characterization links the subdifferential to classical differentiability: if \partial f(x_0) is a singleton set \{c\}, then f is differentiable at x_0 with derivative f'(x_0) = c. In one dimension, the converse also holds; differentiability at x_0 implies the subdifferential is the singleton \{f'(x_0)\}. This equivalence highlights how subderivatives generalize derivatives for nondifferentiable convex functions.The subdifferential provides a necessary and sufficient condition for global minima in convex optimization. Specifically, x_0 is a global minimizer of f if and only if $0 \in \partial f(x_0). For instance, in the absolute value function f(x) = |x|, the subdifferential at the minimum x_0 = 0 is [-1, 1], which contains 0.Subdifferentials exhibit additivity under summation of convex functions. For convex functions f and g both finite at x_0, the subdifferential of their sum satisfies \partial (f + g)(x_0) = \partial f(x_0) + \partial g(x_0), where the sum of intervals is the set of all pairwise sums. This property, which holds without additional qualification conditions in one dimension due to the structure of intervals, facilitates analysis of composite convex models.
Monotonicity and Boundedness
In one dimension, the subdifferential of a convex function f: I \to \mathbb{R}, where I is an interval, takes the form of a closed interval \partial f(x) = [f'_-(x), f'_+(x)], with the left and right derivatives f'_-(x) and f'_+(x) being nondecreasing functions.Rockafellar 1970 This monotonicity implies that for x_1 < x_2, \sup \partial f(x_1) = f'_+(x_1) \leq f'_-(x_2) = \inf \partial f(x_2), ensuring the subdifferential intervals are ordered in a nondecreasing manner along the domain.Rockafellar 1970 Such properties highlight how subderivatives preserve the order-preserving behavior inherent to convex functions, distinguishing them from derivatives of nonconvex functions that may not exhibit this consistency.For locally Lipschitz continuous convex functions, the subdifferential is locally bounded, meaning that on any compact subset of the interior of the domain, there exists a constant M > 0 such that \|g\| \leq M for all g \in \partial f(x) and x in that subset.Rockafellar 1970 This boundedness directly relates to the Lipschitz constant L of f, as any subgradient g \in \partial f(x) satisfies \|g\| \leq L, where L bounds the rate of change of f locally.Rockafellar and Wets 1998 Consequently, the subderivatives provide a quantitative measure of the function's smoothness, with the supremum of subgradient norms over a set equaling the minimal Lipschitz constant for f on that set.Rockafellar and Wets 1998Under certain conditions, such as when f is twice differentiable or satisfies additional regularity like uniform convexity, the subdifferential mapping x \mapsto \partial f(x) exhibits Hausdorff continuity, meaning small perturbations in x lead to small Hausdorff distances between the corresponding subdifferentials.Thibault 2010 More generally, for proper convex functions, the mapping is upper Hausdorff semicontinuous everywhere in the domain and lower semicontinuous at points of differentiability, ensuring stability in variational problems.Rockafellar 1970
Multivariable Generalization
Subgradients in Higher Dimensions
In the multivariable setting, the concept of a subgradient extends naturally to convex functions f: \mathbb{R}^n \to \mathbb{R}. A vector v \in \mathbb{R}^n is a subgradient of f at a point x_0 \in \dom f if it satisfies the inequalityf(x) \geq f(x_0) + v \cdot (x - x_0)for all x \in \dom f. This generalizes the one-dimensional subderivative, where the dot product reduces to scalar multiplication.Geometrically, a subgradient v at x_0 corresponds to the normal vector of a supporting hyperplane to the epigraph of f at the point (x_0, f(x_0)). The epigraph, defined as \epi f = \{ (x, \alpha) \in \mathbb{R}^n \times \mathbb{R} \mid \alpha \geq f(x) \}, is a convex set, and the hyperplane with equation \alpha = f(x_0) + v \cdot (x - x_0) lies below the epigraph while touching it at that point. This supporting hyperplane interpretation underscores the role of subgradients in separating the graph of f from points above it.Subgradients also relate to directional derivatives for convex functions. The directional derivative of f at x_0 in direction d \in \mathbb{R}^n with \|d\| = 1 is given by f'(x_0; d) = \lim_{t \to 0^+} \frac{f(x_0 + t d) - f(x_0)}{t}, which exists and is finite for convex f. Any subgradient v satisfies v \cdot d \leq f'(x_0; d), providing a lower bound on the rate of change in that direction.For convex functions, subgradients are guaranteed to exist at relative interior points of the domain. Specifically, if x_0 lies in the relative interior of \dom f, then there exists at least one subgradient at x_0, as ensured by the supporting hyperplane theorem applied to the epigraph. This existence holds even when f is not differentiable at x_0, distinguishing subgradients from classical gradients.
The Subdifferential Set
In the multivariable setting, the subdifferential of a convex function f: \mathbb{R}^n \to \mathbb{R} at a point x_0 is the set \partial f(x_0) = \{ v \in \mathbb{R}^n \mid f(x) \geq f(x_0) + v \cdot (x - x_0) \ \forall x \in \dom f \}.[14] This set collects all subgradients v that satisfy the subgradient inequality at x_0, generalizing the one-dimensional case to higher dimensions where multiple subgradients may exist simultaneously.[14] For proper convex functions, \partial f(x_0) is convex, closed, nonempty when x_0 lies in the relative interior of \dom f, and compact under these conditions in finite dimensions.[14]A key characterization of membership in the subdifferential links it to the directional derivative: v \in \partial f(x_0) if and only if the directional derivative satisfies f'(x_0; d) \geq v \cdot d for all directions d \in \mathbb{R}^n.[14] Equivalently, the directional derivative can be expressed as the support function of the subdifferential, f'(x_0; d) = \sup \{ v \cdot d \mid v \in \partial f(x_0) \}, highlighting how the subdifferential encodes the first-order behavior of f in all directions.[14] This property underscores the subdifferential's role as a convex set that bounds the rates of change of f.When f is differentiable at x_0, the subdifferential reduces to a singleton set containing the gradient: \partial f(x_0) = \{ \nabla f(x_0) \}.[14] In this case, the compact convex set collapses to a single point, recovering the classical derivative as the unique subgradient.The subdifferential exhibits additivity under summation of convex functions: for convex f and g, \partial (f + g)(x_0) = \partial f(x_0) + \partial g(x_0), where the right-hand side denotes the Minkowski sum of the sets.[14] This rule facilitates the analysis of composite convex functions by decomposing their subdifferentials into sums of simpler components, preserving convexity and compactness in the result.[14]
Applications
In Convex Optimization
In convex optimization, the subdifferential plays a central role in characterizing optimality for nondifferentiable convex functions. For a convex function f: \mathbb{R}^n \to \mathbb{R}, a point x^* minimizes f if and only if $0 \in \partial f(x^*), where \partial f(x^*) is the subdifferential at x^*. This condition generalizes the first-order necessary and sufficient optimality criterion from smooth convex functions, where the gradient vanishes at the minimizer, and extends it to cases where f may not be differentiable, such as piecewise linear or norm-based objectives.Subgradient methods provide iterative algorithms for minimizing such nondifferentiable convex functions by leveraging elements from the subdifferential. At each iteration k, a subgradient g_k \in \partial f(x_k) is selected, and the update proceeds as x_{k+1} = x_k - \alpha_k g_k, where \alpha_k > 0 is a step size; the subgradient inequality gives f(x_{k+1}) \geq f(x_k) + g_k^\top (x_{k+1} - x_k), which with the update direction implies f(x_{k+1}) \geq f(x_k) - \alpha_k \|g_k\|^2, providing a lower bound that facilitates convergence analysis while accounting for nondifferentiability.[15] These methods are particularly useful for large-scale problems where computing the full subdifferential is infeasible, as any single subgradient suffices for progress.[15]Under appropriate step size rules, subgradient descent converges to the minimum for convex functions. Specifically, if the step sizes \alpha_k satisfy \sum_{k=1}^\infty \alpha_k = \infty and \sum_{k=1}^\infty \alpha_k^2 < \infty (diminishing step sizes), the average of the iterates \bar{x}_K = \frac{1}{K} \sum_{k=1}^K x_k satisfies f(\bar{x}_K) - f^* = O(1/\sqrt{K}), where f^* is the optimal value, establishing sublinear convergence to the global minimum.[15] Constant step sizes yield similar rates over a fixed horizon, making the method robust for bounded subgradients.[15]A representative application arises in minimizing the \ell_1-norm residual \|Ax - b\|_1 = \sum_{i=1}^m |(Ax - b)_i| for an underdetermined system Ax = b, which promotes sparse solutions in compressed sensing and signal processing. Here, the subdifferential of each absolute value term at zero includes the interval [-1, 1], allowing subgradient methods to handle the nondifferentiability at points where residuals vanish; for instance, a subgradient can be formed as A^\top \operatorname{sign}(Ax - b) when no components are zero, enabling iterative descent toward sparse minimizers.
In Nonsmooth Analysis
In nonsmooth analysis, subderivatives, or subgradients, extend classical calculus rules to nondifferentiable functions, enabling the study of compositions and other operations without smoothness assumptions. A key result is the chain rule for the subdifferential of a composite function f \circ g, which states that under suitable qualification conditions, such as the inner function g(x) lying in the relative interior of the domain of f, the inclusion \partial (f \circ g)(x) \subseteq \partial f(g(x)) \cdot \partial g(x) holds, where \cdot denotes the set product. This rule, applicable to convex functions and more general settings via limiting or approximate subdifferentials, allows differentiation of nonsmooth mappings while accounting for potential set-valuedness.[10]Subderivatives also furnish first-order approximations for nonsmooth functions, particularly in the formulation of variational inequalities, where the subgradient inequality f(z) \geq f(x) + \langle g, z - x \rangle for g \in \partial f(x) models the linear underestimator of the function's behavior. In this context, such approximations characterize solutions to variational inequalities involving set-valued operators derived from subdifferentials, providing necessary and sufficient conditions for equilibrium points in nonsmooth systems. This approach is fundamental for analyzing monotone inclusions and generalized equations without requiring differentiability.[16]Subderivatives connect to smoothing techniques via proximal operators and the Moreau envelope, which regularizes a nonsmooth function f as e_\lambda f(x) = \inf_y \left\{ f(y) + \frac{1}{2\lambda} \|y - x\|^2 \right\}, yielding a differentiable surrogate whose gradient relates to the resolvent of the subdifferential: \nabla e_\lambda f(x) = \frac{x - \prox_{\lambda f}(x)}{\lambda}, with \prox_{\lambda f}(x) = \argmin_y \left\{ f(y) + \frac{1}{2\lambda} \|y - x\|^2 \right\}. This linkage facilitates the computation of approximate subgradients through optimization of the proximal mapping, bridging exact nonsmooth calculus with smooth approximations for theoretical and numerical purposes.[16]In applications, subderivatives model discontinuities in control theory, such as in hybrid systems or optimal control with nonsmooth dynamics, where the generalized maximum principle uses Clarke or limiting subdifferentials to derive necessary conditions for trajectories involving jumps or switches. Similarly, in economics, subderivatives capture kinked preferences in utility functions, representing non-differentiable indifference curves at thresholds like income levels, enabling variational analysis of consumer choice and equilibrium under incomplete differentiability.[17]
Historical Development
Origins in Convex Analysis
The concept of subderivatives, more commonly known as subgradients in the context of convex functions, traces its early roots to the work of Werner Fenchel in the 1940s, where he laid foundational ideas through the development of conjugate functions and the geometry of supporting hyperplanes for convex sets.[18] Fenchel's 1949 paper introduced the convex conjugate transformation, which establishes a duality between a convex function and its conjugate, providing a framework for understanding the supporting properties of convex epigraphs via hyperplanes.[18] This duality implicitly underpins the notion of subgradients, as the points where the primal and conjugate functions achieve equality correspond to supporting hyperplanes, a connection later formalized in subdifferential theory.[19] Fenchel's contributions, built on earlier geometric studies of convex bodies, emphasized the role of these hyperplanes in characterizing convexity without relying on differentiability.[20]The formal introduction of subgradients for convex functions occurred in the early 1960s, primarily through the independent efforts of Jean-Jacques Moreau and R. Tyrrell Rockafellar, who defined the subdifferential as the set of vectors providing linear underestimators to the function at a given point.[21] Moreau's 1962 paper introduced dual convex functions and proximal points in Hilbert spaces, providing insights into the geometry of convex functionals and optimality conditions in variational problems, which laid important groundwork for later developments in subdifferential theory.[21] Rockafellar extended this in his seminal 1970 monograph Convex Analysis, where he rigorously defined subgradients and the subdifferential operator for proper convex functions on Banach spaces, establishing their role as a generalization of gradients for nondifferentiable cases.[22] These definitions enabled the treatment of convex optimization problems where classical derivatives fail, such as those involving indicator functions or norms.This development was motivated by the growing need in operations research during the mid-20th century to solve nondifferentiable convex programming problems arising in resource allocation, network flows, and economic modeling, where traditional gradient-based methods were inadequate.[23] In the 1960s, as operations research expanded beyond linear programming to nonlinear convex models, researchers sought tools to handle piecewise linear or polyhedral objective functions common in practical applications like transportation and production planning.[24] Subgradients provided a way to compute descent directions and certify optimality in these settings, bridging theoretical convex analysis with computational needs in the field.[23]Rockafellar's Convex Analysis stands as the key publication that solidified subdifferential calculus for convex functions, offering a comprehensive treatment of chain rules, sum rules, and conjugate-based representations that form the basis for subsequent optimization algorithms.[22] The book synthesized Moreau's proximal insights with Fenchel's duality, proving essential properties like monotonicity and maximal cyclicity of the subdifferential mapping, which ensure its utility in proving convergence of subgradient methods.[25] This work not only formalized the calculus but also highlighted its applicability to finite-dimensional convex programs prevalent in operations research.[22]
Key Extensions and Contributors
In the 1970s and 1980s, Frank H. Clarke extended the concept of subderivatives beyond strictly convex functions to locally Lipschitz continuous functions, introducing the generalized gradient as the convex hull of limits of gradients from nearby differentiable points. This framework, detailed in Clarke's seminal 1975 paper, enabled the analysis of nonsmooth functions without requiring convexity, providing tools for optimization and control theory applications.Building on this, R. Tyrrell Rockafellar advanced variational analysis in the 1980s, incorporating subderivatives into broader structures like coderivatives, which dualize derivatives for set-valued mappings and support stability analysis in optimization. His work, culminating in the 1998 book Variational Analysis co-authored with Roger J.-B. Wets, unified subdifferential calculus with geometric concepts, influencing nonsmooth optimization techniques.[16]More recent contributions include those of A. D. Ioffe and Lionel Thibault, who in the 1980s and 1990s refined tangent and normal cones for nonsmooth sets, extending subderivative-based calculus to handle metric regularity and constraint qualifications without smoothness assumptions. Ioffe's 1981 paper on nonsmooth analysis and differential calculus of nondifferentiable mappings, along with Thibault's papers on subdifferentials of set-valued maps, established foundational results for these cones, enabling robust error bounds in variational problems.[26]These extensions facilitated the integration of subderivatives into machine learning by the 2000s, particularly for optimizing nonsmooth losses like the hinge loss in support vector machines (SVMs). Algorithms such as Pegasos, introduced in 2007, leverage stochastic subgradient descent to scale SVM training efficiently on large datasets.[27]