Convex conjugate

In convex analysis, the convex conjugate (also known as the Fenchel conjugate or Legendre–Fenchel transform) of a function f: \mathbb{R}^n \to (-\infty, \infty] is defined as f^*(y) = \sup_{x \in \mathbb{R}^n} (y^\top x - f(x)) for all y \in \mathbb{R}^n.^[1] This construction generalizes the classical Legendre transformation, originally developed for differentiable convex functions, to arbitrary extended real-valued functions that may be non-convex, non-differentiable, or improper.^[2] Regardless of the properties of f, the convex conjugate f^* is always a proper, closed, and convex function, with its epigraph formed as the intersection of half-spaces defined by the affine functions y^\top x - f(x).^[3] The convex conjugate satisfies Fenchel's inequality, which states that f(x) + f^*(y) \geq y^\top x for all x, y \in \mathbb{R}^n, with equality holding if and only if y belongs to the subdifferential of f at x.^[2] A key theorem in convex analysis asserts that if f is proper, closed, and convex, then the biconjugate f^{**}(x) = \sup_{y \in \mathbb{R}^n} (y^\top x - f^*(y)) recovers f exactly, meaning f^{**} = f; for non-convex f, f^{**} yields the closed convex envelope of f.^[1] These properties make the convex conjugate indispensable in mathematical optimization, where it underpins duality theory by linking primal and dual problems, supports the derivation of optimality conditions via subgradients, and enables robust formulations in areas such as machine learning and operations research.^[3] Notable examples illustrate its utility: for the \ell_p-norm function f(x) = \frac{1}{p} \|x\|_p^p with p \geq 1, the conjugate is f^*(y) = \frac{1}{q} \|y\|_q^q where \frac{1}{p} + \frac{1}{q} = 1, highlighting its role in Hölder's inequality; for indicator functions of convex sets, the conjugate is the support function of the set.^[1] In broader applications, it connects to dual norms—for a norm \| \cdot \|, the conjugate is the indicator of the unit ball in the dual norm—and facilitates transformations in entropy-based models, such as the negative entropy function whose conjugate is f^*(y) = \sum_i (e^{y_i} - [1](/page/1)).^[2]

Fundamentals

Definition

In convex analysis, the convex conjugate, also known as the Fenchel conjugate or Legendre–Fenchel transform, provides a fundamental duality operation for functions defined on vector spaces.^[4] Consider a function f: X \to \mathbb{R} \cup \{+\infty\}, where X is a vector space over the reals, typically equipped with a topology that allows for a dual space X^* consisting of continuous linear functionals on X. The convex conjugate f^*: X^* \to \mathbb{R} \cup \{+\infty\} is defined by

f^*(x^*) = \sup_{x \in X} \left( \langle x^*, x \rangle - f(x) \right),

where \langle \cdot, \cdot \rangle denotes the duality pairing between X^* and X. This supremum is taken over all x \in X, though in practice it is often restricted to the effective domain of f, where f(x) < +\infty. The domain of f^* consists of those x^* \in X^* for which the supremum is finite, and the range takes values in the extended reals \mathbb{R} \cup \{+\infty\} to accommodate cases where the supremum diverges. The extended real-valued framework ensures that the conjugate is well-defined even for improper functions, such as indicator functions of convex sets, which take the value $0 on the set and +\infty elsewhere. This setup presupposes familiarity with convex functions—those satisfying f(\theta x + (1-\theta)y) \leq \theta f(x) + (1-\theta) f(y) for \theta \in [0,1]—and the canonical duality pairing in topological vector spaces, without requiring explicit derivations of these concepts. As a generalization of the classical Legendre transform, which applies to differentiable strictly convex functions via their gradients, the convex conjugate extends the notion to arbitrary convex (possibly non-differentiable) functions by replacing the pointwise derivative with a global supremum over supporting hyperplanes.^[4] This broader applicability makes it a cornerstone for duality in optimization and variational problems.

Fenchel's Inequality

Fenchel's inequality provides a fundamental bound relating a proper convex function f: X \to (-\infty, +\infty] defined on a Banach space X and its convex conjugate f^*: X^* \to (-\infty, +\infty], where X^* is the dual space. Specifically, for all x \in X and x^* \in X^*,

f(x) + f^*(x^*) \geq \langle x, x^* \rangle,

with equality if the supremum defining f^*(x^*) is attained at x (or symmetrically for f^{**}(x)).^[5] This inequality follows directly from the definition of the convex conjugate. By construction,

f^*(x^*) = \sup_{y \in X} \left( \langle y, x^* \rangle - f(y) \right),

so for any fixed x \in X,

\langle x, x^* \rangle - f(x) \leq f^*(x^*),

which rearranges to the stated form.^[5] In the context of convex optimization, Fenchel's inequality establishes weak duality between primal and dual problems. For a primal problem minimizing f(x) + g(Ax) and its Fenchel dual maximizing -f^*(-u) - g^*(A^*u), the inequality implies that the primal optimal value is at least the dual optimal value.^[6] The inequality is named after Werner Fenchel, who formalized it for conjugate convex functions in 1949, though it generalizes earlier results such as Young's inequality for products from 1912.^[5]

Examples

Conjugates of Common Functions

The convex conjugate provides a duality between functions that reveals their geometric and analytical properties, as illustrated by explicit computations for several archetypal convex functions. These examples demonstrate how the supremum operation in the definition transforms familiar forms into their dual counterparts, often yielding indicator functions, dual norms, or entropy-like expressions. Such pairings highlight the self-duality of certain functions and underpin applications in optimization and variational analysis. Consider the affine function f(x) = \langle a, x \rangle + b defined on \mathbb{R}^n, where a \in \mathbb{R}^n and b \in \mathbb{R}. This function is both convex and concave, hence affine. To compute its conjugate, evaluate

f^*(y) = \sup_{x \in \mathbb{R}^n} \left( \langle y, x \rangle - \langle a, x \rangle - b \right) = \sup_{x \in \mathbb{R}^n} \left( \langle y - a, x \rangle \right) - b.

The supremum is finite only if y - a = 0, in which case it equals -b; otherwise, it diverges to +\infty. Thus, f^*(y) = \delta_{a}(y) - b, where \delta_{a}(y) is the indicator function that equals 0 if y = a and +\infty otherwise. This result shows that the conjugate of an affine function is essentially a shifted Dirac delta, emphasizing the pointwise nature of linear duality. For power functions, take f(x) = \frac{|x|^p}{p} on \mathbb{R}, where p > 1. The conjugate is found by solving

f^*(y) = \sup_{x \in \mathbb{R}} \left( y x - \frac{|x|^p}{p} \right).

Assuming x \geq 0 and y \geq 0 without loss of generality (due to symmetry), the critical point occurs where the derivative vanishes: y - |x|^{p-1} = 0, so x = y^{1/(p-1)}. Substituting yields f^*(y) = y \cdot y^{1/(p-1)} - \frac{1}{p} y^{p/(p-1)} = \frac{|y|^q}{q}, where \frac{1}{p} + \frac{1}{q} = 1. This duality pairs p-norm-like functions with their Hölder conjugates, illustrating how the conjugate inverts the exponent while preserving convexity. In higher dimensions, this extends to f(x) = \frac{\|x\|_p^p}{p}, with f^*(y) = \frac{\|y\|_q^q}{q}. The absolute value function f(x) = |x| on \mathbb{R} serves as a simple norm example. Its conjugate is

f^*(y) = \sup_{x \in \mathbb{R}} \left( y x - |x| \right).

For fixed y, the maximum occurs at x = \operatorname{sign}(y) if |y| \leq 1, yielding f^*(y) = 0; if |y| > 1, the expression unboundedly increases in the direction of \operatorname{sign}(y), so f^*(y) = +\infty. Thus, f^*(y) = \delta_{[-1,1]}(y), the indicator function of the interval [-1,1]. This computation reveals the conjugate as the characteristic function of the unit ball in the dual (infinity) norm, underscoring the role of conjugates in norm duality. For the exponential function f(x) = e^x on \mathbb{R}, compute

f^*(y) = \sup_{x \in \mathbb{R}} \left( y x - e^x \right).

The maximum is at x = \ln y for y > 0, where the derivative y - e^x = 0. Substituting gives f^*(y) = y \ln y - y for y > 0, and f^*(y) = +\infty otherwise (as the supremum diverges for y \leq 0). This expression, known as the negative entropy or Fermi-Dirac function, pairs the exponential growth with a logarithmic decay, a duality central to information theory and large deviation principles within convex optimization. Indicator functions and norms provide further illustrations of conjugate pairs involving extended real-valued functions. The indicator function of a closed convex set C \subseteq \mathbb{R}^n, defined as \delta_C(x) = 0 if x \in C and +\infty otherwise, has conjugate

\delta_C^*(y) = \sup_{x \in C} \langle y, x \rangle = \sigma_C(y),

the support function of C, which measures the maximum projection onto direction y. Conversely, the conjugate of a norm \| \cdot \| on \mathbb{R}^n is the indicator of the unit ball in the dual norm \| \cdot \|_*, defined by \|y\|_* = \sup_{\|x\| \leq 1} \langle y, x \rangle: specifically,

\| \cdot \|^*(y) = \begin{cases} 0 & \text{if } \|y\|_* \leq 1, \\ +\infty & \text{otherwise}. \end{cases}

These relations highlight how conjugates transform set indicators into directional maxima and norms into constraint enforcers, foundational for duality in constrained optimization problems.

Connection with Expected Shortfall

The expected shortfall (ES) at confidence level \alpha \in (0,1), also known as average value at risk (AVaR) or conditional value at risk (CVaR), for a loss random variable X is defined as

\mathrm{ES}_\alpha(X) = \frac{1}{1-\alpha} \int_\alpha^1 \mathrm{VaR}_u(X) \, du,

where \mathrm{VaR}_u(X) denotes the u-quantile of the distribution of X.^[7] This formulation captures the average loss exceeding the value at risk threshold, providing a coherent measure of tail risk that addresses limitations of VaR by incorporating severity beyond the quantile.^[7] The convex conjugate establishes a fundamental link between ES and quantile-based measures through the dual representation of shortfall functions. Specifically, the optimization form of ES derives from minimizing over thresholds \zeta,

\mathrm{ES}_\alpha(X) = \min_{\zeta \in \mathbb{R}} \left\{ \zeta + \frac{1}{1-\alpha} \mathbb{E}\left[ (X - \zeta)^+ \right] \right\},

where (t)^+ = \max\{t, 0\}.^[7] This structure reveals how the conjugate of the shortfall function—encoding expected excesses over a threshold—directly yields quantile interpretations, as the minimizing \zeta equals \mathrm{VaR}_\alpha(X).^[7] In the context of loss distributions, the convex conjugate enables a dual representation for ES in robust optimization problems. Applying the Fenchel-Moreau theorem to the proper convex lower semicontinuous ES yields its biconjugate form,

\mathrm{ES}_\alpha(X) = \sup_Q \left\{ \mathbb{E}_Q[X] \ \middle|\ Q \ll P,\ \frac{dQ}{dP} \leq \frac{1}{1-\alpha}\ \mathrm{a.s.} \right\},

where the supremum is over probability measures Q absolutely continuous with respect to the reference probability P. This duality transforms ES minimization into tractable linear programs in portfolio optimization and hedging, leveraging the conjugate to bound worst-case expectations under density constraints.^[7] Expected shortfall stands out as the unique law-invariant coherent risk measure satisfying specific conjugate properties, namely that its robust dual envelope corresponds precisely to the set of measures with bounded Radon-Nikodym derivatives, distinguishing it from mixtures or other spectral forms in the Kusuoka representation.^[8]

Convex Ordering

The convex ordering defines a partial order on the set of proper convex lower semicontinuous functions on a vector space, induced by their convex conjugates. Specifically, for convex functions f and g, f \preceq g if f^*(y) \geq g^*(y) for all y in the domain of the conjugates, where f^* and g^* are the convex conjugates of f and g, respectively. This means that f is smaller than g in the convex order. This ordering is useful in the theory of stochastic dominance, where it relates to the increasing convex order on probability distributions. In particular, if two distributions \mu and \nu satisfy \mu \preceq_{\text{icx}} \nu, then the expectations of increasing convex functions under \mu are less than or equal to those under \nu, and the conjugate ordering on associated potential functions (such as transport potentials) captures this dominance. For example, when considering convex functions derived from cumulative distribution functions—such as the integrated tail distribution \int_{-\infty}^x \bar{F}(t) \, dt, where \bar{F} is the survival function—the conjugate ordering implies stochastic comparisons, including second-order stochastic dominance between the distributions. The convex ordering exhibits preservation under certain operations. For instance, if f \preceq g, then \alpha f \preceq \alpha g for \alpha > 0, as the conjugate scales by $1/\alpha. Similarly, the order is preserved under infimal convolution, since the conjugate of the infimal convolution of f and g is the sum of the conjugates f^* + g^*.

Properties

Convexity

The convex conjugate f^* of any extended real-valued function f: X \to \overline{\mathbb{R}}, where X is a vector space equipped with a duality pairing \langle \cdot, \cdot \rangle, is always a convex function, regardless of whether f itself is convex.^[9]^[10] To prove this, consider points y_1, y_2 in the domain of f^* and \lambda \in [0,1]. Let z = (1-\lambda) y_1 + \lambda y_2. By definition,

f^*(z) = \sup_{x \in X} \left( \langle z, x \rangle - f(x) \right) = \sup_{x \in X} \left( (1-\lambda) \langle y_1, x \rangle + \lambda \langle y_2, x \rangle - f(x) \right).

The expression inside the supremum equals

(1-\lambda) \left( \langle y_1, x \rangle - f(x) \right) + \lambda \left( \langle y_2, x \rangle - f(x) \right),

f^*(z) = \sup_{x} \left[ (1-\lambda) \left( \langle y_1, x \rangle - f(x) \right) + \lambda \left( \langle y_2, x \rangle - f(x) \right) \right] \leq (1-\lambda) \sup_{x} \left( \langle y_1, x \rangle - f(x) \right) + \lambda \sup_{x} \left( \langle y_2, x \rangle - f(x) \right) = (1-\lambda) f^*(y_1) + \lambda f^*(y_2),

where the inequality follows because the supremum of a convex combination is at most the convex combination of the suprema.^[9] An equivalent perspective arises from the epigraph of f^*, defined as \operatorname{epi} f^* = \{ (y, \alpha) \in X^* \times \mathbb{R} \mid f^*(y) \leq \alpha \}. For each fixed x \in X, the function s \mapsto \langle s, x \rangle - f(x) is affine in s, and its epigraph is a closed halfspace. The epigraph of f^* is the intersection over all x of these halfspaces, which is convex as an intersection of convex sets. Thus, f^* is convex, and moreover, it is closed (hence lower semicontinuous) because the halfspaces are closed.^[10] This inherent convexity ensures that the conjugate maps any function to a lower semicontinuous convex function, providing a canonical way to obtain the convex closure of f via further operations like the biconjugate.^[10]

Biconjugate

The biconjugate of a function f: X \to (-\infty, +\infty], where X is a Banach space, is obtained by applying the convex conjugate operation twice and is given by

f^{**}(x) = (f^*)^*(x) = \sup_{x^* \in X^*} \bigl( \langle x^*, x \rangle - f^*(x^*) \bigr)

for all x \in X, where X^* denotes the topological dual of X.^[11] The Fenchel–Moreau theorem provides a characterization of when the biconjugate recovers the original function: If f is proper, convex, and lower semicontinuous, then f^{**} = f.^[11] In the more general case where f is merely convex and proper, the biconjugate f^{**} equals the convex lower semicontinuous envelope of f, defined as the pointwise supremum of all convex lower semicontinuous functions that are less than or equal to f.^[10] This envelope represents the smallest convex lower semicontinuous majorant that bounds f from below.^[10] A proof sketch of the Fenchel–Moreau theorem begins with Fenchel's inequality, which establishes that f(x) + f^*(x^*) \geq \langle x^*, x \rangle for all x \in X and x^* \in X^*, implying f^{**}(x) \leq f(x) whenever f is convex.^[11] Equality in Fenchel's inequality holds if and only if x^* \in \partial f(x), where \partial f(x) is the subdifferential of f at x.^[11] Under the additional assumptions of properness, convexity, and lower semicontinuity, the supremum defining f^{**}(x) is attained for some x^* \in \partial f(x), ensuring f^{**}(x) = f(x).^[11] The Fenchel–Moreau theorem applies to proper convex functions defined on Banach spaces, where properness means f is not identically +\infty and attains finite values on a nonempty set.^[11] This setting ensures the conjugate operations are well-defined and the topological properties support the necessary continuity arguments.^[12]

Order Reversing

The convex conjugate operation exhibits an order-reversing property with respect to pointwise inequalities between functions. Specifically, for proper convex lower semicontinuous functions f and g defined on a Banach space, if f \geq g pointwise, then f^* \leq g^* pointwise, where f^* and g^* denote the respective convex conjugates.^[13] Equivalently, if f \leq g pointwise, the inequality reverses to f^* \geq g^* pointwise. This monotonicity holds more generally for extended real-valued functions, ensuring that the conjugate preserves the structure of inequalities in the dual space but in the opposite direction.^[13] The proof follows directly from the definition of the convex conjugate. Recall that the conjugate of a function h is given by

h^*(y) = \sup_{x \in E} \left( \langle x, y \rangle - h(x) \right),

where E is the underlying space and \langle \cdot, \cdot \rangle is the duality pairing. If f \geq g, then for every x \in E and y in the dual space, \langle x, y \rangle - f(x) \leq \langle x, y \rangle - g(x). Taking the supremum over x preserves the inequality but reverses its direction due to the subtraction, yielding f^*(y) \leq g^*(y) for all y.^[13] This order-reversing behavior extends to related operations in convex analysis, notably the infimal convolution. For convex functions f and g, the infimal convolution is defined as

(f \square g)(x) = \inf_{z \in E} \left( f(z) + g(x - z) \right),

and its conjugate satisfies (f \square g)^* = f^* + g^*, which inherits the order-reversing property through the addition of conjugates. Similar extensions apply to other duality-preserving operations, such as those involving subgradients or convex processes, maintaining the reversal of pointwise orders.^[13] In optimization, the order-reversing property underpins the relationship between primal and dual problems via Fenchel duality. The Fenchel-Young inequality states that for a convex function h and its conjugate h^*,

h(x) + h^*(\phi) \geq \langle \phi, x \rangle

for all x and \phi, with equality if \phi \in \partial h(x), the subdifferential. This provides weak duality, where the dual objective (involving conjugates) upper-bounds the primal value, and the order reversal ensures that perturbations or relaxations in the primal correspond to tightened bounds in the dual, facilitating strong duality under suitable conditions like Slater's constraint qualification.^[13]

Infimal Convolution

The infimal convolution of two proper convex functions f and g defined on a vector space is given by

(f \square g)(x) = \inf_{y} \left\{ f(y) + g(x - y) \right\},

where the infimum is taken over all y in the domain, and the result is understood to be +\infty if the infimum is empty or unbounded below. This operation generalizes the Minkowski sum of sets to their indicator functions and preserves convexity when f and g are convex. A key property relating infimal convolution to convex conjugates states that, for proper convex lower semicontinuous functions f and g, the conjugate of their infimal convolution equals the pointwise sum of their conjugates:

(f \square g)^* = f^* + g^*. $$ This holds under the domain conditions ensuring the functions are closed, as established in classical [convex analysis](/page/Convex_analysis). The result extends to finite sums of multiple functions, where the infimal convolution of $f_1, \dots, f_m$ has conjugate $f_1^* + \dots + f_m^*$.[](https://arxiv.org/abs/1907.08318) To see this property, consider the definition of the conjugate: $(f \square g)^*(z) = \sup_x \left\{ z \cdot x - (f \square g)(x) \right\} = \sup_x \left\{ z \cdot x - \inf_y \left( f(y) + g(x - y) \right) \right\}$.[](http://www.seas.ucla.edu/~vandenbe/236C/lectures/conj.pdf) Interchanging the supremum and infimum via [minimax](/page/Minimax) duality (justified for convex lower semicontinuous functions) yields

\sup_x \inf_y \left{ z \cdot x - f(y) - g(x - y) \right} = \sup_{x,y} \left{ z \cdot y + z \cdot (x - y) - f(y) - g(x - y) \right} = \sup_y \left{ z \cdot y - f(y) \right} + \sup_w \left{ z \cdot w - g(w) \right} = f^(z) + g^(z),

where $w = x - y$. This [manipulation](/page/Manipulation) highlights the [dual](/page/Dual) nature of addition in the [primal](/page/Primal) domain corresponding to infimal [convolution](/page/Convolution).[](http://www.seas.ucla.edu/~vandenbe/236C/lectures/conj.pdf) In optimization, this property facilitates the decomposition of complex problems into simpler subproblems whose conjugates are easier to compute or analyze. For instance, infimal convolution enables regularization techniques, such as the Moreau-Yosida approximation, which smooths nonsmooth convex functions while preserving key optimality conditions, aiding convergence in iterative algorithms.[](https://eudml.org/doc/271746) It also supports the study of convex-composite functions, where the structure allows for efficient duality-based solvers in large-scale optimization tasks like [machine learning](/page/Machine_learning) and [signal processing](/page/Signal_processing).[](https://arxiv.org/abs/1907.08318) ### Maximizing Argument In the definition of the convex conjugate $f^*(y) = \sup_{x} \langle y, x \rangle - f(x)$, the maximizing [argument](/page/The_Argument) $x$ is the point at which the supremum is attained, provided it exists. This maximizer satisfies the subdifferential inclusion $y \in \partial f(x)$, where $\partial f(x)$ is the subdifferential of the [convex function](/page/Convex_function) $f$ at $x$. This characterization links the [optimization problem](/page/Optimization_problem) inherent in the conjugate directly to the geometry of [convex](/page/Convex) functions via subgradients. The properties of this maximizing argument depend on the convexity of $f$. If $f$ is [strictly convex](/page/Strictly_convex) on an [open set](/page/Open_set) containing $x$, then $\partial f(x)$ is a [singleton](/page/Singleton), implying that the maximizer is unique. In the general [convex](/page/Convex) case, however, the maximizer may be set-valued, corresponding to the potentially multifunction nature of the subdifferential. Furthermore, the symmetric relation holds: $x \in \partial f^*(y)$ [if and only if](/page/If_and_only_if) $y \in \partial f(x)$. This biconjugate reciprocity underscores the duality between $f$ and its conjugate in terms of their supporting hyperplanes. This condition $y \in \partial f(x)$ also marks the equality case in Fenchel's inequality, where $f(x) + f^*(y) = \langle y, x \rangle$. Computationally, identifying the maximizing argument reduces to solving the inclusion $y \in \partial f(x)$, which can be tackled using algorithms that leverage proximal operators of $f$, such as those in splitting methods for monotone inclusions. ### Scaling Properties The convex conjugate exhibits a fundamental scaling property: for a convex function $f: \mathbb{R}^n \to (-\infty, +\infty]$ and any $\lambda > 0$, the conjugate of the scaled function $\lambda f$ satisfies

(\lambda f)^(y) = \lambda f^\left( \frac{y}{\lambda} \right).

This relation holds for all $y \in \mathbb{R}^n$. To verify this, substitute into the definition of the conjugate:

(\lambda f)^(y) = \sup_{x \in \dom f} \left{ \langle y, x \rangle - \lambda f(x) \right} = \lambda \sup_{x \in \dom f} \left{ \left\langle \frac{y}{\lambda}, x \right\rangle - f(x) \right} = \lambda f^\left( \frac{y}{\lambda} \right),

where the equality follows from the homogeneity of the inner product and the scaling of the supremum argument. A related property concerns positively homogeneous convex functions. If $f$ is positively homogeneous of degree $p > 1$, meaning $f(\lambda x) = \lambda^p f(x)$ for all $\lambda > 0$ and $x \in \dom f$, then the conjugate $f^*$ is positively homogeneous of degree $q = p/(p-1)$, so that $f^*(\lambda y) = \lambda^q f^*(y)$ for all $\lambda > 0$ and $y \in \dom f^*$. This follows from direct substitution into the supremum definition, leveraging the homogeneity assumption to rescale the optimizing argument $x$. This homogeneity preservation (with degree transformation) has key implications for functions like powered norms. For example, the function $f(x) = \frac{1}{p} \|x\|_p^p$ (homogeneous of degree $p$) has conjugate $\frac{1}{q} \|y\|_q^q$ (homogeneous of degree $q$), where $1/p + 1/q = 1$, illustrating the duality in common examples without altering the core scaling structure.[](http://www.seas.ucla.edu/~vandenbe/cvxbook/bv_cvxbook.pdf) ### Behavior under Linear Transformations The convex conjugate exhibits specific transformation properties when composed with linear maps, reflecting the interplay between the original [function](/page/Function), the linear operator, and its [adjoint](/page/Adjoint). Consider a linear operator $A: X \to Y$ between normed vector spaces, with [adjoint](/page/Adjoint) $A^*: Y^* \to X^*$, and a proper convex lower semicontinuous [function](/page/Function) $f: Y \to (-\infty, +\infty]$. The conjugate of the [composition](/page/Composition) $f \circ A: X \to (-\infty, +\infty]$ is given by

(f \circ A)^(y^) = f^(A^ y^) + \delta_{\ker A^}(y^*),

where $\delta_{\ker A^*}$ is the [indicator function](/page/Indicator_function) of the [kernel](/page/Kernel) of $A^*$, taking value 0 on $\ker A^*$ and $+\infty$ otherwise. This formula accounts for the restriction imposed by the [kernel](/page/Kernel), ensuring the conjugate is finite only when $y^*$ lies in $\ker A^*$, adjusted by the action of the [adjoint](/page/Adjoint). In the special case of a change of variables where $g(x) = f(Ax)$ and $A$ is an invertible linear operator, the formula simplifies significantly. Here, the conjugate is

g^(z) = f^(A^{-*} z),

with $A^{-*}$ denoting the inverse of the adjoint $A^*$. This follows directly from the definition, as the supremum defining $g^*(z)$ transforms via the substitution $u = Ax$, yielding

g^(z) = \sup_{x} \langle z, x \rangle - f(Ax) = \sup_{u} \langle A^{-} z, u \rangle - f(u) = f^(A^{-} z).

The proof of the invertible case relies on the bilinear pairing invariance under adjoints: $\langle y^*, Ax \rangle = \langle A^* y^*, x \rangle$. Substituting into the conjugate definition preserves the supremum structure, mapping the dual variable accordingly. For the general non-invertible case, the indicator term arises from the need to enforce orthogonality conditions implicit in the pairing when the kernel is nontrivial, ensuring consistency with the domain restrictions of the composition. These transformation properties have important implications in [convex optimization](/page/Convex_optimization), particularly for dimension reduction. By composing a high-dimensional objective with a [linear map](/page/Linear_map) of reduced rank, the conjugate facilitates dual formulations in lower-dimensional spaces, enabling efficient solving of constrained problems such as those involving projections or subspace restrictions. ## Broader Context ### Relation to Legendre Transform The Legendre transform, introduced by [Adrien-Marie Legendre](/page/Adrien-Marie_Legendre) in 1787 for applications in the [calculus of variations](/page/Calculus_of_variations) and [mechanics](/page/Mechanics), applies to smooth and [strictly convex](/page/Strictly_convex) functions $f: \mathbb{R}^n \to \mathbb{R}$. It is defined as

f^L(p) = \sup_{x \in \mathbb{R}^n} \left( \langle p, x \rangle - f(x) \right),

where the supremum is attained at a unique point $x$ satisfying $p = \nabla f(x)$, and the inverse is recovered via $x = \nabla f^L(p)$. This formulation relies on differentiability to ensure the bijection between primal and dual variables through gradients.[](https://ise.ncsu.edu/wp-content/uploads/sites/9/2015/07/or706-LF-transform-1.pdf)[](https://sites.math.washington.edu/~rtr/papers/rtr014-LegendreTransform.pdf) The convex conjugate extends the Legendre transform to arbitrary proper convex functions, potentially nonsmooth, by retaining the same supremum expression without assuming differentiability. The two transforms coincide precisely when $f$ is essentially [smooth](/page/Smooth) (steep at the boundary of its effective [domain](/page/Domain)) and [strictly convex](/page/Strictly_convex), ensuring the subdifferential is single-valued and bijective, so the argmax in the supremum aligns with the [gradient](/page/Gradient) condition. In such cases, the convex conjugate inherits the involutive [property](/page/Property) of the Legendre transform, where applying it twice recovers the original function. Otherwise, the convex conjugate provides a more general duality without the invertibility guarantees of the smooth setting.[](https://sites.math.washington.edu/~rtr/papers/rtr014-LegendreTransform.pdf) Werner Fenchel generalized the concept in the 1940s, culminating in his 1949 work on conjugate convex functions, which formalized the transform for nonsmooth [convex analysis](/page/Convex_analysis) and established key duality properties like biconjugation equaling the convex closure. This extension was crucial for broader applications in optimization and [functional analysis](/page/Functional_analysis).[](https://www.cs.cmu.edu/~suvrit/teach/papers/1949_fenchel_conjugate_convex_functions.pdf) A key limitation of the classical Legendre transform is its inapplicability to nondifferentiable functions, such as the $ \ell_p $-norm $ f(x) = \|x\|_p $ for $ 1 \leq p < \infty $, where gradients fail at the origin and along certain directions. In contrast, the convex conjugate of the norm yields the dual norm's indicator function over the unit ball, $ f^*(y) = 0 $ if $ \|y\|_{q} \leq 1 $ and $ +\infty $ otherwise (with $ 1/p + 1/q = 1 $), demonstrating the conjugate's ability to handle such cases robustly. ### Applications in Convex Optimization The convex conjugate plays a pivotal role in Lagrangian duality for convex optimization problems. Consider the standard form $\min_{x} f(x)$ subject to $Ax \leq b$, where $f$ is convex. The Lagrangian is $L(x, y) = f(x) + y^T (Ax - b)$ with $y \geq 0$, and the dual function is $g(y) = \inf_x L(x, y) = -f^*(-A^T y) - b^T y$, where $f^*$ denotes the convex conjugate of $f$.[](https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf) This expression reveals how the conjugate encapsulates the infimum over the primal variable, transforming the constrained problem into an unconstrained maximization $\max_{y \geq 0} g(y)$, which provides a lower bound on the primal optimal value.[](https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf) Strong duality holds under conditions such as Slater's, which requires the existence of a strictly feasible point $x$ satisfying $Ax < b$. In this case, the dual optimal value equals the primal optimal value, $\sup_{y \geq 0} g(y) = \inf \{ f(x) \mid Ax \leq b \}$.[](https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf) This equality is guaranteed by strong duality theorems in the convex setting, with the Fenchel–Moreau theorem ensuring that for a proper convex lower semicontinuous function $f$, the biconjugate $f^{**} = f$, which supports the alignment of primal and dual objectives.[](https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf) In proximal algorithms for minimizing nonsmooth convex functions, the convex conjugate facilitates efficient computation via the Moreau envelope, defined as

f^\lambda(x) = \inf_y \left( f(y) + \frac{1}{2\lambda} |x - y|^2 \right),

which smooths $f$ while preserving its minimizers. The conjugate of the Moreau envelope relates directly to the conjugate of $f$, given by

(f^\lambda)^(z) = f^(z) + \frac{\lambda}{2} |z|^2,

enabling dual-based implementations in methods like the proximal gradient algorithm.[](https://web.stanford.edu/~boyd/papers/pdf/prox_algs.pdf) Entropic regularization, often used in mirror descent variants, leverages the conjugate of the negative entropy function, which yields the log-sum-exp form, to handle probability simplex constraints in large-scale convex problems.[](https://arxiv.org/pdf/1906.01333) In machine learning, the convex conjugate appears in the dual formulation of support vector machines (SVMs), where the hinge loss $\max(0, 1 - y_i (w^T x_i + b))$ has a conjugate that bounds the dual quadratic program, facilitating kernel methods and regularization analysis.[](https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf) This duality underpins SVM's maximum-margin optimization, ensuring convex solvability and strong duality under [Slater's condition](/page/Slater's_condition) for the soft-margin variant.[](https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf) ## Selected Conjugates ### Tabular Listing The following table provides a quick reference for selected convex conjugate pairs, compiled from standard examples in convex analysis. Each pair consists of a proper convex function $f$ and its convex conjugate $f^*$, with domains specified where relevant. | $ f(x) $ | Domain | $ f^*(y) $ | Domain | |------------|--------|--------------|--------| | Indicator function of convex set $ C $: $ \delta_C(x) = 0 $ if $ x \in C $, $ +\infty $ otherwise | $ C \subseteq \mathbb{R}^n $ | Support function of $ C $: $ \sigma_C(y) = \sup_{x \in C} \langle y, x \rangle $ | $ \mathbb{R}^n $ []() | | $ \frac{1}{2} \|x\|_2^2 $ | $ \mathbb{R}^n $ | $ \frac{1}{2} \|y\|_2^2 $ | $ \mathbb{R}^n $ []() | | $ \|x\|_p $ for $ 1 < p < \infty $ | $ \mathbb{R}^n $ | Indicator of dual unit ball: $ 0 $ if $ \|y\|_q \leq 1 $, $ +\infty $ otherwise, where $ \frac{1}{p} + \frac{1}{q} = 1 $ | $ \{ y \in \mathbb{R}^n : \|y\|_q \leq 1 \} $ []() | | $ \frac{1}{p} \|x\|_p^p $ for $ 1 < p < \infty $ | $ \mathbb{R}^n $ | $ \frac{1}{q} \|y\|_q^q $ where $ \frac{1}{p} + \frac{1}{q} = 1 $ | $ \mathbb{R}^n $ []() | | Negative entropy: $ \sum_i x_i \log x_i $ | $ x \geq 0 $, $ \sum_i x_i = 1 $ | Log-sum-exp: $ \log \left( \sum_i e^{y_i} \right) $ | $ \mathbb{R}^n $ [](https://tisp.indigits.com/convex_sets/conjugate_functions) | | Log-sum-exp: $ \log \left( \sum_i e^{x_i} \right) $ | $ \mathbb{R}^n $ | Negative entropy: $ \sum_i y_i \log y_i $ | $ y \geq 0 $, $ \sum_i y_i = 1 $ []() | | Negative log: $ -\log x $ (scalar) | $ x > 0 $ | $ -1 - \log(-y) $ | $ y < 0 $ []() | | Exponential: $ e^x $ (scalar) | $ \mathbb{R} $ | $ y \log y - y $ for $ y > 0 $, $ +\infty $ otherwise | $ y > 0 $ []() | | Fermi-Dirac entropy: $ x \log x + (1 - x) \log (1 - x) $ (scalar) | $ 0 < x < 1 $ | $ \log (1 + e^y ) $ | $ \mathbb{R} $ []() | | Matrix negative log-det: $ -\log \det X $ | Symmetric positive definite matrices $ \mathbb{S}_{++}^n $ | $ -\log \det (-Y) - n $ | $ - \mathbb{S}_{++}^n $ []() | | Quadratic: $ \frac{1}{2} x^T Q x $ with $ Q \succ 0 $ | $ \mathbb{R}^n $ | $ \frac{1}{2} y^T Q^{-1} y $ | $ \mathbb{R}^n $ []() | | Norm: $ \|x\| $ | $ \mathbb{R}^n $ | Indicator of dual unit ball: $ 0 $ if $ \|y\|_* \leq 1 $, $ +\infty $ otherwise | $ \{ y : \|y\|_* \leq 1 \} $ []() | | Softmax conjugate variant: $ \sum_i x_i \log x_i $ | $ x > 0 $ | $ \sum_i e^{y_i - 1} $ | $ \mathbb{R}^n $ []() | These pairs illustrate common functions in optimization and analysis, drawn from established references on convex duality. ### Patterns and Dual Pairs One prominent pattern in convex conjugates arises from [Hölder's inequality](/page/Hölder's_inequality), which underpins the duality between $\ell_p$ and $\ell_q$ norms for $p, q > 1$ satisfying $\frac{1}{p} + \frac{1}{q} = 1$. Specifically, the convex conjugate of the function $f(x) = \frac{1}{p} \|x\|_p^p$ is $f^*(y) = \frac{1}{q} \|y\|_q^q$, reflecting the reciprocal relationship that ensures the dual norm captures the supporting hyperplane structure in the unit ball of the original norm.[](https://math.stackexchange.com/questions/2450874/fenchel-conjugate-of-ell-pp) This p-q pairing exemplifies how conjugates preserve homogeneity while inverting exponents, a structural symmetry rooted in the inequality's bound on inner products.[](https://math.stackexchange.com/questions/2450874/fenchel-conjugate-of-ell-pp) Self-dual functions, where $f^{**} = f$ and $f^* = f$ [up to scaling](/page/Up_to), provide another recurring [pattern](/page/Pattern), highlighting intrinsic symmetries in the conjugate [operation](/page/Operation). [A canonical](/page/Canonical) example is the [quadratic](/page/Quadratic) $f(x) = \frac{1}{2} \|x\|_2^2$, whose conjugate is itself, $f^*(y) = \frac{1}{2} \|y\|_2^2$, due to the self-duality of the Euclidean norm.[](https://www.stat.cmu.edu/~ryantibs/convexopt-F18/lectures/dual-corres.pdf) This property facilitates closed-form dual representations in optimization problems involving squared norms. In [information theory](/page/Information_theory), dual pairs emerge between the negative [entropy](/page/Entropy) function and measures of [divergence](/page/Divergence). The negative [entropy](/page/Entropy) $ \sum_i p_i \log p_i $ (convex on the probability simplex) has the log-sum-exp function as its convex conjugate, and the associated Bregman [divergence](/page/Divergence) is the Kullback-Leibler divergence $D_{\text{KL}}(p \| q) = \sum_i p_i \log \frac{p_i}{q_i}$, quantifying distributional asymmetry through Fenchel-Young inequality gaps.[](https://tisp.indigits.com/convex_sets/conjugate_functions) This pairing underscores conjugates' role in bounding relative entropies and enabling variational approximations. Geometrically, convex conjugates act as polar transforms in the space of proper [convex](/page/Convex) functions, mirroring how the polar of a [convex set](/page/Convex_set) $C$ is $\{ y \mid \langle x, y \rangle \leq 1 \ \forall x \in C \}$. For indicator functions of [convex](/page/Convex) sets, the conjugate yields the [support function](/page/Support_function) of the polar set, transforming epigraphs into [dual](/page/Dual) barriers that encode supporting hyperplanes.[](http://www.seas.ucla.edu/~vandenbe/236C/lectures/conj.pdf) These patterns generalize beyond [Euclidean](/page/Euclidean) spaces to Orlicz spaces, where complementary Young functions $\Phi$ and $\Psi = \Phi^*$ (satisfying $\Phi(t) + \Psi(s) \geq ts$) define [dual](/page/Dual) Banach spaces via modular norms, extending p-q duality to variable exponent growth.[](https://arxiv.org/abs/1711.09121)