Fact-checked by Grok 2 weeks ago

Maximum and minimum

In mathematics, the maximum and minimum of a set are defined as the greatest and least elements in the set, respectively, provided such elements exist.^[1] For functions, maxima and minima, collectively known as extrema, refer to the largest and smallest values that the function attains over its domain, occurring at points where the function reaches these peak or valley values.^[2] These concepts are fundamental in various mathematical fields, including calculus, optimization, and order theory. In the context of real-valued functions of one or more variables, extrema are classified as absolute (or global), which represent the overall highest or lowest values across the entire domain, or local (or relative), which are the highest or lowest values in a neighborhood around a specific point.^[3] Local maxima and minima often occur at critical points, where the derivative of the function is zero or undefined, allowing for the identification of potential peaks and valleys through techniques like the first and second derivative tests.^[4] The study of maxima and minima plays a crucial role in applied mathematics, particularly in optimization problems, where one seeks to maximize profit, minimize cost, or find equilibrium states in physical systems.^[5] For instance, in economics and engineering, these principles underpin linear programming and nonlinear optimization algorithms. In more advanced settings, such as multivariable calculus, partial derivatives help locate extrema on surfaces, with the Hessian matrix providing insight into their nature (maximum, minimum, or saddle point).^[6] Not all functions possess maxima or minima; unbounded functions, like linear ones extending to infinity, may lack them, highlighting the importance of domain considerations.^[7]

Fundamental Definitions

Maxima and Minima for Functions

In mathematics, a real-valued function f defined on a domain D attains a maximum value at a point x_0 \in D if f(x_0) \geq f(x) for all x \in D, making f(x_0) the largest value of the function over its entire domain; similarly, it attains a minimum value at x_1 \in D if f(x_1) \leq f(x) for all x \in D, making f(x_1) the smallest value.^[3]^[8] These are known as absolute or global maxima and minima, as they represent the extremal values across the full domain. In contrast, a local maximum occurs at x_0 \in D if there exists an open interval I containing x_0 such that f(x_0) \geq f(x) for all x \in I \cap D, and a local minimum satisfies f(x_1) \leq f(x) for all x \in I \cap D; these relative extrema hold only in a neighborhood, not necessarily globally.^[9]^[10] The existence of global maxima and minima depends on the domain and properties of the function. For a continuous function f on a closed and bounded interval [a, b], the extreme value theorem guarantees that f attains both an absolute maximum and an absolute minimum on [a, b]. This theorem requires continuity on the compact set [a, b], ensuring the function's image is also compact and thus achieves its supremum and infimum. On unbounded domains, such as the real line, continuous functions may not attain extrema; for instance, f(x) = x has no global maximum or minimum.^[11]/03:_Functions_from_R_to_R/3.05:_Extreme_Values) Global extrema are denoted by \max_{x \in D} f(x) for the maximum value and \min_{x \in D} f(x) for the minimum value, with the points of attainment specified as \arg\max or \arg\min when needed. For a constant function f(x) = c on any domain D, every point in D is both a global maximum and minimum, as f(x) = c for all x \in D. Local extrema detection often involves derivatives, though full methods are covered elsewhere. In unbounded cases where extrema are not attained, the supremum and infimum provide least upper and greatest lower bounds, respectively, as discussed in ordered sets.^[3]^[9] The study of maxima and minima originated in early calculus, with Pierre de Fermat developing foundational methods around the 1630s through correspondence, using adequacy to identify points where function differences vanish near extrema.^[12]^[13]

Local and Global Extrema

In the context of a function f defined on a domain D \subseteq \mathbb{R}, a point c \in D is a local maximum if there exists an open neighborhood N around c such that f(c) \geq f(x) for all x \in N \cap D. Similarly, c is a local minimum if f(c) \leq f(x) for all x \in N \cap D. These local extrema represent relative peaks or valleys in the function's graph within a restricted vicinity, without regard to the behavior elsewhere in the domain.^[14] A global maximum (or absolute maximum) occurs at a point c \in D where f(c) \geq f(x) for every x \in D, making it the highest value attained over the entire domain; a global minimum is defined analogously with the inequality reversed. A global extremum may coincide with one or more local extrema, but uniqueness is not guaranteed—a function can have multiple global maxima if it is constant on some interval, for instance. On domains with boundaries, such as closed intervals [a, b], global extrema can occur at the endpoints (boundary points) even if they are not local extrema in the interior. For example, the function f(x) = x on [0, 1] has its global maximum at the boundary point x = 1 and global minimum at x = 0.^[14] The existence of global extrema is guaranteed under certain conditions by the Extreme Value Theorem: if f is continuous on a compact set K \subseteq \mathbb{R} (i.e., closed and bounded), then f attains both a global maximum and a global minimum on K. These extrema occur either at critical points in the interior or at boundary points of K. A proof sketch proceeds as follows: since K is compact and f is continuous, the image f(K) is also compact; by the Heine-Borel theorem, f(K) is closed and bounded in \mathbb{R}. Boundedness implies f(K) has a supremum M and infimum m; closedness ensures M, m \in f(K), so there exist points in K where these values are attained. The intermediate value theorem further supports attainment by ensuring the continuous image connects all intermediate values between m and M. Critical points, where the derivative is zero or undefined, are relevant for locating interior extrema, as detailed in analytical techniques for single-variable functions.^[15]^[16]^[17] Global extrema may fail to exist if the domain is not compact or if f is discontinuous. For instance, on the open interval (0, 1], the continuous function f(x) = x has no global minimum, as values approach 0 (the infimum) near x = 0 but never attain it within the domain, despite having a global maximum of 1 at x = 1. Discontinuous functions on compact sets also lack guaranteed extrema; for example, consider the function defined by f(x) = x for $0 \leq x < 1 and f(1) = 0 on [0, 1], which has no maximum since its supremum of 1 is not attained (though the infimum of 0 is attained at x = 0 and x = 1).^[18]^[19] In cases without a maximum, the supremum serves as the least upper bound, a concept explored further in ordered sets.

Methods for Finding Extrema

Analytical Techniques for Single-Variable Functions

Analytical techniques for identifying maxima and minima in single-variable functions rely on the properties of derivatives, assuming the function is differentiable. These methods locate critical points where the derivative is zero or undefined and classify them as local extrema based on the behavior of the function nearby. For functions defined on closed intervals, additional evaluation at boundaries determines global extrema.^[20] Fermat's theorem states that if a function f has a local extremum at an interior point c where f is differentiable, then the first derivative f'(c) = 0. This theorem identifies potential locations for local maxima or minima, known as critical points, but does not distinguish between them.^[20] The first derivative test classifies critical points by examining the sign of f' in intervals around c. If f' changes from negative to positive at c, then f(c) is a local minimum; if from positive to negative, then f(c) is a local maximum. No sign change indicates neither.^[21] The second derivative test provides a quicker classification using the concavity at c. Compute the second derivative, defined as

f''(x) = \frac{d^2 f}{dx^2}.

If f'(c) = 0 and f''(c) > 0, then f(c) is a local minimum; if f''(c) < 0, then a local maximum. If f''(c) = 0, the test is inconclusive.^[22] For inconclusive cases where f''(c) = 0, higher-order derivative tests extend the analysis. Consider the first non-zero higher derivative at c: if it is the nth derivative with f^{(n)}(c) > 0 and n even, then a local minimum; if f^{(n)}(c) < 0 and n even, a local maximum; odd n indicates neither. For example, the third derivative test applies when the second is zero.^[23] To find global extrema on a closed interval [a, b], evaluate f at critical points in the interior and at the endpoints a and b, then compare values; the largest is the global maximum, the smallest the global minimum. This follows from the extreme value theorem for continuous functions on compact sets.^[24] Consider the example f(x) = x^3 - 3x on [-2, 2]. First, find critical points: f'(x) = 3x^2 - 3 = 0 implies x^2 = 1, so x = \pm 1. These are interior points. Apply the first derivative test: f'(x) < 0 for |x| < 1, and f'(x) > 0 for |x| > 1, so x = -1 is a local maximum and x = 1 is a local minimum. Confirm with the second derivative: f''(x) = 6x, so f''(-1) = -6 < 0 (maximum) and f''(1) = 6 > 0 (minimum). Evaluate at endpoints and critical points: f(-2) = -2, f(-1) = 2, f(1) = -2, f(2) = 2. Thus, global maxima are 2 at x = -1 and x = 2, global minima are -2 at x = 1 and x = -2.^[25]

Numerical and Search Methods

Numerical and search methods provide iterative approaches to approximate maxima and minima of functions, particularly when analytical solutions are unavailable or computationally infeasible due to the function's complexity or lack of closed-form derivatives. These techniques rely on repeated evaluations of the function (and possibly its derivatives) to refine estimates within a search interval, often assuming properties like unimodality to ensure progress toward an extremum. They are essential for practical optimization in fields such as engineering and machine learning, where exact methods may fail for non-polynomial or high-dimensional problems. The bisection method, adapted for finding extrema in unimodal functions, operates by successively halving an initial interval [a, b] containing the extremum based on function evaluations at the midpoint and comparisons to endpoints. For a unimodal function f with a maximum, evaluate f at the midpoint m = (a + b)/2; if f(m) > f(b), the maximum lies in [a, m] (set b = m); otherwise, it lies in [m, b] (set a = m). This process reduces the interval length by half each iteration until a tolerance is met, guaranteeing convergence to the global extremum in the interval under unimodality.^[26] Ternary search extends this interval reduction for unimodal functions without requiring derivatives, dividing the interval into thirds rather than halves to more efficiently discard unpromising regions. Given [l, r], compute points m_1 = l + (r - l)/3 and m_2 = r - (r - l)/3; evaluate f(m_1) and f(m_2). For a maximum, if f(m_1) < f(m_2), discard [l, m_1] (set l = m_1); otherwise, discard [m_2, r] (set r = m_2). Repeat until the interval is sufficiently small, achieving a convergence rate where the interval shrinks by a factor of approximately 0.666 per iteration. This method is particularly useful for black-box functions where derivative computation is expensive or impossible.^[27] Newton's method for optimization iteratively refines an estimate x_n of a local extremum using second-order information, updating via the formula

x_{n+1} = x_n - \frac{f'(x_n)}{f''(x_n)},

which approximates the function quadratically near the root of f'(x) = 0. It requires the function to be twice continuously differentiable, with f''(x) nonzero and of consistent sign at the extremum (positive for minima, negative for maxima) to ensure the update moves toward the stationary point. Local quadratic convergence—where the error e_{n+1} \approx C e_n^2 for some constant C—holds if the initial guess is sufficiently close and the Hessian (second derivative) is Lipschitz continuous, making it highly efficient for smooth problems once near the solution. However, it may diverge if started far from the extremum or if f''(x_n) changes sign.^[28] In one dimension, gradient ascent (for maxima) or descent (for minima) updates the estimate in the direction of the steepest increase or decrease, given by x_{n+1} = x_n + \alpha f'(x_n) for ascent, where \alpha > 0 is a step size. This first-order method assumes differentiability and moves along the slope until |f'(x_n)| < \epsilon, converging linearly under Lipschitz continuity of f' but potentially slowly if the function is ill-conditioned. While foundational, it is less common in one dimension than in multivariable settings, where it generalizes to higher-dimensional gradients.^[29] Global optimization poses challenges when functions exhibit multiple local extrema, trapping local methods in suboptimal points; stochastic approaches like simulated annealing address this by mimicking the physical annealing process to explore the search space probabilistically. Starting from an initial solution, it iteratively perturbs the current state to a neighbor, accepting the move if it improves the objective or, with probability \exp(-\Delta E / T) (where \Delta E is the change in function value and T is a decreasing "temperature" parameter), if it worsens it—allowing escape from local minima early on while favoring improvements as T cools. This seminal algorithm converges to the global optimum in probability under suitable cooling schedules, though it requires tuning and may be computationally intensive.^[30] Error analysis in these methods focuses on convergence rates and stopping criteria to balance accuracy and efficiency. Linear convergence, as in bisection or ternary search, reduces error by a constant factor r < 1 per iteration (e_{n+1} \leq r e_n), while quadratic rates in Newton's method square the error near the solution. Common stopping criteria include a tolerance \epsilon on the interval length (|b - a| < \epsilon) for bracketing methods, on the gradient magnitude (|f'(x_n)| < \epsilon) for derivative-based approaches, or on the change in iterates (|x_{n+1} - x_n| < \epsilon), ensuring the approximation meets a predefined precision without excessive computation. These criteria must account for function scaling and noise to avoid premature or infinite termination.^[31]

Extrema in Multivariable Settings

Functions of Several Variables

In the context of functions of several variables, a function f: \mathbb{R}^n \to \mathbb{R} has a local maximum at a point c \in \mathbb{R}^n if there exists a neighborhood U around c such that f(c) \geq f(x) for all x \in U; similarly, c is a local minimum if f(c) \leq f(x) for all x \in U ^[32]. A global maximum (or absolute maximum) occurs if f(c) \geq f(x) for all x in the domain of f, and analogously for a global minimum ^[33]. These definitions extend the one-variable case by considering neighborhoods in the higher-dimensional Euclidean space \mathbb{R}^n, where the neighborhood is typically an open ball centered at c ^[34]. To identify potential local extrema, critical points are defined as points c where the gradient \nabla f(c) = 0, with \nabla f being the vector of partial derivatives \left( \frac{\partial f}{\partial x_1}, \dots, \frac{\partial f}{\partial x_n} \right) ^[32]. At such points, the tangent hyperplane to the graph of f is horizontal, analogous to the first derivative test in one variable, though critical points may also include locations where the gradient is undefined ^[35]. Solving \nabla f(c) = 0 typically involves setting each partial derivative to zero and solving the resulting system of equations ^[33]. To classify these critical points, the second derivative test employs the Hessian matrix H, whose entries are the second partial derivatives: H_{ij} = \frac{\partial^2 f}{\partial x_i \partial x_j} for i, j = 1, \dots, n ^[36]. The definiteness of H at a critical point c is determined by its eigenvalues: if all eigenvalues are positive (positive definite), then c is a local minimum; if all are negative (negative definite), a local maximum; and if eigenvalues have mixed signs (indefinite), a saddle point ^[32]. For functions of two variables, definiteness can also be checked via the determinant of H and the trace, but eigenvalues provide a general approach for n > 2 ^[33]. The extreme value theorem for multivariable functions states that if f: K \to \mathbb{R} is continuous and K \subset \mathbb{R}^n is compact (closed and bounded), then f attains its global maximum and minimum on K ^[37]. This guarantees the existence of extrema on such sets, with candidates found among critical points in the interior and boundary values ^[38]. Consider the function f(x,y) = x^2 + y^2. The gradient is \nabla f = (2x, 2y), so the critical point is at (0,0) where \nabla f = 0 ^[39]. The Hessian is

H = \begin{pmatrix} 2 & 0 \\ 0 & 2 \end{pmatrix},

with eigenvalues 2 and 2, both positive, confirming a local (and global) minimum at (0,0) where f(0,0) = 0 ^[32].

Constrained Optimization Problems

Constrained optimization involves finding the maximum or minimum of an objective function subject to one or more constraints that define a feasible region, often visualized geometrically as extrema occurring on manifolds defined by equality constraints or within bounded feasible regions for inequalities.^[40] This setup contrasts with unconstrained problems by requiring the gradient of the objective to align with the constraint gradients at the optimum, ensuring the search direction respects the boundary.^[41] For equality constraints of the form g(\mathbf{x}) = 0, where f(\mathbf{x}) is the objective function to extremize, the method of Lagrange multipliers introduces scalar multipliers \lambda such that the gradients satisfy \nabla f(\mathbf{x}) = \lambda \nabla g(\mathbf{x}).^[42] This condition arises from forming the Lagrangian \mathcal{L}(\mathbf{x}, \lambda) = f(\mathbf{x}) + \lambda g(\mathbf{x}) and setting its partial derivatives to zero, yielding the system:

\nabla f(\mathbf{x}) + \lambda \nabla g(\mathbf{x}) = 0, \quad g(\mathbf{x}) = 0.

The multiplier \lambda interprets the sensitivity of the objective to perturbations in the constraint, representing the rate of change of the extremum value with respect to the constraint constant.^[41] This approach, pioneered by Joseph-Louis Lagrange in his 1788 work Mécanique Analytique, transforms the constrained problem into solving an unconstrained system in extended variables.^[43] To classify these critical points, second-order conditions employ the bordered Hessian matrix of the Lagrangian, which augments the standard Hessian with rows and columns from the constraint Jacobian. For a single constraint in two variables, the bordered Hessian is

H_b = \begin{vmatrix} 0 & \frac{\partial g}{\partial x} & \frac{\partial g}{\partial y} \\ \frac{\partial g}{\partial x} & \frac{\partial^2 \mathcal{L}}{\partial x^2} & \frac{\partial^2 \mathcal{L}}{\partial x \partial y} \\ \frac{\partial g}{\partial y} & \frac{\partial^2 \mathcal{L}}{\partial y \partial x} & \frac{\partial^2 \mathcal{L}}{\partial y^2} \end{vmatrix}.

A local minimum requires the determinant of this matrix to be positive at the critical point, while a local maximum requires it to be negative; the sign determines the nature based on the leading principal minors.^[44] A representative example is minimizing f(x,y) = x^2 + y^2 subject to x + y = 1, which finds the point on the line closest to the origin. The Lagrangian is \mathcal{L}(x,y,\lambda) = x^2 + y^2 + \lambda (1 - x - y), leading to equations $2x - \lambda = 0, $2y - \lambda = 0, and x + y = 1. Solving yields x = y = 0.5, \lambda = 1, with minimum value $0.5.^[45] For inequality constraints g_i(\mathbf{x}) \leq 0, the Karush-Kuhn-Tucker (KKT) conditions generalize Lagrange multipliers, formulated by William Karush in 1939 and independently by Harold W. Kuhn and Albert W. Tucker in 1951. At a local optimum \mathbf{x}^*, there exist multipliers \lambda_i \geq 0 such that stationarity holds: \nabla f(\mathbf{x}^*) + \sum \lambda_i \nabla g_i(\mathbf{x}^*) = 0; primal feasibility: g_i(\mathbf{x}^*) \leq 0; dual feasibility: \lambda_i \geq 0; and complementarity: \lambda_i g_i(\mathbf{x}^*) = 0. Active constraints (where g_i(\mathbf{x}^*) = 0) receive positive multipliers, while inactive ones have \lambda_i = 0, effectively reducing to equality cases for the binding set.^[46]

Extrema in Abstract and Advanced Contexts

Ordered Sets and Supremum/Infimum

In a partially ordered set (poset), which is a set equipped with a reflexive, antisymmetric, and transitive binary relation ≤, the maximum of a nonempty subset S is an element m ∈ S such that m ≥ s for all s ∈ S. This element m dominates every other member of S under the order relation. If such an m exists, it is unique, as the antisymmetry of ≤ ensures no two distinct maxima can coexist.^[47] The supremum of S, often denoted \sup S, generalizes the notion of a maximum by allowing it to lie outside S. It is defined as the least upper bound: an element u (in the ambient poset) such that u ≥ s for all s ∈ S, and for any other upper bound v, u ≤ v. Dually, the infimum \inf S is the greatest lower bound: an element l such that l ≤ s for all s ∈ S, and for any other lower bound w, w ≤ l. These concepts extend maxima and minima to settings where no element in S may achieve the bound, but a tight enclosure exists in the larger structure.^[48]^[49] The supremum coincides with the maximum precisely when \sup S ∈ S; in this case, \sup S serves as both the least upper bound and the largest element within S. However, a supremum may exist without being attainable in S. For instance, consider the poset of real numbers ℝ with the standard order ≤. The set S = (0, 1) = { x \in \mathbb{R} \mid 0 < x < 1 } has \sup S = 1, since 1 is an upper bound and the smallest such, yet 1 \notin S. Similarly, \inf S = 0 \notin S. This distinction highlights how suprema provide bounds even in "open" or unbounded subsets.^[48]^[49] A foundational property ensuring the existence of suprema in ℝ is its completeness: every nonempty subset of ℝ that is bounded above has a least upper bound in ℝ. This least upper bound axiom underpins the construction of ℝ from the rationals and guarantees that limits, integrals, and other analytic concepts behave consistently without "gaps." For the dual, every nonempty subset bounded below has a greatest lower bound. In contrast, the rational numbers ℚ lack this completeness, as the set { q \in \mathbb{Q} \mid q^2 < 2 } is bounded above but has no least upper bound in ℚ.^[50] In lattice theory, a lattice is a poset where every finite nonempty subset has both a supremum (join) and infimum (meet). A complete lattice extends this to arbitrary subsets: every subset, finite or infinite, possesses a supremum and infimum within the poset. Complete lattices form a rich structure for abstract algebra and order theory, with the empty set's supremum being the bottom element (least element of the poset) and its infimum the top element (greatest element). Examples include the power set of any set under inclusion, where suprema are unions and infima intersections.^[51] Zorn's lemma provides a tool for establishing the existence of maxima (or maximal elements) in certain posets without explicit construction. It asserts that if every chain—a totally ordered subset—in a nonempty poset has an upper bound within the poset, then the poset contains at least one maximal element, defined as an m such that no element strictly exceeds m. This lemma, equivalent to the axiom of choice, is particularly useful for proving existence in infinite posets, such as bases in vector spaces or algebraic closures, by reducing to chain conditions rather than direct enumeration.^[52]^[53]

Functionals and Variational Calculus

In the calculus of variations, the study of maxima and minima extends to functionals, which are scalar-valued mappings defined on infinite-dimensional spaces of functions. A fundamental example is the integral functional

J = \int_a^b L(x, y(x), y'(x)) \, dx,

where y: [a, b] \to \mathbb{R} belongs to an appropriate function space (such as continuously differentiable functions), and L denotes the Lagrangian, a given smooth function of its arguments.^[54] The goal is to identify functions y that extremize J, analogous to finding points that extremize finite-dimensional functions.^[54] Stationary functions, potential candidates for extrema, satisfy the Euler-Lagrange equation, a necessary condition derived by setting the first variation \delta J = 0:

\frac{d}{dx} \left( \frac{\partial L}{\partial y'} \right) - \frac{\partial L}{\partial y} = 0.

This ordinary differential equation, first formulated by Leonhard Euler in his 1744 treatise Methodus inveniendi lineas curvas maximi minimive proprietate gaudentes, governs the extremal curves or paths.^[55] Joseph-Louis Lagrange later provided a more general and systematic derivation in his 1760 essay Essai d'une nouvelle méthode pour déterminer les maxima et les minima des formules intégrales indéfinies, emphasizing variational principles. Solutions to the Euler-Lagrange equation yield extremals, but further analysis is required to classify them as maxima or minima.^[54] To distinguish local minima from maxima or saddle points, the second variation \delta^2 J of the functional is analyzed. For a variation \eta with \eta(a) = \eta(b) = 0,

\delta^2 J[\eta] = \int_a^b \left( \frac{\partial^2 L}{\partial y^2} \eta^2 + 2 \frac{\partial^2 L}{\partial y \partial y'} \eta \eta' + \frac{\partial^2 L}{\partial (y')^2} (\eta')^2 \right) dx.

If \delta^2 J[\eta] > 0 for all admissible nonzero \eta, the extremal corresponds to a local minimum; the condition \delta^2 J[\eta] < 0 indicates a maximum. This test parallels the Hessian determinant criterion in multivariable calculus and requires solving an associated Jacobi equation to check for conjugate points.^[54] Boundary conditions specify the domain of admissible functions and affect the form of the Euler-Lagrange equation at endpoints. In problems with fixed endpoints, y(a) = y_a and y(b) = y_b are prescribed, ensuring variations vanish there. For free endpoints (natural boundary conditions), the transversality requirement \frac{\partial L}{\partial y'} \big|_{x=a \text{ or } b} = 0 holds, allowing the functional to extremize without endpoint constraints. These conditions ensure well-posedness and were integral to the foundational developments by Euler and Lagrange.^[54] A seminal application is the brachistochrone problem, posed by Johann Bernoulli in 1696, which minimizes the descent time for a particle sliding under gravity from point (0, 0) to (x_f, -y_f) with y_f > 0. The time functional is

J = \int_0^{x_f} \frac{\sqrt{1 + (y'(x))^2}}{\sqrt{2 g y(x)}} \, dx,

where g is gravitational acceleration and L(x, y, y') = \frac{\sqrt{1 + (y')^2}}{\sqrt{2 g y}} (independent of x). Since L does not depend explicitly on x, the Euler-Lagrange equation simplifies via the Beltrami identity to L - y' \frac{\partial L}{\partial y'} = C, yielding y' = \sqrt{\frac{k - y}{y}} for constant k. Integrating gives the parametric cycloid solution x = r(\theta - \sin \theta), y = r(1 - \cos \theta), where r = y_f / 2 fits the endpoints; this curve outperforms the straight line, illustrating a true minimum. Euler solved this using his variational methods, confirming the cycloid as the tautochrone and brachistochrone.^[56]^[55] The field originated with Euler's 1744 work, which introduced the differential equation for general variational problems, building on earlier isoperimetric queries by Bernoulli and others.^[55] Lagrange's 1760 contribution formalized the delta-method and extended it to mechanics, establishing variational calculus as a cornerstone for optimization in continuous systems.

Specialized Concepts and Applications

Argument of the Maximum

In mathematics, the argument of the maximum, denoted as \arg\max, refers to the set of points in the domain where a function attains its global maximum value. For a real-valued function f defined on a set S, this is formally defined as \arg\max_{x \in S} f(x) = \{ x \in S : f(x) = \max_{y \in S} f(y) \}, which collects all elements of S that achieve the supremum of f over S.^[57] Similarly, the argument of the minimum, or \arg\min, is defined as \arg\min_{x \in S} f(x) = \{ x \in S : f(x) = \min_{y \in S} f(y) \}, capturing the points where the global minimum is reached.^[57] The notation \arg\max_x f(x) is commonly used to emphasize the variable over which the maximization occurs, and it may yield a singleton set if the maximum is unique or a larger set otherwise. For instance, consider the quadratic function f(x) = -x^2 on the real numbers; here, \arg\max_x (-x^2) = \{0\}, as the maximum value of 0 is attained solely at x = 0.^[57] In cases of non-uniqueness, such as a constant function f(x) = c for all x in the domain S, the argmax is the entire set S, since every point achieves the maximum value c.^[57] Uniqueness of the argmax holds under certain conditions on the function, such as strict concavity over a convex domain. If f is strictly concave on a convex set D, then f has at most one point satisfying the first-order necessary conditions for maximization, making that point the unique global maximizer.^[58] This property ensures the argmax is a singleton, facilitating precise identification in optimization problems. In optimization and decision theory, the argmax plays a central role by selecting optimal inputs that maximize an objective, such as expected utility. A rational agent chooses an action a via a = \arg\max_a \mathbb{E}[U(a \mid e)], where \mathbb{E}[U(a \mid e)] is the expected utility given evidence e, thereby identifying the decision that yields the highest anticipated benefit.^[59]

Illustrative Examples

In single-variable calculus, the sine function provides a classic illustration of maxima and minima on a closed interval. Consider f(x) = \sin(x) over [0, 2\pi]. This function achieves a global maximum of 1 at x = \pi/2 and a global minimum of -1 at x = 3\pi/2, with these points identified as critical points where the derivative vanishes.^[60]^[61] For multivariable functions, extrema often occur on boundaries of the domain. The function f(x,y) = xy on the closed unit disk x^2 + y^2 \leq 1 has its maximum value of $1/2 at boundary points such as (\sqrt{2}/2, \sqrt{2}/2), found using Lagrange multipliers on the constraint x^2 + y^2 = 1, while the interior critical point at (0,0) yields 0.^[62] In constrained optimization, linear programming exemplifies how maxima arise at geometric vertices. To maximize c \cdot x subject to Ax \leq b and x \geq 0, the simplex method evaluates the objective at feasible region vertices, as the maximum of a linear function over a polyhedron occurs at a corner point.^[63] Set theory highlights distinctions between maxima/minima and their bounds. The set \{1/n \mid n \in \mathbb{N}, n \geq 1\} has a maximum of 1 (achieved at n=1) but no minimum, with infimum 0 not attained; in contrast, the interval (0,1] has a maximum of 1 (achieved) but no minimum, with infimum 0.^[64] Variational calculus applies minima to functionals, such as finding the shortest path between two points as the minimizer of the arc length integral, which yields a straight line.^[65] In discrete settings, the maximum in a finite set, such as an array of numbers, is the largest element, computable by sequential comparison, and can approximate continuous maxima in sampled data.^[66] In modern applications like neural networks, training seeks minima of a loss function measuring prediction error, where local minima represent suboptimal parameter sets but global minima yield effective models.^[67]

References

[1]
Extrema of a Function - Department of Mathematics at UTSA
Nov 6, 2021 · As defined in set theory, the maximum and minimum of a set are the greatest and least elements in the set, respectively. Unbounded infinite sets ...
[2]
Maxima and Minima - Department of Mathematics at UTSA
Oct 28, 2021 · Maxima and minima are points where a function reaches a highest or lowest value, respectively. There are two kinds of extrema (a word meaning maximum or ...
[3]
Calculus I - Minimum and Maximum Values
Nov 16, 2022 · In this section we define absolute (or global) minimum and maximum values of a function and relative (or local) minimum and maximum values ...
[4]
5.1 Maxima and Minima
A local maximum point on a function is a point $(x,y)$ on the graph of the function whose $y$ coordinate is larger than all other $y$ coordinates on the ...
[5]
2.9 Maximum and Minimum Values
Theorem 2.9.2 tells us that every local maximum or minimum (in the interior of the domain of a function whose partial derivatives exist) is a critical point.
[6]
Maxima and minima - Ximera - The Ohio State University
To find local extrema, we find the critical points of and determine which correspond to local maxima, local minima, or neither.<|control11|><|separator|>
[7]
[PDF] CHAPTER 6 - Max, Min, Sup, Inf - Purdue Math
An upper bound which actually belongs to the set is called a maximum. Proving that a certain number M is the LUB of a set S is often done in two steps: (1) ...
[8]
[PDF] MATH 12002 - CALCULUS I §3.1: Maximum and Minimum Values
The y value f (a) is the absolute maximum value of f . We say that f has an absolute minimum at x = a if f (a) ⩽ f (x) for all x in the domain of f . The y ...
[9]
Maxima and Minima
A function f has a local maximum at c if there exists an open interval I containing c such that I is contained in the domain of f and f ( c ) ≥ f ( x ) for all ...
[10]
Maximums and minimums - Ximera - The Ohio State University
Global maxima (minima) refer to the largest (smallest) function value that a particular function achieves globally on its entire domain. On the other hand, ...
[11]
The Extreme Value Theorem
The Extreme Value Theorem: If f is continuous on a closed interval [a,b], then f attains both an absolute maximum value and an absolute minimum value at some ...Missing: source | Show results with:source
[12]
Pierre Fermat (1601 - 1665) - Biography - MacTutor
Descartes attacked Fermat's method of maxima, minima and tangents. Roberval and Étienne Pascal became involved in the argument and eventually so did ...
[13]
[PDF] Fermat's Method for Finding Maxima and Minima
May 17, 2022 · He would receive, copy, record, and distribute materials as they worked. Fermat and Mersenne began a correspondence in 1636. 3. Page 5. Task ...
[14]
4.3: Maxima and Minima - Mathematics LibreTexts
Jan 17, 2025 · A function may have both an absolute maximum and an absolute minimum, have just one absolute extremum, or have no absolute maximum or absolute ...
[15]
3.1: Extreme Values - Mathematics LibreTexts
Dec 20, 2020 · Local and Relative Extrema. The terms local minimum and local maximum are often used as synonyms for relative minimum and relative maximum.<|control11|><|separator|>
[16]
[PDF] 2.4 The Extreme Value Theorem and Some of its Consequences
The Extreme Value Theorem deals with the question of when we can be sure that for a given function f ,. (1) the values f (x) don't get too big or too small,.
[17]
Extreme value theorem using continuous image of compact is ...
Jul 30, 2016 · Proof attempt: Since [a,b] is compact, f continuous, therefore f([a,b]) is compact. By Heine Borel, f([a,b]) is closed and bounded.real analysis - Question about a proof of the extreme value theorem.Is this not a proof of the Extreme Value theorem?More results from math.stackexchange.com
[18]
4.1: Maximum and Minimum Values - Mathematics LibreTexts
Nov 9, 2020 · We say that $1$ is the absolute minimum of $f(x)=x^2+1$ and it occurs at $x=0$. We say that $f(x)=x^2+1$ does not have an absolute ...
[19]
4.1: Extreme Values of Functions - Mathematics LibreTexts
Apr 27, 2019 · Locating Absolute Extrema. The extreme value theorem states that a continuous function over a closed, bounded interval has an absolute maximum ...
[20]
Second Derivative Test for Local Extrema - CliffsNotes
The second derivative may be used to determine local extrema of a function under certain conditions. If a function has a critical point for which f′(x) =
[21]
Higher-Order Derivative Test - eMathHelp
Find and classify extrema of the function f ( x ) = e x + e − x + 2 cos ⁡ ( x ) {f{{\left({x}\right)}}}={{e}}^{{x}}+{{e}}^{{-{x}}}+{2}{\cos{{\left({x}\right)}}} ...
[22]
https://www.cliffsnotes.com/study-guides/calculus/calculus/applications-of-the-derivative/second-derivative-test-for-local-extrema
[23]
[PDF] Nonlinear Programming - MIT
Any concave function, of course, is unimodal. Figure 13.18 illustrates Bolzano's method, without derivative evaluations, for a unimodal function. The ...
[24]
Ternary Search - Algorithms for Competitive Programming
Mar 24, 2025 · By unimodal function, we mean one of two behaviors of the function: The function strictly increases first, reaches a maximum (at a single ...
[25]
[PDF] Numerical Optimization - UCI Mathematics
This is a book for people interested in solving optimization problems. Because of the wide. (and growing) use of optimization in science, engineering ...
[26]
[PDF] 3.1 Gradient descent in one dimension
And we present an important method known as stochastic gradient descent (Section 3.4), which is especially useful when datasets are too large for descent in a ...
[27]
[PDF] Optimization by Simulated Annealing S. Kirkpatrick - Stat@Duke
Nov 5, 2007 · Optimization by Simulated Annealing. S. Kirkpatrick; C. D. Gelatt; M. P. Vecchi. Science, New Series, Vol. 220, No. 4598. (May 13, 1983), pp.
[28]
[PDF] Numerical Optimization
Finding an analytic minimum of a simple univariate sometimes is easy given the usual f.o.c. and s.o.c.. Example: Quadratic Function. 𝑈 𝑥. 5𝑥. 4𝑥 2. 0. 10.
[29]
A Brief Introduction to Manifold Optimization
Apr 4, 2020 · By utilizing the geometry of manifold, a large class of constrained optimization problems can be viewed as unconstrained optimization problems ...
[30]
Constrained Optimization and Lagrange Multiplier Methods - MIT
This reference textbook, first published in 1982 by Academic Press, is a comprehensive treatment of some of the most widely used constrained optimization ...Missing: authoritative | Show results with:authoritative
[31]
Lagrange multipliers for constrained optimization
The Lagrange multiplier equations are guaranteed to be satisfied at the points where the min and max occur.
[32]
[PDF] From Lagrange Multipliers to Optimal Control and PDE Constraints
from 1788. ▷ In the second edition from 1811, Lagrange baptizes the method. Page 15. Lagrange. Multipliers. Martin J. Gander. Mechanics. Archimedes. Varignon.
[33]
[PDF] second derivative test for constrained extrema
This handout presents the second derivative test for a local extrema of a Lagrange multiplier problem. The Section 1 presents a geometric motivation for the ...
[34]
Calculus III - Lagrange Multipliers - Pauls Online Math Notes
Mar 31, 2025 · In this section we'll see discuss how to use the method of Lagrange Multipliers to find the absolute minimums and maximums of functions of ...Missing: x² + y²
[35]
[PDF] Karush-Kuhn-Tucker Conditions
Convex problem, no inequality constraints, so by KKT conditions: x is a solution if and only if. Q AT. A. 0 x u. = −c. 0 for some u. Linear system combines ...
[36]
[PDF] Notes on Ordered Sets
Sep 22, 2009 · Definition 2.4 A partially ordered set (S, ) is called a lattice if every finite subset E ⊆ S, including ∅ ⊆ S, has supremum and infimum.
[37]
[PDF] The supremum and infimum - UC Davis Math
A set is bounded if it is bounded both from above and below. The supremum of a set is its least upper bound and the infimum is its greatest upper bound.
[38]
[PDF] Bounds, supremums, infimums, maximums, and mini- mums
A maximum is a special type of supremum: it belongs to the set. Definition 5 The maximum of a set A is an element m ∈ A such that m is also the supremum.
[39]
[PDF] Completeness of R Math 122 Calculus III
Each nonempty set of real numbers that has an upper bound has a least upper bound. Theorem 8. Each nonempty set of real numbers that has a lower bound has a ...
[40]
Complete Lattice -- from Wolfram MathWorld
A partially ordered set (or ordered set or poset for short) (L,<=) is called a complete lattice if every subset M of L has a least upper bound.
[41]
[PDF] Zorn's lemma and some applications - Keith Conrad
A maximal element m of a partially ordered set S is an element that is not below each element to which it is comparable: for all s ∈ S to which m is comparable, ...
[42]
[PDF] Zorn's lemma - Cornell Mathematics
Zorn's lemma states that if a poset has every chain with an upper bound, then it has at least one maximal element.
[43]
[PDF] The Calculus of Variations - College of Science and Engineering
Jan 7, 2022 · The initial major developments appeared in the work of Euler, Lagrange, and Legendre. In the nineteenth century, Hamilton, Jacobi,. Dirichlet ...
[44]
Methodus inveniendi lineas curvas maximi minimive proprietate ...
May 25, 2018 · 1744. Topics: Calculus of variations. Publisher: Lausannæ ; Genevæ ... PDF download · download 1 file · SEGMENT DATA download · download 1 file.
[45]
Brachistochrone problem - MacTutor History of Mathematics
Galileo in 1638 had studied the problem in his famous work Discourse on two new sciences. His version of the problem was first to find the straight line from a ...Missing: source | Show results with:source
[46]
arg min and arg max - PlanetMath
Mar 22, 2013 · Author, kshum (5987). Entry type, Definition. Classification, msc 00A05. Defines, argmin argmax. Generated on Tue Feb 6 22:17:26 2018 by LaTeXML ...<|control11|><|separator|>
[47]
[PDF] Optimization 1 Some Concepts and Terms
If f is strictly concave on D, then at most one point can satisfy its FONC, and it is the unique and strict global maximizer. 2.3.4 Procedure for Finding Maxima.
[48]
[PDF] Making Simple Decisions: Utility Theory
Nov 3, 2011 · action = argmax a. EU(a |e). Evidence observations. Page 4. 4. Why Expected Utility? • Why not sum of cubes? • Or minimize worst-case loss? • ...
[49]
[PDF] Lecture Notes -- Trigonometric Functions - Joseph M. Mahaffy
Sine and Cosine. Properties of Sine and Cosine. Identities. Sine and Cosine. 4. Sine - Maximum and Minimum. The sine function has its maximum value at π. 2 with.
[50]
4-06 Graphs of Sine and Cosine
If the data starts at 0, use y = a sin(bx – c) + d. If the data starts at a maximum or minimum, use y = a cos(bx – c) + d. Otherwise, either sine or ...
[51]
[PDF] Lagrange's Method Of Multipliers
Multi-Lagrange multiplier method to improve the region-specific GRACE estimation of water storage change in eleven sub-basins of the Yangtze River -.<|separator|>
[52]
[PDF] Linear Programming I - CMU School of Computer Science
The simplex algorithm starts at a vertex and then iteratively moves to neighboring vertices with better objective value until it arrives at the maximum. How ...
[53]
[PDF] Analysis – Exercise Problems and Solutions
Find the supremum and infimum of set {1/n | n ∈ N}. Do they belong to this set? 12. Let A, B ⊂ R and C = {x + y | x ∈ A, y ∈ B}. How are the numbers inf A, ...
[54]
Calculus of Variations - Richard Fitzpatrick
Calculus of Variations. It is a well-known fact, first enunciated by Archimedes, that the shortest distance between two points in a plane is a straight-line.
[55]
CS441
Any ideas?? Algorithm 1: Finding the maximum element in a finite sequence: procedure max(a1, a2, a3 ...
[56]
Neural Networks | CAIS++
Any deep neural network is guaranteed to have a very large number of local minima. Take a look at the below surface. This surface has two minima: one local, ...