Fact-checked by Grok 2 weeks ago

Mathematical optimization

Mathematical optimization, also known as mathematical programming, is a subfield of dedicated to selecting the best element from a set of available alternatives by systematically finding values of decision variables that minimize or maximize a real-valued objective function while satisfying a set of constraints. These problems can be formulated as \min f_0(x) subject to constraints f_i(x) \leq b_i and constraints, where x \in \mathbb{R}^n represents the variables, f_0 is the objective function, and the f_i define the . The field encompasses diverse problem classes, including (where the objective and constraints are linear), nonlinear programming (with nonlinear functions), integer programming (requiring integer variables), and convex optimization (where the objective and feasible set are convex, enabling efficient global solutions). Historical development traces back to the 1700s with theories for unconstrained optimization by Fermat, Newton, and Euler, followed by Lagrange's 1797 work on equality-constrained problems; modern milestones include Dantzig's 1947 simplex method for linear programming and Karmarkar's 1984 interior-point algorithms, which spurred polynomial-time solutions for various classes. Applications of mathematical optimization span numerous domains, including engineering design (e.g., minimizing material weight in structures), (e.g., and equilibrium models), (e.g., via Markowitz's mean-variance framework), transportation (e.g., minimizing shipping costs), (e.g., training support vector machines), and (e.g., scheduling and ). These techniques leverage algorithms like for unconstrained problems and specialized solvers for constrained ones, often implemented in software such as for modeling complex scenarios.

Fundamental Concepts

Optimization Problems

Mathematical optimization seeks to determine the values of decision variables that either minimize or maximize a given function while satisfying a set of constraints. The general formulation of such a problem is to minimize (or equivalently, maximize) an function f: \mathbb{R}^n \to \mathbb{R} over decision variables x \in \mathbb{R}^n, subject to constraints g_i(x) \leq 0 for i = 1, \dots, m and equality constraints h_j(x) = 0 for j = 1, \dots, p. Here, the decision variables x represent the unknowns to be determined, and the constraints define the allowable values for x. Optimization problems are classified based on the nature of the decision variables. In , the variables x take real values, allowing for solutions in \mathbb{R}^n. , in contrast, requires some or all variables to belong to a set, such as integers, leading to a finite (though potentially large) number of possible solutions. Mixed-integer problems combine both continuous and discrete variables, blending the characteristics of the two categories. Standard forms of optimization problems vary by the types of constraints imposed. An unconstrained problem simply minimizes f(x) without any restrictions on x, such as \min_x x^2, where the objective is a . Equality-constrained problems include only constraints, like \min_x f(x) subject to h(x) = 0. Inequality-constrained problems incorporate constraints, for example, \min_x x subject to x \geq 1, which limits the decision to non-negative values. These forms provide a structured way to express diverse applications in fields like and . The , also known as the , consists of all decision variables x that satisfy the given constraints, forming the domain over which the objective function is evaluated. In continuous problems, this set is often a or other geometric region in \mathbb{R}^n; in discrete cases, it is a finite collection of points. The optimal solution, if it exists, lies within this feasible set and may be local or global, depending on the problem's structure.

Notation and Terminology

Mathematical optimization problems are typically formulated in a standard minimization form, given by \begin{align*} \min_{x \in X} \quad & f(x) \\ \text{subject to} \quad & g_i(x) \leq 0, \quad i = 1, \dots, m, \\ & h_j(x) = 0, \quad j = 1, \dots, p, \end{align*} where x \in \mathbb{R}^n represents the decision variables, f: \mathbb{R}^n \to \mathbb{R} is the objective function to be minimized, g_i: \mathbb{R}^n \to \mathbb{R} are constraint functions, and h_j: \mathbb{R}^n \to \mathbb{R} are constraint functions. The set X \subseteq \mathbb{R}^n denotes the , often \mathbb{R}^n itself. Maximization problems are equivalently handled by minimizing the negative of the objective, i.e., \min_x -f(x) for \max_x f(x). The feasible set, denoted \Omega = \{ x \in X \mid g_i(x) \leq 0, \, i=1,\dots,m; \, h_j(x) = 0, \, j=1,\dots,p \}, consists of all points satisfying the constraints. Standard notation for optimal points and values includes \arg\min_x f(x), the set of points x^* achieving the minimum; \inf f(x), the greatest lower bound of f over the domain (equal to the minimum if attained); and \sup f(x), the least upper bound (used analogously for maximization). If \inf f(x) = -\infty over \Omega, the problem is unbounded below; otherwise, it is bounded. Key terms include convexity: a function f is convex if f(\lambda x + (1-\lambda) y) \leq \lambda f(x) + (1-\lambda) f(y) for \lambda \in [0,1], ensuring any local minimum is global in problems; functions satisfy the reverse inequality and are relevant for maximization objectives. For constrained , provides a ensuring : there exists a strictly feasible point \tilde{x} \in \Omega such that g_i(\tilde{x}) < 0 for all i with convex g_i, and equalities hold.

Local and Global Optima

In mathematical optimization, a global minimum of an objective function f: \Omega \to \mathbb{R} over a feasible set \Omega is a point x^* \in \Omega such that f(x^*) \leq f(x) for all x \in \Omega. Similarly, a global maximum satisfies f(x^*) \geq f(x) for all x \in \Omega. These represent the overall best and worst values of f across the entire domain. In contrast, a local minimum is a point x^* \in \Omega for which there exists a neighborhood N around x^* such that f(x^*) \leq f(x) for all x \in N \cap \Omega. Local maxima are defined analogously. Local optima may not coincide with global ones, particularly in nonconvex problems where multiple basins of attraction exist. Optima are further classified as strict or non-strict. A strict local minimum satisfies f(x^*) < f(x) for all x \in N \cap \Omega \setminus \{x^*\}, implying uniqueness within the neighborhood. Non-strict optima allow equality with nearby points, such as along a flat region. Saddle points are critical points that are neither local minima nor maxima, where the function increases in some directions and decreases in others. Critical points form the foundation for identifying potential optima. In unconstrained optimization over an open set, a critical point x^* satisfies \nabla f(x^*) = 0, where \nabla f is the gradient. For constrained problems, critical points are those satisfying the , which generalize stationarity to include for equality and inequality constraints. These conditions, introduced by , require primal feasibility, dual feasibility, complementary slackness, and stationarity of the Lagrangian. Consider the quadratic function f(x) = x^2 over \mathbb{R}. Here, x^* = [0](/page/0) is both a strict local and global minimum, with f(0) = 0 and f(x) > 0 for x \neq 0. For a multimodal example, f(x) = \sin(x) + \frac{x^2}{10} over \mathbb{R} has multiple local minima near points where \cos(x) + \frac{x}{5} = 0, but the global minimum occurs at the leftmost such point, approximately x \approx -1.2, due to the dominating quadratic term. Optima may be attainable or unattainable. An attainable optimum is achieved at some x^* \in \Omega, so the minimum value equals the infimum \inf_{x \in \Omega} f(x). Unattainable optima occur when the infimum is not achieved, such as for f(x) = e^x over \mathbb{R}, where \inf f(x) = 0 but no x realizes it. This distinction arises in open or unbounded feasible sets, impacting convergence and problem well-posedness.

Historical Development

Early History

The origins of mathematical optimization trace back to ancient civilizations, where geometric and practical problems implicitly involved finding extrema or efficient allocations. In , Euclid's Elements (circa 300 BCE) addressed early forms of optimization through geometric constructions that maximized or minimized quantities. For instance, Book III, Proposition 15, demonstrates that among straight lines in a circle, the is the longest, and chords nearer are greater than those farther away, establishing a foundational result on maximizing line segments within a bounded region. Similarly, in ancient China, The Nine Chapters on the Mathematical Art (circa 200 BCE) included problems on resource allocation, such as fair division of land and labor in Chapter 6 ("Fair Taxes") and solving systems of equations in Chapter 7 ("Excess and Deficit") to optimize distributions under constraints. In the 17th century, optimization ideas emerged in the context of physics and . Pierre de Fermat's principle of least time, proposed in 1662, stated that light travels between two points along the path that minimizes travel time, unifying laws of reflection and refraction under a variational and foreshadowing modern optimization principles. Concurrently, developed his method for root-finding in the 1670s, an iterative technique to approximate solutions to equations, which later found applications in optimization by solving for critical points where vanish. The marked a pivotal advancement with the formalization of the by Leonhard Euler and . Euler's 1744 treatise Methodus inveniendi lineas curvas maximi minimive proprietate gaudentes introduced systematic methods to find curves extremizing functionals, including solutions to isoperimetric problems that maximize area for a fixed perimeter. Lagrange extended this in his 1788 Mécanique Analytique, developing the Euler-Lagrange equations for and applying them to , thereby laying the groundwork for . By the early 19th century, precursors to appeared in Joseph Fourier's 1823 work on solving systems of linear inequalities, providing an elimination method to determine feasible regions and optimize linear objectives over polyhedra. Additionally, the method of , independently developed by in 1805 and (who claimed prior invention around 1795), minimized the sum of squared residuals for data fitting in astronomy, establishing a cornerstone of and parameter estimation in optimization.

Modern Foundations

The modern foundations of mathematical optimization were laid in the 1930s and 1940s through pioneering work on for problems. developed the foundational ideas of in his 1939 monograph, formulating methods to optimize under linear constraints, which anticipated key concepts like shadow prices. Independently, advanced similar ideas around 1942 in a memorandum on activity analysis, later elaborated in his 1951 edited volume, applying linear models to economic allocation during wartime logistics efforts. These contributions established as a tool for efficient resource use, earning and the 1975 in Economic Sciences. then introduced the simplex method in 1947, providing an efficient algorithmic framework for solving linear programs by traversing vertices of the feasible , which became the cornerstone of practical implementation. Following , the field experienced a surge through the (OR) movement, driven by military applications in and planning. This era saw the integration of optimization with emerging computational tools, fostering interdisciplinary growth. John von Neumann's 1928 minimax theorem, which guarantees the existence of optimal mixed strategies in zero-sum games, found direct applications in the to economic and strategic decision-making, notably in his 1944 collaboration on , linking adversarial optimization to broader economic models. The 1950s marked the emergence of nonlinear programming, extending optimization to non-linear objectives and constraints. The Karush-Kuhn-Tucker (KKT) conditions, first derived by William Karush in his 1939 master's thesis and independently published by Harold Kuhn and Albert Tucker in 1951, provide necessary optimality criteria for constrained nonlinear problems under constraint qualifications, generalizing Lagrange multipliers to inequalities. This period also saw Richard Bellman's introduction of dynamic programming in 1957, a recursive approach for solving multistage decision problems by breaking them into subproblems, revolutionizing sequential optimization in and . A pivotal event was the inaugural International Symposium on Mathematical Programming in 1949, which unified researchers and solidified the field as a distinct discipline. Further advancements included Anthony Fiacco and Garth McCormick's 1968 development of sequential unconstrained minimization techniques (SUMT), which transform constrained nonlinear programs into sequences of unconstrained problems via penalty functions, laying groundwork for modern interior-point and methods.

Contemporary Advances

In the and , interior-point methods marked a pivotal advancement in , with Narendra Karmarkar's 1984 introducing a polynomial-time approach that efficiently navigated the interior of the , surpassing the limitations of the simplex method for large-scale problems. This innovation spurred widespread adoption in and , enabling solutions to industrial-scale linear programs with improved scalability and theoretical guarantees. By the , extensions of these methods extended to convex quadratic and , facilitating applications in , , and design. The 2010s witnessed the surge of techniques in , driven by the need to handle massive datasets in . variants, such as the Adam optimizer introduced in 2014, combined adaptive learning rates with momentum to accelerate convergence in nonconvex landscapes, becoming a standard for training neural networks on tasks like image recognition and . Concurrently, emerged as a cornerstone for solving optimization challenges, enabling efficient computation of gradients through complex computational graphs via frameworks like Theano (2010) and (2015), which automated reverse-mode differentiation for scalable . These tools addressed the computational bottlenecks in training deep architectures, achieving state-of-the-art performance on benchmarks such as with reduced engineering effort. Integration with artificial intelligence further propelled optimization in the late 2010s, exemplified by () methods starting around 2016, which automated the design of topologies using or evolutionary strategies to optimize performance metrics like accuracy and efficiency. frameworks, such as those employing recurrent networks to explore architectural spaces, yielded models outperforming hand-crafted designs on datasets like , with applications spanning and beyond. Robust optimization, initially formalized by Dimitris Bertsimas and colleagues in the early 2000s for handling parameter uncertainty, matured in the 2020s through data-driven enhancements that incorporated for uncertainty sets, improving decision-making under ambiguity in fields like and energy systems. These advancements, including two-stage sample robust formulations, provided tractable approximations with probabilistic guarantees, demonstrating superior out-of-sample performance compared to methods in high-dimensional settings. In the 2020s, gained traction, with the quantum approximate optimization algorithm (QAOA), proposed in 2014, seeing practical implementations on noisy intermediate-scale quantum devices for combinatorial problems like MaxCut, often extending principles from Grover's search algorithm to achieve speedups in unstructured optimization. Recent extensions, such as , have demonstrated proven speedups over classical methods on small-scale problems with up to a few , such as in robotic manipulator tasks, leveraging for hybrid quantum-classical solvers. Parallel to this, distributed optimization techniques addressed challenges, with algorithms like doubly accelerated methods enabling parallel local computations across clusters to minimize communication overhead, achieving convergence rates competitive with centralized approaches on terabyte-scale datasets in distributed . Contemporary applications increasingly emphasize , particularly in modeling, where optimization integrates with to calibrate parameters in Earth system simulations, such as deriving data-driven closure models for atmospheric dynamics that enhance predictive accuracy for scenarios. These efforts, including multi-objective formulations for emission reduction pathways, have supported policy decisions by optimizing trade-offs in deployment and carbon capture under uncertain projections. methods have evolved alongside these developments, with hybrids providing robust approximations for nonconvex problems where exact solutions remain intractable.

Theoretical Foundations

Feasibility and Existence

In mathematical optimization, the feasible set, denoted as \Omega, consists of all points x in the domain that satisfy the problem's constraints, such as equalities h_i(x) = 0 and inequalities g_j(x) \leq 0. If \Omega is empty, the is infeasible, meaning no solution exists that meets all constraints simultaneously. For instance, consider the linear program \min x subject to x \geq 1 and x \leq 0; the constraints contradict each other, resulting in an empty feasible set. Constraint qualifications, such as the Mangasarian-Fromovitz constraint qualification (MFCQ), ensure that the feasible set possesses desirable geometric properties, like a non-empty interior relative to the active constraints at certain points, which is crucial for the well-posedness of the problem. Introduced by Mangasarian and Fromovitz, MFCQ requires that the gradients of the constraints are linearly independent and that there exists a satisfying strict for the active inequality constraints. This qualification helps in analyzing the structure of \Omega without singularities that could complicate feasibility assessments. Even when \Omega is non-empty, an optimization problem may lack a solution if the objective function f does not attain its infimum over \Omega. The Weierstrass extreme value theorem guarantees existence under suitable conditions: if f: \mathbb{R}^n \to \mathbb{R} is continuous and \Omega is compact (closed and bounded), then f attains both its minimum and maximum on \Omega. \min_{x \in \Omega} f(x) = f(x^*) \quad \text{for some } x^* \in \Omega, where the infimum is achieved. For unbounded feasible sets, such as \Omega = \mathbb{R}^n, compactness fails, but coercivity of f—defined as \lim_{\|x\| \to \infty} f(x) = +\infty—ensures a global minimizer exists, as minimizing sequences remain bounded and thus have convergent subsequences yielding the minimum. Problems can also be unbounded below, where \inf_{x \in \Omega} f(x) = -\infty, precluding a minimum despite a non-empty \Omega. For example, the linear program \min -x subject to x \geq 0 has feasible points but no minimum, as f(x) \to -\infty along the ray x \to +\infty. In such cases, the infimum is not attained, highlighting the distinction between the infimum value and an actual minimizer in the feasible set.

Optimality Conditions

Optimality conditions provide necessary and sufficient criteria for identifying and in mathematical optimization problems, serving as foundational tools for theoretical and algorithmic verification. For unconstrained optimization problems, where the objective is to minimize a f: \mathbb{R}^n \to \mathbb{R}, the first-order necessary for a point x^* to be a minimum is that the vanishes at x^*, i.e., \nabla f(x^*) = 0. This arises from the requirement that the in any feasible direction must be non-negative at a optimum. Under twice differentiability assumptions, the second-order necessary further requires that the H(x^*) = \nabla^2 f(x^*) is positive semi-definite, meaning d^T H(x^*) d \geq 0 for all directions d \in \mathbb{R}^n. Conversely, if H(x^*) is positive definite (d^T H(x^*) d > 0 for all d \neq 0), then x^* is a strict minimum, providing a second-order sufficient . In the presence of equality constraints, consider the problem of minimizing f(x) subject to g_i(x) = 0 for i = 1, \dots, m, where g_i: \mathbb{R}^n \to \mathbb{R} are differentiable. The method of Lagrange multipliers introduces the Lagrangian \mathcal{L}(x, \lambda) = f(x) + \sum_{i=1}^m \lambda_i g_i(x), and the first-order necessary conditions require the existence of multipliers \lambda \in \mathbb{R}^m such that \nabla_x \mathcal{L}(x^*, \lambda) = 0 and \nabla_\lambda \mathcal{L}(x^*, \lambda) = 0, or equivalently, \nabla f(x^*) + \sum_{i=1}^m \lambda_i \nabla g_i(x^*) = 0 and g_i(x^*) = 0 for all i. These conditions ensure that at the optimum, the gradients of the active constraints lie in the span of the objective gradient, aligning the stationarity requirement with the constraint manifold. Second-order conditions extend this by requiring the Hessian of the Lagrangian, projected onto the tangent space of the constraints, to be positive semi-definite for necessity and positive definite for sufficiency. For problems involving inequality constraints, minimize f(x) subject to g_i(x) \leq 0 for i = 1, \dots, m and h_j(x) = 0 for j = 1, \dots, p, the Karush-Kuhn-Tucker (KKT) conditions generalize the Lagrange framework to handle both equalities and inequalities under suitable constraint qualifications, such as . The KKT conditions consist of stationarity, primal feasibility, dual feasibility, and complementary slackness, stated as follows for a local minimum x^* with associated multipliers \lambda \in \mathbb{R}^m_{\geq 0} and \mu \in \mathbb{R}^p: \begin{align*} \nabla f(x^*) + \sum_{i=1}^m \lambda_i \nabla g_i(x^*) + \sum_{j=1}^p \mu_j \nabla h_j(x^*) &= 0, \\ g_i(x^*) &\leq 0, \quad i=1,\dots,m, \\ h_j(x^*) &= 0, \quad j=1,\dots,p, \\ \lambda_i &\geq 0, \quad i=1,\dots,m, \\ \lambda_i g_i(x^*) &= 0, \quad i=1,\dots,m. \end{align*} Here, the multipliers \lambda_i for inequalities are non-negative, reflecting the directional nature of the s, and complementary slackness enforces that \lambda_i > 0 only for active inequalities (g_i(x^*) = 0). These conditions are necessary for optimality under constraint qualifications and sufficient for optimality in problems. In convex optimization, where the objective f and feasible set are convex, any point satisfying the first-order (KKT) conditions is a global optimum, as local minima coincide with global ones due to the absence of local traps in the convex landscape. This property, established through the convexity of sublevel sets and epigraphs, ensures that verification via optimality conditions yields the overall solution without exhaustive search.

Duality and Sensitivity

In mathematical optimization, duality provides a framework for reformulating a primal optimization problem into a dual problem, offering insights into bounds, optimality, and economic interpretations. The Lagrangian dual is constructed for a primal problem of minimizing an objective function subject to inequality and equality constraints. Consider the primal problem: \begin{align*} \min_{x} \quad & f(x) \\ \text{subject to} \quad & g_i(x) \leq 0, \quad i = 1, \dots, m, \\ & h_j(x) = 0, \quad j = 1, \dots, p, \end{align*} where f: \mathbb{R}^n \to \mathbb{R} is the objective, g_i: \mathbb{R}^n \to \mathbb{R} are inequality functions, and h_j: \mathbb{R}^n \to \mathbb{R} are equality functions. The Lagrangian is defined as L(x, \lambda, \mu) = f(x) + \sum_{i=1}^m \lambda_i g_i(x) + \sum_{j=1}^p \mu_j h_j(x), with \lambda \geq 0. The dual function is g(\lambda, \mu) = \inf_x L(x, \lambda, \mu), and the dual problem is \max_{\lambda, \mu} g(\lambda, \mu) subject to \lambda \geq 0. Weak duality states that the optimal value is at least as large as the optimal value, i.e., f^* \geq g^*, for any feasible and points. This holds generally, providing a lower bound on the optimum via solutions. , where f^* = g^* and the is zero, requires additional conditions; for problems satisfying —a strict feasibility point exists for inequalities—the vanishes, enabling exact recovery of the optimum from the . Sensitivity analysis examines how the optimal value and solutions change with perturbations in problem data, such as objective coefficients or right-hand sides of constraints. Shadow prices, given by the optimal dual variables, quantify these changes; for a constraint Ax \geq b, the shadow price y^* satisfies \frac{\partial f^*}{\partial b_k} = y_k^* under strong duality, indicating the marginal improvement in the objective per unit increase in b_k. This follows from the envelope theorem, which states that the derivative of the optimal value function with respect to a parameter equals the partial derivative of the Lagrangian with respect to that parameter, evaluated at the optimum, ignoring indirect effects through the optimizer. A canonical example is duality. The primal is \min c^\top x subject to Ax \geq b, x \geq 0, with optimal value f^*. The is \max b^\top y subject to A^\top y \leq c, y \geq 0, with optimal value g^*. holds (f^* = g^*) without further conditions due to the problem's structure, and the dual variables y^* serve as shadow prices for resource constraints in b. For instance, in , y_k^* represents the value of an additional unit of k. (Note: This is a draft PDF of Bertsimas & Tsitsiklis; official book: Athena Scientific, 1997) Under suitable regularity conditions, such as differentiability of the objective and constraints, the optimal value function is continuous with respect to perturbations, and the set of optimal solutions exhibits . For convex problems with , the optimal solution set is nonempty and compact for nearby perturbations, ensuring . These properties are central to perturbation analysis, linking duality to robustness in applications like and .

Major Subfields

Linear and Convex Optimization

is a fundamental subfield of mathematical optimization that involves minimizing or maximizing a linear objective function subject to linear equality and inequality constraints. The standard form of a linear program is given by \begin{align*} \min &\quad \mathbf{c}^\top \mathbf{x} \\ \text{s.t.} &\quad A\mathbf{x} = \mathbf{b}, \\ &\quad \mathbf{x} \geq \mathbf{0}, \end{align*} where \mathbf{c} \in \mathbb{R}^n, A \in \mathbb{R}^{m \times n}, \mathbf{b} \in \mathbb{R}^m, and \mathbf{x} \in \mathbb{R}^n. This formulation assumes non-negativity constraints, with inequalities convertible to equalities via variables. The simplex method, developed by in 1947, solves linear programs by traversing vertices of the defined by the constraints, starting from an initial and pivoting to improve the objective until optimality. Interior-point methods, which navigate the interior of the using barrier functions to approach the optimum, emerged as alternatives offering better practical performance for large-scale problems. A landmark advancement was Narendra Karmarkar's , which provided the first polynomial-time for , running in O(n^{3.5} L) time where n is the number of variables and L the of the input, sparking widespread development of efficient solvers. Convex optimization generalizes to problems where the objective and constraints are , ensuring that local optima are global and enabling efficient solvability. A set C \subseteq \mathbb{R}^n is if for any \mathbf{x}, \mathbf{y} \in C and \theta \in [0,1], the point \theta \mathbf{x} + (1-\theta) \mathbf{y} \in C. A f: \mathbb{R}^n \to \mathbb{R} is if its epigraph \{(\mathbf{x}, t) \mid f(\mathbf{x}) \leq t\} is a , or equivalently, if f(\theta \mathbf{x} + (1-\theta) \mathbf{y}) \leq \theta f(\mathbf{x}) + (1-\theta) f(\mathbf{y}) for all \mathbf{x}, \mathbf{y} in the and \theta \in [0,1]. The epigraph formulation transforms a problem \min f_0(\mathbf{x}) subject to f_i(\mathbf{x}) \leq 0 into an equivalent form \min t subject to ( \mathbf{x}, t ) \in \text{epi } f_0 and affine constraints, facilitating analysis and solution via linear approximations. Jensen's inequality states that for a convex function f and weights \lambda_i \geq 0 summing to 1, f(\sum_i \lambda_i \mathbf{x}_i) \leq \sum_i \lambda_i f(\mathbf{x}_i), with equality if f is affine on the convex hull of the \mathbf{x}_i. This property underpins many convexity proofs and applications in optimization. Key properties of convex optimization include solution uniqueness under strict convexity of the objective, where any local minimum is the unique global minimum, and separability, where the objective decomposes as f(\mathbf{x}) = \sum_k f_k(\mathbf{x}_k) across blocks \mathbf{x}_k, allowing decomposition into subproblems for parallel or distributed solving. Semidefinite programming extends convex optimization by optimizing linear functions over spectrahedra, the feasible sets defined by linear matrix inequalities F(\mathbf{x}) \succeq 0 where F(\mathbf{x}) is affine in \mathbf{x} and \succeq 0 denotes positive semidefiniteness, capturing problems like maximum cut approximations beyond linear constraints. A classic example is the diet problem, formulated as minimizing the cost \sum_j c_j x_j where x_j is the amount of food j, subject to \sum_j a_{ij} x_j \geq b_i for nutrients i with requirements b_i, and x_j \geq 0; George Stigler posed this in 1945, solvable via linear programming to find minimal-cost nutrient-adequate diets.

Nonlinear and Nonconvex Optimization

Nonlinear optimization encompasses problems where the objective function or constraints are nonlinear, often leading to complex landscapes with multiple local optima rather than the unique global solutions typical of linear or convex cases. These problems arise in diverse fields such as engineering design, chemical process modeling, and economic equilibrium analysis, where the nonlinearity captures realistic phenomena like saturation effects or interaction terms. Unlike convex optimization, which guarantees global optimality under mild conditions, nonlinear problems generally converge only to local optima, necessitating careful algorithm selection to navigate saddle points and ensure practical convergence. In unconstrained nonlinear optimization, the goal is to minimize a f: \mathbb{R}^n \to \mathbb{R} without restrictions on the variables. is a foundational second-order approach that approximates the function quadratically using the \nabla f(x) and H(x) = \nabla^2 f(x), yielding the update direction d_k = -H(x_k)^{-1} \nabla f(x_k). This step leverages curvature information for rapid quadratic convergence near a local minimum satisfying second-order optimality conditions, though it can fail far from solutions due to indefinite Hessians or high computational cost in inverting large matrices. To mitigate these issues, trust-region methods restrict the step to a neighborhood where the model remains trustworthy, solving a subproblem \min_d m_k(d) = f(x_k) + \nabla f(x_k)^T d + \frac{1}{2} d^T H(x_k) d subject to \|d\| \leq \Delta_k, with trust radius \Delta_k adjusted based on the of actual to predicted decrease. These methods ensure convergence for sufficiently small steps and handle nonconvexity better than pure iterations by incorporating line searches or regularization. For problems with nonlinear constraints, such as \min f(x) subject to c(x) = 0 (equality-constrained), (SQP) iteratively solves approximations of the , forming subproblems \min_d \nabla f(x_k)^T d + \frac{1}{2} d^T H_k d subject to \nabla c(x_k)^T d + c(x_k) = 0, where H_k approximates the of the . This approach exploits the structure of programs for efficiency and achieves superlinear under qualifications, making it suitable for medium-scale nonlinear programs in aerospace trajectory optimization. Inequality-constrained nonlinear optimization, \min f(x) subject to g(x) \leq 0, often employs barrier methods, which transform the problem into a sequence of unconstrained ones by adding a logarithmic barrier term -\mu \sum \log(-g_i(x)) to the objective, with barrier parameter \mu > 0 decreasing toward zero. This interior-point technique keeps iterates feasible and feasible, converging to the optimum while avoiding boundary violations, though it requires careful handling of ill-conditioning near degeneracy. Nonconvex nonlinear optimization introduces significant challenges due to the proliferation of local minima, saddle points, and flat regions, where algorithms may converge to suboptimal solutions depending on initialization. For instance, verifying global optimality is often NP-hard, as seen in the traveling salesman problem (TSP), a classic nonconvex task that requires finding the shortest tour visiting each city once, with the continuous relaxation exhibiting multiple local minima that trap gradient-based methods. These issues underscore the need for hybrid strategies combining local search with randomization, though deterministic guarantees remain elusive without additional problem structure. A representative benchmark for testing nonlinear optimization algorithms is the , defined as f(x_1, x_2) = 100(x_2 - x_1^2)^2 + (1 - x_1)^2, which features a narrow, curved valley leading to the global minimum at (1, 1). This nonconvex test case challenges methods on slow convergence along the valley and sensitivity to step sizes, highlighting the trade-offs in unconstrained solvers like versus more robust trust-region variants.

Stochastic and Robust Optimization

Stochastic optimization addresses decision-making problems where the objective function or constraints involve random variables, typically formulated as minimizing the of a over uncertain parameters. In this framework, the goal is to solve problems of the form \min_x \mathbb{E}_{\xi}[f(x, \xi)], where x is the decision variable, \xi represents the random uncertainty, and the expectation is taken with respect to the distribution of \xi. This approach originated in the mid-20th century with early works on under uncertainty, such as Dantzig's 1955 paper introducing recourse actions in linear programs. has since become essential in fields like and , where exact distributions may be known or assumed, allowing for risk-neutral decisions based on averages over possible outcomes. A key computational technique in stochastic optimization is the sample average approximation (SAA) method, which replaces the intractable expectation with an empirical mean from a finite set of scenarios drawn from the of \xi. This yields an approximate problem \min_x \frac{1}{N} \sum_{i=1}^N f(x, \xi_i), where N is the number of samples, transforming the stochastic problem into a deterministic one that can be solved using standard optimization solvers. Under suitable conditions, such as convexity and finite moments, SAA solutions converge to the true optimum as N increases, with error bounds established through large deviation principles. This method is particularly effective for two-stage stochastic programs, where first-stage decisions are made before uncertainty is revealed, followed by second-stage recourse actions. In contrast, robust optimization focuses on worst-case performance over an set \mathcal{U}, formulated as \min_x \max_{\xi \in \mathcal{U}} f(x, \xi), ensuring solutions remain feasible and near-optimal even under adversarial realizations of . This paradigm, developed prominently in the late 1990s, provides tractable reformulations for common sets like ellipsoids or polyhedra, converting the min-max problem into a program solvable in time. Robust methods are conservative by design, prioritizing protection against the most detrimental outcomes rather than probabilistic averages, and have been extended to conic and semidefinite programs for broader applicability. Common techniques in these areas include scenario-based approaches, which generate a finite set of plausible uncertainty realizations to approximate either expectations or worst cases, and chance-constrained formulations that enforce probabilistic guarantees on constraints. Scenario methods, as in the scenario approach, draw independent samples to construct convex programs with explicit violation probability bounds, scaling well for high-dimensional problems. Chance constraints, such as P(g(x, \xi) \leq 0) \geq 1 - \alpha, limit the risk of constraint violation to at most \alpha, with tractable approximations available for joint or individual probabilities under log-concave distributions. These techniques bridge stochastic and robust paradigms, often combining sampling with uncertainty sets for practical implementation. A classic application is portfolio optimization under uncertain asset returns, extending Markowitz's 1952 mean-variance model to handle estimation errors in expected returns and covariances. In versions, portfolios minimize over return distributions, while robust extensions consider worst-case returns within ellipsoidal uncertainty sets around nominal estimates, yielding more stable allocations with controlled . Such formulations have demonstrated improved out-of-sample performance compared to classical models, particularly in volatile markets. In the , distributionally robust optimization emerged as a unifying , optimizing over ambiguity sets of possible distributions rather than a single nominal one, such as moment-based sets containing all distributions with bounded mean and variance. This addresses distributional in data-driven settings, with tractable reformulations via duality yielding conservative yet computationally efficient solutions. Seminal work showed that for linear problems, these lead to tightened robust counterparts with explicit levels. DRO has gained traction in and , offering a balance between flexibility and robust .

Multi-objective and Global Optimization

Multi-objective optimization addresses problems where multiple conflicting objectives must be optimized simultaneously, such as minimizing cost while maximizing performance in engineering designs. Unlike single-objective problems, solutions are rarely optimal for all objectives at once; instead, the goal is to identify trade-offs among them. A key concept is , where a feasible solution x is Pareto optimal if no other feasible solution x' exists such that f_i(x') \leq f_i(x) for all objectives i = 1, \dots, k and f_j(x') < f_j(x) for at least one j. This definition ensures that improving one objective cannot occur without degrading another, forming the basis for non-dominated solutions in vector optimization. The set of all Pareto-optimal solutions in the objective space traces the efficient frontier, a curve or surface representing the best possible trade-offs, often visualized for two objectives to highlight compromises like risk versus return in portfolio selection. To generate points on this frontier, scalarization techniques convert the multi-objective problem into a single-objective one. A prominent method is the weighted sum scalarization, which minimizes \sum_{i=1}^k w_i f_i(x) subject to constraints, where weights w_i > 0 reflect objective priorities and sum to 1; this yields properly efficient solutions under convexity assumptions. In , this approach optimizes trade-offs, such as minimizing production cost while maximizing structural performance in component design, where Pareto solutions balance material expenses against load-bearing capacity. Global optimization seeks the absolute best solution in problems with multimodal landscapes, particularly nonconvex ones where local minima abound, extending beyond local nonlinear solvers by guaranteeing convergence to the global optimum. Branch-and-bound methods systematically partition the and use lower bounds to prune suboptimal branches, ensuring finite termination for twice-differentiable nonconvex problems. The αBB algorithm exemplifies this by constructing convex underestimators via diagonal perturbation of the , enabling rigorous global minimization. Spatial partitioning complements this by dividing the search space into subregions, such as via , to refine bounds and focus computation on promising areas in low-dimensional bound-constrained problems. From the 1990s onward, evolutionary algorithms have advanced global search in multi-objective settings by maintaining diverse populations to approximate the entire without gradient reliance. The Non-dominated Sorting (NSGA), introduced in 1995, ranks solutions by dominance levels and uses crowding distance for diversity, effectively handling nonconvex Pareto fronts in applications like aerodynamic design. These methods provide robust approximations in complex, multimodal landscapes where exact global techniques are computationally intensive.

Optimization Algorithms

Exact Methods

Exact methods in mathematical optimization are algorithms designed to find globally optimal solutions with finite termination guarantees, typically for problems with linear, convex, or structured discrete objectives and constraints. These methods are particularly effective for (LP) and certain (IP) instances where the problem size permits exhaustive or polynomial-time exploration. Unlike approximation techniques, exact methods ensure optimality by systematically evaluating feasible regions or subproblems, often leveraging duality or relaxation principles to prune search spaces. They form the backbone of solvers for and applications, where precision is paramount over computational speed for moderate-scale instances. The simplex method, introduced by George B. Dantzig in 1947, addresses by iteratively pivoting from one to an adjacent one along the edges of the feasible , improving the objective value until optimality is reached. This tableau-based approach represents the problem in , selecting entering and leaving variables based on reduced costs and ratio tests to maintain feasibility. Despite its empirical efficiency on sparse problems, the method's is due to potential cycling through an unbounded number of vertices, though anti-cycling rules like Bland's rule prevent this in practice. Interior-point methods, pioneered by in 1984, solve linear and convex programs by traversing paths through the interior of the rather than its boundaries. These algorithms employ barrier functions, such as the logarithmic barrier for inequality constraints, to transform the constrained problem into a sequence of unconstrained minimizations solved via . Primal-dual variants simultaneously optimize primal and dual objectives, following central paths that approximate the analytic center and converge to the optimum. Karmarkar's projective algorithm achieves polynomial-time complexity, specifically O(n^{3.5} L) arithmetic operations for an n-variable problem with L-bit input, marking a theoretical breakthrough over . Branch-and-bound, formalized by Ailsa Land and Alison Doig in , extends exact solving to by partitioning the feasible space into subproblems via branching on fractional variables from relaxations. Each node in the search is bounded using continuous relaxations or duals; branches exceeding the current best solution are fathomed to prune the tree. This enumerative strategy guarantees optimality by exhaustively exploring only promising regions, with enhancements like cutting planes strengthening bounds. For mixed-integer linear programs, the method's worst-case complexity is exponential, reflecting the NP-hard nature of , though average performance benefits from tight relaxations in structured cases. Dynamic programming, developed by Richard Bellman in the , provides an exact recursive framework for sequential decision problems, decomposing them into overlapping subproblems solved via backward or forward induction. The core defines the value function for stage k as V_k(x) = \min_a \left[ c(x,a) + V_{k-1}(f(x,a)) \right], where x is the state, a the action, c(x,a) the stage cost, and f(x,a) the transition function, with V_0(x) = 0 for the terminal stage. This principle of optimality ensures that optimal subpolicy solutions compose to a global optimum, applicable to finite-horizon problems like inventory control or shortest paths. Computationally, it requires storing value functions across states, leading to exponential space and time in the state dimension unless sparsity or decomposition is exploited. In terms of , exact methods for achieve polynomial time via interior-point approaches, contrasting with simplex's exponential worst-case, while via branch-and-bound remains exponential in the worst case due to the of discrete choices. These guarantees underscore their suitability for verifiable optima in linear and subfields, though scalability limits their use to problems with up to thousands of variables.

Iterative and Gradient-Based Methods

Iterative and gradient-based methods form a of , relying on derivatives to iteratively refine solutions to minimization problems. These algorithms generate a of points approaching a local optimum by following the negative direction, which points toward steepest descent for differentiable objectives. Unlike exact methods that terminate finitely for structured problems, iterative approaches are designed for large-scale, general nonlinear functions, often achieving approximate solutions efficiently through repeated updates. Their effectiveness stems from exploiting first- or second-order information, with guaranteed under conditions like of gradients. Gradient descent, the simplest such method, updates the iterate as x_{k+1} = x_k - \alpha_k \nabla f(x_k), where \alpha_k > 0 is a step size and \nabla f is the of f. Introduced by Cauchy in for solving systems via minimization, it ensures descent when \alpha_k satisfies sufficient decrease conditions. The step size \alpha_k is typically chosen via , such as the Armijo rule, which backtracks from an initial guess until f(x_k - \alpha_k \nabla f(x_k)) \leq f(x_k) - c \alpha_k \|\nabla f(x_k)\|^2 for parameters $0 < c < 1 and a contraction factor. This rule, proposed by Armijo in 1966, promotes global convergence for smooth functions with Lipschitz gradients. Newton's method enhances this by incorporating curvature via the Hessian matrix H(x_k) = \nabla^2 f(x_k), yielding x_{k+1} = x_k - H(x_k)^{-1} \nabla f(x_k). Originating from root-finding but adapted for optimization, it approximates the quadratic model of f locally, leading to faster progress near stationary points where second-order conditions hold, such as positive definiteness of the Hessian indicating a local minimum. For twice-differentiable functions with Lipschitz continuous Hessians, local quadratic convergence occurs: the error satisfies \|x_{k+1} - x^*\| \leq C \|x_k - x^*\|^2 for some constant C > 0 near the optimum x^*. The addresses quadratic objectives f(x) = \frac{1}{2} x^T A x - b^T x with positive definite A, avoiding explicit inversion by generating conjugate search directions. Developed by Hestenes and Stiefel in , it solves the equivalent A x = b in at most n iterations for n-dimensional problems, with updates x_{k+1} = x_k + \alpha_k d_k where directions d_k satisfy d_k^T A d_j = 0 for j < k, and \alpha_k = \frac{\nabla f(x_k)^T \nabla f(x_k)}{d_k^T A d_k}. For non-quadratic cases, it approximates second-order behavior with only first-order oracles, exhibiting superlinear convergence under suitable conditions. Convergence properties vary by method and assumptions. Gradient descent achieves linear convergence—error reduces by a factor \rho < 1 per iteration, \|x_{k+1} - x^*\| \leq \rho \|x_k - x^*\|—for strongly convex f with condition number \kappa, at rate $1 - \frac{1}{\kappa}, improvable to O(1/k^2) via acceleration as in Nesterov's 1983 method. Newton's method offers quadratic rates locally, while conjugate gradient minimizes quadratics exactly and shows O(k^{1/2}) global rates for convex quadratics. Step size rules like Armijo ensure descent and global convergence to stationary points for nonconvex smooth f. For constrained optimization, projected gradient descent adapts the update to feasible sets: x_{k+1} = P_\mathcal{C}(x_k - \alpha_k \nabla f(x_k)), where P_\mathcal{C} projects onto the convex set \mathcal{C}. Pioneered by in 1964 for Hilbert spaces, it converges linearly for convex problems with feasible projections. The augmented Lagrangian method handles general nonlinear constraints by minimizing an augmented objective \mathcal{L}_\rho(x, \lambda) = f(x) + \lambda^T g(x) + \frac{\rho}{2} \|g(x)\|^2, with g(x) = 0 the constraints, updating multipliers \lambda_{k+1} = \lambda_k + \rho g(x_{k+1}). Introduced by in 1969, it combines penalty and multiplier ideas for dual feasibility and global convergence under constraint qualifications.

Heuristic and Approximation Methods

Heuristic and approximation methods provide practical solutions to optimization problems that are computationally intractable for exact algorithms, particularly in global optimization where local search may trap solutions in suboptimal regions. These methods trade optimality guarantees for efficiency, often yielding high-quality approximations in reasonable time. , a prominent class, draw inspiration from natural processes to explore large search spaces stochastically, while approximation algorithms offer provable bounds on solution quality relative to the optimum. Genetic algorithms (GAs) emulate Darwinian evolution to optimize over populations of candidate solutions. An initial population of encoded individuals (e.g., binary strings representing decision variables) is iteratively evolved through selection, crossover, and mutation operators. Selection favors fitter individuals based on an objective function, crossover combines parent solutions to produce offspring, and mutation introduces random variations to maintain diversity and escape local optima. This process, formalized by , enables GAs to converge toward global optima in complex, multimodal landscapes. Simulated annealing (SA) mimics the physical process of annealing in metallurgy to probabilistically explore solution spaces. Starting from an initial solution, SA generates neighboring candidates and accepts worse ones with probability P = \exp(-\Delta E / T), where \Delta E is the change in objective value and T is a temperature parameter that decreases over time. This allows escape from local minima early on, with acceptance becoming more deterministic as T cools. Introduced by Kirkpatrick, Gelatt, and Vecchi, SA has proven effective for combinatorial problems like the traveling salesman. Particle swarm optimization (PSO) models social behavior in flocks or swarms, where particles adjust positions in the search space based on personal and collective experiences. Each particle i updates its velocity according to \mathbf{v}_i^{t+1} = w \mathbf{v}_i^t + c_1 r_1 (\mathbf{pbest}_i - \mathbf{x}_i^t) + c_2 r_2 (\mathbf{gbest} - \mathbf{x}_i^t), followed by position update \mathbf{x}_i^{t+1} = \mathbf{x}_i^t + \mathbf{v}_i^{t+1}, with inertia weight w, cognitive coefficient c_1, social coefficient c_2, random values r_1, r_2 \in [0,1], personal best \mathbf{pbest}_i, and global best \mathbf{gbest}. Developed by Kennedy and Eberhart, PSO excels in continuous, high-dimensional problems due to its simplicity and few parameters. Approximation methods provide rigorous guarantees for specific problems, such as the 0-1 knapsack problem, which maximizes value under a weight constraint and admits a polynomial-time approximation scheme (PTAS). A fully polynomial-time approximation scheme (FPTAS) achieves a (1 - ε)-approximation in time polynomial in input size and 1/ε, using scaled dynamic programming to reduce state space while bounding error. The original FPTAS, by Ibarra and Kim, scales item values and solves a relaxed instance exactly. In recent developments, reinforcement learning (RL) has emerged for automating hyperparameter tuning in optimization algorithms, treating tuning as a sequential decision process. Population-based training with RL, as in Jaderberg et al. (2020), evolves hyperparameters online across a population of models, rewarding performance improvements and enabling adaptation to diverse tasks in the 2020s. This approach enhances efficiency in AutoML pipelines for nonconvex and stochastic settings.

Applications

Engineering and Physics

Mathematical optimization plays a pivotal role in engineering and physics, enabling the design and analysis of physical systems under constraints such as material limits, energy efficiency, and performance requirements. In structural mechanics, topology optimization seeks to minimize weight while maintaining structural integrity, often formulated as a compliance minimization problem subject to volume constraints. A seminal approach, the Solid Isotropic Material with Penalization (SIMP) method, introduced in the late 1980s, models material distribution by assigning a density variable to each finite element, penalizing intermediate densities to promote binary (solid-void) designs; the effective stiffness is interpolated as E(\rho) = \rho^p E_0, where \rho is the density (0 to 1), p > 1 is the penalization factor, and E_0 is the solid material . This method builds on homogenization techniques for optimal topologies and has been widely adopted for lightweight structures in and automotive applications, achieving significant weight reductions in benchmark cantilever beam problems compared to uniform designs. In , addresses by optimizing component values to meet specifications like gain, , and , often minimizing a least-squares error between desired and simulated responses. For instance, gradient-based methods solve for sizes and biases in analog amplifiers, reducing design iterations from weeks to hours in workflows. Similarly, optimization uses to shape radiation patterns, adjusting element positions and excitations to minimize sidelobe levels while maximizing ; a classic example is the synthesis of linear arrays for , where navigates the nonconvex landscape to achieve sidelobe suppression below -20 in Dolph-Chebyshev configurations. These applications leverage nonlinear methods to handle the inherent nonconvexity of electromagnetic objectives. Control systems in engineering rely on optimization for stabilizing dynamic processes, with the (LQR) providing an optimal law for linear systems. The LQR problem minimizes the infinite-horizon quadratic cost functional: J = \int_0^\infty \left( \mathbf{x}^T Q \mathbf{x} + \mathbf{u}^T R \mathbf{u} \right) dt subject to \dot{\mathbf{x}} = A \mathbf{x} + B \mathbf{u}, where \mathbf{x} is the , \mathbf{u} the input, Q \geq 0 penalizes state deviations, and R > 0 penalizes control effort. Solved via the , LQR yields gains that balance stability and performance, commonly applied in to improve settling times over proportional-integral controllers. This framework, originating from early optimal control theory, remains foundational for applications in and automotive suspension. In , optimization enhances design by sizing members and adjusting geometries to minimize material use under load constraints, often using for stress-limited problems. Seminal work on topology traces to Michell's 1904 criteria for statically determinate structures, extended numerically to yield layouts with less steel than designs in bridge girders. in civil networks, such as water distribution or transportation systems, employs mixed-integer programming to optimize pipe diameters or lane capacities, minimizing costs while ensuring flow demands; for example, genetic algorithms level construction resources across project phases, helping to reduce peak usage and delays in large-scale builds. Geophysics applies least-squares optimization in seismic inversion to estimate subsurface properties from wave data, formulating the problem as minimizing the misfit \| d - G m \|^2_2, where d are observed seismograms, G the forward modeling , and m the model parameters like profiles. This damped least-squares approach, regularized with Tikhonov terms, resolves ambiguities in ill-posed , enabling accurate of reservoirs with significantly improved resolutions over ray-based methods in synthetic benchmarks like the Marmousi model. Originating from theory advancements, it underpins full-waveform inversion for hazard assessment and resource .

Economics and Operations Research

In economics, mathematical optimization plays a central role in modeling consumer behavior through the , where an individual seeks to maximize their function u(x) subject to a p \cdot x \leq w, with x representing quantities of , p their prices, and w the . This formulation underpins , ensuring that individual optimizations lead to market equilibria under assumptions of convexity and completeness. The problem is typically solved using Lagrange multipliers or Kuhn-Tucker conditions for interior or boundary solutions, highlighting trade-offs in . In , optimization techniques are applied to selection, notably through the mean-variance framework introduced by , which minimizes for a given or maximizes return for a given level. The model formulates the problem as minimizing the variance \sigma_p^2 = w^T \Sigma w subject to an expected return constraint \mu^T w = r and $1^T w = 1, where w are asset weights, \Sigma the , and \mu . This approach revolutionized investment management by quantifying diversification benefits. Operations research employs optimization for scheduling and inventory management. , a classic problem, assigns jobs to machines to minimize or , often modeled as a mixed-integer linear program (MILP) with binary variables for sequencing and continuous variables for start times, subject to precedence and capacity constraints. Seminal formulations demonstrated that such MILP models can yield optimal schedules for small instances, though necessitates heuristics for larger scales. Inventory models, such as the (EOQ), optimize ordering policies to balance holding and setup costs, deriving the optimal order size Q = \sqrt{\frac{2DS}{H}}, where D is demand rate, S setup cost, and H holding cost per unit. This deterministic model provides a foundational for extensions in planning. Transportation optimization focuses on the , which minimizes the total cost of shipping goods from sources to sinks in a while satisfying constraints. Formulated as \min \sum_{(i,j)} c_{ij} f_{ij} subject to flow balance \sum_j f_{ij} - \sum_k f_{ki} = b_i and capacity bounds, it generalizes the and is solvable via techniques like the method. This approach is pivotal for efficiency, reducing costs in networks. Auction design leverages optimization through , exemplified by the Vickrey-Clarke-Groves (VCG) mechanism, which allocates items to maximize social welfare while incentivizing truthful bidding. Developed across key contributions, VCG sets payments as the externality imposed on others, ensuring in quasi-linear settings for multi-item auctions. serves as a key tool across these applications, enabling scalable solutions for in economic and operational contexts.

Machine Learning and Data Science

In , optimization plays a central role in training models to minimize , defined as the over a finite training dataset. The (ERM) principle seeks to find parameters that solve \hat{\theta} = \arg\min_{\theta} \frac{1}{n} \sum_{i=1}^n L(y_i, f(x_i; \theta)), where L is the , (x_i, y_i) are training examples, and f is the model. This approach, foundational to , balances fitting observed data while aiming for generalization to unseen data, as established in . Training large-scale models, particularly deep neural networks, relies on and its variants to efficiently approximate gradients from mini-batches, addressing the computational demands of high-dimensional parameter spaces. updates parameters iteratively as \theta_{t+1} = \theta_t - \eta \nabla L(\theta_t; b_t), where \eta is the and b_t is a random mini-batch, enabling scalable optimization for datasets with millions of examples. Variants like momentum-accelerated and adaptive methods such as incorporate second-moment estimates to accelerate and handle noisy gradients, achieving state-of-the-art performance in tasks like image classification. Hyperparameter tuning in treats the selection of values like learning rates or network architectures as a , often solved using to minimize validation error with fewer evaluations than grid search. This method models the objective as a surrogate and uses an acquisition function, such as expected improvement, to balance exploration and exploitation in the hyperparameter space. has demonstrated superior efficiency over , outperforming human experts in tuning algorithms like support vector machines and deep networks on benchmarks such as UCI datasets. In data science applications, optimization underpins unsupervised tasks such as clustering and feature selection. K-means clustering formulates the problem as a nonconvex optimization of assigning data points to k centroids to minimize the sum of squared distances, solved iteratively via Lloyd's algorithm, which alternates between assignment and centroid updates until convergence. This method, while sensitive to initialization due to local minima, remains widely used for its simplicity and effectiveness in partitioning high-dimensional data like customer segmentation. Feature selection optimizes a combinatorial objective to identify a subset of relevant variables that maximizes predictive performance while reducing dimensionality, often using wrapper methods that evaluate subsets via cross-validation; filter methods like mutual information ranking provide scalable alternatives by scoring individual features against the target. Recent advances address distributed optimization in privacy-sensitive settings through , which minimizes a weighted sum of local losses \min_{\theta} \sum_i w_i f_i(\theta) across decentralized devices without sharing raw data. This framework aggregates model updates from clients, as in the FedAvg algorithm, reducing communication costs by up to 100x compared to centralized training on datasets like while preserving . Optimization challenges in transformer models, which scale to billions of parameters for , have been mitigated by mixed-precision training, combining 16-bit floating-point computations with 32-bit for stability to accelerate training by 2-3x on GPUs without accuracy loss, as seen in models like and series from 2018 onward.

Software Tools and Solvers

Open-Source Libraries

Open-source libraries have democratized access to mathematical optimization tools, enabling researchers, students, and practitioners to implement and solve complex problems without licensing costs. These libraries span general-purpose solvers for nonlinear and , domain-specific frameworks for and mixed-integer problems, and specialized tools for and . Built primarily in for its ecosystem compatibility, they leverage community contributions and integrate with numerical computing environments like . The library, part of the broader SciPy ecosystem, provides the optimize module for a wide range of optimization tasks, including nonlinear minimization and problems. It implements algorithms such as sequential programming (SLSQP) for nonlinear programs with bounds and constraints, and supports global optimization methods like . For , SciPy's linprog function solves standard LP problems using interior-point or revised simplex methods, making it suitable for medium-scale applications in scientific computing. These tools are tightly integrated with arrays for efficient handling of vectorized data. CVXPY serves as a embedded in for modeling problems, allowing users to express objectives and constraints in a natural, mathematical syntax that adheres to disciplined programming rules. It supports (SDP) through atomic functions for matrix inequalities and can handle mixed-integer programming (MIP) by interfacing with external solvers. CVXPY automatically transforms models into standard forms and dispatches them to open-source solvers like or OSQP for quadratic and conic programs, facilitating in fields like and . For mixed-integer linear programming (MILP), open-source interfaces connect to robust solvers with free access options. The (CBC) solver, an open-source C++ library, excels in solving MILP problems via branch-and-cut techniques and integrates seamlessly with through wrappers like or Pyomo, supporting large-scale combinatorial models without restrictions. Similarly, Gurobi's interface (gurobipy) offers a free academic license for full-featured MILP solving, including advanced heuristics and parallel processing, though it requires registration for non-commercial use. Google's , released in the , provides a comprehensive suite for , incorporating linear solvers like Glop and for routing, scheduling, and assignment problems. In , libraries like and emphasize optimization through (autodiff) for gradient-based methods. 's torch.autograd engine computes exact gradients for arbitrary computational graphs, powering optimizers in torch.optim such as and (SGD) for training deep neural networks on large datasets. uses tf.GradientTape for dynamic gradient computation, enabling efficient in models with millions of parameters and supporting distributed optimization for scalable ML workflows. These autodiff capabilities underpin empirical risk minimization in data-driven optimization.

Commercial Software

Commercial software for mathematical optimization provides robust, scalable solvers tailored for industrial applications, offering advanced features such as high-performance algorithms, , and dedicated support for enterprise integration. These tools are designed to handle large-scale problems in , , , and , often with premium licensing models that include maintenance and customization services. Gurobi Optimizer is renowned for its high-performance capabilities in solving mixed-integer programming problems, leveraging advanced barrier methods for continuous relaxations within its branch-and-bound framework. It supports , , and MIP formulations, with ongoing performance improvements in recent versions, such as enhanced solving capabilities in version 12.0. The solver employs interior-point algorithms for large-scale continuous models, making it suitable for computationally intensive industrial scenarios. As of 2025, Gurobi version 12.0 introduces global optimality for mixed-integer nonlinear programs (MINLP) and further performance gains. The 2025 State of Mathematical Optimization notes increasing integration of optimization with technologies across industries. IBM CPLEX Optimization Studio excels in solving , mixed-integer programming, and problems, incorporating distributed algorithms to accelerate computations on multi-core systems. It handles quadratically constrained programs (QCP) and supports both and simplex methods alongside barrier solvers for and MIP. CPLEX's solving features enable efficient scaling for enterprise-level optimization tasks, such as and . The Optimization Toolbox integrates seamlessly with the environment for solving nonlinear optimization problems, including constrained and unconstrained formulations using algorithms like . It also supports techniques, such as genetic algorithms for multimodal nonlinear problems, allowing users to minimize functions with integer or binary constraints. This toolbox facilitates and integration with simulation workflows in and . MOSEK Optimization Suite specializes in , extending linear and quadratic solvers to handle semidefinite, power, and exponential cones through its interior-point algorithm. It solves conic quadratic problems efficiently, supporting affine expressions over Cartesian products of conic domains, which is particularly useful for robust and chance-constrained formulations. MOSEK's focus on and sparsity exploitation makes it ideal for and applications requiring precise conic modeling. In the 2020s, commercial solvers like Gurobi have seen enhanced integration with enterprise systems, exemplified by its partnership with to embed optimization capabilities into for . These updates enable seamless deployment in production environments, supporting and multi-objective extensions for decision-making.

Specialized Frameworks

Specialized frameworks in mathematical optimization extend general-purpose tools to address challenges in niche domains, such as , hyperparameter tuning, and biological modeling, by integrating domain-specific constraints and algorithms. These frameworks often leverage hybrid approaches, combining classical optimization with emerging paradigms like or , to tackle problems intractable for standard solvers. In quantum optimization, Optimization, developed by in the early 2020s, provides a suite for formulating and solving combinatorial problems using the Quantum Approximate Optimization Algorithm (QAOA). QAOA employs a parameterized to approximate solutions to NP-hard problems, iteratively optimizing parameters via classical methods to minimize an objective function encoded in the quantum . This framework supports applications like graph partitioning and has been implemented for execution on 's quantum hardware, demonstrating scalability for problems up to hundreds of qubits. Complementing gate-based approaches, D-Wave's quantum annealing systems, commercially available since 2011, have matured by 2025 into hybrid solvers capable of addressing large-scale optimization in and , with recent systems achieving over 5,000 qubits and demonstrating quantum advantage in specific instances. For , Optuna offers an automated framework that uses tree-structured Parzen estimators to efficiently search vast configuration spaces, outperforming grid search by up to 10x in convergence speed for models. Tune, part of the project, enables distributed hyperparameter tuning across clusters, supporting algorithms like and population-based training to scale experiments to thousands of trials, reducing training time for neural networks by orders of magnitude on GPU clusters. These tools are particularly vital in ML applications, where optimizing hyperparameters can directly impact model performance in tasks like image classification. In , the COBRA Toolbox facilitates constraint-based reconstruction and analysis of metabolic networks, enabling (FBA) to predict steady-state fluxes in genome-scale models under formulations. By incorporating constraints from and , it supports simulations of microbial growth and drug targeting, with extensions for multi-omics that have been applied to over 1,000 reconstructions across . JuMP, a modeling language embedded in Julia, specializes in expressive formulation of optimization problems, interfacing seamlessly with solvers for mixed-integer, nonlinear, and conic programs while allowing custom extensions via Julia's ecosystem. It excels in rapid prototyping for domain-specific models, such as energy systems or supply chains, by abstracting solver details and enabling automatic differentiation for gradient-based methods.

References

  1. [1]
    [PDF] Introduction to Mathematical Optimization
    Mathematical optimization is making something 'best', which can involve maximizing or minimizing, and is a branch of applied mathematics.
  2. [2]
    Practical Optimization | Chapter 1: Introduction - SIAM.org
    Dec 16, 2019 · In mathematical terms, optimization usually involves maximizing or minimizing; for example, we may wish to maximize profit or minimize weight.
  3. [3]
    [PDF] Mathematical optimization
    Brief history of optimization. ▻ 1700s: theory for unconstrained optimization (Fermat,. Newton, Euler). ▻ 1797: theory for equality constrained optimization ...
  4. [4]
    [PDF] Introduction to Mathematical Optimization R. Clark Robinson
    This book covers linear programming, unconstrained and constrained optimization, and dynamic programming, focusing on deterministic problems with continuous ...
  5. [5]
    [PDF] A First Course in Optimization
    Optimization means maximizing or minimizing some function of one or, more ... Economics for his work in optimization and mathematical economics. His.
  6. [6]
    JuMP: A Modeling Language for Mathematical Optimization
    JuMP is an open-source modeling language that allows users to express a wide range of optimization problems (linear, mixed-integer, quadratic, conic-quadratic, ...
  7. [7]
    [PDF] Convex optimization problems
    A standard convex optimization problem is to minimize f0(x) subject to fi(x) ≤ 0, i = 1,..., m, and hi(x) = 0, i = 1,..., p, where f0, f1, ..., fm are convex and  ...
  8. [8]
    [PDF] Math 407 Definitions : Sections 1–3
    Section 1. • Mathematical Optimization: A mathematical optimization problem is one in which some real-valued function is either maximized or minimized ...
  9. [9]
    [PDF] Discrete Optimization - UW Math Department
    Roughly speaking, discrete optimization deals with finding the best solution out of finite number of possibilities in a computationally efficient way.
  10. [10]
    [PDF] Topics in Discrete Optimization Lenny Fukshansky
    An optimization problem like this is called discrete if the domain D is a discrete set inside of some topological space, i.e. if every point of D is an ...
  11. [11]
    [PDF] 4. Optimization Definition and Formulation 4.1. Introduction
    The definition of an optimization problem boils down to defining design parameters, objective functions, and constraint functions. When the number of design ...
  12. [12]
    [PDF] Linear programming 1 Basics
    Mar 17, 2015 · The set of feasible solutions is called the feasible space or feasible region. A feasible solution is optimal if its objective function ...
  13. [13]
    None
    Below is a merged and comprehensive summary of the requested sections from *Convex Optimization* by Stephen Boyd and Lieven Vandenberghe, based on the provided excerpts from https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf. The response consolidates all information into a dense, structured format, including key definitions, page references, and standard notations. Where information overlaps or varies across segments, the most detailed or authoritative version is retained, and discrepancies are noted. Tables in CSV-like format are used where appropriate to maximize detail and readability.
  14. [14]
    [PDF] Numerical Optimization - UCI Mathematics
    Formal mathematical requirements are kept to a minimum. Because of our focus on continuous problems, we have omitted discussion of impor- tant optimization ...
  15. [15]
    [PDF] nonlinear-programming.pdf - Patrick Emami
    - Bertsekas. Patrick Emami ... If X is a convex subset of Rn and f : Rn → R is convex over X, then a local minimum of f over X is also a global minimum.
  16. [16]
    Euclid's Elements, Book III, Proposition 15 - Clark University
    Proposition 15. Of straight lines in a circle the diameter is greatest, and of the rest the nearer to the center is always greater than the more remote.
  17. [17]
    A Classic from China: The Nine Chapters - Introduction and History
    This article is about the most important mathematical work in China's long history, the Jiuzhang Suanshu (“Nine Chapters on the Art of Calculation”).Missing: optimization allocation
  18. [18]
    [PDF] five page historical introduction - UMD Physics Department
    In 1662 Fermat succeeded in deriving the law of refraction of light from the ... called Fermat's principle. It is one of the pillars on which geometric ...
  19. [19]
    [PDF] The Princeton Companion to Mathematics
    2.3 The Newton–Raphson Method: Recurrence Formulas. In around 1670, newton [VI.14] devised a method for finding roots of equations, which he explained with ref-.<|separator|>
  20. [20]
    [PDF] The original Euler's calculus-of-variations method - Edwin F. Taylor
    Leonhard Euler's original version of the calculus of variations (1744) used elementary mathematics and was intuitive, geometric, and easily visualized. In.
  21. [21]
    On Fourier's algorithm for linear constraints - ResearchGate
    Aug 7, 2025 · In the 1820s Fourier provided the first algorithm for solving linear arithmetic constraints. In other words, this algorithm determines ...Missing: precursor | Show results with:precursor
  22. [22]
    Carl Friedrich Gauss & Adrien-Marie Legendre Discover the Method ...
    Adrien-Marie Legendre Offsite Link was the first to publish the method of least squares Offsite Link in 1805, Carl Friedrich Gauss Offsite Link is credited ...
  23. [23]
    Interior Point Methods for Linear Programming - PubsOnLine
    A survey of the significant developments in the field of interior point methods for linear programming is presented, beginning with Karmarkar's projective ...
  24. [24]
    Interior-point methods - ScienceDirect.com
    The modern era of interior-point methods dates to 1984, when Karmarkar proposed his algorithm for linear programming. In the years since then, algorithms ...
  25. [25]
    [1412.6980] Adam: A Method for Stochastic Optimization - arXiv
    Dec 22, 2014 · We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order ...
  26. [26]
    A Golden Decade of Deep Learning: Computing Systems ...
    May 1, 2022 · Theano, developed in 2010, was an early deep learning-oriented framework that included automatic symbolic differentiation.14 Automatic ...
  27. [27]
    Neural Architecture Search with Reinforcement Learning - arXiv
    Nov 5, 2016 · In this paper, we use a recurrent network to generate the model descriptions of neural networks and train this RNN with reinforcement learning.
  28. [28]
    [PDF] Neural Architecture Search with Reinforcement Learning
    This paper uses a recurrent network to generate the model descriptions of neural networks and trains this RNN with reinforcement learning to maximize the ...
  29. [29]
    [1010.5445] Theory and Applications of Robust Optimization - arXiv
    Oct 26, 2010 · In this paper we survey the primary research, both theoretical and applied, in the area of Robust Optimization (RO).Missing: 1990s 2020s developments
  30. [30]
    Technical Note—Two-Stage Sample Robust Optimization
    Apr 22, 2021 · In “Two-Stage Sample Robust Optimization,” Bertsimas, Shtern, and Sturt investigate a simple approximation scheme, based on overlapping ...
  31. [31]
    [1411.4028] A Quantum Approximate Optimization Algorithm - arXiv
    Nov 14, 2014 · This quantum algorithm produces approximate solutions for combinatorial optimization problems, using a parameter 'p' to improve approximation ...
  32. [32]
    [2509.07216] Quantum Machine Learning and Grover's Algorithm for ...
    Sep 8, 2025 · This paper introduces a quantum native framework that integrates quantum machine learning with Grover's algorithm to solve kinematic ...Missing: extensions 2020s
  33. [33]
    [PDF] Optimizing climate models with process-knowledge, resolution, and AI
    Jan 23, 2024 · Climate models can be optimized by using AI to derive data-driven closure functions, increasing resolution, and using process-based ...Missing: 2020s | Show results with:2020s
  34. [34]
    [PDF] Applying Machine Learning in Numerical Weather and Climate ...
    May 26, 2024 · Here, we briefly discuss two major types of ML tools that have been applied to develop applications for numerical weather and climate prediction ...Missing: 2020s | Show results with:2020s
  35. [35]
    Linear Programming Basics
    Here is an example of an infeasible problem: min x s.t. x ≤ 1 x ≥ 2 There is no value for x that is at the same time at most 1 and at least 2. Even if a problem ...
  36. [36]
    The Fritz John necessary optimality conditions in the presence of ...
    Volume 17, Issue 1, January 1967, Pages 37-47 The Fritz John necessary optimality conditions in the presence of equality and inequality constraints.
  37. [37]
    [PDF] constraint qualifications - cs.wisc.edu
    Constraint qualifications are essential for deriving primal and primal-dual characterizations of solutions of optimization and variational problems, for ...
  38. [38]
    [PDF] OPTIMALITY CONDITIONS
    Theorem 1.3. (Coercivity implies existence) Let f : Rn → R be continuous on all of Rn. If f is coercive, then f has at least one global minimizer.
  39. [39]
    [PDF] 3.1 Basics of Convex Optimization
    Definition: (Coercivity) A function f : Rn → R is called coercive if for all sequence {xk} with kxkk→∞, we have limk→∞ f(xk) = ∞.
  40. [40]
    [PDF] Convex Optimization
    Boyd, Stephen P. Convex Optimization / Stephen Boyd & Lieven Vandenberghe p. cm. Includes bibliographical references and index. ISBN 0 521 83378 7. 1 ...
  41. [41]
    Perturbation Analysis of Optimization Problems - SpringerLink
    This book focuses on perturbation analysis of continuous optimization problems, studying the continuity and differentiability of optimal value and solutions as ...Missing: optima | Show results with:optima
  42. [42]
    [PDF] Canonical Problem Forms
    Standard form. A linear program is said to be in standard form when it is written as min x. cT x subject to Ax = b x ≥ 0. Any linear program can be rewritten ...
  43. [43]
    [PDF] Origins of the Simplex Method - DTIC
    Today we know that before 1947 that four isolated papers had been published on special cases of the linear programming problem by Fourier (1824) [5], de la.Missing: citation | Show results with:citation
  44. [44]
    [PDF] A new polynomial-time algorithm for linear programming
    A new polynomial-time algorithm for linear programming · N. Karmarkar · Published in Symposium on the Theory of… 1 December 1984 · Mathematics, Computer Science.
  45. [45]
    [PDF] PROBLEM DECOMPOSITION IN BLOCK-SEPARABLE CONVEX ...
    Problem decomposition in convex optimization can take several forms, but one of the most important is seen in the case of block-separable constraints and ...
  46. [46]
    [PDF] Semidefinite Programming - Stanford University
    Atthe same time, they offer a simpleconceptual framework and make possible a self-containedtreatment of interior-point methods for many convex optimization.
  47. [47]
    The Diet Problem - NEOS Guide
    The problem is formulated as a linear program where the objective is to minimize cost and the constraints are to satisfy the specified nutritional requirements.
  48. [48]
    [PDF] Newton's Method for Unconstrained Optimization
    Newton's Method for Unconstrained Optimization. Robert M. Freund. February, 2004. 1. 2004 Massachusetts Institute of Technology. Page 2. 1 Newton's Method.Missing: seminal paper
  49. [49]
    (PDF) Newton's method and its use in optimization - ResearchGate
    Aug 9, 2025 · Newton's method is a basic tool in numerical analysis and numerous applications, including operations research and data mining.
  50. [50]
    Trust-Region Methods | SpringerLink
    Trust-Region Methods. In: Nocedal, J., Wright, SJ (eds) Numerical Optimization. Springer Series in Operations Research and Financial Engineering.
  51. [51]
    [PDF] Sequential Quadratic Programming - Duke University
    While these problems are important and numerous, the great strength of the SQP method is its ability to solve problems with nonlinear constraints. For this ...
  52. [52]
    [PDF] SEQUENTIAL QUADRATIC PROGRAMMING METHODS
    In his 1963 PhD thesis, Wilson proposed the first sequential quadratic programming. (SQP) method for the solution of constrained nonlinear optimization problems ...
  53. [53]
    [PDF] Interior Methods for Nonlinear Optimization
    Primarily in the form of barrier methods, interior-point techniques were popular during the 1960s for solving nonlinearly constrained problems. However, their ...
  54. [54]
    Interior Methods for Nonlinear Optimization | SIAM Review
    Primarily in the form of barrier methods, interior-point techniques were popular during the 1960s for solving nonlinearly constrained problems.
  55. [55]
    [PDF] Introduction to Global Optimization - LIX
    Oct 23, 2008 · Local optimization of NLPs is an NP-hard problem in itself [PS88], so finding the global optimum of most nonconvex problems is also NP-hard.<|separator|>
  56. [56]
    Rosenbrock Function
    The Rosenbrock function, also referred to as the Valley or Banana function, is a popular test problem for gradient-based optimization algorithms.Missing: example | Show results with:example
  57. [57]
    [PDF] Stochastic Programming - Stanford University
    In 1955 he wrote “Linear Programming Under Uncer- tainty”(Dantzig 1955), reprinted in this book as the first chapter, a seminal paper in which he introduced ...
  58. [58]
    Stochastic Programming - Book - SpringerLink
    Includes George Dantzig's original 1955 paper, “Linear Programming under Uncertainty” which is considered one of the ten most influential papers in Management ...Missing: seminal | Show results with:seminal
  59. [59]
    The Sample Average Approximation Method for Stochastic Discrete ...
    In thispaper we study a Monte Carlo simulation--based approach to stochastic discrete optimization problems.Missing: seminal | Show results with:seminal
  60. [60]
    A Review of Stochastic Programming Methods for Optimization of ...
    In this paper, we review the basic concepts and recent advances of a risk-neutral mathematical framework called “stochastic programming” and its applicationsMissing: seminal | Show results with:seminal
  61. [61]
    [PDF] ROBUST CONVEX OPTIMIZATION
    The general robust counterpart scheme as outlined below was announced in recent paper Ben-. Tal and Nemirovski (1997) on robust Truss Topology Design.Missing: seminal | Show results with:seminal
  62. [62]
  63. [63]
    The scenario approach to robust control design - IEEE Xplore
    This paper proposes a new probabilistic solution framework for robust control analysis and synthesis problems that can be expressed in the form of minimization ...
  64. [64]
    The Scenario Approach for Systems and Control Design
    The 'scenario approach' is an innovative technology that has been introduced to solve convex optimization problems with an infinite number of constraints.
  65. [65]
    [PDF] Robust portfolio optimization: a categorized bibliographic review
    Fabozzi et al. (2006) argue that robust Markowitz portfolios are more stable than other portfolios as inputs fluctuate, and their out-of-sample performance ...Missing: extension | Show results with:extension
  66. [66]
    [PDF] Robust Optimization — Methodology and Applications1
    The paper surveys the main results of RO as applied to uncertain linear, conic quadratic and semidefinite programming. For these cases, computa- tionally ...Missing: seminal | Show results with:seminal<|control11|><|separator|>
  67. [67]
    [PDF] Distributionally Robust Optimization: A Review - arXiv
    Aug 13, 2019 · This paper surveys main concepts and contributions to DRO, and its relationships with robust optimization, risk-aversion, chance-constrained ...
  68. [68]
    Pareto optimality in multiobjective problems
    In this study, the optimization theory of Dubovitskii and Milyutin is extended to multiobjective optimization problems, producing new necessary conditions.
  69. [69]
    [PDF] Pareto Optimality - Stanford University
    One way to find good solutions to multiobjective problems is with Pareto optimality, named after economist Vilfredo Pareto. Pareto noticed that many economic ...
  70. [70]
    Generation of efficient frontiers in multi-objective optimization ...
    Genetic algorithms are useful for generating efficient frontiers with two or three objective functions.
  71. [71]
    [PDF] Constrained Multi-Objective Optimization of a Condenser Coil Using ...
    The primary optimization objectives are the performance of the condenser coil and the cost. This study illustrates how genetic optimization algorithms can ...
  72. [72]
    a triangulation-based partitioning algorithm for global optimization
    We propose a triangulation-based partitioning algorithm, TRIOPT, for solving low-dimensional bound-constrained black box global optimization problems.
  73. [73]
    [PDF] Multiobjective Optimization Using Nondominated Sorting in Genetic ...
    Page 1. Multiobjective Optimization Using. Nondominated Sorting in Genetic Algorithms*. N. Srinivas and Kalyanmoy Deb. Department of Mechanical Engineering.
  74. [74]
    [PDF] On the complexity of linear programming - Stanford CS Theory
    The main topics are polynomial and strongly polynomial algorithms, probabilistic analy- sis of simplex algorithms, and recent interior point methods. 1.
  75. [75]
    An Automatic Method of Solving Discrete Programming Problems
    10 This is an upper bound to the branch value of y which has been obtained by ignoring the fact that Y2 would be negative at this point. The true branch value ...
  76. [76]
    [PDF] Branch and Bound in Mixed Integer Linear Programming Problems
    Nov 5, 2021 · In this paper, we surveyed the existing literature studying different approaches and algorithms for the four critical components.
  77. [77]
    [PDF] THE THEORY OF DYNAMIC PROGRAMMING - Richard Bellman
    stated above, the basic idea of the theory of dynamic programming is that of viewing an optimal policy as one deter- mining the decision required at each time ...
  78. [78]
    An overview of gradient descent optimization algorithms - arXiv
    Sep 15, 2016 · This article provides intuitions about gradient descent algorithms, covering variants, challenges, common algorithms, parallel architectures, ...
  79. [79]
    [PDF] Steepest Descent 1 Introduction - OSTI.GOV
    also known as the gradient descent method, was first proposed by Cauchy in. 1847 [1]. In the original paper, Cauchy proposed the use of the gradient as a way ...
  80. [80]
    Minimization of functions having Lipschitz continuous first partial ...
    Pacific Journal of Mathematics. Vol. 16, No. 1. November, 1966. Larry Armijo, Minimization of functions having Lipschitz continuous first partial derivatives ...
  81. [81]
    [PDF] Quadratic Convergence of Newton's Method - NYU Computer Science
    The quadratic convergence rate of Newton's Method is not given in A&G, except as Exercise 3.9. However, it's not so obvious how to derive it, even though.Missing: source | Show results with:source
  82. [82]
    [PDF] Methods of conjugate gradients for solving linear systems
    (a) Like the Gauss elimination method, the method of conjugate gradients gives the solution in n steps if no rounding-off error occurs. (b) The conjugate ...
  83. [83]
    Fast Approximation Algorithms for the Knapsack and Sum of Subset ...
    An algorithm is presented which finds for any 0 < e < 1 an approximate solution P satisfying (P* P)/P* < ~, where P* is the desired optimal sum.
  84. [84]
    Genetic Algorithms | Scientific American
    Computer programs that "evolve" in ways that resemble natural selection can solve complex problems even their creators do not fully understand. By John H.Missing: seminal | Show results with:seminal
  85. [85]
    Optimization by Simulated Annealing - Science
    Optimization by Simulated Annealing. S. Kirkpatrick, C. D. Gelatt, Jr., and M. P. VecchiAuthors Info & Affiliations. Science. 13 May 1983. Vol 220, Issue 4598.
  86. [86]
  87. [87]
    Fast Approximation Algorithms for the Knapsack and Sum of Subset ...
    A simplified version, known as the 0/1 knapsack problem, can be stated as fol- lows' Given a positive integer M and a multiset 1 Q consisting of ~ pairs of ...
  88. [88]
    [PDF] Provably Efficient Online Hyperparameter Optimization with ...
    Many of the recent triumphs in machine learning are dependent on well-tuned hyperparameters. This is particularly prominent in reinforcement learning (RL).
  89. [89]
    Generating optimal topologies in structural design using a ...
    This paper presents a methodology for optimal shape design where both these drawbacks can be avoided. The method is related to modern production techniques.
  90. [90]
    [PDF] Antenna Array Pattern Synthesis via Convex Optimization
    A convex optimization problem (or convex program) is the minimization of a convex function over a convex set. It is easy to show that any local minimum of a ...
  91. [91]
    [PDF] Contributions to the Theory of Optimal Control - EE IIT Bombay
    This first paper, which deals with linear-quadratic feedback control, set the stage for what came to be known as LQR (Linear-Quadratic-Regulator) ... Kalman's ...
  92. [92]
    Existence of an Equilibrium for a Competitive Economy - jstor
    ARROW AND GERARD DEBREU tion by a consumption unit under a budget ... assumes utility maximization but postulates that the marginal utility of each.
  93. [93]
    PORTFOLIO SELECTION* - Markowitz - 1952 - The Journal of Finance
    The process of selecting a portfolio may be divided into two stages. The first stage starts with observation and experience and ends with beliefs about the ...
  94. [94]
    On the Job-Shop Scheduling Problem | Operations Research
    The job-shop scheduling problem involves sequencing restrictions and noninterference constraints for individual pieces of equipment.
  95. [95]
    Ford Whitman Harris and the Economic Order Quantity Model
    Ford Whitman Harris first presented the familiar economic order quantity (EOQ) model in a paper published in 1913. Even though Harris's original paper was ...
  96. [96]
    The Distribution of a Product from Several Sources to Numerous ...
    The Distribution of a Product from Several Sources to Numerous Localities. Frank L. Hitchcock, ... First published: April 1941. https://doi.org/10.1002/ ...
  97. [97]
    [PDF] Counterspeculation, Auctions, and Competitive Sealed Tenders
    Jul 25, 2005 · Counterspeculation, Auctions, and Competitive Sealed Tenders. William Vickrey. The Journal of Finance, Vol. 16, No. 1 (Mar., 1961), 8-37.
  98. [98]
    [PDF] Multipart Pricing of Public Goods
    Multipart Pricing of Public Goods. Author(s): Edward H. Clarke. Source: Public Choice , Fall, 1971, Vol. 11 (Fall, 1971), pp. 17-33. Published by: Springer.Missing: original | Show results with:original
  99. [99]
    [PDF] Incentives in Teams
    However in this paper the methods of team theory are used in viewing the incentive problem as a problem of team formation or of the formulation of mechanisms ...
  100. [100]
    [PDF] Large-Scale Machine Learning with Stochastic Gradient Descent
    Léon Bottou. Table 1. Stochastic gradient algorithms for various learning systems. Loss. Stochastic gradient algorithm. Adaline (Widrow and Hoff, 1960).
  101. [101]
    [PDF] Practical Bayesian Optimization of Machine Learning Algorithms
    We show that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms ...
  102. [102]
    [PDF] An Introduction to Variable and Feature Selection
    With this method, one can choose a subset of variables with a given proportion of positively and negatively correlated variables. 1161. Page 6. GUYON AND ...
  103. [103]
    [1602.05629] Communication-Efficient Learning of Deep Networks ...
    Feb 17, 2016 · We present a practical method for the federated learning of deep networks based on iterative model averaging, and conduct an extensive empirical evaluation.
  104. [104]
    Optimization and root finding - Numpy and Scipy Documentation
    SciPy optimize provides functions for minimizing (or maximizing) objective functions, possibly subject to constraints. It includes solvers for nonlinear ...Minimize · Curve_fit · Optimize · Root_scalar
  105. [105]
    User Guide - - cvxpy
    CVXPY is related to Disciplined Convex Programming (DCP), Disciplined Geometric Programming (DGP), Disciplined Parametrized Programming (DPP), and Disciplined ...CVXPY is a Python-embedded... · Advanced Features · Atomic Functions
  106. [106]
    API Documentation - - cvxpy
    The cvxpy API is documented in five sections: atoms, constraints, expressions, problems, and reductions. Classes and functions are imported into the cvxpy ...
  107. [107]
    coin-or/Cbc: COIN-OR Branch-and-Cut solver - GitHub
    Cbc (Coin-or branch and cut) is an open-source mixed integer linear programming solver written in C++. It can be used as a callable library or using a stand ...
  108. [108]
    CBC User Guide - COIN-OR
    Branch and Cut solver (CBC) is an open-source mixed-integer program (MIP) solver written in C++. CBC is intended to be used primarily as a callable library.
  109. [109]
    Free Licenses for Academics | Gurobi - Gurobi Optimization
    at no cost — to students, faculty, and staff at accredited degree-granting institutions. These licenses are designed ...Academic WLS License · Academic Site License · Get your named-user licenseMissing: interface | Show results with:interface
  110. [110]
    gurobipy - PyPI
    As a student or staff member of an academic institution, you qualify for a free, full product license. For more information, see: Academic Program and Licenses.
  111. [111]
    About OR-Tools | Google for Developers
    Aug 28, 2024 · OR-Tools is open source software for combinatorial optimization, which seeks to find the best solution to a problem out of a very large set of possible ...
  112. [112]
    Introduction to gradients and automatic differentiation - TensorFlow
    Aug 15, 2024 · Automatic differentiation is useful for implementing machine learning algorithms such as backpropagation for training neural networks.Gradient tapes · Gradients of non-scalar targets · Cases where gradient returns...
  113. [113]
    The Leader in Decision Intelligence Technology - Gurobi Optimization
    SAP Logo. Gurobi helps SAP deliver a powerful planning solution built for today's challenges. Watch Now. Audi Logo. Audi transformed a three-week manual ...Company · Download Center · Mathematical optimization solver · Contact UsMissing: CPLEX 2020s
  114. [114]
    Parameter Guidelines - Gurobi Optimizer Reference Manual
    The default will invoke the barrier method, which can take a lot more memory than dual. ... The two most important Gurobi settings when solving a MIP model are ...Continuous Models · Infeasible Or Unbounded... · Mip Models
  115. [115]
    Prior Version Enhancements - Gurobi Optimization
    Gurobi Version 9.0 delivers significant performance improvements across LP, MIP, and MIQP problem types compared to v8.1. LP – In default settings is 7% faster.
  116. [116]
    Mathematical program solvers - IBM CPLEX
    Take advantage of a distributed parallel algorithm for mixed integer programming and flexible, high-performance mathematical programming solvers for linear ...
  117. [117]
    IloCplex - IBM
    IloCplex is the class used to create and solve LP (linear program), QP (program with quadratic terms in the objective function), QCP (quadratically constrained ...Missing: parallel | Show results with:parallel
  118. [118]
    [PDF] User's Manual for CPLEX - IBM
    This guide contains important information on the procedures and practices followed in the service and support of your IBM products.
  119. [119]
    ga - Find minimum of function using genetic algorithm - MATLAB
    Use the genetic algorithm to minimize an integer-constrained nonlinear problem. Obtain both the location of the minimum and the minimum function value. The ...
  120. [120]
    Problem types MOSEK can solve
    Strengths and features of MOSEK · The strongest point of MOSEK is its state-of-the-art interior-point optimizer for continuous linear, quadratic and conic ...
  121. [121]
    12.2 Conic Optimization — MOSEK Optimizer API for Python 11.0.29
    Conic optimization extends linear optimization, allowing conic domains to be specified for affine expressions, using a Cartesian product of conic domains.
  122. [122]
    [PDF] MOSEK Optimization Suite - Documentation
    An interior-point conic solver for linear and nonlinear continuous problems, conic problems and quadratic problems. • A mixed-integer solver when the problem ...
  123. [123]
    SAP Partners with Gurobi to Enhance and Expand Optimization ...
    May 13, 2020 · Under the 10-Year Enterprise Agreement, SAP confirmed Gurobi as the Premier, Long-Term Supplier for Mathematical Optimization Technology.Missing: CPLEX 2020s
  124. [124]
    SAP: Mastering Supply Chain Challenges Through Complex ...
    With Gurobi, SAP helps their customers automate enterprise resource planning and optimize supply chain networks to deliver optimal outcomes for any variety ...
  125. [125]
    [2301.09535] Theory and Implementation of the Quantum ... - arXiv
    Jan 23, 2023 · Theory and Implementation of the Quantum Approximate Optimization Algorithm: A Comprehensible Introduction and Case Study Using Qiskit and IBM ...
  126. [126]
    D-Wave - The Birth of the World's Most Powerful Quantum Hub
    Oct 14, 2025 · D-Wave is a leader in the development and delivery of quantum computing systems, software, and services. We are the world's first commercial ...
  127. [127]
    Optuna: A Next-generation Hyperparameter Optimization Framework
    Jul 25, 2019 · The purpose of this study is to introduce new design-criteria for next-generation hyperparameter optimization software.
  128. [128]
    A Research Platform for Distributed Model Selection and Training
    Jul 13, 2018 · We propose Tune, a unified framework for model selection and training that provides a narrow-waist interface between training scripts and search algorithms.
  129. [129]
    Creation and analysis of biochemical constraint-based models
    The COBRA Toolbox 3.0 provides an unparalleled depth of constraint-based reconstruction and analysis methods. Keywords: Metabolic models, metabolic ...
  130. [130]
    JuMP: A Modeling Language for Mathematical Optimization - arXiv
    Aug 9, 2015 · JuMP is an open-source modeling language that allows users to express a wide range of optimization problems (linear, mixed-integer, quadratic, conic-quadratic, ...