Fact-checked by Grok 2 weeks ago

Mathematical optimization

Mathematical optimization, also known as mathematical programming, is a subfield of applied mathematics dedicated to selecting the best element from a set of available alternatives by systematically finding values of decision variables that minimize or maximize a real-valued objective function while satisfying a set of constraints.^[1]^[2] These problems can be formulated as \min f_0(x) subject to inequality constraints f_i(x) \leq b_i and equality constraints, where x \in \mathbb{R}^n represents the variables, f_0 is the objective function, and the f_i define the feasible region.^[3] The field encompasses diverse problem classes, including linear programming (where the objective and constraints are linear), nonlinear programming (with nonlinear functions), integer programming (requiring integer variables), and convex optimization (where the objective and feasible set are convex, enabling efficient global solutions).^[3]^[4] Historical development traces back to the 1700s with theories for unconstrained optimization by Fermat, Newton, and Euler, followed by Lagrange's 1797 work on equality-constrained problems; modern milestones include Dantzig's 1947 simplex method for linear programming and Karmarkar's 1984 interior-point algorithms, which spurred polynomial-time solutions for various classes.^[3] Applications of mathematical optimization span numerous domains, including engineering design (e.g., minimizing material weight in structures), economics (e.g., resource allocation and equilibrium models), finance (e.g., portfolio optimization via Markowitz's mean-variance framework), transportation (e.g., minimizing shipping costs), machine learning (e.g., training support vector machines), and operations research (e.g., scheduling and inventory control).^[1]^[3]^[5] These techniques leverage algorithms like gradient descent for unconstrained problems and specialized solvers for constrained ones, often implemented in software such as JuMP for modeling complex scenarios.^[6]

Fundamental Concepts

Optimization Problems

Mathematical optimization seeks to determine the values of decision variables that either minimize or maximize a given objective function while satisfying a set of constraints. The general formulation of such a problem is to minimize (or equivalently, maximize) an objective function f: \mathbb{R}^n \to \mathbb{R} over decision variables x \in \mathbb{R}^n, subject to inequality constraints g_i(x) \leq 0 for i = 1, \dots, m and equality constraints h_j(x) = 0 for j = 1, \dots, p.^[7] Here, the decision variables x represent the unknowns to be determined, and the constraints define the allowable values for x.^[8] Optimization problems are classified based on the nature of the decision variables. In continuous optimization, the variables x take real values, allowing for solutions in \mathbb{R}^n.^[4] Discrete optimization, in contrast, requires some or all variables to belong to a discrete set, such as integers, leading to a finite (though potentially large) number of possible solutions.^[9] Mixed-integer problems combine both continuous and discrete variables, blending the characteristics of the two categories.^[10] Standard forms of optimization problems vary by the types of constraints imposed. An unconstrained problem simply minimizes f(x) without any restrictions on x, such as \min_x x^2, where the objective is a quadratic function.^[7] Equality-constrained problems include only equality constraints, like \min_x f(x) subject to h(x) = 0. Inequality-constrained problems incorporate inequality constraints, for example, \min_x x subject to x \geq 1, which limits the decision variable to non-negative values.^[11] These forms provide a structured way to express diverse applications in fields like engineering and economics. The feasible set, also known as the feasible region, consists of all decision variables x that satisfy the given constraints, forming the domain over which the objective function is evaluated.^[12] In continuous problems, this set is often a convex polyhedron or other geometric region in \mathbb{R}^n; in discrete cases, it is a finite collection of points.^[8] The optimal solution, if it exists, lies within this feasible set and may be local or global, depending on the problem's structure.

Notation and Terminology

Mathematical optimization problems are typically formulated in a standard minimization form, given by

\begin{align*} \min_{x \in X} \quad & f(x) \\ \text{subject to} \quad & g_i(x) \leq 0, \quad i = 1, \dots, m, \\ & h_j(x) = 0, \quad j = 1, \dots, p, \end{align*}

where x \in \mathbb{R}^n represents the decision variables, f: \mathbb{R}^n \to \mathbb{R} is the objective function to be minimized, g_i: \mathbb{R}^n \to \mathbb{R} are inequality constraint functions, and h_j: \mathbb{R}^n \to \mathbb{R} are equality constraint functions.^[13] The set X \subseteq \mathbb{R}^n denotes the domain, often \mathbb{R}^n itself. Maximization problems are equivalently handled by minimizing the negative of the objective, i.e., \min_x -f(x) for \max_x f(x).^[13] The feasible set, denoted \Omega = \{ x \in X \mid g_i(x) \leq 0, \, i=1,\dots,m; \, h_j(x) = 0, \, j=1,\dots,p \}, consists of all points satisfying the constraints.^[13] Standard notation for optimal points and values includes \arg\min_x f(x), the set of points x^* achieving the minimum; \inf f(x), the greatest lower bound of f over the domain (equal to the minimum if attained); and \sup f(x), the least upper bound (used analogously for maximization).^[13] If \inf f(x) = -\infty over \Omega, the problem is unbounded below; otherwise, it is bounded.^[13] Key terms include convexity: a function f is convex if f(\lambda x + (1-\lambda) y) \leq \lambda f(x) + (1-\lambda) f(y) for \lambda \in [0,1], ensuring any local minimum is global in convex problems; concave functions satisfy the reverse inequality and are relevant for maximization objectives.^[13] For constrained convex optimization, Slater's condition provides a constraint qualification ensuring strong duality: there exists a strictly feasible point \tilde{x} \in \Omega such that g_i(\tilde{x}) < 0 for all i with convex g_i, and equalities hold.^[13]

Local and Global Optima

In mathematical optimization, a global minimum of an objective function f: \Omega \to \mathbb{R} over a feasible set \Omega is a point x^* \in \Omega such that f(x^*) \leq f(x) for all x \in \Omega.^[14] Similarly, a global maximum satisfies f(x^*) \geq f(x) for all x \in \Omega.^[14] These represent the overall best and worst values of f across the entire domain. In contrast, a local minimum is a point x^* \in \Omega for which there exists a neighborhood N around x^* such that f(x^*) \leq f(x) for all x \in N \cap \Omega.^[14] Local maxima are defined analogously. Local optima may not coincide with global ones, particularly in nonconvex problems where multiple basins of attraction exist.^[15] Optima are further classified as strict or non-strict. A strict local minimum satisfies f(x^*) < f(x) for all x \in N \cap \Omega \setminus \{x^*\}, implying uniqueness within the neighborhood. Non-strict optima allow equality with nearby points, such as along a flat region. Saddle points are critical points that are neither local minima nor maxima, where the function increases in some directions and decreases in others.^[14] Critical points form the foundation for identifying potential optima. In unconstrained optimization over an open set, a critical point x^* satisfies \nabla f(x^*) = 0, where \nabla f is the gradient.^[14] For constrained problems, critical points are those satisfying the Karush-Kuhn-Tucker (KKT) conditions, which generalize stationarity to include Lagrange multipliers for equality and inequality constraints. These conditions, introduced by Kuhn and Tucker, require primal feasibility, dual feasibility, complementary slackness, and stationarity of the Lagrangian. Consider the quadratic function f(x) = x^2 over \mathbb{R}. Here, x^* = [0](/page/0) is both a strict local and global minimum, with f(0) = 0 and f(x) > 0 for x \neq 0.^[14] For a multimodal example, f(x) = \sin(x) + \frac{x^2}{10} over \mathbb{R} has multiple local minima near points where \cos(x) + \frac{x}{5} = 0, but the global minimum occurs at the leftmost such point, approximately x \approx -1.2, due to the dominating quadratic term.^[15] Optima may be attainable or unattainable. An attainable optimum is achieved at some x^* \in \Omega, so the minimum value equals the infimum \inf_{x \in \Omega} f(x). Unattainable optima occur when the infimum is not achieved, such as for f(x) = e^x over \mathbb{R}, where \inf f(x) = 0 but no x realizes it. This distinction arises in open or unbounded feasible sets, impacting algorithm convergence and problem well-posedness.

Historical Development

Early History

The origins of mathematical optimization trace back to ancient civilizations, where geometric and practical problems implicitly involved finding extrema or efficient allocations. In ancient Greece, Euclid's Elements (circa 300 BCE) addressed early forms of optimization through geometric constructions that maximized or minimized quantities. For instance, Book III, Proposition 15, demonstrates that among straight lines in a circle, the diameter is the longest, and chords nearer the center are greater than those farther away, establishing a foundational result on maximizing line segments within a bounded region.^[16] Similarly, in ancient China, The Nine Chapters on the Mathematical Art (circa 200 BCE) included problems on resource allocation, such as fair division of land and labor in Chapter 6 ("Fair Taxes") and solving systems of equations in Chapter 7 ("Excess and Deficit") to optimize distributions under constraints.^[17] In the 17th century, optimization ideas emerged in the context of physics and optics. Pierre de Fermat's principle of least time, proposed in 1662, stated that light travels between two points along the path that minimizes travel time, unifying laws of reflection and refraction under a variational framework and foreshadowing modern optimization principles.^[18] Concurrently, Isaac Newton developed his method for root-finding in the 1670s, an iterative technique to approximate solutions to equations, which later found applications in optimization by solving for critical points where derivatives vanish.^[19] The 18th century marked a pivotal advancement with the formalization of the calculus of variations by Leonhard Euler and Joseph-Louis Lagrange. Euler's 1744 treatise Methodus inveniendi lineas curvas maximi minimive proprietate gaudentes introduced systematic methods to find curves extremizing functionals, including solutions to isoperimetric problems that maximize area for a fixed perimeter.^[20] Lagrange extended this in his 1788 Mécanique Analytique, developing the Euler-Lagrange equations for constrained variational problems and applying them to mechanics, thereby laying the groundwork for constrained optimization. By the early 19th century, precursors to linear programming appeared in Joseph Fourier's 1823 work on solving systems of linear inequalities, providing an elimination method to determine feasible regions and optimize linear objectives over polyhedra.^[21] Additionally, the method of least squares, independently developed by Adrien-Marie Legendre in 1805 and Carl Friedrich Gauss (who claimed prior invention around 1795), minimized the sum of squared residuals for data fitting in astronomy, establishing a cornerstone of regression and parameter estimation in optimization.^[22]

Modern Foundations

The modern foundations of mathematical optimization were laid in the 1930s and 1940s through pioneering work on linear programming for resource allocation problems. Leonid Kantorovich developed the foundational ideas of linear programming in his 1939 monograph, formulating methods to optimize production planning under linear constraints, which anticipated key concepts like shadow prices. Independently, Tjalling Koopmans advanced similar ideas around 1942 in a memorandum on activity analysis, later elaborated in his 1951 edited volume, applying linear models to economic allocation during wartime logistics efforts. These contributions established linear programming as a tool for efficient resource use, earning Kantorovich and Koopmans the 1975 Nobel Prize in Economic Sciences. George Dantzig then introduced the simplex method in 1947, providing an efficient algorithmic framework for solving linear programs by traversing vertices of the feasible polyhedron, which became the cornerstone of practical implementation. Following World War II, the field experienced a surge through the operations research (OR) movement, driven by military applications in logistics and planning. This era saw the integration of optimization with emerging computational tools, fostering interdisciplinary growth. John von Neumann's 1928 minimax theorem, which guarantees the existence of optimal mixed strategies in zero-sum games, found direct applications in the 1940s to economic and strategic decision-making, notably in his 1944 collaboration on game theory, linking adversarial optimization to broader economic models. The 1950s marked the emergence of nonlinear programming, extending optimization to non-linear objectives and constraints. The Karush-Kuhn-Tucker (KKT) conditions, first derived by William Karush in his 1939 master's thesis and independently published by Harold Kuhn and Albert Tucker in 1951, provide necessary optimality criteria for constrained nonlinear problems under constraint qualifications, generalizing Lagrange multipliers to inequalities. This period also saw Richard Bellman's introduction of dynamic programming in 1957, a recursive approach for solving multistage decision problems by breaking them into subproblems, revolutionizing sequential optimization in control and economics. A pivotal event was the inaugural International Symposium on Mathematical Programming in 1949, which unified researchers and solidified the field as a distinct discipline.^[23] Further advancements included Anthony Fiacco and Garth McCormick's 1968 development of sequential unconstrained minimization techniques (SUMT), which transform constrained nonlinear programs into sequences of unconstrained problems via penalty functions, laying groundwork for modern interior-point and sequential quadratic programming methods.

Contemporary Advances

In the 1980s and 1990s, interior-point methods marked a pivotal advancement in linear programming, with Narendra Karmarkar's 1984 algorithm introducing a polynomial-time approach that efficiently navigated the interior of the feasible region, surpassing the limitations of the simplex method for large-scale problems. This innovation spurred widespread adoption in operations research and computational mathematics, enabling solutions to industrial-scale linear programs with improved scalability and theoretical guarantees.^[24] By the 2000s, extensions of these methods extended to convex quadratic and semidefinite programming, facilitating applications in finance, logistics, and engineering design.^[25] The 2010s witnessed the surge of stochastic optimization techniques in machine learning, driven by the need to handle massive datasets in deep learning. Stochastic gradient descent variants, such as the Adam optimizer introduced in 2014, combined adaptive learning rates with momentum to accelerate convergence in nonconvex landscapes, becoming a standard for training neural networks on tasks like image recognition and natural language processing.^[26] Concurrently, automatic differentiation emerged as a cornerstone for solving deep learning optimization challenges, enabling efficient computation of gradients through complex computational graphs via frameworks like Theano (2010) and TensorFlow (2015), which automated reverse-mode differentiation for scalable backpropagation. These tools addressed the computational bottlenecks in training deep architectures, achieving state-of-the-art performance on benchmarks such as ImageNet with reduced engineering effort.^[27] Integration with artificial intelligence further propelled optimization in the late 2010s, exemplified by neural architecture search (NAS) methods starting around 2016, which automated the design of neural network topologies using reinforcement learning or evolutionary strategies to optimize performance metrics like accuracy and efficiency.^[28] NAS frameworks, such as those employing recurrent networks to explore architectural spaces, yielded models outperforming hand-crafted designs on datasets like CIFAR-10, with applications spanning computer vision and beyond.^[29] Robust optimization, initially formalized by Dimitris Bertsimas and colleagues in the early 2000s for handling parameter uncertainty, matured in the 2020s through data-driven enhancements that incorporated machine learning for uncertainty sets, improving decision-making under ambiguity in fields like supply chain management and energy systems.^[30] These advancements, including two-stage sample robust formulations, provided tractable approximations with probabilistic guarantees, demonstrating superior out-of-sample performance compared to stochastic methods in high-dimensional settings.^[31] In the 2020s, quantum optimization algorithms gained traction, with the quantum approximate optimization algorithm (QAOA), proposed in 2014, seeing practical implementations on noisy intermediate-scale quantum devices for combinatorial problems like MaxCut, often extending principles from Grover's search algorithm to achieve quadratic speedups in unstructured optimization.^[32] Recent extensions, such as quantum search variants for continuous optimization, have demonstrated proven speedups over classical methods on small-scale problems with up to a few degrees of freedom, such as in robotic manipulator tasks, leveraging amplitude amplification for hybrid quantum-classical solvers.^[33] Parallel to this, distributed optimization techniques addressed big data challenges, with algorithms like doubly accelerated methods enabling parallel local computations across clusters to minimize communication overhead, achieving convergence rates competitive with centralized approaches on terabyte-scale datasets in distributed machine learning. Contemporary applications increasingly emphasize sustainability, particularly in climate modeling, where optimization integrates with AI to calibrate parameters in Earth system simulations, such as deriving data-driven closure models for atmospheric dynamics that enhance predictive accuracy for global warming scenarios.^[34] These efforts, including multi-objective formulations for emission reduction pathways, have supported policy decisions by optimizing trade-offs in renewable energy deployment and carbon capture under uncertain climate projections.^[35] Heuristic methods have evolved alongside these developments, with metaheuristic hybrids providing robust approximations for nonconvex sustainability problems where exact solutions remain intractable.

Theoretical Foundations

Feasibility and Existence

In mathematical optimization, the feasible set, denoted as \Omega, consists of all points x in the domain that satisfy the problem's constraints, such as equalities h_i(x) = 0 and inequalities g_j(x) \leq 0.^[14] If \Omega is empty, the optimization problem is infeasible, meaning no solution exists that meets all constraints simultaneously.^[36] For instance, consider the linear program \min x subject to x \geq 1 and x \leq 0; the constraints contradict each other, resulting in an empty feasible set.^[36] Constraint qualifications, such as the Mangasarian-Fromovitz constraint qualification (MFCQ), ensure that the feasible set possesses desirable geometric properties, like a non-empty interior relative to the active constraints at certain points, which is crucial for the well-posedness of the problem.^[37] Introduced by Mangasarian and Fromovitz, MFCQ requires that the gradients of the equality constraints are linearly independent and that there exists a direction satisfying strict inequality for the active inequality constraints.^[37] This qualification helps in analyzing the structure of \Omega without singularities that could complicate feasibility assessments.^[38] Even when \Omega is non-empty, an optimization problem may lack a solution if the objective function f does not attain its infimum over \Omega. The Weierstrass extreme value theorem guarantees existence under suitable conditions: if f: \mathbb{R}^n \to \mathbb{R} is continuous and \Omega is compact (closed and bounded), then f attains both its minimum and maximum on \Omega.^[39]

\min_{x \in \Omega} f(x) = f(x^*) \quad \text{for some } x^* \in \Omega,

where the infimum is achieved.^[39] For unbounded feasible sets, such as \Omega = \mathbb{R}^n, compactness fails, but coercivity of f—defined as \lim_{\|x\| \to \infty} f(x) = +\infty—ensures a global minimizer exists, as minimizing sequences remain bounded and thus have convergent subsequences yielding the minimum.^[39]^[40] Problems can also be unbounded below, where \inf_{x \in \Omega} f(x) = -\infty, precluding a minimum despite a non-empty \Omega. For example, the linear program \min -x subject to x \geq 0 has feasible points but no minimum, as f(x) \to -\infty along the ray x \to +\infty.^[36] In such cases, the infimum is not attained, highlighting the distinction between the infimum value and an actual minimizer in the feasible set.^[39]

Optimality Conditions

Optimality conditions provide necessary and sufficient criteria for identifying local and global optima in mathematical optimization problems, serving as foundational tools for theoretical analysis and algorithmic verification. For unconstrained optimization problems, where the objective is to minimize a differentiable function f: \mathbb{R}^n \to \mathbb{R}, the first-order necessary condition for a point x^* to be a local minimum is that the gradient vanishes at x^*, i.e., \nabla f(x^*) = 0. This condition arises from the requirement that the directional derivative in any feasible direction must be non-negative at a local optimum. Under twice differentiability assumptions, the second-order necessary condition further requires that the Hessian matrix H(x^*) = \nabla^2 f(x^*) is positive semi-definite, meaning d^T H(x^*) d \geq 0 for all directions d \in \mathbb{R}^n. Conversely, if H(x^*) is positive definite (d^T H(x^*) d > 0 for all d \neq 0), then x^* is a strict local minimum, providing a second-order sufficient condition. In the presence of equality constraints, consider the problem of minimizing f(x) subject to g_i(x) = 0 for i = 1, \dots, m, where g_i: \mathbb{R}^n \to \mathbb{R} are differentiable. The method of Lagrange multipliers introduces the Lagrangian \mathcal{L}(x, \lambda) = f(x) + \sum_{i=1}^m \lambda_i g_i(x), and the first-order necessary conditions require the existence of multipliers \lambda \in \mathbb{R}^m such that \nabla_x \mathcal{L}(x^*, \lambda) = 0 and \nabla_\lambda \mathcal{L}(x^*, \lambda) = 0, or equivalently, \nabla f(x^*) + \sum_{i=1}^m \lambda_i \nabla g_i(x^*) = 0 and g_i(x^*) = 0 for all i. These conditions ensure that at the optimum, the gradients of the active constraints lie in the span of the objective gradient, aligning the stationarity requirement with the constraint manifold. Second-order conditions extend this by requiring the Hessian of the Lagrangian, projected onto the tangent space of the constraints, to be positive semi-definite for necessity and positive definite for sufficiency. For problems involving inequality constraints, minimize f(x) subject to g_i(x) \leq 0 for i = 1, \dots, m and h_j(x) = 0 for j = 1, \dots, p, the Karush-Kuhn-Tucker (KKT) conditions generalize the Lagrange framework to handle both equalities and inequalities under suitable constraint qualifications, such as Slater's condition. The KKT conditions consist of stationarity, primal feasibility, dual feasibility, and complementary slackness, stated as follows for a local minimum x^* with associated multipliers \lambda \in \mathbb{R}^m_{\geq 0} and \mu \in \mathbb{R}^p:

\begin{align*} \nabla f(x^*) + \sum_{i=1}^m \lambda_i \nabla g_i(x^*) + \sum_{j=1}^p \mu_j \nabla h_j(x^*) &= 0, \\ g_i(x^*) &\leq 0, \quad i=1,\dots,m, \\ h_j(x^*) &= 0, \quad j=1,\dots,p, \\ \lambda_i &\geq 0, \quad i=1,\dots,m, \\ \lambda_i g_i(x^*) &= 0, \quad i=1,\dots,m. \end{align*}

Here, the multipliers \lambda_i for inequalities are non-negative, reflecting the directional nature of the constraints, and complementary slackness enforces that \lambda_i > 0 only for active inequalities (g_i(x^*) = 0). These conditions are necessary for local optimality under constraint qualifications and sufficient for global optimality in convex problems. In convex optimization, where the objective f and feasible set are convex, any point satisfying the first-order (KKT) conditions is a global optimum, as local minima coincide with global ones due to the absence of local traps in the convex landscape. This property, established through the convexity of sublevel sets and epigraphs, ensures that verification via optimality conditions yields the overall solution without exhaustive search.

Duality and Sensitivity

In mathematical optimization, duality provides a framework for reformulating a primal optimization problem into a dual problem, offering insights into bounds, optimality, and economic interpretations. The Lagrangian dual is constructed for a primal problem of minimizing an objective function subject to inequality and equality constraints. Consider the primal problem:

\begin{align*} \min_{x} \quad & f(x) \\ \text{subject to} \quad & g_i(x) \leq 0, \quad i = 1, \dots, m, \\ & h_j(x) = 0, \quad j = 1, \dots, p, \end{align*}

where f: \mathbb{R}^n \to \mathbb{R} is the objective, g_i: \mathbb{R}^n \to \mathbb{R} are inequality functions, and h_j: \mathbb{R}^n \to \mathbb{R} are equality functions. The Lagrangian is defined as L(x, \lambda, \mu) = f(x) + \sum_{i=1}^m \lambda_i g_i(x) + \sum_{j=1}^p \mu_j h_j(x), with \lambda \geq 0. The dual function is g(\lambda, \mu) = \inf_x L(x, \lambda, \mu), and the dual problem is \max_{\lambda, \mu} g(\lambda, \mu) subject to \lambda \geq 0.^[41] Weak duality states that the primal optimal value is at least as large as the dual optimal value, i.e., f^* \geq g^*, for any feasible primal and dual points. This holds generally, providing a lower bound on the primal optimum via dual solutions. Strong duality, where f^* = g^* and the duality gap is zero, requires additional conditions; for convex problems satisfying Slater's condition—a strict feasibility point exists for inequalities—the duality gap vanishes, enabling exact recovery of the primal optimum from the dual.^[41] Sensitivity analysis examines how the optimal value and solutions change with perturbations in problem data, such as objective coefficients or right-hand sides of constraints. Shadow prices, given by the optimal dual variables, quantify these changes; for a constraint Ax \geq b, the shadow price y^* satisfies \frac{\partial f^*}{\partial b_k} = y_k^* under strong duality, indicating the marginal improvement in the objective per unit increase in b_k. This follows from the envelope theorem, which states that the derivative of the optimal value function with respect to a parameter equals the partial derivative of the Lagrangian with respect to that parameter, evaluated at the optimum, ignoring indirect effects through the optimizer.^[41] A canonical example is linear programming duality. The primal is \min c^\top x subject to Ax \geq b, x \geq 0, with optimal value f^*. The dual is \max b^\top y subject to A^\top y \leq c, y \geq 0, with optimal value g^*. Strong duality holds (f^* = g^*) without further conditions due to the problem's structure, and the dual variables y^* serve as shadow prices for resource constraints in b. For instance, in production planning, y_k^* represents the value of an additional unit of resource k. (Note: This is a draft PDF of Bertsimas & Tsitsiklis; official book: Athena Scientific, 1997) Under suitable regularity conditions, such as differentiability of the objective and constraints, the optimal value function is Lipschitz continuous with respect to perturbations, and the set of optimal solutions exhibits upper hemicontinuity. For convex problems with Slater's condition, the optimal solution set is nonempty and compact for nearby perturbations, ensuring stability. These properties are central to perturbation analysis, linking duality to robustness in applications like control and economics.^[42]

Major Subfields

Linear and Convex Optimization

Linear programming is a fundamental subfield of mathematical optimization that involves minimizing or maximizing a linear objective function subject to linear equality and inequality constraints. The standard form of a linear program is given by

\begin{align*} \min &\quad \mathbf{c}^\top \mathbf{x} \\ \text{s.t.} &\quad A\mathbf{x} = \mathbf{b}, \\ &\quad \mathbf{x} \geq \mathbf{0}, \end{align*}

where \mathbf{c} \in \mathbb{R}^n, A \in \mathbb{R}^{m \times n}, \mathbf{b} \in \mathbb{R}^m, and \mathbf{x} \in \mathbb{R}^n. This formulation assumes non-negativity constraints, with inequalities convertible to equalities via slack variables.^[43]^[12] The simplex method, developed by George Dantzig in 1947, solves linear programs by traversing vertices of the feasible polytope defined by the constraints, starting from an initial basic feasible solution and pivoting to improve the objective until optimality.^[44] Interior-point methods, which navigate the interior of the feasible region using barrier functions to approach the optimum, emerged as alternatives offering better practical performance for large-scale problems.^[24] A landmark advancement was Narendra Karmarkar's 1984 algorithm, which provided the first polynomial-time interior-point method for linear programming, running in O(n^{3.5} L) time where n is the number of variables and L the bit length of the input, sparking widespread development of efficient solvers.^[45] Convex optimization generalizes linear programming to problems where the objective function and constraints are convex, ensuring that local optima are global and enabling efficient solvability. A set C \subseteq \mathbb{R}^n is convex if for any \mathbf{x}, \mathbf{y} \in C and \theta \in [0,1], the point \theta \mathbf{x} + (1-\theta) \mathbf{y} \in C. A function f: \mathbb{R}^n \to \mathbb{R} is convex if its epigraph \{(\mathbf{x}, t) \mid f(\mathbf{x}) \leq t\} is a convex set, or equivalently, if f(\theta \mathbf{x} + (1-\theta) \mathbf{y}) \leq \theta f(\mathbf{x}) + (1-\theta) f(\mathbf{y}) for all \mathbf{x}, \mathbf{y} in the domain and \theta \in [0,1].^[41] The epigraph formulation transforms a convex problem \min f_0(\mathbf{x}) subject to f_i(\mathbf{x}) \leq 0 into an equivalent form \min t subject to ( \mathbf{x}, t ) \in \text{epi } f_0 and affine constraints, facilitating analysis and solution via linear approximations.^[7] Jensen's inequality states that for a convex function f and weights \lambda_i \geq 0 summing to 1, f(\sum_i \lambda_i \mathbf{x}_i) \leq \sum_i \lambda_i f(\mathbf{x}_i), with equality if f is affine on the convex hull of the \mathbf{x}_i. This property underpins many convexity proofs and applications in optimization.^[41] Key properties of convex optimization include solution uniqueness under strict convexity of the objective, where any local minimum is the unique global minimum, and separability, where the objective decomposes as f(\mathbf{x}) = \sum_k f_k(\mathbf{x}_k) across blocks \mathbf{x}_k, allowing decomposition into subproblems for parallel or distributed solving.^[41]^[46] Semidefinite programming extends convex optimization by optimizing linear functions over spectrahedra, the feasible sets defined by linear matrix inequalities F(\mathbf{x}) \succeq 0 where F(\mathbf{x}) is affine in \mathbf{x} and \succeq 0 denotes positive semidefiniteness, capturing problems like maximum cut approximations beyond linear constraints.^[47] A classic example is the diet problem, formulated as minimizing the cost \sum_j c_j x_j where x_j is the amount of food j, subject to \sum_j a_{ij} x_j \geq b_i for nutrients i with requirements b_i, and x_j \geq 0; George Stigler posed this in 1945, solvable via linear programming to find minimal-cost nutrient-adequate diets.^[48]

Nonlinear and Nonconvex Optimization

Nonlinear optimization encompasses problems where the objective function or constraints are nonlinear, often leading to complex landscapes with multiple local optima rather than the unique global solutions typical of linear or convex cases. These problems arise in diverse fields such as engineering design, chemical process modeling, and economic equilibrium analysis, where the nonlinearity captures realistic phenomena like saturation effects or interaction terms. Unlike convex optimization, which guarantees global optimality under mild conditions, nonlinear problems generally converge only to local optima, necessitating careful algorithm selection to navigate saddle points and ensure practical convergence.^[14] In unconstrained nonlinear optimization, the goal is to minimize a differentiable function f: \mathbb{R}^n \to \mathbb{R} without restrictions on the variables. Newton's method is a foundational second-order approach that approximates the function quadratically using the gradient \nabla f(x) and Hessian matrix H(x) = \nabla^2 f(x), yielding the update direction d_k = -H(x_k)^{-1} \nabla f(x_k). This step leverages curvature information for rapid quadratic convergence near a local minimum satisfying second-order optimality conditions, though it can fail far from solutions due to indefinite Hessians or high computational cost in inverting large matrices.^[49]^[50] To mitigate these issues, trust-region methods restrict the step to a neighborhood where the quadratic model remains trustworthy, solving a subproblem \min_d m_k(d) = f(x_k) + \nabla f(x_k)^T d + \frac{1}{2} d^T H(x_k) d subject to \|d\| \leq \Delta_k, with trust radius \Delta_k adjusted based on the ratio of actual to predicted function decrease. These methods ensure global convergence for sufficiently small steps and handle nonconvexity better than pure Newton iterations by incorporating line searches or regularization.^[51] For problems with nonlinear constraints, such as \min f(x) subject to c(x) = 0 (equality-constrained), sequential quadratic programming (SQP) iteratively solves quadratic approximations of the Lagrangian, forming subproblems \min_d \nabla f(x_k)^T d + \frac{1}{2} d^T H_k d subject to \nabla c(x_k)^T d + c(x_k) = 0, where H_k approximates the Hessian of the Lagrangian. This approach exploits the structure of quadratic programs for efficiency and achieves superlinear convergence under constraint qualifications, making it suitable for medium-scale nonlinear programs in aerospace trajectory optimization.^[52]^[53] Inequality-constrained nonlinear optimization, \min f(x) subject to g(x) \leq 0, often employs barrier methods, which transform the problem into a sequence of unconstrained ones by adding a logarithmic barrier term -\mu \sum \log(-g_i(x)) to the objective, with barrier parameter \mu > 0 decreasing toward zero. This interior-point technique keeps iterates feasible and feasible, converging to the optimum while avoiding boundary violations, though it requires careful handling of ill-conditioning near degeneracy.^[54]^[55] Nonconvex nonlinear optimization introduces significant challenges due to the proliferation of local minima, saddle points, and flat regions, where algorithms may converge to suboptimal solutions depending on initialization. For instance, verifying global optimality is often NP-hard, as seen in the traveling salesman problem (TSP), a classic nonconvex combinatorial optimization task that requires finding the shortest tour visiting each city once, with the continuous relaxation exhibiting multiple local minima that trap gradient-based methods. These issues underscore the need for hybrid strategies combining local search with randomization, though deterministic guarantees remain elusive without additional problem structure.^[56] A representative benchmark for testing nonlinear optimization algorithms is the Rosenbrock function, defined as f(x_1, x_2) = 100(x_2 - x_1^2)^2 + (1 - x_1)^2, which features a narrow, curved valley leading to the global minimum at (1, 1). This nonconvex test case challenges methods on slow convergence along the valley and sensitivity to step sizes, highlighting the trade-offs in unconstrained solvers like Newton's method versus more robust trust-region variants.^[57]

Stochastic and Robust Optimization

Stochastic optimization addresses decision-making problems where the objective function or constraints involve random variables, typically formulated as minimizing the expected value of a cost function over uncertain parameters. In this framework, the goal is to solve problems of the form \min_x \mathbb{E}_{\xi}[f(x, \xi)], where x is the decision variable, \xi represents the random uncertainty, and the expectation is taken with respect to the distribution of \xi. This approach originated in the mid-20th century with early works on linear programming under uncertainty, such as Dantzig's 1955 paper introducing recourse actions in stochastic linear programs.^[58] Stochastic programming has since become essential in fields like operations research and finance, where exact distributions may be known or assumed, allowing for risk-neutral decisions based on averages over possible outcomes.^[59] A key computational technique in stochastic optimization is the sample average approximation (SAA) method, which replaces the intractable expectation with an empirical mean from a finite set of scenarios drawn from the distribution of \xi. This yields an approximate problem \min_x \frac{1}{N} \sum_{i=1}^N f(x, \xi_i), where N is the number of samples, transforming the stochastic problem into a deterministic one that can be solved using standard optimization solvers. Under suitable conditions, such as convexity and finite moments, SAA solutions converge to the true optimum as N increases, with error bounds established through large deviation principles.^[60] This method is particularly effective for two-stage stochastic programs, where first-stage decisions are made before uncertainty is revealed, followed by second-stage recourse actions.^[61] In contrast, robust optimization focuses on worst-case performance over an uncertainty set \mathcal{U}, formulated as \min_x \max_{\xi \in \mathcal{U}} f(x, \xi), ensuring solutions remain feasible and near-optimal even under adversarial realizations of uncertainty. This paradigm, developed prominently in the late 1990s, provides tractable reformulations for common uncertainty sets like ellipsoids or polyhedra, converting the min-max problem into a convex program solvable in polynomial time.^[62] Robust methods are conservative by design, prioritizing protection against the most detrimental outcomes rather than probabilistic averages, and have been extended to conic and semidefinite programs for broader applicability.^[63] Common techniques in these areas include scenario-based approaches, which generate a finite set of plausible uncertainty realizations to approximate either expectations or worst cases, and chance-constrained formulations that enforce probabilistic guarantees on constraints. Scenario methods, as in the scenario approach, draw independent samples to construct convex programs with explicit violation probability bounds, scaling well for high-dimensional problems.^[64] Chance constraints, such as P(g(x, \xi) \leq 0) \geq 1 - \alpha, limit the risk of constraint violation to at most \alpha, with tractable approximations available for joint or individual probabilities under log-concave distributions. These techniques bridge stochastic and robust paradigms, often combining sampling with uncertainty sets for practical implementation.^[65] A classic application is portfolio optimization under uncertain asset returns, extending Markowitz's 1952 mean-variance model to handle estimation errors in expected returns and covariances. In stochastic versions, portfolios minimize expected shortfall over return distributions, while robust extensions consider worst-case returns within ellipsoidal uncertainty sets around nominal estimates, yielding more stable allocations with controlled downside risk.^[66] Such formulations have demonstrated improved out-of-sample performance compared to classical models, particularly in volatile markets.^[67] In the 2010s, distributionally robust optimization emerged as a unifying framework, optimizing over ambiguity sets of possible distributions rather than a single nominal one, such as moment-based sets containing all distributions with bounded mean and variance. This addresses distributional ambiguity in data-driven settings, with tractable reformulations via duality yielding conservative yet computationally efficient solutions. Seminal work showed that for linear problems, these lead to tightened robust counterparts with explicit conservatism levels. DRO has gained traction in machine learning and supply chain management, offering a balance between stochastic flexibility and robust conservatism.^[68]

Multi-objective and Global Optimization

Multi-objective optimization addresses problems where multiple conflicting objectives must be optimized simultaneously, such as minimizing cost while maximizing performance in engineering designs. Unlike single-objective problems, solutions are rarely optimal for all objectives at once; instead, the goal is to identify trade-offs among them. A key concept is Pareto optimality, where a feasible solution x is Pareto optimal if no other feasible solution x' exists such that f_i(x') \leq f_i(x) for all objectives i = 1, \dots, k and f_j(x') < f_j(x) for at least one j.^[69] This definition ensures that improving one objective cannot occur without degrading another, forming the basis for non-dominated solutions in vector optimization.^[70] The set of all Pareto-optimal solutions in the objective space traces the efficient frontier, a curve or surface representing the best possible trade-offs, often visualized for two objectives to highlight compromises like risk versus return in portfolio selection.^[71] To generate points on this frontier, scalarization techniques convert the multi-objective problem into a single-objective one. A prominent method is the weighted sum scalarization, which minimizes \sum_{i=1}^k w_i f_i(x) subject to constraints, where weights w_i > 0 reflect objective priorities and sum to 1; this yields properly efficient solutions under convexity assumptions. In engineering, this approach optimizes trade-offs, such as minimizing production cost while maximizing structural performance in component design, where Pareto solutions balance material expenses against load-bearing capacity.^[72] Global optimization seeks the absolute best solution in problems with multimodal landscapes, particularly nonconvex ones where local minima abound, extending beyond local nonlinear solvers by guaranteeing convergence to the global optimum. Branch-and-bound methods systematically partition the feasible region and use lower bounds to prune suboptimal branches, ensuring finite termination for twice-differentiable nonconvex problems. The αBB algorithm exemplifies this by constructing convex underestimators via diagonal perturbation of the Hessian, enabling rigorous global minimization. Spatial partitioning complements this by dividing the search space into subregions, such as via triangulation, to refine bounds and focus computation on promising areas in low-dimensional bound-constrained problems.^[73] From the 1990s onward, evolutionary algorithms have advanced global search in multi-objective settings by maintaining diverse populations to approximate the entire efficient frontier without gradient reliance. The Non-dominated Sorting Genetic Algorithm (NSGA), introduced in 1995, ranks solutions by dominance levels and uses crowding distance for diversity, effectively handling nonconvex Pareto fronts in engineering applications like aerodynamic design.^[74] These methods provide robust approximations in complex, multimodal landscapes where exact global techniques are computationally intensive.

Optimization Algorithms

Exact Methods

Exact methods in mathematical optimization are algorithms designed to find globally optimal solutions with finite termination guarantees, typically for problems with linear, convex, or structured discrete objectives and constraints. These methods are particularly effective for linear programming (LP) and certain integer programming (IP) instances where the problem size permits exhaustive or polynomial-time exploration. Unlike approximation techniques, exact methods ensure optimality by systematically evaluating feasible regions or subproblems, often leveraging duality or relaxation principles to prune search spaces. They form the backbone of solvers for operations research and engineering applications, where precision is paramount over computational speed for moderate-scale instances.^[75] The simplex method, introduced by George B. Dantzig in 1947, addresses linear programming by iteratively pivoting from one basic feasible solution to an adjacent one along the edges of the feasible polyhedron, improving the objective value until optimality is reached. This tableau-based approach represents the problem in canonical form, selecting entering and leaving variables based on reduced costs and ratio tests to maintain feasibility. Despite its empirical efficiency on sparse problems, the method's worst-case complexity is exponential due to potential cycling through an unbounded number of vertices, though anti-cycling rules like Bland's rule prevent this in practice.^[44]^[75] Interior-point methods, pioneered by Narendra Karmarkar in 1984, solve linear and convex programs by traversing paths through the interior of the feasible region rather than its boundaries. These algorithms employ barrier functions, such as the logarithmic barrier for inequality constraints, to transform the constrained problem into a sequence of unconstrained minimizations solved via Newton's method. Primal-dual variants simultaneously optimize primal and dual objectives, following central paths that approximate the analytic center and converge to the optimum. Karmarkar's projective algorithm achieves polynomial-time complexity, specifically O(n^{3.5} L) arithmetic operations for an n-variable problem with L-bit input, marking a theoretical breakthrough over simplex.^[25]^[24] Branch-and-bound, formalized by Ailsa Land and Alison Doig in 1960, extends exact solving to integer programming by partitioning the feasible space into subproblems via branching on fractional variables from LP relaxations. Each node in the search tree is bounded using continuous relaxations or Lagrangian duals; branches exceeding the current best integer solution are fathomed to prune the tree. This enumerative strategy guarantees optimality by exhaustively exploring only promising regions, with enhancements like cutting planes strengthening bounds. For mixed-integer linear programs, the method's worst-case complexity is exponential, reflecting the NP-hard nature of IP, though average performance benefits from tight relaxations in structured cases.^[76]^[77] Dynamic programming, developed by Richard Bellman in the 1950s, provides an exact recursive framework for sequential decision problems, decomposing them into overlapping subproblems solved via backward or forward induction. The core Bellman equation defines the value function for stage k as

V_k(x) = \min_a \left[ c(x,a) + V_{k-1}(f(x,a)) \right],

where x is the state, a the action, c(x,a) the stage cost, and f(x,a) the transition function, with V_0(x) = 0 for the terminal stage. This principle of optimality ensures that optimal subpolicy solutions compose to a global optimum, applicable to finite-horizon problems like inventory control or shortest paths. Computationally, it requires storing value functions across states, leading to exponential space and time in the state dimension unless sparsity or decomposition is exploited.^[78] In terms of computational complexity, exact methods for linear programming achieve polynomial time via interior-point approaches, contrasting with simplex's exponential worst-case, while integer programming via branch-and-bound remains exponential in the worst case due to the combinatorial explosion of discrete choices. These guarantees underscore their suitability for verifiable optima in linear and convex subfields, though scalability limits their use to problems with up to thousands of variables.^[75]^[77]

Iterative and Gradient-Based Methods

Iterative and gradient-based methods form a cornerstone of continuous optimization, relying on derivatives to iteratively refine solutions to minimization problems. These algorithms generate a sequence of points approaching a local optimum by following the negative gradient direction, which points toward steepest descent for differentiable objectives. Unlike exact methods that terminate finitely for structured problems, iterative approaches are designed for large-scale, general nonlinear functions, often achieving approximate solutions efficiently through repeated updates. Their effectiveness stems from exploiting first- or second-order information, with convergence guaranteed under conditions like Lipschitz continuity of gradients.^[79] Gradient descent, the simplest such method, updates the iterate as x_{k+1} = x_k - \alpha_k \nabla f(x_k), where \alpha_k > 0 is a step size and \nabla f is the gradient of the objective f. Introduced by Cauchy in 1847 for solving systems via minimization, it ensures descent when \alpha_k satisfies sufficient decrease conditions.^[80] The step size \alpha_k is typically chosen via line search, such as the Armijo rule, which backtracks from an initial guess until f(x_k - \alpha_k \nabla f(x_k)) \leq f(x_k) - c \alpha_k \|\nabla f(x_k)\|^2 for parameters $0 < c < 1 and a contraction factor. This rule, proposed by Armijo in 1966, promotes global convergence for smooth functions with Lipschitz gradients.^[81] Newton's method enhances this by incorporating curvature via the Hessian matrix H(x_k) = \nabla^2 f(x_k), yielding x_{k+1} = x_k - H(x_k)^{-1} \nabla f(x_k). Originating from root-finding but adapted for optimization, it approximates the quadratic model of f locally, leading to faster progress near stationary points where second-order conditions hold, such as positive definiteness of the Hessian indicating a local minimum.^[50] For twice-differentiable functions with Lipschitz continuous Hessians, local quadratic convergence occurs: the error satisfies \|x_{k+1} - x^*\| \leq C \|x_k - x^*\|^2 for some constant C > 0 near the optimum x^*.^[82] The conjugate gradient method addresses quadratic objectives f(x) = \frac{1}{2} x^T A x - b^T x with positive definite A, avoiding explicit Hessian inversion by generating conjugate search directions. Developed by Hestenes and Stiefel in 1952, it solves the equivalent linear system A x = b in at most n iterations for n-dimensional problems, with updates x_{k+1} = x_k + \alpha_k d_k where directions d_k satisfy d_k^T A d_j = 0 for j < k, and \alpha_k = \frac{\nabla f(x_k)^T \nabla f(x_k)}{d_k^T A d_k}.^[83] For non-quadratic cases, it approximates second-order behavior with only first-order oracles, exhibiting superlinear convergence under suitable conditions. Convergence properties vary by method and assumptions. Gradient descent achieves linear convergence—error reduces by a factor \rho < 1 per iteration, \|x_{k+1} - x^*\| \leq \rho \|x_k - x^*\|—for strongly convex f with condition number \kappa, at rate $1 - \frac{1}{\kappa}, improvable to O(1/k^2) via acceleration as in Nesterov's 1983 method. Newton's method offers quadratic rates locally, while conjugate gradient minimizes quadratics exactly and shows O(k^{1/2}) global rates for convex quadratics. Step size rules like Armijo ensure descent and global convergence to stationary points for nonconvex smooth f.^[79] For constrained optimization, projected gradient descent adapts the update to feasible sets: x_{k+1} = P_\mathcal{C}(x_k - \alpha_k \nabla f(x_k)), where P_\mathcal{C} projects onto the convex set \mathcal{C}. Pioneered by Goldstein in 1964 for Hilbert spaces, it converges linearly for convex problems with feasible projections. The augmented Lagrangian method handles general nonlinear constraints by minimizing an augmented objective \mathcal{L}_\rho(x, \lambda) = f(x) + \lambda^T g(x) + \frac{\rho}{2} \|g(x)\|^2, with g(x) = 0 the constraints, updating multipliers \lambda_{k+1} = \lambda_k + \rho g(x_{k+1}). Introduced by Hestenes in 1969, it combines penalty and multiplier ideas for dual feasibility and global convergence under constraint qualifications.

Heuristic and Approximation Methods

Heuristic and approximation methods provide practical solutions to optimization problems that are computationally intractable for exact algorithms, particularly NP-hard problems in global optimization where local search may trap solutions in suboptimal regions. These methods trade optimality guarantees for efficiency, often yielding high-quality approximations in reasonable time. Metaheuristics, a prominent class, draw inspiration from natural processes to explore large search spaces stochastically, while approximation algorithms offer provable bounds on solution quality relative to the optimum.^[84] Genetic algorithms (GAs) emulate Darwinian evolution to optimize over populations of candidate solutions. An initial population of encoded individuals (e.g., binary strings representing decision variables) is iteratively evolved through selection, crossover, and mutation operators. Selection favors fitter individuals based on an objective function, crossover combines parent solutions to produce offspring, and mutation introduces random variations to maintain diversity and escape local optima. This process, formalized by John Holland, enables GAs to converge toward global optima in complex, multimodal landscapes.^[85] Simulated annealing (SA) mimics the physical process of annealing in metallurgy to probabilistically explore solution spaces. Starting from an initial solution, SA generates neighboring candidates and accepts worse ones with probability P = \exp(-\Delta E / T), where \Delta E is the change in objective value and T is a temperature parameter that decreases over time. This allows escape from local minima early on, with acceptance becoming more deterministic as T cools. Introduced by Kirkpatrick, Gelatt, and Vecchi, SA has proven effective for combinatorial problems like the traveling salesman.^[86] Particle swarm optimization (PSO) models social behavior in flocks or swarms, where particles adjust positions in the search space based on personal and collective experiences. Each particle i updates its velocity according to

\mathbf{v}_i^{t+1} = w \mathbf{v}_i^t + c_1 r_1 (\mathbf{pbest}_i - \mathbf{x}_i^t) + c_2 r_2 (\mathbf{gbest} - \mathbf{x}_i^t),

followed by position update \mathbf{x}_i^{t+1} = \mathbf{x}_i^t + \mathbf{v}_i^{t+1}, with inertia weight w, cognitive coefficient c_1, social coefficient c_2, random values r_1, r_2 \in [0,1], personal best \mathbf{pbest}_i, and global best \mathbf{gbest}. Developed by Kennedy and Eberhart, PSO excels in continuous, high-dimensional problems due to its simplicity and few parameters.^[87] Approximation methods provide rigorous guarantees for specific problems, such as the 0-1 knapsack problem, which maximizes value under a weight constraint and admits a polynomial-time approximation scheme (PTAS). A fully polynomial-time approximation scheme (FPTAS) achieves a (1 - ε)-approximation in time polynomial in input size and 1/ε, using scaled dynamic programming to reduce state space while bounding error. The original FPTAS, by Ibarra and Kim, scales item values and solves a relaxed instance exactly.^[88] In recent developments, reinforcement learning (RL) has emerged for automating hyperparameter tuning in optimization algorithms, treating tuning as a sequential decision process. Population-based training with RL, as in Jaderberg et al. (2020), evolves hyperparameters online across a population of models, rewarding performance improvements and enabling adaptation to diverse tasks in the 2020s. This approach enhances efficiency in AutoML pipelines for nonconvex and stochastic settings.^[89]

Applications

Engineering and Physics

Mathematical optimization plays a pivotal role in engineering and physics, enabling the design and analysis of physical systems under constraints such as material limits, energy efficiency, and performance requirements. In structural mechanics, topology optimization seeks to minimize weight while maintaining structural integrity, often formulated as a compliance minimization problem subject to volume constraints. A seminal approach, the Solid Isotropic Material with Penalization (SIMP) method, introduced in the late 1980s, models material distribution by assigning a density variable to each finite element, penalizing intermediate densities to promote binary (solid-void) designs; the effective stiffness is interpolated as E(\rho) = \rho^p E_0, where \rho is the density (0 to 1), p > 1 is the penalization factor, and E_0 is the solid material modulus. This method builds on homogenization techniques for optimal topologies and has been widely adopted for lightweight structures in aerospace and automotive applications, achieving significant weight reductions in benchmark cantilever beam problems compared to uniform designs.^[90] In electrical engineering, nonlinear programming addresses circuit design by optimizing component values to meet specifications like gain, bandwidth, and noise, often minimizing a least-squares error between desired and simulated responses. For instance, gradient-based methods solve for transistor sizes and biases in analog amplifiers, reducing design iterations from weeks to hours in integrated circuit workflows. Similarly, antenna array optimization uses nonlinear programming to shape radiation patterns, adjusting element positions and excitations to minimize sidelobe levels while maximizing directivity; a classic example is the synthesis of linear arrays for radar, where sequential quadratic programming navigates the nonconvex landscape to achieve sidelobe suppression below -20 dB in Dolph-Chebyshev configurations. These applications leverage nonlinear methods to handle the inherent nonconvexity of electromagnetic objectives.^[91] Control systems in engineering rely on optimization for stabilizing dynamic processes, with the Linear Quadratic Regulator (LQR) providing an optimal feedback law for linear systems. The LQR problem minimizes the infinite-horizon quadratic cost functional:

J = \int_0^\infty \left( \mathbf{x}^T Q \mathbf{x} + \mathbf{u}^T R \mathbf{u} \right) dt

subject to \dot{\mathbf{x}} = A \mathbf{x} + B \mathbf{u}, where \mathbf{x} is the state, \mathbf{u} the control input, Q \geq 0 penalizes state deviations, and R > 0 penalizes control effort. Solved via the algebraic Riccati equation, LQR yields gains that balance stability and performance, commonly applied in spacecraft attitude control to improve settling times over proportional-integral controllers. This framework, originating from early optimal control theory, remains foundational for applications in robotics and automotive suspension.^[92] In civil engineering, optimization enhances truss design by sizing members and adjusting geometries to minimize material use under load constraints, often using linear programming for stress-limited problems. Seminal work on truss topology traces to Michell's 1904 criteria for statically determinate structures, extended numerically to yield layouts with less steel than heuristic designs in bridge girders. Resource allocation in civil networks, such as water distribution or transportation systems, employs mixed-integer programming to optimize pipe diameters or lane capacities, minimizing costs while ensuring flow demands; for example, genetic algorithms level construction resources across project phases, helping to reduce peak usage and delays in large-scale infrastructure builds. Geophysics applies least-squares optimization in seismic inversion to estimate subsurface properties from wave data, formulating the problem as minimizing the misfit \| d - G m \|^2_2, where d are observed seismograms, G the forward modeling operator, and m the model parameters like velocity profiles. This damped least-squares approach, regularized with Tikhonov terms, resolves ambiguities in ill-posed inverses, enabling accurate imaging of oil reservoirs with significantly improved resolutions over ray-based methods in synthetic benchmarks like the Marmousi model. Originating from inverse theory advancements, it underpins full-waveform inversion for earthquake hazard assessment and resource exploration.

Economics and Operations Research

In economics, mathematical optimization plays a central role in modeling consumer behavior through the utility maximization problem, where an individual seeks to maximize their utility function u(x) subject to a budget constraint p \cdot x \leq w, with x representing quantities of goods, p their prices, and w the income. This formulation underpins general equilibrium theory, ensuring that individual optimizations lead to market equilibria under assumptions of convexity and completeness.^[93] The problem is typically solved using Lagrange multipliers or Kuhn-Tucker conditions for interior or boundary solutions, highlighting trade-offs in resource allocation.^[93] In finance, optimization techniques are applied to portfolio selection, notably through the mean-variance framework introduced by Harry Markowitz, which minimizes portfolio risk for a given expected return or maximizes return for a given risk level. The model formulates the problem as minimizing the variance \sigma_p^2 = w^T \Sigma w subject to an expected return constraint \mu^T w = r and budget $1^T w = 1, where w are asset weights, \Sigma the covariance matrix, and \mu expected returns. This quadratic programming approach revolutionized investment management by quantifying diversification benefits.^[94] Operations research employs optimization for scheduling and inventory management. Job shop scheduling, a classic problem, assigns jobs to machines to minimize makespan or tardiness, often modeled as a mixed-integer linear program (MILP) with binary variables for sequencing and continuous variables for start times, subject to precedence and capacity constraints. Seminal formulations demonstrated that such MILP models can yield optimal schedules for small instances, though NP-hardness necessitates heuristics for larger scales.^[95] Inventory models, such as the economic order quantity (EOQ), optimize ordering policies to balance holding and setup costs, deriving the optimal order size Q = \sqrt{\frac{2DS}{H}}, where D is demand rate, S setup cost, and H holding cost per unit. This deterministic model provides a foundational benchmark for stochastic extensions in supply chain planning.^[96] Transportation optimization focuses on the minimum cost flow problem, which minimizes the total cost of shipping goods from sources to sinks in a network while satisfying supply and demand constraints. Formulated as \min \sum_{(i,j)} c_{ij} f_{ij} subject to flow balance \sum_j f_{ij} - \sum_k f_{ki} = b_i and capacity bounds, it generalizes the transportation problem and is solvable via linear programming techniques like the simplex method. This approach is pivotal for logistics efficiency, reducing costs in distribution networks.^[97] Auction design leverages optimization through mechanism design, exemplified by the Vickrey-Clarke-Groves (VCG) mechanism, which allocates items to maximize social welfare while incentivizing truthful bidding. Developed across key contributions, VCG sets payments as the externality imposed on others, ensuring incentive compatibility in quasi-linear settings for multi-item auctions. Linear programming serves as a key tool across these applications, enabling scalable solutions for resource allocation in economic and operational contexts.^[98]^[99]^[100]

Machine Learning and Data Science

In machine learning, optimization plays a central role in training models to minimize empirical risk, defined as the average loss over a finite training dataset. The empirical risk minimization (ERM) principle seeks to find parameters \theta that solve \hat{\theta} = \arg\min_{\theta} \frac{1}{n} \sum_{i=1}^n L(y_i, f(x_i; \theta)), where L is the loss function, (x_i, y_i) are training examples, and f is the model. This approach, foundational to supervised learning, balances fitting observed data while aiming for generalization to unseen data, as established in statistical learning theory. Training large-scale models, particularly deep neural networks, relies on stochastic gradient descent (SGD) and its variants to efficiently approximate gradients from mini-batches, addressing the computational demands of high-dimensional parameter spaces. SGD updates parameters iteratively as \theta_{t+1} = \theta_t - \eta \nabla L(\theta_t; b_t), where \eta is the learning rate and b_t is a random mini-batch, enabling scalable optimization for datasets with millions of examples. Variants like momentum-accelerated SGD and adaptive methods such as Adam incorporate second-moment estimates to accelerate convergence and handle noisy gradients, achieving state-of-the-art performance in tasks like image classification.^[101] Hyperparameter tuning in machine learning treats the selection of values like learning rates or network architectures as a black-box optimization problem, often solved using Bayesian optimization to minimize validation error with fewer evaluations than grid search. This method models the objective as a Gaussian process surrogate and uses an acquisition function, such as expected improvement, to balance exploration and exploitation in the hyperparameter space. Bayesian optimization has demonstrated superior efficiency over random search, outperforming human experts in tuning algorithms like support vector machines and deep networks on benchmarks such as UCI datasets.^[102] In data science applications, optimization underpins unsupervised tasks such as clustering and feature selection. K-means clustering formulates the problem as a nonconvex optimization of assigning data points to k centroids to minimize the sum of squared distances, solved iteratively via Lloyd's algorithm, which alternates between assignment and centroid updates until convergence. This method, while sensitive to initialization due to local minima, remains widely used for its simplicity and effectiveness in partitioning high-dimensional data like customer segmentation. Feature selection optimizes a combinatorial objective to identify a subset of relevant variables that maximizes predictive performance while reducing dimensionality, often using wrapper methods that evaluate subsets via cross-validation; filter methods like mutual information ranking provide scalable alternatives by scoring individual features against the target.^[103] Recent advances address distributed optimization in privacy-sensitive settings through federated learning, which minimizes a weighted sum of local losses \min_{\theta} \sum_i w_i f_i(\theta) across decentralized devices without sharing raw data. This framework aggregates model updates from clients, as in the FedAvg algorithm, reducing communication costs by up to 100x compared to centralized training on datasets like CIFAR-10 while preserving differential privacy. Optimization challenges in transformer models, which scale to billions of parameters for natural language processing, have been mitigated by mixed-precision training, combining 16-bit floating-point computations with 32-bit for stability to accelerate training by 2-3x on GPUs without accuracy loss, as seen in models like BERT and GPT series from 2018 onward.^[104]

Software Tools and Solvers

Open-Source Libraries

Open-source libraries have democratized access to mathematical optimization tools, enabling researchers, students, and practitioners to implement and solve complex problems without licensing costs. These libraries span general-purpose solvers for nonlinear and linear programming, domain-specific frameworks for convex and mixed-integer problems, and specialized tools for machine learning and combinatorial optimization. Built primarily in Python for its ecosystem compatibility, they leverage community contributions and integrate with numerical computing environments like NumPy.^[105] The SciPy library, part of the broader SciPy ecosystem, provides the optimize module for a wide range of optimization tasks, including nonlinear minimization and constrained optimization problems. It implements algorithms such as sequential least squares programming (SLSQP) for nonlinear programs with bounds and constraints, and supports global optimization methods like differential evolution. For linear programming, SciPy's linprog function solves standard LP problems using interior-point or revised simplex methods, making it suitable for medium-scale applications in scientific computing. These tools are tightly integrated with NumPy arrays for efficient handling of vectorized data.^[105] CVXPY serves as a domain-specific language embedded in Python for modeling convex optimization problems, allowing users to express objectives and constraints in a natural, mathematical syntax that adheres to disciplined convex programming rules. It supports semidefinite programming (SDP) through atomic functions for matrix inequalities and can handle mixed-integer programming (MIP) by interfacing with external solvers. CVXPY automatically transforms models into standard forms and dispatches them to open-source solvers like SCS or OSQP for quadratic and conic programs, facilitating rapid prototyping in fields like control theory and signal processing.^[106]^[107] For mixed-integer linear programming (MILP), open-source interfaces connect Python to robust solvers with free access options. The Coin-OR Branch and Cut (CBC) solver, an open-source C++ library, excels in solving MILP problems via branch-and-cut techniques and integrates seamlessly with Python through wrappers like PuLP or Pyomo, supporting large-scale combinatorial models without restrictions. Similarly, Gurobi's Python interface (gurobipy) offers a free academic license for full-featured MILP solving, including advanced heuristics and parallel processing, though it requires registration for non-commercial use. Google's OR-Tools, released in the 2010s, provides a comprehensive suite for combinatorial optimization, incorporating linear solvers like Glop and constraint programming for routing, scheduling, and assignment problems.^[108]^[109]^[110]^[111]^[112] In machine learning, libraries like PyTorch and TensorFlow emphasize optimization through automatic differentiation (autodiff) for gradient-based methods. PyTorch's torch.autograd engine computes exact gradients for arbitrary computational graphs, powering optimizers in torch.optim such as Adam and stochastic gradient descent (SGD) for training deep neural networks on large datasets. TensorFlow uses tf.GradientTape for dynamic gradient computation, enabling efficient backpropagation in models with millions of parameters and supporting distributed optimization for scalable ML workflows. These autodiff capabilities underpin empirical risk minimization in data-driven optimization.^[113]

Commercial Software

Commercial software for mathematical optimization provides robust, scalable solvers tailored for industrial applications, offering advanced features such as high-performance algorithms, parallel processing, and dedicated support for enterprise integration. These tools are designed to handle large-scale problems in mixed-integer programming (MIP), linear programming (LP), quadratic programming (QP), and conic optimization, often with premium licensing models that include maintenance and customization services.^[114] Gurobi Optimizer is renowned for its high-performance capabilities in solving mixed-integer programming problems, leveraging advanced barrier methods for continuous relaxations within its branch-and-bound framework. It supports LP, QP, and MIP formulations, with ongoing performance improvements in recent versions, such as enhanced solving capabilities in version 12.0. The solver employs interior-point algorithms for large-scale continuous models, making it suitable for computationally intensive industrial scenarios. As of 2025, Gurobi version 12.0 introduces global optimality for mixed-integer nonlinear programs (MINLP) and further performance gains. The 2025 State of Mathematical Optimization report notes increasing integration of optimization with AI technologies across industries.^[115]^[116]^[117] IBM CPLEX Optimization Studio excels in solving linear programming, mixed-integer programming, and quadratic programming problems, incorporating distributed parallel algorithms to accelerate computations on multi-core systems. It handles quadratically constrained programs (QCP) and supports both primal and dual simplex methods alongside barrier solvers for QP and MIP. CPLEX's parallel solving features enable efficient scaling for enterprise-level optimization tasks, such as supply chain and resource allocation.^[118]^[119]^[120] The MATLAB Optimization Toolbox integrates seamlessly with the MATLAB environment for solving nonlinear optimization problems, including constrained and unconstrained formulations using algorithms like sequential quadratic programming. It also supports global optimization techniques, such as genetic algorithms for multimodal nonlinear problems, allowing users to minimize functions with integer or binary constraints. This toolbox facilitates rapid prototyping and integration with simulation workflows in engineering and data analysis.^[121] MOSEK Optimization Suite specializes in conic optimization, extending linear and quadratic solvers to handle semidefinite, power, and exponential cones through its interior-point algorithm. It solves conic quadratic problems efficiently, supporting affine expressions over Cartesian products of conic domains, which is particularly useful for robust and chance-constrained formulations. MOSEK's focus on numerical stability and sparsity exploitation makes it ideal for finance and engineering applications requiring precise conic modeling.^[122]^[123]^[124] In the 2020s, commercial solvers like Gurobi have seen enhanced integration with enterprise systems, exemplified by its partnership with SAP to embed optimization capabilities into SAP Integrated Business Planning for supply chain automation. These updates enable seamless deployment in production environments, supporting stochastic and multi-objective extensions for real-time decision-making.^[125]^[126]

Specialized Frameworks

Specialized frameworks in mathematical optimization extend general-purpose tools to address challenges in niche domains, such as quantum computing, machine learning hyperparameter tuning, and biological modeling, by integrating domain-specific constraints and algorithms. These frameworks often leverage hybrid approaches, combining classical optimization with emerging paradigms like quantum annealing or distributed computing, to tackle problems intractable for standard solvers. In quantum optimization, Qiskit Optimization, developed by IBM in the early 2020s, provides a suite for formulating and solving combinatorial problems using the Quantum Approximate Optimization Algorithm (QAOA). QAOA employs a parameterized quantum circuit to approximate solutions to NP-hard problems, iteratively optimizing parameters via classical methods to minimize an objective function encoded in the quantum Hamiltonian. This framework supports applications like graph partitioning and has been implemented for execution on IBM's quantum hardware, demonstrating scalability for problems up to hundreds of qubits. Complementing gate-based approaches, D-Wave's quantum annealing systems, commercially available since 2011, have matured by 2025 into hybrid solvers capable of addressing large-scale optimization in logistics and materials science, with recent systems achieving over 5,000 qubits and demonstrating quantum advantage in specific Ising model instances.^[127]^[128] For machine learning, Optuna offers an automated hyperparameter optimization framework that uses tree-structured Parzen estimators to efficiently search vast configuration spaces, outperforming grid search by up to 10x in convergence speed for deep learning models. Ray Tune, part of the Ray project, enables distributed hyperparameter tuning across clusters, supporting algorithms like Bayesian optimization and population-based training to scale experiments to thousands of trials, reducing training time for neural networks by orders of magnitude on GPU clusters. These tools are particularly vital in ML applications, where optimizing hyperparameters can directly impact model performance in tasks like image classification.^[129]^[130] In biology, the COBRA Toolbox facilitates constraint-based reconstruction and analysis of metabolic networks, enabling flux balance analysis (FBA) to predict steady-state fluxes in genome-scale models under linear programming formulations. By incorporating constraints from stoichiometry and thermodynamics, it supports simulations of microbial growth and drug targeting, with extensions for multi-omics integration that have been applied to over 1,000 reconstructions across species.^[131] JuMP, a modeling language embedded in Julia, specializes in expressive formulation of optimization problems, interfacing seamlessly with solvers for mixed-integer, nonlinear, and conic programs while allowing custom extensions via Julia's ecosystem. It excels in rapid prototyping for domain-specific models, such as energy systems or supply chains, by abstracting solver details and enabling automatic differentiation for gradient-based methods.^[132]

References

[1]
[PDF] Introduction to Mathematical Optimization
Mathematical optimization is making something 'best', which can involve maximizing or minimizing, and is a branch of applied mathematics.
[2]
Practical Optimization | Chapter 1: Introduction - SIAM.org
Dec 16, 2019 · In mathematical terms, optimization usually involves maximizing or minimizing; for example, we may wish to maximize profit or minimize weight.
[3]
[PDF] Mathematical optimization
Brief history of optimization. ▻ 1700s: theory for unconstrained optimization (Fermat,. Newton, Euler). ▻ 1797: theory for equality constrained optimization ...
[4]
[PDF] Introduction to Mathematical Optimization R. Clark Robinson
This book covers linear programming, unconstrained and constrained optimization, and dynamic programming, focusing on deterministic problems with continuous ...
[5]
[PDF] A First Course in Optimization
Optimization means maximizing or minimizing some function of one or, more ... Economics for his work in optimization and mathematical economics. His.
[6]
JuMP: A Modeling Language for Mathematical Optimization
JuMP is an open-source modeling language that allows users to express a wide range of optimization problems (linear, mixed-integer, quadratic, conic-quadratic, ...
[7]
[PDF] Convex optimization problems
A standard convex optimization problem is to minimize f0(x) subject to fi(x) ≤ 0, i = 1,..., m, and hi(x) = 0, i = 1,..., p, where f0, f1, ..., fm are convex and ...
[8]
[PDF] Math 407 Definitions : Sections 1–3
Section 1. • Mathematical Optimization: A mathematical optimization problem is one in which some real-valued function is either maximized or minimized ...
[9]
[PDF] Discrete Optimization - UW Math Department
Roughly speaking, discrete optimization deals with finding the best solution out of finite number of possibilities in a computationally efficient way.
[10]
[PDF] Topics in Discrete Optimization Lenny Fukshansky
An optimization problem like this is called discrete if the domain D is a discrete set inside of some topological space, i.e. if every point of D is an ...
[11]
[PDF] 4. Optimization Definition and Formulation 4.1. Introduction
The definition of an optimization problem boils down to defining design parameters, objective functions, and constraint functions. When the number of design ...
[12]
[PDF] Linear programming 1 Basics
Mar 17, 2015 · The set of feasible solutions is called the feasible space or feasible region. A feasible solution is optimal if its objective function ...
[13]
None
Below is a merged and comprehensive summary of the requested sections from *Convex Optimization* by Stephen Boyd and Lieven Vandenberghe, based on the provided excerpts from https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf. The response consolidates all information into a dense, structured format, including key definitions, page references, and standard notations. Where information overlaps or varies across segments, the most detailed or authoritative version is retained, and discrepancies are noted. Tables in CSV-like format are used where appropriate to maximize detail and readability.
[14]
[PDF] Numerical Optimization - UCI Mathematics
Formal mathematical requirements are kept to a minimum. Because of our focus on continuous problems, we have omitted discussion of impor- tant optimization ...
[15]
[PDF] nonlinear-programming.pdf - Patrick Emami
- Bertsekas. Patrick Emami ... If X is a convex subset of Rn and f : Rn → R is convex over X, then a local minimum of f over X is also a global minimum.
[16]
Euclid's Elements, Book III, Proposition 15 - Clark University
Proposition 15. Of straight lines in a circle the diameter is greatest, and of the rest the nearer to the center is always greater than the more remote.
[17]
A Classic from China: The Nine Chapters - Introduction and History
This article is about the most important mathematical work in China's long history, the Jiuzhang Suanshu (“Nine Chapters on the Art of Calculation”).Missing: optimization allocation
[18]
[PDF] five page historical introduction - UMD Physics Department
In 1662 Fermat succeeded in deriving the law of refraction of light from the ... called Fermat's principle. It is one of the pillars on which geometric ...
[19]
[PDF] The Princeton Companion to Mathematics
2.3 The Newton–Raphson Method: Recurrence Formulas. In around 1670, newton [VI.14] devised a method for finding roots of equations, which he explained with ref-.<|separator|>
[20]
[PDF] The original Euler's calculus-of-variations method - Edwin F. Taylor
Leonhard Euler's original version of the calculus of variations (1744) used elementary mathematics and was intuitive, geometric, and easily visualized. In.
[21]
On Fourier's algorithm for linear constraints - ResearchGate
Aug 7, 2025 · In the 1820s Fourier provided the first algorithm for solving linear arithmetic constraints. In other words, this algorithm determines ...Missing: precursor | Show results with:precursor
[22]
Carl Friedrich Gauss & Adrien-Marie Legendre Discover the Method ...
Adrien-Marie Legendre Offsite Link was the first to publish the method of least squares Offsite Link in 1805, Carl Friedrich Gauss Offsite Link is credited ...
[23]
Interior Point Methods for Linear Programming - PubsOnLine
A survey of the significant developments in the field of interior point methods for linear programming is presented, beginning with Karmarkar's projective ...
[24]
Interior-point methods - ScienceDirect.com
The modern era of interior-point methods dates to 1984, when Karmarkar proposed his algorithm for linear programming. In the years since then, algorithms ...
[25]
[1412.6980] Adam: A Method for Stochastic Optimization - arXiv
Dec 22, 2014 · We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order ...
[26]
A Golden Decade of Deep Learning: Computing Systems ...
May 1, 2022 · Theano, developed in 2010, was an early deep learning-oriented framework that included automatic symbolic differentiation.14 Automatic ...
[27]
Neural Architecture Search with Reinforcement Learning - arXiv
Nov 5, 2016 · In this paper, we use a recurrent network to generate the model descriptions of neural networks and train this RNN with reinforcement learning.
[28]
[PDF] Neural Architecture Search with Reinforcement Learning
This paper uses a recurrent network to generate the model descriptions of neural networks and trains this RNN with reinforcement learning to maximize the ...
[29]
[1010.5445] Theory and Applications of Robust Optimization - arXiv
Oct 26, 2010 · In this paper we survey the primary research, both theoretical and applied, in the area of Robust Optimization (RO).Missing: 1990s 2020s developments
[30]
Technical Note—Two-Stage Sample Robust Optimization
Apr 22, 2021 · In “Two-Stage Sample Robust Optimization,” Bertsimas, Shtern, and Sturt investigate a simple approximation scheme, based on overlapping ...
[31]
[1411.4028] A Quantum Approximate Optimization Algorithm - arXiv
Nov 14, 2014 · This quantum algorithm produces approximate solutions for combinatorial optimization problems, using a parameter 'p' to improve approximation ...
[32]
[2509.07216] Quantum Machine Learning and Grover's Algorithm for ...
Sep 8, 2025 · This paper introduces a quantum native framework that integrates quantum machine learning with Grover's algorithm to solve kinematic ...Missing: extensions 2020s
[33]
[PDF] Optimizing climate models with process-knowledge, resolution, and AI
Jan 23, 2024 · Climate models can be optimized by using AI to derive data-driven closure functions, increasing resolution, and using process-based ...Missing: 2020s | Show results with:2020s
[34]
[PDF] Applying Machine Learning in Numerical Weather and Climate ...
May 26, 2024 · Here, we briefly discuss two major types of ML tools that have been applied to develop applications for numerical weather and climate prediction ...Missing: 2020s | Show results with:2020s
[35]
Linear Programming Basics
Here is an example of an infeasible problem: min x s.t. x ≤ 1 x ≥ 2 There is no value for x that is at the same time at most 1 and at least 2. Even if a problem ...
[36]
The Fritz John necessary optimality conditions in the presence of ...
Volume 17, Issue 1, January 1967, Pages 37-47 The Fritz John necessary optimality conditions in the presence of equality and inequality constraints.
[37]
[PDF] constraint qualifications - cs.wisc.edu
Constraint qualifications are essential for deriving primal and primal-dual characterizations of solutions of optimization and variational problems, for ...
[38]
[PDF] OPTIMALITY CONDITIONS
Theorem 1.3. (Coercivity implies existence) Let f : Rn → R be continuous on all of Rn. If f is coercive, then f has at least one global minimizer.
[39]
[PDF] 3.1 Basics of Convex Optimization
Definition: (Coercivity) A function f : Rn → R is called coercive if for all sequence {xk} with kxkk→∞, we have limk→∞ f(xk) = ∞.
[40]
[PDF] Convex Optimization
Boyd, Stephen P. Convex Optimization / Stephen Boyd & Lieven Vandenberghe p. cm. Includes bibliographical references and index. ISBN 0 521 83378 7. 1 ...
[41]
Perturbation Analysis of Optimization Problems - SpringerLink
This book focuses on perturbation analysis of continuous optimization problems, studying the continuity and differentiability of optimal value and solutions as ...Missing: optima | Show results with:optima
[42]
[PDF] Canonical Problem Forms
Standard form. A linear program is said to be in standard form when it is written as min x. cT x subject to Ax = b x ≥ 0. Any linear program can be rewritten ...
[43]
[PDF] Origins of the Simplex Method - DTIC
Today we know that before 1947 that four isolated papers had been published on special cases of the linear programming problem by Fourier (1824) [5], de la.Missing: citation | Show results with:citation
[44]
[PDF] A new polynomial-time algorithm for linear programming
A new polynomial-time algorithm for linear programming · N. Karmarkar · Published in Symposium on the Theory of… 1 December 1984 · Mathematics, Computer Science.
[45]
[PDF] PROBLEM DECOMPOSITION IN BLOCK-SEPARABLE CONVEX ...
Problem decomposition in convex optimization can take several forms, but one of the most important is seen in the case of block-separable constraints and ...
[46]
[PDF] Semidefinite Programming - Stanford University
Atthe same time, they offer a simpleconceptual framework and make possible a self-containedtreatment of interior-point methods for many convex optimization.
[47]
The Diet Problem - NEOS Guide
The problem is formulated as a linear program where the objective is to minimize cost and the constraints are to satisfy the specified nutritional requirements.
[48]
[PDF] Newton's Method for Unconstrained Optimization
Newton's Method for Unconstrained Optimization. Robert M. Freund. February, 2004. 1. 2004 Massachusetts Institute of Technology. Page 2. 1 Newton's Method.Missing: seminal paper
[49]
(PDF) Newton's method and its use in optimization - ResearchGate
Aug 9, 2025 · Newton's method is a basic tool in numerical analysis and numerous applications, including operations research and data mining.
[50]
Trust-Region Methods | SpringerLink
Trust-Region Methods. In: Nocedal, J., Wright, SJ (eds) Numerical Optimization. Springer Series in Operations Research and Financial Engineering.
[51]
[PDF] Sequential Quadratic Programming - Duke University
While these problems are important and numerous, the great strength of the SQP method is its ability to solve problems with nonlinear constraints. For this ...
[52]
[PDF] SEQUENTIAL QUADRATIC PROGRAMMING METHODS
In his 1963 PhD thesis, Wilson proposed the first sequential quadratic programming. (SQP) method for the solution of constrained nonlinear optimization problems ...
[53]
[PDF] Interior Methods for Nonlinear Optimization
Primarily in the form of barrier methods, interior-point techniques were popular during the 1960s for solving nonlinearly constrained problems. However, their ...
[54]
Interior Methods for Nonlinear Optimization | SIAM Review
Primarily in the form of barrier methods, interior-point techniques were popular during the 1960s for solving nonlinearly constrained problems.
[55]
[PDF] Introduction to Global Optimization - LIX
Oct 23, 2008 · Local optimization of NLPs is an NP-hard problem in itself [PS88], so finding the global optimum of most nonconvex problems is also NP-hard.<|separator|>
[56]
Rosenbrock Function
The Rosenbrock function, also referred to as the Valley or Banana function, is a popular test problem for gradient-based optimization algorithms.Missing: example | Show results with:example
[57]
[PDF] Stochastic Programming - Stanford University
In 1955 he wrote “Linear Programming Under Uncer- tainty”(Dantzig 1955), reprinted in this book as the first chapter, a seminal paper in which he introduced ...
[58]
Stochastic Programming - Book - SpringerLink
Includes George Dantzig's original 1955 paper, “Linear Programming under Uncertainty” which is considered one of the ten most influential papers in Management ...Missing: seminal | Show results with:seminal
[59]
The Sample Average Approximation Method for Stochastic Discrete ...
In thispaper we study a Monte Carlo simulation--based approach to stochastic discrete optimization problems.Missing: seminal | Show results with:seminal
[60]
A Review of Stochastic Programming Methods for Optimization of ...
In this paper, we review the basic concepts and recent advances of a risk-neutral mathematical framework called “stochastic programming” and its applicationsMissing: seminal | Show results with:seminal
[61]
[PDF] ROBUST CONVEX OPTIMIZATION
The general robust counterpart scheme as outlined below was announced in recent paper Ben-. Tal and Nemirovski (1997) on robust Truss Topology Design.Missing: seminal | Show results with:seminal
[62]
https://www2.isye.gatech.edu/~nemirovs/robmp.pdf
[63]
The scenario approach to robust control design - IEEE Xplore
This paper proposes a new probabilistic solution framework for robust control analysis and synthesis problems that can be expressed in the form of minimization ...
[64]
The Scenario Approach for Systems and Control Design
The 'scenario approach' is an innovative technology that has been introduced to solve convex optimization problems with an infinite number of constraints.
[65]
[PDF] Robust portfolio optimization: a categorized bibliographic review
Fabozzi et al. (2006) argue that robust Markowitz portfolios are more stable than other portfolios as inputs fluctuate, and their out-of-sample performance ...Missing: extension | Show results with:extension
[66]
[PDF] Robust Optimization — Methodology and Applications1
The paper surveys the main results of RO as applied to uncertain linear, conic quadratic and semidefinite programming. For these cases, computa- tionally ...Missing: seminal | Show results with:seminal<|control11|><|separator|>
[67]
[PDF] Distributionally Robust Optimization: A Review - arXiv
Aug 13, 2019 · This paper surveys main concepts and contributions to DRO, and its relationships with robust optimization, risk-aversion, chance-constrained ...
[68]
Pareto optimality in multiobjective problems
In this study, the optimization theory of Dubovitskii and Milyutin is extended to multiobjective optimization problems, producing new necessary conditions.
[69]
[PDF] Pareto Optimality - Stanford University
One way to find good solutions to multiobjective problems is with Pareto optimality, named after economist Vilfredo Pareto. Pareto noticed that many economic ...
[70]
Generation of efficient frontiers in multi-objective optimization ...
Genetic algorithms are useful for generating efficient frontiers with two or three objective functions.
[71]
[PDF] Constrained Multi-Objective Optimization of a Condenser Coil Using ...
The primary optimization objectives are the performance of the condenser coil and the cost. This study illustrates how genetic optimization algorithms can ...
[72]
a triangulation-based partitioning algorithm for global optimization
We propose a triangulation-based partitioning algorithm, TRIOPT, for solving low-dimensional bound-constrained black box global optimization problems.
[73]
[PDF] Multiobjective Optimization Using Nondominated Sorting in Genetic ...
Page 1. Multiobjective Optimization Using. Nondominated Sorting in Genetic Algorithms*. N. Srinivas and Kalyanmoy Deb. Department of Mechanical Engineering.
[74]
[PDF] On the complexity of linear programming - Stanford CS Theory
The main topics are polynomial and strongly polynomial algorithms, probabilistic analy- sis of simplex algorithms, and recent interior point methods. 1.
[75]
An Automatic Method of Solving Discrete Programming Problems
10 This is an upper bound to the branch value of y which has been obtained by ignoring the fact that Y2 would be negative at this point. The true branch value ...
[76]
[PDF] Branch and Bound in Mixed Integer Linear Programming Problems
Nov 5, 2021 · In this paper, we surveyed the existing literature studying different approaches and algorithms for the four critical components.
[77]
[PDF] THE THEORY OF DYNAMIC PROGRAMMING - Richard Bellman
stated above, the basic idea of the theory of dynamic programming is that of viewing an optimal policy as one deter- mining the decision required at each time ...
[78]
An overview of gradient descent optimization algorithms - arXiv
Sep 15, 2016 · This article provides intuitions about gradient descent algorithms, covering variants, challenges, common algorithms, parallel architectures, ...
[79]
[PDF] Steepest Descent 1 Introduction - OSTI.GOV
also known as the gradient descent method, was first proposed by Cauchy in. 1847 [1]. In the original paper, Cauchy proposed the use of the gradient as a way ...
[80]
Minimization of functions having Lipschitz continuous first partial ...
Pacific Journal of Mathematics. Vol. 16, No. 1. November, 1966. Larry Armijo, Minimization of functions having Lipschitz continuous first partial derivatives ...
[81]
[PDF] Quadratic Convergence of Newton's Method - NYU Computer Science
The quadratic convergence rate of Newton's Method is not given in A&G, except as Exercise 3.9. However, it's not so obvious how to derive it, even though.Missing: source | Show results with:source
[82]
[PDF] Methods of conjugate gradients for solving linear systems
(a) Like the Gauss elimination method, the method of conjugate gradients gives the solution in n steps if no rounding-off error occurs. (b) The conjugate ...
[83]
Fast Approximation Algorithms for the Knapsack and Sum of Subset ...
An algorithm is presented which finds for any 0 < e < 1 an approximate solution P satisfying (P* P)/P* < ~, where P* is the desired optimal sum.
[84]
Genetic Algorithms | Scientific American
Computer programs that "evolve" in ways that resemble natural selection can solve complex problems even their creators do not fully understand. By John H.Missing: seminal | Show results with:seminal
[85]
Optimization by Simulated Annealing - Science
Optimization by Simulated Annealing. S. Kirkpatrick, C. D. Gelatt, Jr., and M. P. VecchiAuthors Info & Affiliations. Science. 13 May 1983. Vol 220, Issue 4598.
[86]
https://www.science.org/doi/10.1126/science.220.4598.671
[87]
Fast Approximation Algorithms for the Knapsack and Sum of Subset ...
A simplified version, known as the 0/1 knapsack problem, can be stated as fol- lows' Given a positive integer M and a multiset 1 Q consisting of ~ pairs of ...
[88]
[PDF] Provably Efficient Online Hyperparameter Optimization with ...
Many of the recent triumphs in machine learning are dependent on well-tuned hyperparameters. This is particularly prominent in reinforcement learning (RL).
[89]
Generating optimal topologies in structural design using a ...
This paper presents a methodology for optimal shape design where both these drawbacks can be avoided. The method is related to modern production techniques.
[90]
[PDF] Antenna Array Pattern Synthesis via Convex Optimization
A convex optimization problem (or convex program) is the minimization of a convex function over a convex set. It is easy to show that any local minimum of a ...
[91]
[PDF] Contributions to the Theory of Optimal Control - EE IIT Bombay
This first paper, which deals with linear-quadratic feedback control, set the stage for what came to be known as LQR (Linear-Quadratic-Regulator) ... Kalman's ...
[92]
Existence of an Equilibrium for a Competitive Economy - jstor
ARROW AND GERARD DEBREU tion by a consumption unit under a budget ... assumes utility maximization but postulates that the marginal utility of each.
[93]
PORTFOLIO SELECTION* - Markowitz - 1952 - The Journal of Finance
The process of selecting a portfolio may be divided into two stages. The first stage starts with observation and experience and ends with beliefs about the ...
[94]
On the Job-Shop Scheduling Problem | Operations Research
The job-shop scheduling problem involves sequencing restrictions and noninterference constraints for individual pieces of equipment.
[95]
Ford Whitman Harris and the Economic Order Quantity Model
Ford Whitman Harris first presented the familiar economic order quantity (EOQ) model in a paper published in 1913. Even though Harris's original paper was ...
[96]
The Distribution of a Product from Several Sources to Numerous ...
The Distribution of a Product from Several Sources to Numerous Localities. Frank L. Hitchcock, ... First published: April 1941. https://doi.org/10.1002/ ...
[97]
[PDF] Counterspeculation, Auctions, and Competitive Sealed Tenders
Jul 25, 2005 · Counterspeculation, Auctions, and Competitive Sealed Tenders. William Vickrey. The Journal of Finance, Vol. 16, No. 1 (Mar., 1961), 8-37.
[98]
[PDF] Multipart Pricing of Public Goods
Multipart Pricing of Public Goods. Author(s): Edward H. Clarke. Source: Public Choice , Fall, 1971, Vol. 11 (Fall, 1971), pp. 17-33. Published by: Springer.Missing: original | Show results with:original
[99]
[PDF] Incentives in Teams
However in this paper the methods of team theory are used in viewing the incentive problem as a problem of team formation or of the formulation of mechanisms ...
[100]
[PDF] Large-Scale Machine Learning with Stochastic Gradient Descent
Léon Bottou. Table 1. Stochastic gradient algorithms for various learning systems. Loss. Stochastic gradient algorithm. Adaline (Widrow and Hoff, 1960).
[101]
[PDF] Practical Bayesian Optimization of Machine Learning Algorithms
We show that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms ...
[102]
[PDF] An Introduction to Variable and Feature Selection
With this method, one can choose a subset of variables with a given proportion of positively and negatively correlated variables. 1161. Page 6. GUYON AND ...
[103]
[1602.05629] Communication-Efficient Learning of Deep Networks ...
Feb 17, 2016 · We present a practical method for the federated learning of deep networks based on iterative model averaging, and conduct an extensive empirical evaluation.
[104]
Optimization and root finding - Numpy and Scipy Documentation
SciPy optimize provides functions for minimizing (or maximizing) objective functions, possibly subject to constraints. It includes solvers for nonlinear ...Minimize · Curve_fit · Optimize · Root_scalar
[105]
User Guide - - cvxpy
CVXPY is related to Disciplined Convex Programming (DCP), Disciplined Geometric Programming (DGP), Disciplined Parametrized Programming (DPP), and Disciplined ...CVXPY is a Python-embedded... · Advanced Features · Atomic Functions
[106]
API Documentation - - cvxpy
The cvxpy API is documented in five sections: atoms, constraints, expressions, problems, and reductions. Classes and functions are imported into the cvxpy ...
[107]
coin-or/Cbc: COIN-OR Branch-and-Cut solver - GitHub
Cbc (Coin-or branch and cut) is an open-source mixed integer linear programming solver written in C++. It can be used as a callable library or using a stand ...
[108]
CBC User Guide - COIN-OR
Branch and Cut solver (CBC) is an open-source mixed-integer program (MIP) solver written in C++. CBC is intended to be used primarily as a callable library.
[109]
Free Licenses for Academics | Gurobi - Gurobi Optimization
at no cost — to students, faculty, and staff at accredited degree-granting institutions. These licenses are designed ...Academic WLS License · Academic Site License · Get your named-user licenseMissing: interface | Show results with:interface
[110]
gurobipy - PyPI
As a student or staff member of an academic institution, you qualify for a free, full product license. For more information, see: Academic Program and Licenses.
[111]
About OR-Tools | Google for Developers
Aug 28, 2024 · OR-Tools is open source software for combinatorial optimization, which seeks to find the best solution to a problem out of a very large set of possible ...
[112]
Introduction to gradients and automatic differentiation - TensorFlow
Aug 15, 2024 · Automatic differentiation is useful for implementing machine learning algorithms such as backpropagation for training neural networks.Gradient tapes · Gradients of non-scalar targets · Cases where gradient returns...
[113]
The Leader in Decision Intelligence Technology - Gurobi Optimization
SAP Logo. Gurobi helps SAP deliver a powerful planning solution built for today's challenges. Watch Now. Audi Logo. Audi transformed a three-week manual ...Company · Download Center · Mathematical optimization solver · Contact UsMissing: CPLEX 2020s
[114]
Parameter Guidelines - Gurobi Optimizer Reference Manual
The default will invoke the barrier method, which can take a lot more memory than dual. ... The two most important Gurobi settings when solving a MIP model are ...Continuous Models · Infeasible Or Unbounded... · Mip Models
[115]
Prior Version Enhancements - Gurobi Optimization
Gurobi Version 9.0 delivers significant performance improvements across LP, MIP, and MIQP problem types compared to v8.1. LP – In default settings is 7% faster.
[116]
Mathematical program solvers - IBM CPLEX
Take advantage of a distributed parallel algorithm for mixed integer programming and flexible, high-performance mathematical programming solvers for linear ...
[117]
IloCplex - IBM
IloCplex is the class used to create and solve LP (linear program), QP (program with quadratic terms in the objective function), QCP (quadratically constrained ...Missing: parallel | Show results with:parallel
[118]
[PDF] User's Manual for CPLEX - IBM
This guide contains important information on the procedures and practices followed in the service and support of your IBM products.
[119]
ga - Find minimum of function using genetic algorithm - MATLAB
Use the genetic algorithm to minimize an integer-constrained nonlinear problem. Obtain both the location of the minimum and the minimum function value. The ...
[120]
Problem types MOSEK can solve
Strengths and features of MOSEK · The strongest point of MOSEK is its state-of-the-art interior-point optimizer for continuous linear, quadratic and conic ...
[121]
12.2 Conic Optimization — MOSEK Optimizer API for Python 11.0.29
Conic optimization extends linear optimization, allowing conic domains to be specified for affine expressions, using a Cartesian product of conic domains.
[122]
[PDF] MOSEK Optimization Suite - Documentation
An interior-point conic solver for linear and nonlinear continuous problems, conic problems and quadratic problems. • A mixed-integer solver when the problem ...
[123]
SAP Partners with Gurobi to Enhance and Expand Optimization ...
May 13, 2020 · Under the 10-Year Enterprise Agreement, SAP confirmed Gurobi as the Premier, Long-Term Supplier for Mathematical Optimization Technology.Missing: CPLEX 2020s
[124]
SAP: Mastering Supply Chain Challenges Through Complex ...
With Gurobi, SAP helps their customers automate enterprise resource planning and optimize supply chain networks to deliver optimal outcomes for any variety ...
[125]
[2301.09535] Theory and Implementation of the Quantum ... - arXiv
Jan 23, 2023 · Theory and Implementation of the Quantum Approximate Optimization Algorithm: A Comprehensible Introduction and Case Study Using Qiskit and IBM ...
[126]
D-Wave - The Birth of the World's Most Powerful Quantum Hub
Oct 14, 2025 · D-Wave is a leader in the development and delivery of quantum computing systems, software, and services. We are the world's first commercial ...
[127]
Optuna: A Next-generation Hyperparameter Optimization Framework
Jul 25, 2019 · The purpose of this study is to introduce new design-criteria for next-generation hyperparameter optimization software.
[128]
A Research Platform for Distributed Model Selection and Training
Jul 13, 2018 · We propose Tune, a unified framework for model selection and training that provides a narrow-waist interface between training scripts and search algorithms.
[129]
Creation and analysis of biochemical constraint-based models
The COBRA Toolbox 3.0 provides an unparalleled depth of constraint-based reconstruction and analysis methods. Keywords: Metabolic models, metabolic ...
[130]
JuMP: A Modeling Language for Mathematical Optimization - arXiv
Aug 9, 2015 · JuMP is an open-source modeling language that allows users to express a wide range of optimization problems (linear, mixed-integer, quadratic, conic-quadratic, ...