Fact-checked by Grok 2 weeks ago

Derivative-free optimization

Derivative-free optimization (DFO), also known as black-box optimization, refers to a family of numerical optimization techniques designed to minimize or maximize an objective function using only evaluations of the function itself, without access to or computation of its derivatives such as gradients or Hessians.^[1] These methods are particularly valuable in scenarios where derivative information is unavailable, unreliable, computationally prohibitive, or conceptually inapplicable, such as in simulation-based engineering design, black-box models in machine learning, or problems involving noisy or non-smooth functions.^[2] DFO algorithms can be broadly classified into two main categories: direct-search methods, which explore the search space through systematic sampling or pattern-based moves without building explicit models (e.g., the Nelder-Mead simplex algorithm or mesh adaptive direct search), and model-based methods, which construct surrogate approximations of the objective function—often using linear, quadratic, or radial basis function models—and optimize these within frameworks like trust regions to guide iterations.^[1] The field has roots in mid-20th-century computational efforts, such as early pattern search techniques from the 1950s, and has evolved significantly with theoretical advancements in convergence guarantees and worst-case complexity analysis, including bounds like O(\epsilon^{-2}) function evaluations for first-order criticality in smooth unconstrained problems.^[2] Recent developments emphasize handling stochastic noise, constrained formulations, and large-scale applications, often integrating randomization for robustness or exploiting problem structure like convexity for improved efficiency.^[2] Applications of DFO span diverse domains, including aerodynamic shape optimization in aerospace engineering, hyperparameter tuning in artificial intelligence systems (e.g., Google's AutoML efforts), and calibration of complex physical simulations where derivatives cannot be derived analytically.^[2] While DFO methods typically focus on local optimization of continuous, single-objective problems, extensions address multi-objective, discrete, or global search challenges, though they often require more function evaluations than gradient-based counterparts for high-dimensional problems.^[1] The maturity of the field is evidenced by influential software implementations like NOMAD for direct search and BOBYQA for model-based approaches, enabling practical deployment in industrial and research settings.^[2]

Introduction

Definition and scope

Derivative-free optimization (DFO) addresses the problem of minimizing an objective function f: \mathbb{R}^n \to \mathbb{R} over a domain A \subset \mathbb{R}^n, where access to derivatives such as gradients or Hessians is unavailable, unreliable, or prohibitively expensive.^[1] In this setting, f is treated as a black-box, meaning the optimizer interacts with it solely through direct evaluations at queried points, without exploiting any explicit analytical form or internal structure. The mathematical formulation is thus to solve \min_{x \in A} f(x) using only these function values, often under constraints on the total number of evaluations due to their computational cost.^[2] The scope of DFO encompasses a broad class of problems, including bound-constrained, nonlinear, and nonconvex optimizations, as well as those with noisy or stochastic evaluations arising in simulations and engineering applications.^[1] It explicitly distinguishes itself from derivative-based methods, which rely on first- or second-order information via gradients or Hessians, and from approaches that approximate derivatives using zeroth-order techniques like finite differences. Central to DFO is the concept of black-box evaluation, where each query to f provides an objective value but no additional derivative data, and the evaluation budget serves as the primary resource constraint, typically limiting the method to hundreds or thousands of calls depending on the problem's expense.^[2] A representative example is the unconstrained minimization of a smooth but derivative-inaccessible function, such as \min_{x \in \mathbb{R}^n} f(x), where the optimizer probes points like x_k to obtain f(x_k) and iteratively refines its search based on these values alone, without approximating directional derivatives.^[1] This formulation highlights DFO's reliance on sampling strategies to infer progress toward the optimum within the given budget.

Motivations and challenges

Derivative-free optimization (DFO) arises primarily from practical scenarios where derivative information is unavailable or unreliable, such as in black-box problems involving legacy codes, proprietary software, or complex simulations without explicit functional forms.^[1] In these cases, objective functions are evaluated through oracles or simulators, like finite element models in engineering that can require hours per evaluation, making traditional derivative approximation via finite differences prohibitively expensive or infeasible.^[3] Additionally, DFO is essential for non-differentiable, discontinuous, or multimodal functions where gradients do not exist or cannot be reliably computed, enabling optimization in domains like aerodynamic design or chemical process modeling.^[1] Noise in function evaluations further motivates DFO, as it often stems from measurement errors, stochastic simulations, or approximations in real-world applications, rendering gradient-based methods ineffective due to inconsistent derivative estimates.^[1] For instance, in environmental engineering or financial modeling, noisy black-box evaluations preclude smooth Taylor expansions central to derivative-based approaches like gradient descent, which fail on non-smooth landscapes by getting stuck in local minima or producing erroneous directions. DFO trades the precision of derivative information for broader applicability, allowing robust progress through direct function queries in such "expensive black-box" settings.^[2] Key challenges in DFO include managing limited evaluation budgets, typically ranging from 10^2 to 10^4 function calls, constrained by the high cost of each assessment in simulation-heavy problems.^[1] Algorithms must efficiently sample the space to approximate models or search directions without exceeding these budgets, as seen in benchmarks where even moderate problems demand careful allocation of queries.^[4] Handling multimodality and discontinuity exacerbates this, requiring methods to explore rugged landscapes without gradient guidance, often leading to slower convergence compared to smooth cases.^[1] High dimensionality poses another significant hurdle, with effective DFO limited to around 100 variables on standard hardware, as constructing reliable models or search sets scales poorly—requiring over 5,000 points for a 100-dimensional problem.^[1] In noisy or constrained environments, ensuring poised sample sets for interpolation or regression becomes critical to avoid degeneracy, yet noise can degrade model accuracy and local convergence. These issues highlight DFO's trade-offs: while versatile for real-world irregularities, it demands sophisticated strategies to balance exploration, exploitation, and computational economy.^[1]

Historical development

Early contributions

The origins of derivative-free optimization can be traced to the mid-20th century, with early applications in computational problems during the 1950s, such as Fermi and Metropolis's use of coordinate search on the MANIAC computer for nonlinear least-squares fitting.^[5] Significant advancements occurred in the 1960s amid growing computational capabilities, as methods emerged from operations research and numerical analysis to tackle problems where derivatives were impractical or unavailable, such as in early computational chemistry simulations.^[5] The Hooke-Jeeves pattern search method, introduced in 1961, employed exploratory and pattern moves based solely on function values to solve unconstrained numerical and statistical problems, marking a foundational "direct search" strategy.^[6] In 1964, Michael J. D. Powell developed a conjugate direction method that iteratively adjusted search directions using quadratic models built from function evaluations, enhancing efficiency for multidimensional minimization without derivatives.^[7] The simplex method originated with Spendley, Hext, and Himsworth in 1962 for sequential optimization, and was refined in 1965 by John A. Nelder and Roger Mead into the Nelder-Mead algorithm, which iteratively transforms a simplex through reflection, expansion, contraction, and shrinkage operations on function values at vertices, proving effective for practical engineering tasks.^[8]^[9] Prior to the 1980s, these techniques emphasized deterministic, local search heuristics without guarantees of global optimality, focusing on reliable convergence to nearby minima in noisy or black-box settings typical of early simulation-based applications.^[5] This era's contributions established derivative-free optimization as a vital alternative to gradient-based methods, particularly in domains like chemical process modeling where experimental evaluations substituted for analytical information.^[5]

Key advancements since 2000

Since 2000, derivative-free optimization (DFO) has evolved from heuristic-dominated techniques toward algorithms with rigorous convergence analyses and improved scalability, driven by theoretical advancements and practical demands in complex simulations. This shift is evident in the classification of methods into direct search and model-based categories, as surveyed by Xi et al., who emphasize the development of provably efficient frameworks that address limitations of earlier heuristic methods like Nelder-Mead.^[10] Seminal publications include the 2009 book Introduction to Derivative-Free Optimization by Conn, Scheinberg, and Vicente, which systematically covers sampling strategies and model construction for DFO, serving as a cornerstone for subsequent research.^[1] Complementing this, Audet and Dennis's 2006 framework for mesh adaptive direct search (MADS) algorithms extended pattern search methods to constrained problems, providing convergence guarantees under mild assumptions on function evaluations.^[11] The 2000s marked the rise of trust-region model-based methods, such as Powell's NEWUOA algorithm, which constructs quadratic interpolation models iteratively within trust regions to minimize unconstrained functions without derivatives.^[12] In the 2010s, DFO integrated with machine learning through Gaussian processes, notably in Bayesian optimization frameworks that model objective uncertainties for efficient black-box searches in hyperparameter tuning. The COCO platform, established in the 2010s, enabled standardized black-box benchmarking of continuous optimizers, promoting reproducible evaluations across diverse problem instances.^[13] Post-2020 developments reflect growing emphasis on high-dimensional and stochastic DFO, spurred by AI applications like reinforcement learning where derivative information is unavailable or noisy. In 2021, full-low evaluation methods were introduced, alternating between iterations using full model evaluations and low-cost approximations to achieve robust performance on noisy or nonsmooth functions.^[14] By 2025, model-and-search hybrids emerged, combining quadratic model steps with direct search polls to enhance convergence for bound-constrained local optimization.^[15]

Theoretical foundations

Convergence properties

Derivative-free optimization (DFO) algorithms aim to identify first-order stationary points, where the gradient norm satisfies \|\nabla f(x)\| \leq \epsilon for a small decrease parameter \epsilon > 0, ensuring no significant descent direction exists.^[1] This notion of \epsilon-stationarity replaces exact criticality when derivatives are unavailable, relying on function evaluations to approximate descent properties. In trust-region frameworks, a core mechanism is the sufficient decrease condition, which requires the step d to satisfy f(x + d) \leq f(x) + \eta \nabla f(x)^T d for some \eta \in (0, 1), guaranteeing progress toward stationarity.^[1] A fundamental property of deterministic DFO methods is their convergence to first-order stationary points under the assumption that the objective function f has Lipschitz continuous gradients, ensuring bounded variations in function behavior.^[1] This allows algorithms to generate sequences approaching points where no further decrease is possible within the Lipschitz constant. For stochastic variants, convergence relies on probabilistic models that approximate the function with high probability, enabling guarantees for noisy or black-box settings where exact evaluations are infeasible.^[16] In direct search methods, such as the mesh adaptive direct search (MADS) framework, convergence to \epsilon-stationarity is achieved in O(\epsilon^{-2}) iterations, with function evaluations scaling as O(n \epsilon^{-2}) due to polling multiple directions per iteration, by systematically refining a mesh of trial points to detect descent directions.^[17] Model-based methods, by contrast, construct local quadratic approximations to exploit curvature, yielding faster local convergence rates near stationary points compared to linear or sampling-based approaches.^[1] The Cauchy decrease condition underpins many DFO trust-region steps, stating that a step d along an approximate steepest descent direction provides

f(x + d) \leq f(x) + \nabla f(x)^T d,

which is approximated using finite differences from function values without explicit gradient computation, ensuring at least a minimal reduction proportional to the gradient norm.^[1]

Complexity bounds

In derivative-free optimization (DFO), the evaluation complexity refers to the worst-case number of function evaluations required to achieve a specified accuracy, typically measured by first-order criticality \|\nabla f(x)\| \leq \epsilon for smooth functions or analogous measures for nonsmooth cases. Zeroth-order oracles, which provide only function values, generally exhibit higher complexity than first-order oracles that supply gradients, but for smooth nonconvex minimization, certain DFO methods match the \mathcal{O}(\epsilon^{-2}) bound of gradient-based algorithms in terms of iterations, though function evaluations often scale with problem dimension n.^[18] For smooth functions, model-based DFO methods achieve \epsilon-first-order accuracy in \mathcal{O}(\epsilon^{-2}) iterations, with function evaluations scaling as \mathcal{O}(n^2 \epsilon^{-2}) in trust-region and regularization frameworks that build local models without derivatives.^[18] In direct search methods, which approximate gradients via finite differences along coordinate directions, the worst-case complexity is \mathcal{O}(n \epsilon^{-2}) evaluations, reflecting the \mathcal{O}(n) cost per gradient estimate across \mathcal{O}(\epsilon^{-2}) iterations.^[18] For nonsmooth functions, particularly those with Lipschitz continuity, the complexity worsens to up to \mathcal{O}(\epsilon^{-2} n^2) evaluations in trust-region DFO, due to the need for denser sampling to handle discontinuities and ensure progress.^[19] High-order models in DFO, which approximate derivatives up to order p, improve complexity for higher-order stationarity; specifically, the iteration complexity to achieve \epsilon-approximate p-th-order criticality is \mathcal{O}(\epsilon^{-(p+1)/p}), with evaluation complexity further scaling with the dimension n based on model construction costs, extending classical regularization results to the derivative-free setting. In Bayesian optimization, a probabilistic DFO approach for expensive black-box functions, regret bounds quantify cumulative suboptimality; for example, using expected improvement as an acquisition function, the expected simple regret is bounded by \mathcal{O}(T^{-1/2} \gamma_T^{1/2}) after T evaluations, where \gamma_T is the maximum information gain, assuming Gaussian process priors. Recent advancements in adaptive sampling for noisy functions have lowered complexity bounds in stochastic DFO settings. For instance, adaptive finite-difference schemes combined with trust-region steps achieve expected evaluation complexity of \mathcal{O}(\epsilon^{-2}) for nonconvex noisy minimization, outperforming uniform sampling by dynamically adjusting perturbation sizes to mitigate noise variance.^[20] Similar results hold for strongly convex cases, with bounds reduced to \mathcal{O}(\log(1/\epsilon)) under appropriate noise assumptions.^[21]

Methods

Direct search methods

Direct search methods constitute a foundational class of derivative-free optimization algorithms that systematically explore the search space by generating trial points along predefined directions from the current iterate, with decisions to accept or reject these points based exclusively on objective function value comparisons. Unlike model-based approaches, these methods impose no assumptions on the smoothness or differentiability of the objective function and avoid constructing explicit approximations, relying instead on direct evaluations to drive progress toward local minima. This simplicity makes them particularly appealing for black-box problems where derivative information is unavailable or unreliable.^[1] The core mechanism of direct search methods typically involves an optional search step, where a broader exploration may identify promising directions without strict requirements, followed by a mandatory poll step that generates a finite set of trial points in a structured manner to ensure sufficient coverage of the space. In the poll step, directions are chosen from a positive spanning set, a collection of vectors that positively span \mathbb{R}^n—meaning any direction in the space can be expressed as a positive linear combination of them—to guarantee that feasible descent directions are not overlooked if they exist. If a successful poll point yields a better function value, it becomes the new iterate, and step sizes may expand; otherwise, step sizes contract to refine the search. A key update in the poll step for pattern search variants is given by

x_{k+1} = x_k + \alpha_k d,

where d is a direction from the positive spanning set, and \alpha_k > 0 is a step length scaled to achieve a sufficient decrease in the objective function, often enforced via an Armijo-like condition adapted for derivative-free settings.^[1] Prominent examples include the Nelder-Mead algorithm, which operates on a non-degenerate simplex of n+1 points in n dimensions and iteratively sorts vertices by function values to perform operations such as reflection across the opposite face, expansion along the reflected direction, contraction toward the best point, or complete shrinkage of the simplex. Introduced in 1965, this method excels in low-dimensional problems due to its heuristic efficiency but lacks guaranteed convergence in general cases without modifications. Pattern search methods, revitalized in modern form through the use of positive spanning sets, generate poll directions that form such sets at each iteration, with step sizes decreasing geometrically to ensure convergence to stationary points under mild conditions like sufficient decrease. These were formalized in foundational works emphasizing their robustness for unconstrained and constrained problems. Directional direct search extends this framework by incorporating adaptive meshes that refine locally, as in the mesh adaptive direct search (MADS) algorithms, which poll along directions from frames that positively span the space while allowing variable exploration scales to handle constraints effectively. Direct search methods are well-suited to low-to-moderate dimensional problems (typically n < 20), as the computational cost per iteration scales linearly with dimension due to the fixed number of poll directions required for positive spanning, making high-dimensional applications inefficient without parallelization. Their design inherently confers robustness to noisy function evaluations, since evaluations are used only for ordinal comparisons rather than for fitting sensitive models or approximating gradients that noise could distort. Furthermore, by eschewing any explicit modeling of the objective, these methods avoid assumptions about its form, enabling application to nonsmooth, discontinuous, or multimodal functions where other derivative-free techniques might falter.^[1]

Model-based methods

Model-based methods in derivative-free optimization construct local surrogate models of the objective function using function evaluations at carefully selected sample points, typically through interpolation or regression techniques, to approximate the function's behavior without requiring derivatives. These approximations guide the search by solving an optimization subproblem on the model, often confined to a trust region around the current iterate to ensure the model's reliability in a local neighborhood. The trust region radius is dynamically adjusted based on how well the model's predictions match actual function values at trial points, promoting efficient progress toward the optimum. This approach contrasts with direct search methods by leveraging explicit function approximations to reduce the number of evaluations needed, especially in noisy or expensive black-box settings. Common model forms include linear and quadratic polynomials. For linear interpolation, the model at iteration k can be expressed as

m_k(x) = f(y) + \nabla m_k(y)^T (x - y),

where y is a nearby sample point and \nabla m_k(y) is determined by interpolation conditions on a set of points. Quadratic models extend this to include second-order terms, such as

m_k(x) = f(x_k) + \nabla m_k(x_k)^T (x - x_k) + \frac{1}{2} (x - x_k)^T B_k (x - x_k),

with B_k approximating the Hessian. The trust-region subproblem is then solved as

\min_x m_k(x) \quad \text{subject to} \quad \|x - x_k\| \leq \Delta_k,

where \Delta_k > 0 defines the region of trust, and the solution suggests the next iterate or trial step. Prominent algorithms include COBYLA, introduced by Powell in 1994, which builds linear models via simplex-based interpolation to address constrained nonlinear optimization, iteratively refining approximations while handling inequality and equality constraints through a sequence of feasible points. Another foundational method is the trust-region interpolation framework by Conn, Scheinberg, and Vicente (2009), which uses underdetermined or fully determined quadratic interpolation sets to ensure well-poised sampling geometries, enabling global convergence guarantees under mild conditions; this was further detailed in their 2009 monograph on derivative-free methods.^[1] Surrogate-based variants employ flexible nonlinear models, such as radial basis functions (RBFs), which interpolate via sums of radially symmetric kernels centered at sample points, offering smooth approximations well-suited for multimodal objectives. Kriging, or Gaussian process regression, serves as another key surrogate, modeling the objective as a probabilistic draw from a Gaussian process to quantify prediction uncertainty alongside the mean. In Bayesian optimization, a probabilistic extension of model-based methods, this surrogate informs an acquisition function like expected improvement, which selects the next sample by maximizing the anticipated gain over the current best value, thus balancing local exploitation and global exploration in high-cost settings. These techniques excel in moderate-dimensional problems (typically n = 10 to $50), where constructing and optimizing low-order models remains computationally tractable, and they naturally accommodate constraints via penalized models or feasibility enforcement in the trust region.

Applications

Engineering and design

Derivative-free optimization (DFO) plays a pivotal role in engineering disciplines where objective functions are derived from computationally expensive simulations, such as computational fluid dynamics (CFD) or finite element analysis (FEA), without requiring gradient information. These methods are particularly suited to black-box models where analytical derivatives are unavailable or impractical to compute, enabling iterative design improvements in physical systems. In engineering contexts, DFO facilitates the exploration of complex design spaces involving continuous and discrete variables, often integrating with simulation tools to minimize performance metrics like drag, stress, or weight while respecting constraints on manufacturability and safety. A prominent application is aerodynamic shape optimization, exemplified by NASA's use of pattern search—a derivative-free direct search method—in supersonic airfoil design. In one study, researchers applied a multifidelity pattern search algorithm to minimize drag for a supersonic business jet airfoil at Mach 1.5, using low-fidelity panel methods to guide high-fidelity CFD evaluations with Cart3D, achieving convergence with approximately 68 high-fidelity evaluations compared to 314 for gradient-based sequential quadratic programming (SQP). This approach optimized 11 design variables, including spline points for airfoil thickness and camber, while enforcing geometric constraints like a minimum thickness-to-chord ratio of 0.05, demonstrating DFO's efficiency in reducing computational costs by 78-96% over adjoint-based alternatives.^[22]^[22] In structural optimization, DFO addresses problems coupled with FEA for evaluating mechanical responses under loads, where each simulation can take hours due to mesh complexity. For instance, model-based DFO methods like radial basis function (RBF) interpolation have been employed to minimize vertical displacement in arc-shaped space frames, using Autodesk Robot Structural Analysis for black-box evaluations; RBF-linear models outperformed genetic algorithms and direct search methods like DIRECT, highlighting DFO's suitability for expensive, nonlinear structural simulations in architectural engineering. These applications often involve hundreds to thousands of FEA runs, underscoring DFO's value in handling high-dimensional problems without derivative approximations.^[23]^[23] DFO finds extensive use in automotive engineering for crash simulations, where finite element models predict energy absorption and intrusion during impacts. Evolutionary strategies like covariance matrix adaptation evolution strategy (CMA-ES) have been applied to surrogate models representative of full-vehicle FE simulations across 20 crash scenarios (e.g., side, frontal, roof), to balance crashworthiness, weight, and durability; this derivative-free approach enables robust exploration of multimodal landscapes, with surrogate models reducing the need for repeated expensive evaluations.^[24] In aerospace, DFO supports wing design, as seen in Boeing's adoption of filter pattern search in the early 2000s for multidisciplinary planform optimization of large and small aircraft wings. Implemented via the FOCUS software, this method optimized mission range under constraints without penalties or derivatives, integrating surrogate models to accelerate convergence and yielding economically viable designs that became standard for Boeing's planform processes.^[24]^[25] A key advantage of DFO in CFD-heavy engineering tasks is the elimination of adjoint sensitivity computations, providing robustness against local minima and flexibility with unstructured grids.^[26] Furthermore, DFO seamlessly integrates with frameworks like Sandia's DAKOTA toolkit, which embeds methods such as pattern search and DIRECT for engineering simulations; for example, DAKOTA has optimized compliant microelectromechanical systems (MEMS) switches using 13 design variables to enhance reliability under uncertainty, coupling DFO with Gaussian processes for efficient global searches.^[26]^[27] DFO also accommodates mixed-integer variables common in engineering, such as discrete material choices or topology selections alongside continuous geometries, through hybrid algorithms that branch on integers within continuous searches. These capabilities position DFO as indispensable for simulation-driven engineering, from initial concept to detailed design.

Machine learning and data science

Derivative-free optimization (DFO) plays a pivotal role in machine learning and data science, particularly for tasks involving expensive, noisy, or black-box objective functions where gradients are unavailable or impractical to compute. In hyperparameter optimization, DFO methods such as Sequential Model-based Algorithm Configuration (SMAC), which employs random forest surrogates to guide the search, and Hyperopt, utilizing tree-structured Parzen estimators, have been widely adopted for tuning neural networks and other models.^[28]^[29] These approaches efficiently explore high-dimensional spaces by sequentially selecting promising configurations based on surrogate models, outperforming grid search in scenarios with limited evaluation budgets. Bayesian optimization, a prominent DFO technique, has emerged as the dominant method for hyperparameter tuning in production environments, as exemplified by Google Vizier, a service that integrates Gaussian processes and other surrogates to optimize black-box functions with noisy evaluations like cross-validation scores. This method excels in handling the stochasticity inherent in machine learning evaluations, such as k-fold cross-validation, by modeling uncertainty and focusing on acquisition functions that balance exploration and exploitation. From 2020 to 2025, DFO integration in automated machine learning (AutoML) frameworks has seen significant growth, with libraries like Optuna enabling scalable hyperparameter optimization through diverse samplers including Gaussian processes and covariance matrix adaptation evolution strategy.^[30] In neural architecture search (NAS), DFO facilitates the exploration of discrete architectural spaces under constrained evaluation budgets, often using Bayesian optimization or evolutionary algorithms to select promising network topologies without relying on differentiable relaxations. Similarly, in reinforcement learning, DFO methods are applied to policy tuning, optimizing hyperparameters like learning rates or entropy coefficients in actor-critic algorithms, where direct policy optimization via zeroth-order techniques avoids gradient instabilities in high-dimensional action spaces. Key advantages of DFO in these contexts include scalability to moderately high dimensions through Gaussian process surrogates with inducing points or random Fourier features, and regret minimization frameworks that ensure cumulative performance improvements in sequential decision-making settings. For instance, tuning the regularization parameter C and kernel coefficient \gamma for a support vector machine (SVM) via Bayesian optimization involves evaluating the objective—typically the negative cross-validation accuracy—across k-fold splits to account for noise, often requiring fewer than 100 trials to approach expert-level performance on datasets like MNIST.^[29]

Evaluation and benchmarks

Test function suites

Test function suites provide standardized collections of benchmark problems for evaluating derivative-free optimization algorithms, ensuring reproducible comparisons across methods by simulating black-box scenarios where only function evaluations are available.^[31] These suites typically include a diverse set of functions that capture common challenges in optimization, such as multimodality, ill-conditioning, separability, and noise, often scalable to various dimensions to assess performance in low- to high-dimensional spaces.^[32] The black-box optimization benchmarking (BBOB) suite, introduced in 2009 and maintained through ongoing workshops, exemplifies this approach with 24 noiseless, single-objective functions defined in dimensions from 2 to 40, including variants for separable, ill-conditioned, and multimodal landscapes.^[33] BBOB also features a noisy counterpart (bbob-noisy) with 30 functions incorporating additive Gaussian or search-centric noise to test robustness. The COCO (COmparing Continuous Optimizers) platform, developed in the early 2010s, automates experimentation on these suites by providing tools for observer-based evaluation, where performance is measured via the number of function evaluations needed to achieve predefined target precisions, independent of problem dimension.^[31] This setup facilitates fair comparisons by normalizing for scaling and focusing on anytime performance profiles. Recent extensions include the bbob-largescale suite (introduced 2019) for dimensions 20, 40, 80, 160, 320, and 640, addressing high-dimensional challenges, and constrained variants like bbob-constrained (released 2022) for problems with nonlinear inequalities.^[34] The BBOB workshop series continues annually; for example, the 2024 edition at GECCO introduced the many-affine BBOB (MA-BBOB) suite, extending the classic functions for testing anytime algorithms on composed affine problems.^[35] For broader testing beyond black-box single-objective cases, the CUTEst (Constrained and Unconstrained Testing Environment) suite offers over 1300 problems, including both constrained and unconstrained nonlinear functions drawn from real-world applications, suitable for derivative-free methods that can handle gradients if available but evaluated in black-box mode.^[36] CUTEst supports dynamic memory allocation and thread-safe implementations, enabling large-scale testing on diverse problem classes like bound-constrained and equality-constrained optimizations.^[37] Multi-objective extensions are covered by the bbob-biobj suite, released in 2016, which comprises 55 bi-objective functions derived by linearly combining pairs from the original BBOB functions, scalable in dimensions 2 to 40 and evaluated on the COCO platform to assess Pareto front approximation quality.^[38] A classic example within these suites is the Rosenbrock function, a multimodal quadratic test problem adapted for black-box use in BBOB (as function f3), where the objective is to minimize a banana-shaped valley without relying on its known gradient form, highlighting challenges in escaping local minima.^[32]

Performance metrics and comparisons

Performance in derivative-free optimization is typically assessed using metrics that quantify efficiency in terms of function evaluations, as these represent the primary computational cost in black-box settings. A key measure is the expected runtime, defined as the number of function evaluations required to achieve ε-accuracy, where ε is a predefined tolerance for the objective function value or optimality condition.^[39] This metric emphasizes the solver's ability to converge within a limited evaluation budget, which is critical for expensive simulations. Another fundamental metric is the success rate, computed over multiple bootstrap runs to account for stochasticity in noisy or randomized algorithms, indicating the proportion of trials that reach the target accuracy.^[40] Anytime performance profiles, introduced by Dolan and Moré, provide a visual and quantitative comparison across a suite of problems by plotting the cumulative distribution of performance ratios. The performance ratio for an algorithm f on problem p is given by

\tau_f(p) = \frac{\# \text{ evaluations for algorithm } f \text{ to target}}{\# \text{ evaluations for the best algorithm on } p},

where values below 1 indicate superior performance relative to the best solver.^[41] These profiles highlight scalability with evaluation budgets and are widely used in benchmarking, often revealing trade-offs in robustness versus speed. Data profiles extend this by focusing on success rates under budget constraints, particularly useful for derivative-free solvers.^[39] Empirical comparisons on benchmarks like the Black-Box Optimization Benchmarking (BBOB) suite demonstrate distinct strengths between direct search and model-based methods. Model-based approaches, such as trust-region interpolation methods, generally outperform direct search methods like pattern search on smooth, low-dimensional problems by leveraging surrogate models for faster convergence, as evidenced in evaluations on CUTEr test problems where model-based solvers required fewer evaluations.^[39] For instance, in noiseless BBOB settings, algorithms like NEWUOA (model-based) achieve higher success rates on unimodal functions compared to direct search variants like DMS.^[42] Conversely, direct search methods exhibit better noise tolerance, maintaining reliability in stochastic environments where model accuracy degrades.^[40] Dimensionality scaling poses challenges for both classes, with performance degrading rapidly beyond 10-20 variables due to the curse of dimensionality, though model-based methods scale better on separable smooth functions.^[10] Recent surveys from 2019-2020 confirm these trends, noting model-based superiority in smooth cases across comprehensive tests.^[40]^[10] Software libraries facilitate such comparisons; for example, NLopt implements both COBYLA (model-based) and DIRECT (direct search), with benchmarks showing COBYLA's edge on smooth objectives in a review of 22 algorithms on 502 problems.^[43] PyMOO, focused on multi-objective extensions, integrates similar DFO solvers and reports consistent patterns in empirical profiles.^[44]

References

[1]
Introduction to Derivative-Free Optimization
This book is the first contemporary comprehensive treatment of optimization without derivatives, and it covers most of the relevant classes of algorithms.
[2]
[PDF] Derivative-free optimization methods - UC Davis Math
We provide a review and perspectives on developments in these methods, with an emphasis on high- lighting recent developments and on unifying treatment of such ...
[3]
[PDF] DERIVATIVE-FREE OPTIMIZATION ALGORITHMS FOR ...
Our motivation for developing ORBIT is optimization of problems in Environmental. Engineering relying on complex numerical simulations of physical phenomena ...
[4]
Model-Based Derivative-Free Optimization Methods and Software
Oct 21, 2022 · These methods are motivated by optimization problems for which it is impossible or prohibitively expensive to access the first-order information ...
[5]
[PDF] DERIVATIVE-FREE OPTIMIZATION OF NOISY FUNCTIONS VIA ...
Mar 27, 2018 · by any solver within a given budget of 100×n function evaluations. ... 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000. Number of ...
[6]
https://doi.org/10.1093/comjnl/5.1.9
[7]
https://doi.org/10.1093/comjnl/7.2.155
[8]
https://doi.org/10.1093/comjnl/7.4.308
[9]
https://doi.org/10.1080/00401706.1962.10490033
[10]
Survey of derivative-free optimization
In this survey paper we present an overview of derivative-free optimization, including basic concepts, theories, derivative-free methods and some ...
[11]
Mesh Adaptive Direct Search Algorithms for Constrained Optimization
We introduce the mesh adaptive direct search (MADS) class of algorithms which extends the generalized pattern search (GPS) class by allowing local exploration.
[12]
The NEWUOA software for unconstrained optimization without ...
The NEWUOA software seeks the least value of a function F(x), x∈Rn, when F(x) can be calculated for any vector of variables x. The algorithm is iterative, ...
[13]
COCO: COmparing Continuous Optimizers – COmparing ...
The COCO platform has been used for the Black-Box-Optimization-Benchmarking (BBOB) workshops that took place during the GECCO conference in 2009, 2010, 2012 ...
[14]
Full-low evaluation methods for derivative-free optimization - arXiv
We propose a new class of rigorous methods for derivative-free optimization with the aim of delivering efficient and robust numerical ...
[15]
Model-and-search: a derivative-free local optimization algorithm
May 11, 2025 · Model-and-Search (MAS) is a local-search derivative-free optimization (DFO) algorithm that addresses a bound-constrained optimization problem.
[16]
Convergence of Trust-Region Methods Based on Probabilistic Models
In this paper we prove global convergence for first- and second-order stationary points of a class of derivative-free trust-region methods for unconstrained ...
[17]
On the Oracle Complexity of First-Order and Derivative-Free ...
C. Cartis, N. I. M. Gould, and Ph. L. Toint (2010), On the complexity of steepest descent, Newton's and regularized Newton's methods for nonconvex unconstrained ...
[18]
Trust-Region Methods Without Using Derivatives: Worst Case ...
In this paper, we start by analyzing the worst case complexity of general trust-region derivative-free methods for smooth functions. For the nonsmooth case, we ...
[19]
Derivative-Free Optimization via Adaptive Sampling Strategies - arXiv
Apr 18, 2024 · In this paper, we present a novel derivative-free optimization framework for solving unconstrained stochastic optimization problems.Missing: noisy bounds 2023 2025
[20]
https://arxiv.org/abs/2404.11893
[21]
[PDF] Multifidelity Analysis and Optimization for Supersonic Design
SMF is a derivative-free pattern-search method augmented with a pre- diction of a locally optimal design from a surrogate model. The underlying pattern-search.
[22]
[PDF] Derivative-free Methods for Structural Optimization - CumInCAD
These types of tools can simulate a building's struc- tural behavior given a specific load using finite ele- ment analysis. Similarly, they can compute the use-.
[23]
Generating Cheap Representative Functions for Expensive ...
Jun 8, 2024 · Solving real-world engineering optimization problems, such as automotive crashworthiness optimization, is extremely challenging, ...
[24]
[PDF] Optimization Tools for Engineering Design Using Surrogate Functions
Nov 30, 2000 · "Derivative-free optimization for general constrained nonlinear prob- lems". ... About a year ago, we provided Boeing with the design ...Missing: multidisciplinary | Show results with:multidisciplinary
[25]
[PDF] Comparison between Gradient-free and Adjoint Based Aerodynamic ...
This paper describes investigations of planform and shape optimizations for a flying wing transport aircraft with an Euler continuous adjoint method.
[26]
[PDF] Derivative-free Optimization Methods in DAKOTA with Applications
Unified software infrastructure: reuse tools and common interfaces; integrate commercial, open-source, and research algorithms.
[27]
[PDF] Derivative-free Methods for Mixed-Integer Constrained Optimization ...
In this paper, we propose derivative-free algorithms for solving continuously differentiable Mixed Integer NonLinear Programming problems with general nonlinear ...
[28]
Surrogate-Based Aerodynamic Shape Optimization by Variable ...
The pattern-search algorithm, a derivative-free optimization method [32], is used in the surrogate model optimization. The design variables are the airfoil ...
[29]
[PDF] Sequential Model-Based Optimization for General Algorithm ...
SMBO framework: Sequential Model-based Algorithm Configuration (SMAC). SMAC can be understood as an extension of ROAR that selects configurations based on a ...
[30]
Practical Bayesian Optimization of Machine Learning Algorithms
This paper uses Bayesian optimization with a Gaussian process to automatically tune machine learning algorithms, achieving results exceeding expert-level ...
[31]
Hyperparameter optimization: Foundations, algorithms, best ...
Jan 16, 2023 · Our focus is on providing a general overview of HPO without a special focus on concrete ML model classes. However, since the ML field has many ...Abstract · HYPERPARAMETER... · PIPELINING... · PRACTICAL ASPECTS OF HPO
[32]
COCO: A Platform for Comparing Continuous Optimizers in a Black ...
Mar 29, 2016 · COCO is an open-source platform for comparing continuous optimizers in a black-box setting, aiming to automate benchmarking of numerical ...
[33]
bbob - COCO Home
The blackbox optimization benchmarking (bbob) test suite is COCO's standard test suite with 24 noiseless, single-objective and scalable test functions.
[34]
Black-box optimization benchmarking - ACM Digital Library
Black-box optimization benchmarking: results for the BayEDAcG algorithm on the noisy function testbed. Author: Marcus R. Gallagher.
[35]
The Large Scale Black-Box Optimization Benchmarking (bbob ...
Mar 15, 2019 · The bbob-largescale test suite contains 24 single-objective functions in continuous domain, extending the bbob test suite to large dimensions.
[36]
[PDF] CUTEst: a Constrained and Unconstrained Testing Environment ...
May 5, 2013 · CUTEst is a constrained and unconstrained testing environment with dynamic memory allocation, thread-safe Fortran, and a modern thread-safe ...
[37]
ralna/CUTEst: The Constrained and Unconstrained Testing ... - GitHub
CUTEst. The Constrained and Unconstrained Testing Environment with safe threads (CUTEst) for optimization software. See the wiki for download and installation ...
[38]
[PDF] The Bi-objective Black Box Optimization Benchmarking (bbob-biobj ...
Apr 1, 2016 · The bbob-biobj test suite contains 55 bi-objective functions in continuous domain which are derived from combining functions of the ...
[39]
Benchmarking Derivative-Free Optimization Algorithms
We propose data profiles as a tool for analyzing the performance of derivative-free optimization solvers when there are constraints on the computational budget.
[40]
[1904.11585] Derivative-free optimization methods - arXiv
Apr 25, 2019 · Derivative-free optimization methods are used when objective functions lack derivative information, often from a black-box or simulation oracle.Missing: 2020-2025 | Show results with:2020-2025
[41]
Benchmarking optimization software with performance profiles
distribution functions for a performance metric — as a tool for benchmarking and comparing optimization software.
[42]
DMS and MultiGLODS: black-box optimization benchmarking of two ...
We propose a novel multiobjective derivative-free methodology, calling it direct ... This paper presents results of the BBOB-2009 benchmarking of 31 search ...
[43]
Derivative-free optimization: a review of algorithms and comparison ...
Jul 12, 2012 · The paper presents a review of derivative-free algorithms, followed by a systematic comparison of 22 related implementations using a test set of 502 problems.
[44]
pymoo: Multi-objective Optimization in Python
Our framework offers state of the art single- and multi-objective optimization algorithms and many more features related to multi-objective optimization.Algorithms · Getting Started · NSGA-II: Non-dominated... · ProblemsMissing: NLopt comparisons derivative- free