Fact-checked by Grok 2 weeks ago

Surrogate model

A surrogate model, also known as a metamodel or , is a computationally inexpensive mathematical designed to replicate the input-output of a more complex and resource-intensive model, often constructed from a limited set of data samples or simulations. These models are particularly valuable in fields where direct evaluations of the original model—such as high-fidelity simulations in or scientific —are prohibitively slow or costly, enabling faster without significant loss of accuracy. Surrogate models find widespread application in optimization, , , and real-time decision-making across disciplines including hydrology, , aerospace design, and . For instance, in modeling, they approximate distributed flow simulations like those from to facilitate model calibration and scenario testing that would otherwise require extensive computational resources. In optimization contexts, surrogate-based approaches integrate these approximations into iterative algorithms to efficiently search for optimal designs or parameters, often achieving speedups of orders of magnitude while maintaining near-optimal solutions. Common types of surrogate models include data-driven methods, which build empirical relationships from input-output pairs using techniques such as , regression (Kriging), neural networks, and support vector machines; projection-based methods, which reduce model dimensionality by projecting equations onto lower-dimensional subspaces via approaches like ; and multifidelity methods, which leverage models of varying resolution or simplified physics to balance accuracy and efficiency. The choice of method depends on the problem's characteristics, with data-driven surrogates being prevalent due to their flexibility but potentially limited by sampling requirements, while physics-informed variants incorporate for improved generalization. Key advantages of surrogate models encompass drastic reductions in computational time—sometimes by factors of 10,000 or more—enhanced during , and the ability to handle high-dimensional spaces that challenge full models. However, limitations include potential inaccuracies in unsampled regions of the input space and the need for careful validation to ensure fidelity to the original model. Ongoing advancements, such as hybrid techniques combining with physical principles, continue to expand their utility in complex, data-scarce environments.

Fundamentals

Definition

A surrogate model is a computationally efficient approximation of a high-fidelity, expensive-to-evaluate system or simulation model, designed to mimic its input-output behavior using limited data from the original model. These models replace complex black-box functions, such as those arising in engineering simulations, with simpler representations that enable rapid predictions while preserving essential characteristics of the true system. Key characteristics of surrogate models include their significantly lower computational cost compared to the original model, often requiring only tens of evaluations rather than thousands to construct and use. They can be either statistical, incorporating estimates, or deterministic approximations, and are typically built from sparse sampled data points obtained through techniques. This construction assumes underlying continuity and smoothness in the true model's response, allowing the surrogate to interpolate or regress effectively across the input . For instance, in , the true model might be a full (CFD) simulation of over an , which demands extensive resources, whereas the surrogate could be a simplified fitted to a handful of CFD outputs to approximate drag coefficients. Such approximations facilitate iterative processes like optimization by filtering noise and providing quick evaluations, though their accuracy depends on the quality and quantity of the training data from the high-fidelity model. This approach presupposes familiarity with foundational modeling concepts in and scientific disciplines, where systems are often represented as functions mapping inputs to outputs.

Goals

Surrogate models are primarily employed to reduce the computational expense associated with repeated evaluations of complex, high-fidelity simulations in and scientific workflows. By providing approximate representations of these expensive models, surrogates enable faster iterations during design exploration and optimization processes, where direct evaluations of the true model—such as those involving or finite element analysis—would otherwise be prohibitive due to time and resource constraints. This approximation allows practitioners to perform numerous assessments efficiently, facilitating the identification of promising designs without exhaustive simulations. Key benefits include accelerating simulations for applications, approximating black-box functions whose internal mechanisms are inaccessible, and managing high-dimensional input spaces that exacerbate computational demands. For instance, in optimization tasks, surrogate models can predict performance metrics across vast design spaces, enabling global search strategies that converge on using only tens of evaluations rather than thousands. They also support simulations by offering rapid approximations that incorporate , thus aiding in reliability assessments and under variability. These advantages stem from the surrogate's role as a data-driven "curve fit," which filters and emulates underlying relationships with sufficient fidelity for practical use. However, deploying surrogate models involves inherent trade-offs, particularly in balancing accuracy against efficiency. While they achieve substantial speedups—often orders of magnitude faster than the original model—they may introduce approximation errors that require careful validation to ensure reliability in critical applications. Additionally, some surrogate approaches, such as neural network-based models, can sacrifice physical interpretability, obscuring insights into causal relationships that more transparent methods like response surfaces preserve. Surrogates are thus most suitable for scenarios like expensive finite element analyses in or stochastic simulations in , where the gains in computational efficiency outweigh potential losses in precision or explainability, provided the model is tuned appropriately.

Historical Development

The origins of surrogate modeling trace back to the development of (RSM) in the field of statistics and , introduced by and K. B. Wilson in their seminal 1951 paper, which proposed polynomial approximations to model and optimize responses from physical experiments. This approach laid the groundwork for using simplified mathematical representations to approximate complex systems, initially focused on industrial processes and applications. Box's subsequent contributions, including collaborations on sequential experimentation, further solidified RSM as a foundational technique for empirical modeling. A significant advancement occurred in the 1960s with the introduction of in by Georges Matheron, who formalized it as an optimal method based on spatial correlations, honoring the earlier work of D. G. Krige. Kriging extended surrogate concepts to handle uncertainty in spatial data, particularly in mining and earth sciences. By the late 1980s, these ideas were adapted to deterministic computer experiments through the work of Sacks et al. in 1989, who applied predictors to model outputs from simulation codes, emphasizing space-filling designs for efficient approximation. The 1990s marked the expansion of surrogate models into engineering optimization, driven by increasing computational capabilities that enabled their integration with multidisciplinary design problems, such as and . In the , surrogates were increasingly coupled with evolutionary algorithms to address expensive black-box optimizations, as surveyed by Jin in , enhancing global search efficiency in fields like mechanical design. The influential by Forrester et al. in 2008 synthesized these developments, providing practical guidance on surrogate construction for engineering applications and highlighting and response surfaces as core methods. Post-2010, surrogate modeling has seen a surge in machine learning-based approaches, including neural networks and Gaussian processes, to handle high-dimensional data in simulations. Recent trends up to 2025 emphasize hybrid models that incorporate physical laws into frameworks, such as introduced by Raissi et al. in 2019, improving accuracy and generalization in scientific computing. AI-driven construction techniques, like adaptive sampling via , have further automated surrogate building for real-time applications in and modeling.

Mathematical Framework

Approximation Principles

Surrogate models approximate complex, computationally expensive f(\mathbf{x}) by constructing a simpler \hat{f}(\mathbf{x}) that mimics the input-output relationship of the original model based on a limited set of points. This enables efficient evaluation for tasks such as optimization and , where direct evaluations of f(\mathbf{x}) are prohibitive. A fundamental distinction in surrogate modeling lies between and approaches. Interpolating surrogates, such as certain models, exactly pass through all training data points, assuming the data is noise-free and seeking perfect fidelity at observed locations. In contrast, regressing surrogates, like response surfaces, provide a smoothed fit that minimizes overall error across the data, accommodating noise or outliers by not requiring exact passage through points. The approximation is typically achieved through linear combinations of basis functions, expressed in the general form \hat{f}(\mathbf{x}) = \sum_{i=1}^{m} \beta_i \phi_i(\mathbf{x}), where \phi_i(\mathbf{x}) are the basis functions (e.g., polynomials, radial basis functions like Gaussians \phi_i(\mathbf{x}) = \exp\left(-\frac{\|\mathbf{x} - \mathbf{c}_i\|^2}{2\sigma^2}\right), or kernels) and \beta_i are coefficients determined by fitting to the data. Polynomials offer global approximations suitable for smooth functions, while radial basis functions provide local flexibility for handling discontinuities. In high-dimensional spaces, effective approximation requires careful selection of training data via space-filling designs of experiments, such as , which ensures uniform coverage of the input domain to capture the function's behavior comprehensively without clustering. This approach mitigates the curse of dimensionality by distributing points to maximize information gain for the surrogate construction.

Error Metrics and Validation

Assessing the accuracy and reliability of surrogate models is essential to ensure they faithfully approximate the underlying high-fidelity model while generalizing to unseen inputs. Common error metrics quantify the discrepancy between the surrogate predictions \hat{f}(x_i) and the true function values f(x_i) over a set of n evaluation points. The (MSE) measures the average squared difference, defined as \text{MSE} = \frac{1}{n} \sum_{i=1}^n (f(x_i) - \hat{f}(x_i))^2, providing a scale-sensitive assessment that penalizes larger errors more heavily. The (RMSE), the of the MSE, offers an interpretable metric in the original units of the response, often used to compare surrogate across datasets. The maximum absolute error captures the worst-case deviation, \max_i |f(x_i) - \hat{f}(x_i)|, which is particularly valuable in applications where peak inaccuracies can lead to critical failures. Goodness-of-fit measures evaluate how well the surrogate explains the variability in the training data. The , R^2 = 1 - \frac{\sum_{i=1}^n (f(x_i) - \hat{f}(x_i))^2}{\sum_{i=1}^n (f(x_i) - \bar{f})^2}, where \bar{f} is the mean of the observed values, indicates the proportion of variance captured by the model, with values closer to 1 signifying better fit; however, it can overestimate performance on training data alone. An adjusted R^2 variant accounts for model complexity by penalizing additional parameters, making it suitable for comparing surrogates of varying forms. Validation techniques estimate the surrogate's predictive performance beyond the training set, mitigating risks of poor . Cross-validation partitions the data into k folds, training on k-1 folds and testing on the held-out fold, with the process repeated k times to compute an average error like the prediction (PRESS); k=10 is common for balance between bias and variance. Leave-one-out (LOO) cross-validation, a special case with k=n, is computationally intensive but effective for small datasets typical in expensive simulations, yielding a nearly unbiased RMSE estimate. Hold-out validation splits data into distinct training and test sets (e.g., 70/30 ratio), providing a generalization estimate but sensitive to partition randomness. Bootstrap resampling generates multiple training sets by sampling with replacement, evaluating out-of-bag samples to approximate error distributions and confidence intervals, useful when data is limited. Challenges in validation arise particularly in high-dimensional spaces, where the curse of dimensionality exacerbates data sparsity, leading to unreliable error estimates. is prevalent, as flexible surrogates like neural networks capture noise rather than underlying patterns, inflating training metrics like R^2 while degrading out-of-sample performance; regularization and cross-validation help detect this.

Types of Surrogate Models

Response Surface Models

Response surface models, also known as (RSM) in the context of surrogate modeling, employ low-order polynomial functions to approximate the behavior of complex, computationally expensive systems. These models represent the response variable as a of the input variables, typically limited to first- or second-order terms to balance accuracy and simplicity. Originating from (DOE), RSM was pioneered by Box and Wilson in 1951 to efficiently attain optimal conditions in experimental processes by fitting polynomials to observed data. The core formulation of a response surface model involves regressing the \hat{f}(\mathbf{x}) against training data points obtained from the high-fidelity model. A first-order variant, suitable for linear approximations, takes the form \hat{f}(\mathbf{x}) = \beta_0 + \sum_{i=1}^d \beta_i x_i, where \beta_0 is the intercept, \beta_i are the linear coefficients, and d is the number of input variables. Second-order models extend this to include and interaction terms for capturing nonlinearity: \hat{f}(\mathbf{x}) = \beta_0 + \sum_{i=1}^d \beta_i x_i + \sum_{i=1}^d \beta_{ii} x_i^2 + \sum_{i<j} \beta_{ij} x_i x_j, with coefficients \boldsymbol{\beta} estimated via least squares minimization on data from structured designs like central composite or Box-Behnken. These variants enable the model to represent main effects, curvatures, and cross-variable interactions, making them particularly useful in classical DOE for process optimization. Response surface models offer several advantages in surrogate applications, including their mathematical simplicity, which allows for straightforward interpretation of coefficients that quantify variable influences, and rapid evaluation times due to the low computational overhead of polynomial arithmetic. They are especially effective for unimodal, moderately nonlinear responses where the functional form is reasonably well-understood, facilitating efficient global optimization in engineering design. Despite these strengths, response surface models have notable limitations, particularly their ineffectiveness in approximating non-smooth functions or those exhibiting high nonlinearity, where higher-order terms may be needed but lead to instability. A key issue is , where high-degree polynomials oscillate wildly near the boundaries of the input domain, degrading extrapolation accuracy. Additionally, they suffer from the , requiring an impractically large number of training points—on the order of (d+1)(d+2)/2 for second-order models—to avoid underfitting in high dimensions. These constraints often make them less suitable for complex, multimodal surrogate tasks compared to more flexible methods.

Kriging and Gaussian Processes

Kriging, also known as Gaussian process regression in a statistical context, serves as a geostatistical surrogate modeling technique that provides the best linear unbiased predictor (BLUP) for interpolating values at unobserved points based on observed data, assuming a Gaussian process prior. This approach models the underlying function as a random process with a mean function \mu(\mathbf{x}) and a covariance function k(\mathbf{x}, \mathbf{x}'), enabling probabilistic predictions that account for spatial correlations in the data. In surrogate modeling, Kriging is particularly valued for approximating expensive computer simulations by treating them as realizations of a stochastic process, as introduced in the seminal work on design and analysis of computer experiments. The ordinary Kriging predictor at an unobserved point \mathbf{x} is given by \hat{f}(\mathbf{x}) = \sum_{i=1}^n \lambda_i f(\mathbf{x}_i), where f(\mathbf{x}_i) are the observed function values at training points \mathbf{x}_i, and the weights \lambda_i are determined by solving a system of linear equations derived from the to ensure unbiasedness and minimum variance. Specifically, the weights satisfy \sum_{i=1}^n \lambda_i = 1 for unbiasedness under the assumption of a constant unknown mean, and they minimize the prediction variance through the covariance structure, often represented via a or . Kernel functions, or covariance functions, define the correlation structure in Kriging and Gaussian processes, controlling the smoothness and flexibility of the surrogate. Common choices include the squared exponential kernel k(\mathbf{x}, \mathbf{x}') = \sigma^2 \exp\left(-\frac{\|\mathbf{x} - \mathbf{x}'\|^2}{2\ell^2}\right), which yields infinitely differentiable smooth functions; the exponential kernel k(\mathbf{x}, \mathbf{x}') = \sigma^2 \exp\left(-\frac{\|\mathbf{x} - \mathbf{x}'\|}{\ell}\right), corresponding to rougher, once-differentiable paths; and the Matérn family k(\mathbf{x}, \mathbf{x}') = \sigma^2 \frac{2^{1-\nu}}{\Gamma(\nu)} \left(\sqrt{2\nu} \frac{\|\mathbf{x} - \mathbf{x}'\|}{\ell}\right)^\nu K_\nu\left(\sqrt{2\nu} \frac{\|\mathbf{x} - \mathbf{x}'\|}{\ell}\right), where \nu > 0 tunes smoothness from non-differentiable (\nu = 1/2, exponential) to smooth (\nu \to \infty, squared exponential). These kernels allow adaptation to the expected regularity of the target function, with Matérn often preferred in surrogate modeling for its balance of flexibility and computational tractability. A key advantage of surrogate models is their ability to quantify prediction uncertainty through the variance \sigma^2(\mathbf{x}) = k(\mathbf{x}, \mathbf{x}) - \mathbf{k}^T \mathbf{K}^{-1} \mathbf{k}, where \mathbf{k} is the covariance vector between \mathbf{x} and training points, and \mathbf{K} is the training covariance matrix, enabling reliable error assessment in optimization and design tasks. Additionally, they effectively capture spatial or functional correlations, making them suitable for problems with structured data dependencies, unlike purely deterministic interpolants. Variants of Kriging address different assumptions about the mean structure. Simple Kriging assumes a known constant mean, simplifying the equations by fixing the for unbiasedness, which is ideal when prior knowledge of the mean is available. Universal Kriging extends this by incorporating a trend or drift model, such as a in covariates, to handle non-stationary means, solving an augmented that includes the trend basis to detrend the data before applying the covariance-based prediction. These adaptations enhance applicability in scenarios with underlying trends, maintaining the BLUP property under the respective assumptions.

Machine Learning-Based Surrogates

Machine learning-based surrogate models leverage data-driven techniques to approximate complex simulations, offering flexibility beyond traditional parametric methods. neural networks (NNs) serve as a core approach for tasks in surrogate modeling, where multiple layers of interconnected nodes learn nonlinear mappings from input features to output responses, enabling accurate predictions in high-dimensional spaces. (SVR), particularly with (RBF) kernels, provides another foundational method by constructing a that minimizes structural risk, effectively handling sparse data and providing robust approximations for engineering optimization problems. Deep learning extensions enhance these models for specialized data structures. Convolutional neural networks (CNNs) excel in surrogates for spatial data, such as fluid flow simulations, by capturing local patterns through convolutional filters and pooling layers, reducing computational demands while maintaining fidelity to underlying physics. Recurrent neural networks (RNNs), including variants, are suited for time-series surrogates, modeling sequential dependencies in dynamic systems like structural vibrations or process evolutions. Post-2020 advances have integrated to improve efficiency and generalizability. (PINNs) embed governing partial differential equations directly into the loss function during training, allowing surrogates to respect physical constraints even with limited data, as demonstrated in solving forward and inverse problems in and . Transfer learning for multi-fidelity surrogates pre-trains models on abundant low-fidelity simulations and fine-tunes with sparse high-fidelity data, accelerating convergence in applications like and aerodynamic design. These ML-based surrogates offer distinct advantages, including superior handling of complex nonlinearities and high-dimensional inputs compared to earlier probabilistic methods like , which assume Gaussian processes. They scale effectively with large datasets from modern simulations, achieving prediction accuracies often exceeding 95% in benchmark engineering tasks while reducing evaluation times by orders of magnitude. However, challenges persist, such as the black-box interpretability limiting trust in critical applications and the substantial training data and computational resources required for convergence, potentially offsetting gains in resource-constrained scenarios.

Construction Methods

Sampling Strategies

Sampling strategies are essential for generating the input data points used to train models, ensuring efficient coverage of the design space while minimizing the number of expensive evaluations of the underlying or model. These methods aim to produce a set of samples that allow the to approximate the true function accurately across its domain, particularly when computational resources limit the total number of simulations. Common approaches balance uniformity, randomness, and adaptability to the problem's structure. Latin Hypercube Sampling (LHS) is a widely adopted stratified random sampling technique that divides the range of each input variable into equally probable intervals and samples one point from each interval, then permutes the assignments across dimensions to ensure uniform coverage of the multidimensional design space. Introduced for analyzing computer code outputs, LHS provides better space-filling properties than simple random sampling by reducing variance in estimates and improving surrogate accuracy with fewer points. Other strategies include full factorial designs, which evaluate all combinations of discrete levels for each input and are suitable for low-dimensional problems (typically up to 3-5 variables) to exhaustively explore interactions without . For higher dimensions, quasi-Monte Carlo methods using Sobol sequences generate low-discrepancy points that fill the space more evenly than pseudorandom numbers, leading to faster in approximations and surrogate fitting. Adaptive sampling, in contrast, starts with an initial set of points and iteratively adds samples based on error estimates or uncertainty from the current surrogate, focusing on regions of high prediction variance or near optima. Space-filling criteria guide the selection of samples to maximize the minimum distance between points (maximin designs) or minimize correlations among inputs, promoting even distribution that enhances the robustness of surrogate models across the entire . These criteria are particularly useful in computer experiments where the goal is global approximation rather than local fitting. Key considerations in sampling include balancing (broad coverage to avoid gaps) with (focusing on promising areas for refinement), especially in adaptive schemes, to optimize the between model accuracy and . When constraints define feasible regions, strategies like constrained or boundary-respecting space-filling designs ensure samples remain within allowable bounds, preventing wasted evaluations on invalid points. Sample sizes depend on model , dimensionality, and desired accuracy, with smaller sets often sufficing for low-dimensional or smooth functions and larger ones required for more challenging cases.

Fitting and Optimization Techniques

Fitting surrogate models to sampled data is a critical step in their construction, involving the calibration of model parameters to approximate the underlying high-fidelity function as accurately as possible. This process typically employs optimization algorithms tailored to the surrogate's mathematical form, balancing computational efficiency with approximation quality. For parametric surrogates like polynomials, fitting often relies on closed-form solutions, while probabilistic models such as Gaussian processes require iterative maximization of likelihood functions. In more complex scenarios, especially with machine learning-based surrogates, advanced optimization techniques address non-convexity and high-dimensional parameter spaces. For polynomial-based surrogate models, such as response surface approximations, the method is the standard fitting technique. This approach minimizes the sum of squared errors between the observed responses y_i and the model's predictions \hat{f}(x_i) at the sampled points x_i, expressed as \min_{\beta} \sum_{i=1}^n \left( y_i - \hat{f}(x_i; \beta) \right)^2, where \beta are the coefficients. The optimization is solved analytically using linear algebra: the \mathbf{X} (containing basis function evaluations) yields the normal equations \mathbf{X}^T \mathbf{X} \beta = \mathbf{X}^T \mathbf{y}, providing an exact solution \beta = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}. This method is computationally efficient for low-degree polynomials and assumes a linear relationship in the parameter space, making it suitable for global approximations in engineering design. Gaussian process (GP) surrogates, including Kriging variants, are fitted by optimizing hyperparameters \theta (e.g., length scales and noise variance) via maximum likelihood estimation. The log-marginal likelihood is maximized as \log L(\theta) = -\frac{1}{2} \mathbf{y}^T K(\theta)^{-1} \mathbf{y} - \frac{1}{2} \log |K(\theta)| - \frac{n}{2} \log 2\pi, where K(\theta) is the n \times n covariance matrix based on the kernel function evaluated at the input points, and n is the number of samples. This objective is typically non-convex, so numerical optimization is required, often starting from initial guesses and iterating until convergence. The resulting GP provides not only point predictions but also uncertainty estimates, essential for sequential decision-making in surrogate applications. In scenarios where the fitting objective is non-convex—common in higher-order polynomials or kernel hyperparameter selection—gradient-based optimization methods like the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm are employed to efficiently navigate the parameter landscape. BFGS approximates the using information from the objective, enabling quasi-Newton updates for faster compared to steepest . For problems lacking analytic gradients or requiring global exploration, derivative-free alternatives such as genetic algorithms are used; these population-based methods evolve candidate parameter sets through selection, crossover, and to minimize the fitting loss. Such techniques are particularly valuable in surrogate-assisted optimization where repeated model refits occur. For machine learning-based surrogates, such as neural networks or support vector machines, hyperparameter tuning (e.g., architecture depth or regularization strength) often leverages . This sequential model-based approach treats the validation error as a black-box , using a lower-fidelity —typically a —to guide the search for optimal hyperparameters, balancing exploration of the parameter space with exploitation of promising regions. By iteratively updating the with new evaluations, minimizes the number of costly training runs needed, achieving efficient calibration even in high-dimensional hyperparameter spaces. Hybrid approaches enhance fitting robustness by combining multiple surrogate models into ensembles, where individual fits (e.g., a and a ) are weighted or averaged based on local performance. For instance, adaptive weighting schemes minimize a combined across models, mitigating weaknesses like polynomial extrapolation errors or GP computational scaling issues. These ensemble methods improve overall predictive accuracy and , especially for response surfaces, by leveraging the strengths of diverse fitting techniques.

Properties and Analysis

Invariance Properties

Surrogate models are designed to approximate the behavior of complex systems while ideally preserving key invariance properties of the underlying true model, such as monotonicity and , to ensure reliable predictions across transformed . Monotonicity invariance refers to the preservation of input-output order effects, where an increasing input should yield a non-decreasing output if the true model exhibits this property; this is particularly relevant in optimization and reliability analysis where violating monotonicity can lead to erroneous decision-making. For instance, in simulation metamodeling, bootstrapped models achieve this by resampling outputs to select monotonic predictors, fitting an interpolating Kriging surface only to those samples that maintain positive gradients, thus ensuring the surrogate respects the true model's increasing nature without extending confidence intervals. Symmetry invariance, such as rotational invariance, ensures the surrogate's predictions remain unchanged under domain symmetries like rotations, which is crucial for physical systems exhibiting isotropic behavior. In regression (GPR) surrogates for atomistic properties, this is enforced through kernel functions constructed from complete sets of invariant scalars derived from input tensors, such as traces and determinants of strain tensors, providing a rotationally equivariant representation that reduces the dimensionality of the feature space while capturing tensorial symmetries. Similarly, for , surrogates use invariant basis functions built from mean strain and rotation rate tensors to predict anisotropy, exploiting and rotational symmetries to improve generalization with fewer training samples compared to non-invariant approaches. Scale and location invariance address robustness to units or shifts in input variables, often achieved through normalization techniques that standardize inputs to zero and unit variance, rendering the surrogate independent of absolute scales or translations. In aerodynamic surrogate modeling, representations normalize shapes for translation (zero sample ) and (identity ) invariance, allowing the model to generalize across varied geometries without retraining. The mathematical basis for these invariances lies in selecting basis functions or kernels that are equivariant or invariant under group transformations; for example, in GPR, kernels like the squared exponential can be augmented with invariant features to respect symmetries, while bases in response surface models require careful degree selection to approximate without introducing asymmetries. Challenges arise when approximations fail to maintain these properties, such as polynomial surrogates exhibiting oscillations (e.g., ) that break monotonicity or in the true model, leading to non-physical predictions in high-dimensional spaces. invariance requires to avoid performance degradation from incorrect assumptions, and computational overhead increases with the need for symmetry-adapted features. Testing methods involve under transformations, where inputs are subjected to monotonic mappings, rotations, or scalings, and the surrogate's outputs are validated against the true model using metrics like integrated on transformed datasets to quantify preservation. For bootstrapped , macro-replication experiments assess coverage and monotonicity retention across bootstrap samples.

Uncertainty Quantification

Uncertainty in surrogate models arises from two primary sources: aleatoric uncertainty, which stems from inherent or stochasticity in the underlying data-generating process and is generally irreducible, and epistemic uncertainty, which results from a lack of knowledge due to limited training data, model form choices, or parameter estimation errors and can be reduced with more information. In the context of surrogate modeling, aleatoric uncertainty often manifests as observation in simulation data, while epistemic uncertainty is prominent in regions or high-dimensional spaces where the surrogate lacks sufficient samples. Gaussian processes provide a principled approach to quantifying predictive uncertainty through their inherent probabilistic formulation, where the predictive variance at a point x is given by \sigma^2(x) = k(x,x) - \mathbf{k}(x,X) (K + \sigma_n^2 I)^{-1} \mathbf{k}(X,x), with k(\cdot,\cdot) denoting the kernel function, X the training inputs, K the kernel matrix over X, and \sigma_n^2 the noise variance; this expression directly captures epistemic uncertainty via the distance from training data. For neural network-based surrogates, Monte Carlo dropout approximates Bayesian inference by performing multiple stochastic forward passes with dropout enabled at inference time, yielding an empirical distribution from which both aleatoric and epistemic uncertainties can be estimated as the variance of predictions. This method is particularly effective for deep surrogates in complex, non-linear systems, as it leverages the dropout mechanism to sample from an approximate posterior without requiring full Bayesian training. Polynomial chaos expansions (PCE) offer a global method for uncertainty propagation and in surrogate models by representing the output as a series of orthogonal polynomials in terms of random input variables, enabling efficient computation of variance-based sensitivity indices and propagation of input uncertainties to outputs. To ensure reliable probabilistic predictions, surrogate models are calibrated using proper scoring rules such as the , which measures the between predicted probabilities and observed outcomes, penalizing both miscalibration and poor sharpness. is crucial for surrogates in decision-making applications, as it aligns predicted confidence intervals with empirical coverage rates. Recent advances in the 2020s have focused on deep ensembles for in high-dimensional models, where multiple neural networks are trained independently and their predictive variances provide robust estimates of epistemic , outperforming single-model approaches in scenarios with sparse or noisy . As of 2025, further progress includes scalable Bayesian frameworks for modeling with thorough in high-dimensional applications, as reviewed in recent literature. These ensembles and extensions are particularly valuable for scalable in simulations, as they capture model disagreement to quantify extrapolation risks without assuming a specific .

Applications

Optimization and Design

Surrogate models play a pivotal role in optimization by serving as efficient proxies for expensive objective in gradient-free methods, such as pattern search algorithms, where they guide the exploration of the design space without requiring information. In these approaches, the surrogate approximates the black-box based on a limited set of evaluations, allowing the optimizer to propose promising search directions or points iteratively while minimizing the number of costly true calls. This integration enhances convergence speed and robustness, particularly for high-dimensional or noisy problems where direct evaluations are prohibitive. In (), surrogate models enable sequential strategies that adaptively refine the sampling process by selecting new points based on the current model's or expected , iteratively improving the surrogate's accuracy over the region of interest. Unlike static DOE methods like , sequential approaches using surrogates focus evaluations on areas likely to yield valuable information, such as near predicted optima or high-variance regions, thus optimizing for global and local . This adaptive refinement is especially beneficial in expensive simulations, where each additional sample significantly impacts overall computational cost. For , surrogate models facilitate approximation by embedding within evolutionary algorithms like NSGA-II, where they evaluate candidate solutions to reduce the fitness assessment burden across conflicting . In surrogate-assisted NSGA-II variants, individual or ensembles approximate each , enabling the algorithm to maintain and toward the non-dominated set with far fewer true evaluations. This approach is particularly effective for problems with computationally intensive multi-fidelity evaluations, yielding well-distributed Pareto solutions that balance trade-offs efficiently. A representative case study in aerodynamic shape optimization demonstrates the practical impact: traditional computational fluid dynamics (CFD)-based methods require hundreds of thousands of evaluations for even a two-dimensional , but multi-fidelity deep surrogates reduce high-fidelity evaluations to 20–120 points, supplemented by low-fidelity data, achieving comparable minimization and maximization. Such reductions transform optimization from exhaustive searches into targeted explorations, enabling design navigation that would otherwise be infeasible. Overall, surrogate models in optimization yield gains of up to 90% in computational time for black-box functions, as evidenced in groundwater pumping scenarios where multiple surrogates cut evaluation costs while preserving solution quality.

Engineering Simulations

Surrogate models play a critical role in engineering simulations by approximating complex, computationally intensive processes such as finite element (FEA) and (CFD), enabling faster predictions while maintaining acceptable accuracy. In , reduced-order modeling via surrogates reduces the dimensionality of high-fidelity simulations, allowing for efficient of responses and in systems. For instance, (POD) combined with surrogate techniques decomposes dynamic fields into lower-dimensional bases, facilitating rapid evaluation of nonlinear responses in rotor systems. Similarly, in fluid flow predictions, machine learning-based surrogates trained on CFD data approximate velocity and pressure fields, achieving predictions in milliseconds to seconds compared to traditional solvers that require hours. These models are particularly valuable in built environments, where convolutional neural networks (CNNs) have been shown to replicate Reynolds-averaged Navier-Stokes (RANS) simulations with errors below 3%. Integration of surrogates into iterative solvers enhances simulation workflows, often through hybrid approaches like -based reduced-order models embedded in finite element frameworks. identifies dominant modes from snapshot data of full-order simulations, which are then interpolated using radial basis functions (RBFs) to construct the , reducing the need for repeated high-fidelity solves during sweeps or problems. This has been applied in nonlinear FEA for metal forming and elastoplasticity, where preprocessing techniques such as scaling per physical component minimize errors and improve accuracy in high-dimensional output fields. In geotechnical and simulations, such hybrids enable process estimation by distinguishing truncation errors from basis reduction and uncertainties. Practical examples illustrate the impact in specific domains. In automotive simulations, surrogates combining physics-based spring-damper-mass models with neural networks predict deceleration pulses based on impact parameters, supporting integrated systems with capability and to finite element results. For management in , surrogate models such as artificial neural networks (ANNs) approximate transient in power modules, significantly reducing computation time for predictions while capturing multi-scale responses. These approaches replace detailed simulations, enabling design iterations that would otherwise take hours. In multi-scale modeling of composites, ANN-based surrogates bridge microscale behaviors to macroscale FEA, modeling progressive damage and nonlinear constitutive relations with reusable material databases that cut online costs significantly, achieving good agreement with conventional homogenizations. As of 2025, surrogate models are increasingly integrated into digital twins for engineering monitoring, leveraging reduced-order techniques to create lightweight approximations of physical assets. In and , these enable parametric simulations within seconds—versus hours for full models—supporting and in factory settings. For example, POD-enhanced surrogates in digital twins facilitate rapid response to varying loads or material properties, maintaining accuracy within trained parameter ranges and accelerating system-level analysis.

Uncertainty and Sensitivity Analysis

Surrogate models play a crucial role in uncertainty propagation by approximating expensive simulations, enabling efficient methods to estimate statistical moments such as variance in systems. Traditional simulation requires thousands of model evaluations to propagate input uncertainties through complex systems, often rendering it computationally prohibitive for high-fidelity models. By constructing surrogate approximations, such as Gaussian processes or expansions, these evaluations can be reduced significantly while maintaining accuracy in variance estimation, as demonstrated in environmental modeling where surrogates facilitate uncertainty assessment in large-scale simulations. In sensitivity analysis, surrogate models enable the computation of variance-based measures like Sobol indices, which quantify the contribution of individual inputs to output uncertainty. The first-order Sobol index for input X_i is defined as S_i = \frac{\text{Var}(E[Y|X_i])}{\text{Var}(Y)}, where Y is the model output, and this can be estimated efficiently by evaluating the surrogate multiple times instead of the original model. This approach is particularly effective for stochastic models with intrinsic noise, allowing for generalized Sobol indices that account for both parametric and aleatoric uncertainties. Surrogates distinguish global sensitivity, which captures average effects across the input space via variance like Sobol methods, from local , which examines gradients at specific points; the former is better suited for non-linear, high-dimensional problems where surrogates decompose total variance into main and interaction effects. This variance-based , enabled by surrogate evaluations, provides a comprehensive of input influences without assuming model . In applications, surrogate models support in climate models by propagating uncertainties in and storm parameters to predict probabilities, as seen in frameworks emulating hydrodynamic simulations for coastal sites under future scenarios. Similarly, in within quantitative systems pharmacology models, surrogates aid parameter ranking through , identifying key drivers of drug response variability and accelerating virtual generation. Advances in non-intrusive surrogates enhance by projecting simulation data onto orthogonal polynomial bases, offering efficiency in dimensions exceeding 10 without modifying the underlying model code. These methods, such as regression- or projection-based approaches, require only tens to hundreds of evaluations for accurate statistical moments in or material simulations, outperforming traditional in high-dimensional settings.

Advanced Techniques

Surrogate-Assisted Evolutionary Algorithms

Surrogate-assisted evolutionary algorithms (SAEAs) integrate models to approximate the evaluations in population-based optimization methods, such as genetic algorithms and evolution strategies, thereby minimizing the number of computationally expensive calls to the true objective function. This approach is particularly valuable for black-box optimization problems where each evaluation requires significant resources, like simulations or experiments. By leveraging surrogates—such as Gaussian processes, neural networks, or support vector machines—SAEAs maintain the global search capabilities of evolutionary algorithms while accelerating convergence. Frameworks for SAEAs typically involve infilling strategies to select promising candidates for and mechanisms for updating both the and the . Infilling often employs criteria like expected improvement, which balances and by prioritizing points likely to improve the current best solution based on surrogate predictions and estimates. evolution-surrogate updates alternate between evolutionary operators (e.g., , crossover) and surrogate refinements, ensuring the model remains accurate as the search progresses. These frameworks allow for flexible integration, where surrogates guide individual selection or entire generations. Prominent algorithms in this domain include surrogate-assisted covariance matrix adaptation evolution strategy (saCMA-ES) and coevolutionary models. The saCMA-ES extends the by incorporating self-adaptive for pre-selection or direct optimization of offspring, using techniques like aggregated surrogate models () or ranking-based support vector machines to approximate fitness rankings efficiently. Variants such as IPOP-saCMA-ES and BIPOP-saCMA-ES further enhance restart mechanisms for better handling of landscapes. Coevolutionary models, meanwhile, evolve the surrogate alongside the solution , often through reward-based parent selection, enabling mutual adaptation and improved robustness in dynamic or multi-objective settings. In terms of performance, SAEAs achieve substantial speedups of 10-100 times in high-dimensional problems (dimensions up to 100 or more) by reducing function evaluations from thousands to hundreds, as demonstrated on benchmarks like BBOB suites. They also effectively manage noisy objectives through robust surrogates like models, which incorporate uncertainty to filter out perturbations and maintain reliable rankings even under moderate noise levels. For instance, saCMA-ES variants have shown significant improvements over standard , such as better rankings on BBOB benchmarks for 20-dimensional multimodal functions, scaling well to higher dimensions without proportional increases in computational cost. Key challenges in SAEAs revolve around managing surrogate error accumulation through individual-based versus generation-based updates. Individual-based updates, which refine the surrogate after each evaluation, offer rapid adaptation but risk overfitting or diversity loss if the model quality degrades. Generation-based updates, applied periodically across an entire population, promote stability and parallelism but may lead to divergence if the surrogate becomes outdated during long generations. Balancing these approaches requires careful tuning of parameters like surrogate lifespan to mitigate premature convergence in complex landscapes. Recent advances as of 2025 include co-evolution of large language models with configuration spaces in SAEAs for optimizing computationally expensive black-box problems, and probability selection-based SAEAs that enhance optimization performance by adaptively selecting surrogates.

Multi-Fidelity and Adaptive Surrogates

Multi-fidelity surrogate models integrate data from simulations or models of varying accuracy levels to balance computational cost and predictive precision. Low-fidelity models, which are inexpensive but less accurate, provide broad coverage of the input space, while high-fidelity models offer detailed accuracy at higher expense. By fusing these, multi-fidelity approaches reduce the required number of high-fidelity evaluations, enabling efficient approximation of complex systems. A prominent method is variable-fidelity kriging, which models the high-fidelity function as a scaled and corrected version of the low-fidelity output. The surrogate is constructed as \hat{f}(x) = \rho f_{\text{low}}(x) + \delta(x), where \rho is a scaling factor (often a constant or ), f_{\text{low}}(x) is the low-fidelity prediction, and \delta(x) is a discrepancy term modeled via to capture differences between fidelities. This hierarchical structure leverages correlations between fidelity levels to enhance interpolation accuracy. Co-kriging extends this by jointly estimating parameters across multiple fidelity levels using a shared covariance structure, allowing flexible incorporation of non-hierarchical data sources. These techniques yield significant cost-accuracy trade-offs; for instance, multi-fidelity can achieve comparable accuracy to single-fidelity models with substantially fewer high-fidelity evaluations in aerodynamic design tasks, as demonstrated in optimization examples. In optimization contexts, such reductions translate to substantial computational savings depending on the fidelity ratio and problem dimensionality. Adaptive surrogates build on this by dynamically updating the model during construction or use, focusing on regions of high to refine accuracy efficiently. In frameworks, new data points are sequentially added where the surrogate's —often quantified via the predictive variance in Gaussian processes—is maximized, such as through error-based or variance-driven criteria. loops implement this by iteratively evaluating the expensive model at selected points, balancing exploration of the input space and exploitation of uncertain areas to minimize overall evaluations. In recent developments from the 2020s, (PINNs) have emerged for real-time adaptation in control systems, incorporating physical laws into the network to update surrogates dynamically as new data arrives. For example, Lyapunov-based PINNs enable of uncertain Euler-Lagrange systems by of , ensuring while reducing simulation times for real-time applications like . These methods extend multi-fidelity principles by adaptively weighting physics constraints and data fidelities, achieving faster in time-critical scenarios. As of , advancements include adaptive quality-based multi-fidelity frameworks that maximize low-fidelity data utilization for structural optimization, and physics-informed multi-fidelity surrogates for modeling fluid flow in porous media, enhancing predictions in complex simulations.

Software and Tools

Open-Source Packages

Several open-source packages facilitate the development and application of surrogate models, providing tools for constructing approximations such as response surfaces, , Gaussian processes, and support vector regressions. , developed by , is a comprehensive C++ framework with bindings for optimization, (UQ), and surrogate modeling, supporting methods like response surfaces, Gaussian processes, and multifidelity surrogates integrated with (DOE). Released under the GNU Lesser General Public License, it enables iterative surrogate-based optimization and is widely used in engineering simulations for reducing computational costs. The , a , specializes in response surface methodologies and kriging-based , including implementations for hierarchical and mixed inputs, allowing users to build and evaluate models from sampled data. It supports training on datasets for prediction and uncertainty estimation, making it suitable for tasks. scikit-learn, a foundational machine learning library in Python, provides surrogate modeling capabilities through classes like GaussianProcessRegressor for probabilistic Gaussian process regression and SVR for support vector regression, which approximate complex functions with tunable kernels and hyperparameters. These tools are often employed in surrogate-assisted workflows due to their ease of integration and scalability for high-dimensional problems. Specialized packages extend surrogate functionality for targeted applications. UQPy, an open-source Python toolbox from the Shields Uncertainty Research Group, focuses on uncertainty quantification and includes modules for surrogate construction, such as polynomial chaos expansions and Gaussian processes, alongside sampling methods for validation. SMAC3, developed by the AutoML group, implements sequential model-based optimization using random forest surrogates to guide hyperparameter tuning and black-box optimization, supporting parallel evaluations and intensity measures for robust performance. These packages commonly feature seamless integration with and ecosystems, enabling scripting for model training and evaluation, while many incorporate adaptive sampling techniques to refine by iteratively selecting informative data points based on acquisition functions like expected improvement. Community-driven tools on further enrich the landscape, such as the surrogate-models repository, which has seen 2025 updates incorporating deep learning-based like physics-enhanced neural networks for enhanced accuracy in scientific simulations. In practice, these packages support example workflows for and validation: users can generate Latin samples via or UQPy, train a surrogate in or , and validate predictions against high-fidelity simulations using cross-validation metrics like , ensuring reliable approximations for optimization loops.

Commercial Solutions

Commercial surrogate modeling solutions are proprietary software suites designed for enterprise-level integration in workflows, offering robust tools for constructing and deploying surrogate models to accelerate simulations in fields like optimization and design. These tools emphasize user-friendly interfaces, scalability for large-scale computations, and compliance with industry standards such as ISO certifications for and automotive sectors. modeFRONTIER, developed by ESTECO, is a leading platform for that incorporates response surface (RS) models and AI-driven to approximate complex nonlinear systems, enabling efficient design space exploration with reduced computational demands. It features GUI-driven fitting tools for building metamodels and supports integration with CAD/CAE software for automated workflows. The 2025R2 release introduces the algorithm, enhancing surrogate accuracy in multi-strategy optimization scenarios. Isight from provides integrated surrogate capabilities within process frameworks, using models to create reduced-order representations of simulation data for faster iteration in multidisciplinary design. Its components include built-in tools and metamodel builders that facilitate construction from finite element analyses, with enterprise scalability for environments. ANSYS incorporates surrogate builders through its optiSLang AI+ module, which leverages to generate metamodels for finite element analysis (FEA) and (CFD), allowing rapid design evaluations without full simulations. These tools support GUI-based model fitting and are certified for industries requiring high reliability, such as automotive crash testing. COMSOL Multiphysics offers built-in surrogate model functionality, introduced in version 6.2, for exporting compact approximations of multiphysics simulations to accelerate app deployment and digital twins. In 2025 updates, it emphasizes high-accuracy surrogates derived from FEA/CFD data, with features for neural network-based fitting and integration into collaborative workflows. Siemens Simcenter, particularly through HEEDS and Reduced Order Modeling extensions, applies surrogates in automotive applications like gear stress and CFD optimization, using response surface models to predict performance from simulation data. The 2504 release consolidates surrogate tools in a unified for faster metamodel creation, supporting certifications for . The Optimization Toolbox includes surrogate optimization via the surrogateopt function, which builds Gaussian process-based models to minimize expensive objective functions in tasks. It provides scalable options for enterprise users, integrating with for surrogate-assisted simulations. Post-2023 trends in commercial surrogate solutions highlight a shift toward cloud-based platforms for collaborative , enabling distributed training of surrogates and real-time sharing of metamodels across global teams, as seen in growing adoption for multiphysics applications.

References

  1. [1]
    A review of surrogate models and their application to groundwater ...
    Jul 27, 2015 · Surrogate modeling aims to provide a simpler, and hence faster, model which emulates the specified output of a more complex model in function of ...
  2. [2]
    Surrogate Modeling - an overview | ScienceDirect Topics
    Surrogate modeling is defined as a method used to approximate complex systems by creating simplified representations based on initial sampling sets of ...
  3. [3]
  4. [4]
  5. [5]
    Introduction to Surrogate Modeling and Surrogate-Based Optimization
    A surrogate model is defined as a data-driven model constructed with a small number of experimental samples, which approximates the original physical ...
  6. [6]
  7. [7]
    [PDF] Surrogate-based Analysis and Optimization
    ... .................. 43. 8.3. Construction of the Surrogate Model ............................................................................................ 44.
  8. [8]
    [PDF] Engineering Design via Surrogate Modelling
    Engineering Design via Surrogate Modelling: A Practical Guide A. I. J. Forrester, A. Sóbester and A. J. Keane. © 2008 John Wiley & Sons, Ltd. Page 24. 4.
  9. [9]
    Engineering Design via Surrogate Modelling - Wiley Online Library
    Jul 18, 2008 · Engineering Design via Surrogate Modelling: A Practical Guide. Author(s):. Dr Alexander I. J. Forrester, Dr András Sóbester, ...Missing: goals | Show results with:goals
  10. [10]
    On the use of surrogate models in engineering design optimization ...
    Surrogate models are invaluable tools that greatly assist the process of computationally expensive analyses and optimization. Engineering optimization reaps ...Missing: goals | Show results with:goals
  11. [11]
    Surrogate-based analysis and optimization - ScienceDirect.com
    The surrogates are constructed using data drawn from high-fidelity models, and provide fast approximations of the objectives and constraints at new design ...Missing: expansion | Show results with:expansion
  12. [12]
    A review of surrogate-assisted evolutionary algorithms for expensive ...
    May 1, 2023 · This paper systematically summarizes the existing research results of SAEAs from the aspects of algorithms and applications.Review · Introduction · Common Surrogate Model
  13. [13]
    [PDF] Introduction to Surrogate Modeling
    What is a surrogate model? Use surrogate! Page 4. Structural & Multidisciplinary Optimization Group. 4. • Surrogate models, response surface models, metamodels.
  14. [14]
  15. [15]
  16. [16]
    how cross-validation errors can help us to obtain the best predictor
    Jan 8, 2009 · Multiple surrogates: how cross-validation errors can help us to obtain the best predictor. RESEARCH PAPER; Published: 08 January 2009. Volume 39 ...
  17. [17]
    Review of surrogate modeling in water resources - AGU Journals
    Jul 31, 2012 · [1] Surrogate modeling, also called metamodeling, has evolved and been extensively used over the past decades. A wide variety of methods and ...Introduction · Response Surface Surrogates · Efficiency Gains of Surrogate...
  18. [18]
  19. [19]
    [PDF] Gaussian Processes for Machine Learning
    Gaussian processes for machine learning / Carl Edward Rasmussen, Christopher K. I. Williams. p. cm. —(Adaptive computation and machine learning). Includes ...
  20. [20]
    Derivation of the Kriging Equations
    ... ordinary kriging equations. Putting it all together, this system can be written in the matrix form. displaymath420. or more concisely as. equation326. where ...
  21. [21]
    How Kriging works—ArcGIS Pro
    The following sections discuss how the general kriging formula is used to create a map of the prediction surface and a map of the accuracy of the predictions.
  22. [22]
    Kernel Cookbook
    Standard Kernels. Squared Exponential Kernel. A.K.A. the Radial Basis Function kernel, the Gaussian kernel. It has the form: kSE(x,x′)=σ2exp(−(x−x′)22ℓ2)
  23. [23]
    Kriging Model - an overview | ScienceDirect Topics
    Sacks et al. (1989) have extended the kriging principles to computer experiments by considering the correlation between two responses of a computer code ...
  24. [24]
    The different types of Kriging methods. - ResearchGate
    While Simple Kriging assumes a known and constant mean, Ordinary Kriging assumes a global mean that is constant but unknown. Universal Kriging on the other hand ...<|control11|><|separator|>
  25. [25]
    A support vector regression-based multi-fidelity surrogate model
    Jun 22, 2019 · In this paper, a multi-fidelity surrogate model based on support vector regression named as Co_SVR is developed by combining HF and LF models.Missing: seminal | Show results with:seminal
  26. [26]
    A generalised deep learning-based surrogate model for ... - Nature
    Jun 5, 2023 · The use of surrogate models based on Convolutional Neural Networks (CNN) is increasing significantly in microstructure analysis and property ...Missing: seminal | Show results with:seminal
  27. [27]
    Surrogate modeling of passive microwave circuits using recurrent ...
    Apr 17, 2025 · This research suggests a new technique for dependable modeling of microwave circuits. Its main ingredient is a recurrent neural network (RNN).
  28. [28]
    Efficient deep-learning-based surrogate model for reservoir ...
    We introduce a training procedure that leverages transfer learning with multi-fidelity training data to construct surrogate models efficiently.Missing: post- | Show results with:post-
  29. [29]
    Review of machine learning-based surrogate models of ...
    Dec 1, 2023 · Here, we review 120 research articles on machine learning-based surrogate models for groundwater contaminant modeling that were published between 1994 and 2022.Missing: rise post-
  30. [30]
  31. [31]
    Comparison of Three Methods for Selecting Values of Input ...
    Latin hypercube sampling · Sampling techniques · Simulation techniques · Variance reduction.
  32. [32]
    Design and Analysis of Computer Experiments - Project Euclid
    Jerome Sacks. William J. Welch. Toby J. Mitchell. Henry P. Wynn. "Design and Analysis of Computer Experiments." Statist. Sci. 4 (4) 409 - 423, November, 1989.
  33. [33]
    An evaluation of adaptive surrogate modeling based optimization ...
    This study evaluates an adaptive surrogate modeling based optimization (ASMO) method on two benchmark problems: the Hartman function and calibration of the SAC ...Missing: seminal | Show results with:seminal
  34. [34]
    Constrained adaptive sampling for domain reduction in surrogate ...
    Jun 24, 2021 · This article addresses sampling-domain reduction for surrogate model generation through constrained adaptive sampling, particularly suited for ...2 Sampling Domain Reduction · 2.1 Surrogate Model... · 4 Results
  35. [35]
    Efficient Surrogate Model Development: Impact of Sample Size and ...
    Aug 7, 2025 · Note that the sample size of the simulation data varies case by case and is dependent on model complexity (Davis et al., 2018) ; sufficient ...
  36. [36]
    Polynomial Response Surface based on basis function selection by ...
    Nov 10, 2021 · Polynomial Regression Surface (PRS) is a commonly used surrogate model for its simplicity, good interpretability, and computational ...
  37. [37]
    Hyperparameter Optimization by Gradient Boosting surrogate models
    Jan 6, 2021 · In this paper, we propose a new surrogate model based on gradient boosting, where we use quantile regression to provide optimistic estimates of the performance.
  38. [38]
    Data-driven surrogate model for aerodynamic design using ...
    In LAS, the shape is normalized to possess a zero-sample mean (translation invariance) and an identity sample covariance (scale invariance) across the n ...
  39. [39]
    [PDF] Monotonicity-preserving Bootstrapped Kriging Metamodels for ...
    The method is illustrated through the M/M/1 simulation model with as outputs either the estimated mean or the estimated 90% quantile; both outputs are monotonic ...
  40. [40]
  41. [41]
    Surrogate Model - an overview | ScienceDirect Topics
    Surrogate models are simplified approximations of more complex, higher-order models. They are used to map input data to outputs when the actual relationship ...Missing: seminal | Show results with:seminal
  42. [42]
    [2306.07913] Epistemic and Aleatoric Uncertainty Quantification and ...
    Jun 13, 2023 · This work suggests several methods of uncertainty treatment in multiscale modelling and describes their application to a system of coupled turbulent transport ...
  43. [43]
    Uncertainty quantification in molecular simulations with dropout ...
    Aug 14, 2020 · In this paper, we propose a class of Dropout Uncertainty Neural Network (DUNN) potentials that provide rigorous uncertainty estimates.<|separator|>
  44. [44]
    Uncertainty Quantification using Deep Ensembles for Decision ...
    This paper uses deep ensembles to quantify aleatory and epistemic uncertainty, acting as an uncertainty-aware surrogate transition model for decision-making.
  45. [45]
    Surrogate-based analysis and optimization - ScienceDirect.com
    This section discusses the basic unconstrained SBAO algorithm, a newly proposed multiple surrogate-based optimization approach, the use of surrogate management ...
  46. [46]
    Model-and-search: a derivative-free local optimization algorithm
    May 11, 2025 · The surrogate model is then used to guide the search. We present extensive computational results on a collection of 501 publicly available test ...
  47. [47]
    Efficient sequential experimental design for surrogate modeling of ...
    Fischer, J. Bect and E. Vazquez, Sequential design of experiments to estimate a probability of exceeding a threshold in a multi-fidelity stochastic simulator, ...
  48. [48]
    Surrogate-based sequential Bayesian experimental design using ...
    In summary, the choice of the information acquisition function plays a crucial role in the efficacy of the sequential design of experiments strategy, since it ...
  49. [49]
  50. [50]
    Surrogate-Assisted NSGA-II Algorithm for Expensive Multiobjective ...
    Jul 24, 2023 · We propose in this work a surrogate assisted approach for multiobjective evolutionary algorithms by building a surrogate model on each objective.
  51. [51]
    Aerodynamic Prediction and Design Optimization Using Multi ... - MDPI
    Performing an ASO typically requires hundreds of thousands of aerodynamic evaluations even for the optimization of a two-dimensional airfoil.
  52. [52]
    Performance comparison of multiple and single surrogate models for ...
    Viana, F.A., Haftka, R.T., and Steffen, V., 2009. Multiple surrogates: how cross-validation errors can help us to obtain the best predictor. Structural and ...
  53. [53]
    Proper Orthogonal Decomposition-Based Surrogate Modeling ...
    This work presents a surrogate modeling technique to quickly and accurately reproduce the nonlinear unbalance responses of industrial scale rotor-dynamical ...Study Case · Surrogate Models Training · Error Estimation
  54. [54]
    Evaluation of POD based surrogate models of fields resulting from ...
    Nov 3, 2021 · In this work, POD-based surrogate models with Radial Basis Function interpolation are used to model high-dimensional FE data fields.
  55. [55]
    Rapid CFD Prediction Based on Machine Learning Surrogate Model ...
    In handling complex coupled systems, hybrid ML models offer a scalable solution. For example, the PINN-XGBoost hybrid model exploits the physics-aware ...
  56. [56]
    Proper Orthogonal Decomposition, surrogate modelling and ...
    A computational methodology is proposed for CFD-based aerodynamic design to exploit a reduced order model as surrogate evaluator.
  57. [57]
    A Hybrid Surrogate Modeling Approach for Vehicle Crash Simulations
    A hybrid surrogate model consisting of a machine learning-enhanced spring-damper-mass model, is developed in this work.
  58. [58]
    Machine Learning Based Surrogate Models for the Thermal ... - MDPI
    Surrogate models are simpler models that approximate the original models' input–output behavior, but require much less computational effort than the original ...
  59. [59]
    An efficient multiscale surrogate modelling framework for composite ...
    Aug 1, 2020 · The surrogate model, defined at the macroscale, represents the nonlinear effective constitutive relationship of a homogenised composite material ...
  60. [60]
    Simulation trends for 2025: Get ready for AI and surrogate models
    Dec 31, 2024 · Comsol's Bjorn Sjodin explains the value of reduced order modeling, how chatbots can help simulation beginners and what better AI could lead to.
  61. [61]
    Reduced Order and Surrogate Modeling for Digital Twins
    Surrogates and reduced order models (ROMs) can make these tasks tractable, provided they are sufficiently accurate and can be constructed with sufficiently few ...
  62. [62]
    Recent Advances in Surrogate Modeling Methods for Uncertainty ...
    Jun 13, 2022 · Although the above significant improvements have been achieved, the scheme that can strike a trade-off between accuracy and efficiency is still ...2.1. Probabilistic Models · 3. Numerical Methods In... · 5. Sampling Strategy Of...
  63. [63]
    [PDF] Surrogate modeling for uncertainty assessment
    Thus, the propagation and analysis of uncertainty in such models with a method such as. Monte Carlo simulation, which in some cases can take several thousand ...
  64. [64]
    Efficient Computation of Sobol' Indices for Stochastic Models
    Our method relies on an analysis of variance through a generalization of Sobol' indices and on the use of surrogate models. We show how to efficiently compute ...
  65. [65]
    A Comparison of Surrogate Modeling Techniques for Global ... - MDPI
    Dec 24, 2021 · This paper shows how different surrogate modeling methods can be used to perform global sensitivity analysis via Sobol' indices in hybrid ...
  66. [66]
    Projecting Climate Dependent Coastal Flood Risk With a Hybrid ...
    A surrogate modeling framework of waves, winds, and tides is developed in this study to efficiently predict spatially varying nearshore and estuarine water ...
  67. [67]
    Using machine learning surrogate modeling for faster QSP VP ... - NIH
    Jun 16, 2023 · In general, there is a trade‐off between accuracy of the surrogate models and the efficiency of creating them. The surrogate accuracy should ...
  68. [68]
    Intrusive and non-intrusive uncertainty quantification methodologies ...
    The applied NI methods include Monte Carlo based simulations, regression and projection based non-intrusive polynomial chaos (NIPC) methods. The uncertainty ...
  69. [69]
  70. [70]
    [PDF] Surrogate-Assisted Evolutionary Algorithms
    May 18, 2013 · According to an analysis by [Alander, 1994] of 2500 papers published on Genetic Algorithms, Evolution Strategies, Evolutionary Pro- gramming, ...
  71. [71]
  72. [72]
    [1204.2356] Self-Adaptive Surrogate-Assisted Covariance Matrix ...
    Apr 11, 2012 · This paper presents a novel mechanism to adapt surrogate-assisted population-based algorithms. This mechanism is applied to ACM-ES, a recently proposed ...Missing: key | Show results with:key
  73. [73]
  74. [74]
    Multi-fidelity optimization via surrogate modelling - Journals
    This paper demonstrates the application of correlated Gaussian process based approximations to optimization where multiple levels of analysis are available.Missing: seminal | Show results with:seminal
  75. [75]
    Active learning for adaptive surrogate model improvement in high ...
    Jul 10, 2024 · This paper investigates a novel approach to efficiently construct and improve surrogate models in problems with high-dimensional input and output.
  76. [76]
  77. [77]
    About Dakota - Dakota – Sandia National Laboratories
    Dakota is open source under GNU LGPL, with applications spanning defense programs for DOE and DOD, climate modeling, computational materials, nuclear power ...
  78. [78]
    SMT: Surrogate Modeling Toolbox — SMT 2.10.0 documentation
    The surrogate modeling toolbox (SMT) is an open-source Python package consisting of libraries of surrogate modeling methods.
  79. [79]
    1.7. Gaussian Processes - Scikit-learn
    Gaussian Processes (GP) are a nonparametric supervised learning method used to solve regression and probabilistic classification problems.Gaussian Processes · GaussianProcessRegressor · 1.8. Cross decomposition · RBFMissing: surrogate SVR
  80. [80]
    Packages - Dakota – Sandia National Laboratories
    Dakota utilizes the following Sandia-developed optimization, design of experiments, uncertainty quantification, and surrogate modeling libraries.
  81. [81]
    surrogate_based_global - Dakota Documentation
    Description. The surrogate_based_global method iteratively performs optimization on a global surrogate using the same bounds during each iteration.<|separator|>
  82. [82]
    Dakota: Optimization and UQ - GitHub
    The Dakota project delivers both state-of-the-art research and robust, usable software for optimization and UQ.
  83. [83]
    SMT 2.0: A Surrogate Modeling Toolbox with a focus on hierarchical ...
    SMT 2.0 is the first open-source surrogate library to propose surrogate models for hierarchical and mixed inputs.
  84. [84]
    GaussianProcessRegressor — scikit-learn 1.7.2 documentation
    GaussianProcessRegressor is for Gaussian process regression (GPR), allowing prediction without prior fitting and evaluating samples at given inputs.Missing: surrogate | Show results with:surrogate
  85. [85]
    16 Using sklearn Surrogates in spotpython
    Besides the internal kriging surrogate, which is used as a default by spotpython , any surrogate model from scikit-learn can be used as a surrogate in ...Missing: library | Show results with:library
  86. [86]
    UQpy (Uncertainty Quantification with python) is a general purpose ...
    UQpy (Uncertainty Quantification with python) is a general purpose Python toolbox for modeling uncertainty in physical and mathematical systems.
  87. [87]
    Welcome to UQpy's documentation! — UQpy v4.2.0 documentation
    UQpy (Uncertainty Quantification with python) is a general purpose python toolbox for modeling uncertainty in physical and mathematical systems.Missing: open source
  88. [88]
    UQpy v4.1: Uncertainty quantification with Python - ScienceDirect.com
    This paper presents the latest improvements introduced in Version 4 of the UQpy, Uncertainty Quantification with Python, library.Original Software... · 2. Software Description · 2.2. Software Modules
  89. [89]
    SMAC3: A Versatile Bayesian Optimization Package for ... - GitHub
    SMAC offers a robust and flexible framework for Bayesian Optimization to support users in determining well-performing hyperparameter configurations.
  90. [90]
    SMAC3 Documentation - GitHub Pages
    Introduction#. SMAC is a tool for algorithm configuration to optimize the parameters of arbitrary algorithms, including hyperparameter optimization of Machine ...Missing: open source
  91. [91]
  92. [92]
    payel79/PEDS - GitHub
    PEDS is a Julia package containing methodologies for Scientific Machine Learning surrogate models called Physics-Enhanced Deep Surrogates (PEDS)Content · System Requirements · Julia Dependencies
  93. [93]
    Modefrontier Capabilities - ESTECO Engineering
    AI-data driven modeling. Enabling the development of computationally efficient surrogate models that expedite the exploration of complex designs spaces. Make ...
  94. [94]
    Isight & the SIMULIA Execution Engine - Dassault Systèmes
    Isight and the SIMULIA Execution Engine (formerly Fiper) integrate multiple cross-disciplinary models and applications within a simulation process flow.Isight Partner Components · Isight Basic Components · Isight Add-on Components
  95. [95]
    How Ansys AI+ Modules Advance Simulation and Analysis
    Sep 25, 2024 · The optiSLang AI+ module allows Ansys users to create surrogate models, which provide a way to explore possible designs even faster than with ...Cfd Ai+ · Granta Mi Ai+ · Optislang Ai+
  96. [96]
    Response surface models | Engineering technology | ESTECO
    Use Response Surface Models (RSM) techniques to instantly predict the behavior of complex non-linear systems, while saving time and computational resources.Missing: limitations | Show results with:limitations
  97. [97]
    modeFRONTIER 2025R2 and VOLTA Out Now
    Apr 22, 2025 · modeFRONTIER comes with MUSA, a new multi-strategy island-based algorithm designed to enhance the exploration of metamodeling and optimization ...
  98. [98]
    Approximation Models in Isight for Reduced Order Modeling - Inceptra
    Jan 17, 2023 · Model order reduction methods refer to a technique of applying surrogate models, also known as transfer functions or approximation models, ...
  99. [99]
    Dassault Systemes Delivers Isight 4.0 for Simulation Automation and ...
    Isight provides engineers with an open system for integrating design and simulation models, created with various CAD, CAE, and other software applications, ...<|separator|>
  100. [100]
    Optimize Design and Simulation with AI/ML and Metamodeling - Ansys
    Nov 28, 2023 · Ansys' optimization solutions offer built-in capabilities that leverage artificial intelligence and machine learning.How Optislang Leverages... · Generate Design Of... · Optimization In A Click
  101. [101]
    How One Powerful Computational Combo Can Make All the Difference
    Apr 7, 2025 · Ansys LS-DYNA and Twin Builder use reduced-order modeling (ROM) to enable faster, more accurate simulation for safety, especially in automotive ...
  102. [102]
    Introduction to Using Surrogate Models - COMSOL
    A surrogate model is usually more simple and computationally efficient than a finite element model and is used to approximate the behavior of models that are ...
  103. [103]
    Modeling and Simulation Predictions for 2025 | COMSOL Blog
    Jan 7, 2025 · “What I see is the biggest trend right now is the ability to create high-accuracy surrogate models. Like what we've introduced in COMSOL ...
  104. [104]
    Simcenter HEEDS | Siemens Software
    Simcenter HEEDS is a powerful design space exploration and optimization software package that interfaces with all commercial CAD and CAE tools.Why Simcenter Heeds? · Leading Mdao Software · Simcenter Heeds Capabilities
  105. [105]
    AI-accelerated gear stress analysis - Simcenter
    Aug 27, 2025 · Among the advancements, surrogate modeling for physics simulations experiences rapid growth thanks to advanced AI techniques and architectures.Challenge: Achieve Fast And... · Solution: Combine Powerful... · Results: Realize Fast And...
  106. [106]
    What's new in Simcenter HEEDS 2504?
    May 8, 2025 · Simcenter HEEDS 2504 introduces a new approach to surrogate modeling with a consolidated Surrogates tab. This centralized environment simplifies ...
  107. [107]
    Surrogate Optimization - MATLAB & Simulink - MathWorks
    What Is Surrogate Optimization? Surrogate optimization attempts to find a global minimum of an objective function using few objective function evaluations.
  108. [108]
    surrogateopt - Surrogate optimization for global minimization of time ...
    surrogateopt is a global solver for time-consuming objective functions, searching for the global minimum of a real-valued objective function in multiple ...
  109. [109]
    Surrogate Optimization in Simulink Design Optimization - MathWorks
    This example shows how to use surrogate optimization in Simulink Design Optimization to optimize the design of a hydraulic cylinder.<|separator|>
  110. [110]
    Advancing CFD with AI: Surrogate Modeling Approaches in the ...
    Aug 19, 2025 · AI-powered surrogate models, built from high-fidelity simulation data, offer a breakthrough: rapid simulation predictions at lightning speed–up ...
  111. [111]
    Cloud-Based Simulation Software Charting Growth Trajectories
    Rating 4.8 (1,980) Mar 14, 2025 · The cloud-based simulation software market is experiencing robust growth, driven by the increasing need for faster, more cost-effective, ...