Fact-checked by Grok 2 weeks ago

Surrogate model

A surrogate model, also known as a metamodel or emulator, is a computationally inexpensive mathematical approximation designed to replicate the input-output behavior of a more complex and resource-intensive model, often constructed from a limited set of data samples or simulations.^[1] These models are particularly valuable in fields where direct evaluations of the original model—such as high-fidelity simulations in engineering or scientific computing—are prohibitively slow or costly, enabling faster analysis without significant loss of accuracy.^[2] Surrogate models find widespread application in optimization, uncertainty quantification, sensitivity analysis, and real-time decision-making across disciplines including groundwater hydrology, chemical engineering, aerospace design, and structural mechanics.^[1]^[3] For instance, in groundwater modeling, they approximate distributed flow simulations like those from MODFLOW to facilitate model calibration and scenario testing that would otherwise require extensive computational resources.^[1] In optimization contexts, surrogate-based approaches integrate these approximations into iterative algorithms to efficiently search for optimal designs or parameters, often achieving speedups of orders of magnitude while maintaining near-optimal solutions.^[4] Common types of surrogate models include data-driven methods, which build empirical relationships from input-output pairs using techniques such as polynomial regression, Gaussian process regression (Kriging), neural networks, and support vector machines; projection-based methods, which reduce model dimensionality by projecting equations onto lower-dimensional subspaces via approaches like Proper Orthogonal Decomposition; and multifidelity methods, which leverage models of varying resolution or simplified physics to balance accuracy and efficiency.^[1]^[2] The choice of method depends on the problem's characteristics, with data-driven surrogates being prevalent due to their flexibility but potentially limited by sampling requirements, while physics-informed variants incorporate domain knowledge for improved generalization.^[4] Key advantages of surrogate models encompass drastic reductions in computational time—sometimes by factors of 10,000 or more—enhanced numerical stability during calibration, and the ability to handle high-dimensional parameter spaces that challenge full models.^[3]^[1] However, limitations include potential inaccuracies in unsampled regions of the input space and the need for careful validation to ensure fidelity to the original model.^[2] Ongoing advancements, such as hybrid techniques combining machine learning with physical principles, continue to expand their utility in complex, data-scarce environments.^[5]

Fundamentals

Definition

A surrogate model is a computationally efficient approximation of a high-fidelity, expensive-to-evaluate system or simulation model, designed to mimic its input-output behavior using limited data from the original model.^[6] These models replace complex black-box functions, such as those arising in engineering simulations, with simpler representations that enable rapid predictions while preserving essential characteristics of the true system.^[7] Key characteristics of surrogate models include their significantly lower computational cost compared to the original model, often requiring only tens of evaluations rather than thousands to construct and use. They can be either statistical, incorporating uncertainty estimates, or deterministic approximations, and are typically built from sparse sampled data points obtained through design of experiments techniques.^[6] This construction assumes underlying continuity and smoothness in the true model's response, allowing the surrogate to interpolate or regress effectively across the input space.^[7] For instance, in aerospace engineering, the true model might be a full computational fluid dynamics (CFD) simulation of airflow over an airfoil, which demands extensive resources, whereas the surrogate could be a simplified algebraic equation fitted to a handful of CFD outputs to approximate drag coefficients. Such approximations facilitate iterative processes like optimization by filtering noise and providing quick evaluations, though their accuracy depends on the quality and quantity of the training data from the high-fidelity model.^[6] This approach presupposes familiarity with foundational modeling concepts in engineering and scientific disciplines, where systems are often represented as functions mapping inputs to outputs.

Goals

Surrogate models are primarily employed to reduce the computational expense associated with repeated evaluations of complex, high-fidelity simulations in engineering and scientific workflows. By providing approximate representations of these expensive models, surrogates enable faster iterations during design exploration and optimization processes, where direct evaluations of the true model—such as those involving computational fluid dynamics or finite element analysis—would otherwise be prohibitive due to time and resource constraints. This approximation allows practitioners to perform numerous assessments efficiently, facilitating the identification of promising designs without exhaustive simulations.^[8] Key benefits include accelerating simulations for real-time applications, approximating black-box functions whose internal mechanisms are inaccessible, and managing high-dimensional input spaces that exacerbate computational demands. For instance, in optimization tasks, surrogate models can predict performance metrics across vast design spaces, enabling global search strategies that converge on optima using only tens of evaluations rather than thousands. They also support stochastic simulations by offering rapid approximations that incorporate uncertainty, thus aiding in reliability assessments and decision-making under variability. These advantages stem from the surrogate's role as a data-driven "curve fit," which filters noise and emulates underlying relationships with sufficient fidelity for practical use.^[8]^[9] However, deploying surrogate models involves inherent trade-offs, particularly in balancing accuracy against efficiency. While they achieve substantial speedups—often orders of magnitude faster than the original model—they may introduce approximation errors that require careful validation to ensure reliability in critical applications. Additionally, some surrogate approaches, such as neural network-based models, can sacrifice physical interpretability, obscuring insights into causal relationships that more transparent methods like polynomial response surfaces preserve. Surrogates are thus most suitable for scenarios like expensive finite element analyses in structural engineering or stochastic simulations in risk assessment, where the gains in computational efficiency outweigh potential losses in precision or explainability, provided the model is tuned appropriately.^[8]^[9]

Historical Development

The origins of surrogate modeling trace back to the development of response surface methodology (RSM) in the field of statistics and design of experiments, introduced by George E. P. Box and K. B. Wilson in their seminal 1951 paper, which proposed polynomial approximations to model and optimize responses from physical experiments. This approach laid the groundwork for using simplified mathematical representations to approximate complex systems, initially focused on industrial processes and chemical engineering applications. Box's subsequent contributions, including collaborations on sequential experimentation, further solidified RSM as a foundational technique for empirical modeling. A significant advancement occurred in the 1960s with the introduction of kriging in geostatistics by Georges Matheron, who formalized it as an optimal interpolation method based on spatial correlations, honoring the earlier work of D. G. Krige. Kriging extended surrogate concepts to handle uncertainty in spatial data, particularly in mining and earth sciences. By the late 1980s, these ideas were adapted to deterministic computer experiments through the work of Sacks et al. in 1989, who applied kriging predictors to model outputs from simulation codes, emphasizing space-filling designs for efficient approximation. The 1990s marked the expansion of surrogate models into engineering optimization, driven by increasing computational capabilities that enabled their integration with multidisciplinary design problems, such as aerospace and structural analysis.^[10] In the 2000s, surrogates were increasingly coupled with evolutionary algorithms to address expensive black-box optimizations, as surveyed by Jin in 2005, enhancing global search efficiency in fields like mechanical design. The influential textbook by Forrester et al. in 2008 synthesized these developments, providing practical guidance on surrogate construction for engineering applications and highlighting kriging and response surfaces as core methods.^[8] Post-2010, surrogate modeling has seen a surge in machine learning-based approaches, including neural networks and Gaussian processes, to handle high-dimensional data in simulations.^[11] Recent trends up to 2025 emphasize hybrid models that incorporate physical laws into ML frameworks, such as physics-informed neural networks introduced by Raissi et al. in 2019, improving accuracy and generalization in scientific computing. AI-driven construction techniques, like adaptive sampling via Bayesian optimization, have further automated surrogate building for real-time applications in engineering and climate modeling.

Mathematical Framework

Approximation Principles

Surrogate models approximate complex, computationally expensive functions f(\mathbf{x}) by constructing a simpler surrogate function \hat{f}(\mathbf{x}) that mimics the input-output relationship of the original model based on a limited set of data points. This approximation enables efficient evaluation for tasks such as optimization and sensitivity analysis, where direct evaluations of f(\mathbf{x}) are prohibitive.^[6] A fundamental distinction in surrogate modeling lies between interpolation and regression approaches. Interpolating surrogates, such as certain radial basis function models, exactly pass through all training data points, assuming the data is noise-free and seeking perfect fidelity at observed locations. In contrast, regressing surrogates, like polynomial response surfaces, provide a smoothed fit that minimizes overall error across the data, accommodating noise or outliers by not requiring exact passage through points.^[12]^[6] The approximation is typically achieved through linear combinations of basis functions, expressed in the general form

\hat{f}(\mathbf{x}) = \sum_{i=1}^{m} \beta_i \phi_i(\mathbf{x}),

where \phi_i(\mathbf{x}) are the basis functions (e.g., polynomials, radial basis functions like Gaussians \phi_i(\mathbf{x}) = \exp\left(-\frac{\|\mathbf{x} - \mathbf{c}_i\|^2}{2\sigma^2}\right), or kernels) and \beta_i are coefficients determined by fitting to the data. Polynomials offer global approximations suitable for smooth functions, while radial basis functions provide local flexibility for handling discontinuities.^[6]^[13] In high-dimensional spaces, effective approximation requires careful selection of training data via space-filling designs of experiments, such as Latin Hypercube Sampling, which ensures uniform coverage of the input domain to capture the function's behavior comprehensively without clustering. This approach mitigates the curse of dimensionality by distributing points to maximize information gain for the surrogate construction.^[14]

Error Metrics and Validation

Assessing the accuracy and reliability of surrogate models is essential to ensure they faithfully approximate the underlying high-fidelity model while generalizing to unseen inputs. Common error metrics quantify the discrepancy between the surrogate predictions \hat{f}(x_i) and the true function values f(x_i) over a set of n evaluation points. The mean squared error (MSE) measures the average squared difference, defined as \text{MSE} = \frac{1}{n} \sum_{i=1}^n (f(x_i) - \hat{f}(x_i))^2, providing a scale-sensitive assessment that penalizes larger errors more heavily.^[6] The root mean squared error (RMSE), the square root of the MSE, offers an interpretable metric in the original units of the response, often used to compare surrogate performance across datasets.^[15] The maximum absolute error captures the worst-case deviation, \max_i |f(x_i) - \hat{f}(x_i)|, which is particularly valuable in engineering applications where peak inaccuracies can lead to critical failures. Goodness-of-fit measures evaluate how well the surrogate explains the variability in the training data. The coefficient of determination, R^2 = 1 - \frac{\sum_{i=1}^n (f(x_i) - \hat{f}(x_i))^2}{\sum_{i=1}^n (f(x_i) - \bar{f})^2}, where \bar{f} is the mean of the observed values, indicates the proportion of variance captured by the model, with values closer to 1 signifying better fit; however, it can overestimate performance on training data alone.^[6] An adjusted R^2 variant accounts for model complexity by penalizing additional parameters, making it suitable for comparing surrogates of varying forms.^[6] Validation techniques estimate the surrogate's predictive performance beyond the training set, mitigating risks of poor generalization. Cross-validation partitions the data into k folds, training on k-1 folds and testing on the held-out fold, with the process repeated k times to compute an average error like the prediction sum of squares (PRESS); k=10 is common for balance between bias and variance.^[15] Leave-one-out (LOO) cross-validation, a special case with k=n, is computationally intensive but effective for small datasets typical in expensive simulations, yielding a nearly unbiased RMSE estimate.^[6] Hold-out validation splits data into distinct training and test sets (e.g., 70/30 ratio), providing a simple generalization estimate but sensitive to partition randomness.^[6] Bootstrap resampling generates multiple training sets by sampling with replacement, evaluating out-of-bag samples to approximate error distributions and confidence intervals, useful when data is limited.^[6] Challenges in validation arise particularly in high-dimensional spaces, where the curse of dimensionality exacerbates data sparsity, leading to unreliable error estimates.^[16] Overfitting is prevalent, as flexible surrogates like neural networks capture noise rather than underlying patterns, inflating training metrics like R^2 while degrading out-of-sample performance; regularization and cross-validation help detect this.^[16]

Types of Surrogate Models

Response Surface Models

Response surface models, also known as response surface methodology (RSM) in the context of surrogate modeling, employ low-order polynomial functions to approximate the behavior of complex, computationally expensive systems. These models represent the response variable as a polynomial expansion of the input variables, typically limited to first- or second-order terms to balance accuracy and simplicity. Originating from design of experiments (DOE), RSM was pioneered by Box and Wilson in 1951 to efficiently attain optimal conditions in experimental processes by fitting polynomials to observed data.^[17] The core formulation of a response surface model involves regressing the surrogate \hat{f}(\mathbf{x}) against training data points obtained from the high-fidelity model. A first-order variant, suitable for linear approximations, takes the form \hat{f}(\mathbf{x}) = \beta_0 + \sum_{i=1}^d \beta_i x_i, where \beta_0 is the intercept, \beta_i are the linear coefficients, and d is the number of input variables. Second-order models extend this to include quadratic and interaction terms for capturing nonlinearity:

\hat{f}(\mathbf{x}) = \beta_0 + \sum_{i=1}^d \beta_i x_i + \sum_{i=1}^d \beta_{ii} x_i^2 + \sum_{i<j} \beta_{ij} x_i x_j,

with coefficients \boldsymbol{\beta} estimated via least squares minimization on data from structured designs like central composite or Box-Behnken. These variants enable the model to represent main effects, curvatures, and cross-variable interactions, making them particularly useful in classical DOE for process optimization.^[17] Response surface models offer several advantages in surrogate applications, including their mathematical simplicity, which allows for straightforward interpretation of coefficients that quantify variable influences, and rapid evaluation times due to the low computational overhead of polynomial arithmetic. They are especially effective for unimodal, moderately nonlinear responses where the functional form is reasonably well-understood, facilitating efficient global optimization in engineering design.^[17] Despite these strengths, response surface models have notable limitations, particularly their ineffectiveness in approximating non-smooth functions or those exhibiting high nonlinearity, where higher-order terms may be needed but lead to instability. A key issue is Runge's phenomenon, where high-degree polynomials oscillate wildly near the boundaries of the input domain, degrading extrapolation accuracy. Additionally, they suffer from the curse of dimensionality, requiring an impractically large number of training points—on the order of (d+1)(d+2)/2 for second-order models—to avoid underfitting in high dimensions. These constraints often make them less suitable for complex, multimodal surrogate tasks compared to more flexible methods.^[17]

Kriging and Gaussian Processes

Kriging, also known as Gaussian process regression in a statistical context, serves as a geostatistical surrogate modeling technique that provides the best linear unbiased predictor (BLUP) for interpolating values at unobserved points based on observed data, assuming a Gaussian process prior. This approach models the underlying function as a random process with a mean function \mu(\mathbf{x}) and a covariance function k(\mathbf{x}, \mathbf{x}'), enabling probabilistic predictions that account for spatial correlations in the data.^[18] In surrogate modeling, Kriging is particularly valued for approximating expensive computer simulations by treating them as realizations of a stochastic process, as introduced in the seminal work on design and analysis of computer experiments. The ordinary Kriging predictor at an unobserved point \mathbf{x} is given by

\hat{f}(\mathbf{x}) = \sum_{i=1}^n \lambda_i f(\mathbf{x}_i),

where f(\mathbf{x}_i) are the observed function values at training points \mathbf{x}_i, and the weights \lambda_i are determined by solving a system of linear equations derived from the covariance matrix to ensure unbiasedness and minimum variance.^[19] Specifically, the weights satisfy \sum_{i=1}^n \lambda_i = 1 for unbiasedness under the assumption of a constant unknown mean, and they minimize the prediction variance through the covariance structure, often represented via a semivariogram or covariance function.^[20] Kernel functions, or covariance functions, define the correlation structure in Kriging and Gaussian processes, controlling the smoothness and flexibility of the surrogate. Common choices include the squared exponential kernel k(\mathbf{x}, \mathbf{x}') = \sigma^2 \exp\left(-\frac{\|\mathbf{x} - \mathbf{x}'\|^2}{2\ell^2}\right), which yields infinitely differentiable smooth functions; the exponential kernel k(\mathbf{x}, \mathbf{x}') = \sigma^2 \exp\left(-\frac{\|\mathbf{x} - \mathbf{x}'\|}{\ell}\right), corresponding to rougher, once-differentiable paths; and the Matérn family k(\mathbf{x}, \mathbf{x}') = \sigma^2 \frac{2^{1-\nu}}{\Gamma(\nu)} \left(\sqrt{2\nu} \frac{\|\mathbf{x} - \mathbf{x}'\|}{\ell}\right)^\nu K_\nu\left(\sqrt{2\nu} \frac{\|\mathbf{x} - \mathbf{x}'\|}{\ell}\right), where \nu > 0 tunes smoothness from non-differentiable (\nu = 1/2, exponential) to smooth (\nu \to \infty, squared exponential).^[18] These kernels allow adaptation to the expected regularity of the target function, with Matérn often preferred in surrogate modeling for its balance of flexibility and computational tractability.^[21] A key advantage of Kriging surrogate models is their ability to quantify prediction uncertainty through the variance \sigma^2(\mathbf{x}) = k(\mathbf{x}, \mathbf{x}) - \mathbf{k}^T \mathbf{K}^{-1} \mathbf{k}, where \mathbf{k} is the covariance vector between \mathbf{x} and training points, and \mathbf{K} is the training covariance matrix, enabling reliable error assessment in optimization and design tasks.^[18] Additionally, they effectively capture spatial or functional correlations, making them suitable for problems with structured data dependencies, unlike purely deterministic interpolants.^[22] Variants of Kriging address different assumptions about the mean structure. Simple Kriging assumes a known constant mean, simplifying the equations by fixing the Lagrange multiplier for unbiasedness, which is ideal when prior knowledge of the mean is available. Universal Kriging extends this by incorporating a trend or drift model, such as a polynomial in covariates, to handle non-stationary means, solving an augmented system that includes the trend basis to detrend the data before applying the covariance-based prediction. These adaptations enhance applicability in scenarios with underlying trends, maintaining the BLUP property under the respective assumptions.^[23]

Machine Learning-Based Surrogates

Machine learning-based surrogate models leverage data-driven techniques to approximate complex simulations, offering flexibility beyond traditional parametric methods. Feedforward neural networks (NNs) serve as a core approach for regression tasks in surrogate modeling, where multiple layers of interconnected nodes learn nonlinear mappings from input features to output responses, enabling accurate predictions in high-dimensional spaces. Support vector regression (SVR), particularly with radial basis function (RBF) kernels, provides another foundational method by constructing a hyperplane that minimizes structural risk, effectively handling sparse data and providing robust approximations for engineering optimization problems.^[24] Deep learning extensions enhance these models for specialized data structures. Convolutional neural networks (CNNs) excel in surrogates for spatial data, such as fluid flow simulations, by capturing local patterns through convolutional filters and pooling layers, reducing computational demands while maintaining fidelity to underlying physics.^[25] Recurrent neural networks (RNNs), including long short-term memory variants, are suited for time-series surrogates, modeling sequential dependencies in dynamic systems like structural vibrations or process evolutions.^[26] Post-2020 advances have integrated domain knowledge to improve efficiency and generalizability. Physics-informed neural networks (PINNs) embed governing partial differential equations directly into the loss function during training, allowing surrogates to respect physical constraints even with limited data, as demonstrated in solving forward and inverse problems in fluid dynamics and heat transfer. Transfer learning for multi-fidelity surrogates pre-trains models on abundant low-fidelity simulations and fine-tunes with sparse high-fidelity data, accelerating convergence in applications like reservoir engineering and aerodynamic design.^[27] These ML-based surrogates offer distinct advantages, including superior handling of complex nonlinearities and high-dimensional inputs compared to earlier probabilistic methods like Kriging, which assume Gaussian processes. They scale effectively with large datasets from modern simulations, achieving prediction accuracies often exceeding 95% in benchmark engineering tasks while reducing evaluation times by orders of magnitude.^[28] However, challenges persist, such as the black-box interpretability limiting trust in critical applications and the substantial training data and computational resources required for convergence, potentially offsetting gains in resource-constrained scenarios.^[28]

Construction Methods

Sampling Strategies

Sampling strategies are essential for generating the input data points used to train surrogate models, ensuring efficient coverage of the design space while minimizing the number of expensive evaluations of the underlying simulation or model. These methods aim to produce a set of samples that allow the surrogate to approximate the true function accurately across its domain, particularly when computational resources limit the total number of simulations. Common approaches balance uniformity, randomness, and adaptability to the problem's structure.^[29] Latin Hypercube Sampling (LHS) is a widely adopted stratified random sampling technique that divides the range of each input variable into equally probable intervals and samples one point from each interval, then permutes the assignments across dimensions to ensure uniform coverage of the multidimensional design space. Introduced for analyzing computer code outputs, LHS provides better space-filling properties than simple random sampling by reducing variance in estimates and improving surrogate accuracy with fewer points.^[30]^[31] Other strategies include full factorial designs, which evaluate all combinations of discrete levels for each input and are suitable for low-dimensional problems (typically up to 3-5 variables) to exhaustively explore interactions without aliasing.^[31] For higher dimensions, quasi-Monte Carlo methods using Sobol sequences generate low-discrepancy points that fill the space more evenly than pseudorandom numbers, leading to faster convergence in integral approximations and surrogate fitting. Adaptive sampling, in contrast, starts with an initial set of points and iteratively adds samples based on error estimates or uncertainty from the current surrogate, focusing on regions of high prediction variance or near optima.^[32] Space-filling criteria guide the selection of samples to maximize the minimum distance between points (maximin designs) or minimize correlations among inputs, promoting even distribution that enhances the robustness of surrogate models across the entire domain. These criteria are particularly useful in computer experiments where the goal is global approximation rather than local fitting.^[31] Key considerations in sampling include balancing exploration (broad coverage to avoid gaps) with exploitation (focusing on promising areas for refinement), especially in adaptive schemes, to optimize the trade-off between model accuracy and evaluation cost.^[32] When constraints define feasible regions, strategies like constrained LHS or boundary-respecting space-filling designs ensure samples remain within allowable bounds, preventing wasted evaluations on invalid points.^[33] Sample sizes depend on model complexity, dimensionality, and desired accuracy, with smaller sets often sufficing for low-dimensional or smooth functions and larger ones required for more challenging cases.

Fitting and Optimization Techniques

Fitting surrogate models to sampled data is a critical step in their construction, involving the calibration of model parameters to approximate the underlying high-fidelity function as accurately as possible. This process typically employs optimization algorithms tailored to the surrogate's mathematical form, balancing computational efficiency with approximation quality. For parametric surrogates like polynomials, fitting often relies on closed-form solutions, while probabilistic models such as Gaussian processes require iterative maximization of likelihood functions. In more complex scenarios, especially with machine learning-based surrogates, advanced optimization techniques address non-convexity and high-dimensional parameter spaces. For polynomial-based surrogate models, such as response surface approximations, the least squares method is the standard fitting technique. This approach minimizes the sum of squared errors between the observed responses y_i and the model's predictions \hat{f}(x_i) at the sampled points x_i, expressed as

\min_{\beta} \sum_{i=1}^n \left( y_i - \hat{f}(x_i; \beta) \right)^2,

where \beta are the polynomial coefficients. The optimization is solved analytically using linear algebra: the design matrix \mathbf{X} (containing basis function evaluations) yields the normal equations \mathbf{X}^T \mathbf{X} \beta = \mathbf{X}^T \mathbf{y}, providing an exact least squares solution \beta = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}. This method is computationally efficient for low-degree polynomials and assumes a linear relationship in the parameter space, making it suitable for global approximations in engineering design.^[34] Gaussian process (GP) surrogates, including Kriging variants, are fitted by optimizing hyperparameters \theta (e.g., length scales and noise variance) via maximum likelihood estimation. The log-marginal likelihood is maximized as

\log L(\theta) = -\frac{1}{2} \mathbf{y}^T K(\theta)^{-1} \mathbf{y} - \frac{1}{2} \log |K(\theta)| - \frac{n}{2} \log 2\pi,

where K(\theta) is the n \times n covariance matrix based on the kernel function evaluated at the input points, and n is the number of samples. This objective is typically non-convex, so numerical optimization is required, often starting from initial guesses and iterating until convergence. The resulting GP provides not only point predictions but also uncertainty estimates, essential for sequential decision-making in surrogate applications. In scenarios where the fitting objective is non-convex—common in higher-order polynomials or kernel hyperparameter selection—gradient-based optimization methods like the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm are employed to efficiently navigate the parameter landscape. BFGS approximates the Hessian matrix using gradient information from the objective, enabling quasi-Newton updates for faster convergence compared to steepest descent. For problems lacking analytic gradients or requiring global exploration, derivative-free alternatives such as genetic algorithms are used; these population-based methods evolve candidate parameter sets through selection, crossover, and mutation to minimize the fitting loss. Such techniques are particularly valuable in surrogate-assisted optimization where repeated model refits occur. For machine learning-based surrogates, such as neural networks or support vector machines, hyperparameter tuning (e.g., architecture depth or regularization strength) often leverages Bayesian optimization. This sequential model-based approach treats the validation error as a black-box objective, using a lower-fidelity surrogate—typically a GP—to guide the search for optimal hyperparameters, balancing exploration of the parameter space with exploitation of promising regions. By iteratively updating the surrogate with new evaluations, Bayesian optimization minimizes the number of costly training runs needed, achieving efficient calibration even in high-dimensional hyperparameter spaces.^[35] Hybrid approaches enhance fitting robustness by combining multiple surrogate models into ensembles, where individual fits (e.g., a polynomial and a GP) are weighted or averaged based on local performance. For instance, adaptive weighting schemes minimize a combined loss function across models, mitigating weaknesses like polynomial extrapolation errors or GP computational scaling issues. These ensemble methods improve overall predictive accuracy and generalization, especially for multimodal response surfaces, by leveraging the strengths of diverse fitting techniques.

Properties and Analysis

Invariance Properties

Surrogate models are designed to approximate the behavior of complex systems while ideally preserving key invariance properties of the underlying true model, such as monotonicity and symmetry, to ensure reliable predictions across transformed inputs. Monotonicity invariance refers to the preservation of input-output order effects, where an increasing input should yield a non-decreasing output if the true model exhibits this property; this is particularly relevant in optimization and reliability analysis where violating monotonicity can lead to erroneous decision-making.^[36] For instance, in simulation metamodeling, bootstrapped Kriging models achieve this by resampling outputs to select monotonic predictors, fitting an interpolating Kriging surface only to those samples that maintain positive gradients, thus ensuring the surrogate respects the true model's increasing nature without extending confidence intervals.^[37] Symmetry invariance, such as rotational invariance, ensures the surrogate's predictions remain unchanged under domain symmetries like rotations, which is crucial for physical systems exhibiting isotropic behavior. In Gaussian process regression (GPR) surrogates for atomistic properties, this is enforced through kernel functions constructed from complete sets of invariant scalars derived from input tensors, such as traces and determinants of strain tensors, providing a rotationally equivariant representation that reduces the dimensionality of the feature space while capturing tensorial symmetries. Similarly, for turbulence modeling, machine learning surrogates use invariant basis functions built from mean strain and rotation rate tensors to predict Reynolds stress anisotropy, exploiting Galilean and rotational symmetries to improve generalization with fewer training samples compared to non-invariant approaches.^[38] Scale and location invariance address robustness to units or shifts in input variables, often achieved through normalization techniques that standardize inputs to zero mean and unit variance, rendering the surrogate independent of absolute scales or translations. In aerodynamic surrogate modeling, Lie algebra representations normalize shapes for translation (zero sample mean) and scale (identity covariance) invariance, allowing the model to generalize across varied geometries without retraining.^[36] The mathematical basis for these invariances lies in selecting basis functions or kernels that are equivariant or invariant under group transformations; for example, in GPR, kernels like the squared exponential can be augmented with invariant features to respect symmetries, while polynomial bases in response surface models require careful degree selection to approximate without introducing asymmetries.^[38] Challenges arise when approximations fail to maintain these properties, such as polynomial surrogates exhibiting oscillations (e.g., Runge's phenomenon) that break monotonicity or Lipschitz continuity in the true model, leading to non-physical predictions in high-dimensional spaces.^[39] Embedding invariance requires domain knowledge to avoid performance degradation from incorrect assumptions, and computational overhead increases with the need for symmetry-adapted features.^[38] Testing methods involve sensitivity analysis under transformations, where inputs are subjected to monotonic mappings, rotations, or scalings, and the surrogate's outputs are validated against the true model using metrics like integrated mean squared error on transformed datasets to quantify preservation. For bootstrapped Kriging, macro-replication experiments assess coverage and monotonicity retention across bootstrap samples.^[37]

Uncertainty Quantification

Uncertainty in surrogate models arises from two primary sources: aleatoric uncertainty, which stems from inherent noise or stochasticity in the underlying data-generating process and is generally irreducible, and epistemic uncertainty, which results from a lack of knowledge due to limited training data, model form choices, or parameter estimation errors and can be reduced with more information.^[40] In the context of surrogate modeling, aleatoric uncertainty often manifests as observation noise in simulation data, while epistemic uncertainty is prominent in extrapolation regions or high-dimensional spaces where the surrogate lacks sufficient samples. Gaussian processes provide a principled approach to quantifying predictive uncertainty through their inherent probabilistic formulation, where the predictive variance at a point x is given by

\sigma^2(x) = k(x,x) - \mathbf{k}(x,X) (K + \sigma_n^2 I)^{-1} \mathbf{k}(X,x),

with k(\cdot,\cdot) denoting the kernel function, X the training inputs, K the kernel matrix over X, and \sigma_n^2 the noise variance; this expression directly captures epistemic uncertainty via the distance from training data. For neural network-based surrogates, Monte Carlo dropout approximates Bayesian inference by performing multiple stochastic forward passes with dropout enabled at inference time, yielding an empirical distribution from which both aleatoric and epistemic uncertainties can be estimated as the variance of predictions.^[41] This method is particularly effective for deep surrogates in complex, non-linear systems, as it leverages the dropout mechanism to sample from an approximate posterior without requiring full Bayesian training. Polynomial chaos expansions (PCE) offer a global method for uncertainty propagation and sensitivity analysis in surrogate models by representing the output as a series of orthogonal polynomials in terms of random input variables, enabling efficient computation of variance-based sensitivity indices and propagation of input uncertainties to outputs. To ensure reliable probabilistic predictions, surrogate models are calibrated using proper scoring rules such as the Brier score, which measures the mean squared error between predicted probabilities and observed outcomes, penalizing both miscalibration and poor sharpness. Calibration is crucial for surrogates in decision-making applications, as it aligns predicted confidence intervals with empirical coverage rates. Recent advances in the 2020s have focused on deep ensembles for uncertainty quantification in high-dimensional surrogate models, where multiple neural networks are trained independently and their predictive variances provide robust estimates of epistemic uncertainty, outperforming single-model approaches in scenarios with sparse or noisy data.^[42] As of 2025, further progress includes scalable Bayesian frameworks for surrogate modeling with thorough uncertainty propagation in high-dimensional engineering applications, as reviewed in recent literature.^[43] These ensembles and extensions are particularly valuable for scalable surrogates in engineering simulations, as they capture model disagreement to quantify extrapolation risks without assuming a specific prior.

Applications

Optimization and Design

Surrogate models play a pivotal role in optimization by serving as efficient proxies for expensive objective functions in gradient-free methods, such as pattern search algorithms, where they guide the exploration of the design space without requiring derivative information.^[44] In these approaches, the surrogate approximates the black-box function based on a limited set of evaluations, allowing the optimizer to propose promising search directions or points iteratively while minimizing the number of costly true function calls.^[45] This integration enhances convergence speed and robustness, particularly for high-dimensional or noisy problems where direct evaluations are prohibitive.^[44] In design of experiments (DOE), surrogate models enable sequential strategies that adaptively refine the sampling process by selecting new points based on the current model's uncertainty or expected improvement, iteratively improving the surrogate's accuracy over the region of interest.^[46] Unlike static DOE methods like Latin hypercube sampling, sequential approaches using surrogates focus evaluations on areas likely to yield valuable information, such as near predicted optima or high-variance regions, thus optimizing resource allocation for global exploration and local exploitation.^[47] This adaptive refinement is especially beneficial in expensive simulations, where each additional sample significantly impacts overall computational cost.^[46] For multi-objective optimization, surrogate models facilitate Pareto front approximation by embedding within evolutionary algorithms like NSGA-II, where they evaluate candidate solutions to reduce the fitness assessment burden across conflicting objectives.^[48] In surrogate-assisted NSGA-II variants, individual surrogates or ensembles approximate each objective function, enabling the algorithm to maintain diversity and convergence toward the non-dominated set with far fewer true evaluations.^[49] This approach is particularly effective for problems with computationally intensive multi-fidelity evaluations, yielding well-distributed Pareto solutions that balance trade-offs efficiently.^[48] A representative case study in aerodynamic shape optimization demonstrates the practical impact: traditional computational fluid dynamics (CFD)-based methods require hundreds of thousands of evaluations for even a two-dimensional airfoil, but multi-fidelity deep neural network surrogates reduce high-fidelity evaluations to 20–120 points, supplemented by low-fidelity data, achieving comparable drag minimization and lift maximization.^[50] Such reductions transform optimization from exhaustive grid searches into targeted explorations, enabling design space navigation that would otherwise be infeasible. Overall, surrogate models in optimization yield efficiency gains of up to 90% in computational time for black-box functions, as evidenced in groundwater pumping scenarios where multiple surrogates cut evaluation costs while preserving solution quality.^[51]

Engineering Simulations

Surrogate models play a critical role in engineering simulations by approximating complex, computationally intensive processes such as finite element analysis (FEA) and computational fluid dynamics (CFD), enabling faster predictions while maintaining acceptable accuracy. In structural dynamics, reduced-order modeling via surrogates reduces the dimensionality of high-fidelity simulations, allowing for efficient analysis of vibration responses and stability in mechanical systems. For instance, proper orthogonal decomposition (POD) combined with surrogate techniques decomposes dynamic fields into lower-dimensional bases, facilitating rapid evaluation of nonlinear responses in rotor systems. Similarly, in fluid flow predictions, machine learning-based surrogates trained on CFD data approximate velocity and pressure fields, achieving predictions in milliseconds to seconds compared to traditional solvers that require hours. These models are particularly valuable in built environments, where convolutional neural networks (CNNs) have been shown to replicate Reynolds-averaged Navier-Stokes (RANS) simulations with errors below 3%.^[52]^[53]^[54] Integration of surrogates into iterative solvers enhances simulation workflows, often through hybrid approaches like POD-based reduced-order models embedded in finite element frameworks. POD identifies dominant modes from snapshot data of full-order simulations, which are then interpolated using radial basis functions (RBFs) to construct the surrogate, reducing the need for repeated high-fidelity solves during parameter sweeps or inverse problems. This method has been applied in nonlinear FEA for metal forming and elastoplasticity, where preprocessing techniques such as scaling per physical component minimize interpolation errors and improve accuracy in high-dimensional output fields. In geotechnical and manufacturing simulations, such hybrids enable real-time process estimation by distinguishing truncation errors from basis reduction and interpolation uncertainties.^[53]^[55] Practical examples illustrate the impact in specific domains. In automotive crash simulations, hybrid surrogates combining physics-based spring-damper-mass models with neural networks predict deceleration pulses based on impact parameters, supporting integrated safety systems with real-time capability and high fidelity to finite element results. For thermal management in electronics, surrogate models such as artificial neural networks (ANNs) approximate transient heat transfer in power modules, significantly reducing computation time for temperature predictions while capturing multi-scale responses.^[56]^[57] These approaches replace detailed simulations, enabling design iterations that would otherwise take hours. In multi-scale modeling of composites, ANN-based surrogates bridge microscale unit cell behaviors to macroscale FEA, modeling progressive damage and nonlinear constitutive relations with reusable material databases that cut online costs significantly, achieving good agreement with conventional homogenizations.^[58] As of 2025, surrogate models are increasingly integrated into digital twins for real-time engineering monitoring, leveraging reduced-order techniques to create lightweight approximations of physical assets. In manufacturing and aerospace, these enable parametric simulations within seconds—versus hours for full models—supporting predictive maintenance and adaptive control in factory settings. For example, POD-enhanced surrogates in digital twins facilitate rapid response to varying loads or material properties, maintaining accuracy within trained parameter ranges and accelerating system-level analysis.^[59]^[60]

Uncertainty and Sensitivity Analysis

Surrogate models play a crucial role in uncertainty propagation by approximating expensive simulations, enabling efficient Monte Carlo methods to estimate statistical moments such as variance in stochastic systems. Traditional Monte Carlo simulation requires thousands of model evaluations to propagate input uncertainties through complex systems, often rendering it computationally prohibitive for high-fidelity models. By constructing surrogate approximations, such as Gaussian processes or polynomial chaos expansions, these evaluations can be reduced significantly while maintaining accuracy in variance estimation, as demonstrated in environmental modeling where surrogates facilitate uncertainty assessment in large-scale simulations.^[61]^[62] In sensitivity analysis, surrogate models enable the computation of variance-based measures like Sobol indices, which quantify the contribution of individual inputs to output uncertainty. The first-order Sobol index for input X_i is defined as S_i = \frac{\text{Var}(E[Y|X_i])}{\text{Var}(Y)}, where Y is the model output, and this can be estimated efficiently by evaluating the surrogate multiple times instead of the original model. This approach is particularly effective for stochastic models with intrinsic noise, allowing for generalized Sobol indices that account for both parametric and aleatoric uncertainties.^[63]^[64] Surrogates distinguish global sensitivity, which captures average effects across the input space via variance decomposition like Sobol methods, from local sensitivity, which examines gradients at specific points; the former is better suited for non-linear, high-dimensional problems where surrogates decompose total variance into main and interaction effects. This variance-based decomposition, enabled by surrogate evaluations, provides a comprehensive ranking of input influences without assuming model linearity.^[64]^[63] In applications, surrogate models support risk assessment in climate models by propagating uncertainties in sea level rise and storm parameters to predict flood probabilities, as seen in hybrid frameworks emulating hydrodynamic simulations for coastal sites under future scenarios. Similarly, in pharmacokinetics within quantitative systems pharmacology models, surrogates aid parameter ranking through sensitivity analysis, identifying key drivers of drug response variability and accelerating virtual patient cohort generation.^[65]^[66] Advances in non-intrusive polynomial chaos surrogates enhance uncertainty quantification by projecting simulation data onto orthogonal polynomial bases, offering efficiency in dimensions exceeding 10 without modifying the underlying model code. These methods, such as regression- or projection-based approaches, require only tens to hundreds of evaluations for accurate statistical moments in pyrolysis or material simulations, outperforming traditional Monte Carlo in high-dimensional settings.^[67]

Advanced Techniques

Surrogate-Assisted Evolutionary Algorithms

Surrogate-assisted evolutionary algorithms (SAEAs) integrate surrogate models to approximate the fitness evaluations in population-based optimization methods, such as genetic algorithms and evolution strategies, thereby minimizing the number of computationally expensive calls to the true objective function. This approach is particularly valuable for black-box optimization problems where each evaluation requires significant resources, like simulations or experiments. By leveraging surrogates—such as Gaussian processes, neural networks, or support vector machines—SAEAs maintain the global search capabilities of evolutionary algorithms while accelerating convergence.^[68]^[69] Frameworks for SAEAs typically involve infilling strategies to select promising candidates for evaluation and hybrid mechanisms for updating both the population and the surrogate. Infilling often employs criteria like expected improvement, which balances exploration and exploitation by prioritizing points likely to improve the current best solution based on surrogate predictions and uncertainty estimates. Hybrid evolution-surrogate updates alternate between evolutionary operators (e.g., mutation, crossover) and surrogate refinements, ensuring the model remains accurate as the search progresses. These frameworks allow for flexible integration, where surrogates guide individual selection or entire generations.^[68]^[70] Prominent algorithms in this domain include surrogate-assisted covariance matrix adaptation evolution strategy (saCMA-ES) and coevolutionary models. The saCMA-ES extends the CMA-ES by incorporating self-adaptive surrogates for pre-selection or direct optimization of offspring, using techniques like aggregated surrogate models (ASM) or ranking-based support vector machines to approximate fitness rankings efficiently. Variants such as IPOP-saCMA-ES and BIPOP-saCMA-ES further enhance restart mechanisms for better handling of multimodal landscapes. Coevolutionary models, meanwhile, evolve the surrogate population alongside the solution population, often through reward-based parent selection, enabling mutual adaptation and improved robustness in dynamic or multi-objective settings.^[71]^[69]^[72] In terms of performance, SAEAs achieve substantial speedups of 10-100 times in high-dimensional problems (dimensions up to 100 or more) by reducing function evaluations from thousands to hundreds, as demonstrated on benchmarks like BBOB suites. They also effectively manage noisy objectives through robust surrogates like Kriging models, which incorporate uncertainty to filter out perturbations and maintain reliable rankings even under moderate noise levels. For instance, saCMA-ES variants have shown significant improvements over standard CMA-ES, such as better rankings on BBOB benchmarks for 20-dimensional multimodal functions, scaling well to higher dimensions without proportional increases in computational cost.^[68]^[69]^[71] Key challenges in SAEAs revolve around managing surrogate error accumulation through individual-based versus generation-based updates. Individual-based updates, which refine the surrogate after each evaluation, offer rapid adaptation but risk overfitting or diversity loss if the model quality degrades. Generation-based updates, applied periodically across an entire population, promote stability and parallelism but may lead to divergence if the surrogate becomes outdated during long generations. Balancing these approaches requires careful tuning of parameters like surrogate lifespan to mitigate premature convergence in complex landscapes.^[68]^[69] Recent advances as of 2025 include co-evolution of large language models with configuration spaces in SAEAs for optimizing computationally expensive black-box problems, and probability selection-based SAEAs that enhance optimization performance by adaptively selecting surrogates.^[73]^[74]

Multi-Fidelity and Adaptive Surrogates

Multi-fidelity surrogate models integrate data from simulations or models of varying accuracy levels to balance computational cost and predictive precision. Low-fidelity models, which are inexpensive but less accurate, provide broad coverage of the input space, while high-fidelity models offer detailed accuracy at higher expense. By fusing these, multi-fidelity approaches reduce the required number of high-fidelity evaluations, enabling efficient approximation of complex systems. A prominent method is variable-fidelity kriging, which models the high-fidelity function as a scaled and corrected version of the low-fidelity output. The surrogate is constructed as \hat{f}(x) = \rho f_{\text{low}}(x) + \delta(x), where \rho is a scaling factor (often a constant or Gaussian process), f_{\text{low}}(x) is the low-fidelity prediction, and \delta(x) is a discrepancy term modeled via kriging to capture differences between fidelities. This hierarchical structure leverages correlations between fidelity levels to enhance interpolation accuracy. Co-kriging extends this by jointly estimating parameters across multiple fidelity levels using a shared covariance structure, allowing flexible incorporation of non-hierarchical data sources.^[75] These techniques yield significant cost-accuracy trade-offs; for instance, multi-fidelity kriging can achieve comparable accuracy to single-fidelity models with substantially fewer high-fidelity evaluations in aerodynamic design tasks, as demonstrated in wing optimization examples. In optimization contexts, such reductions translate to substantial computational savings depending on the fidelity ratio and problem dimensionality.^[75] Adaptive surrogates build on this by dynamically updating the model during construction or use, focusing on regions of high prediction uncertainty to refine accuracy efficiently. In online learning frameworks, new data points are sequentially added where the surrogate's uncertainty—often quantified via the predictive variance in Gaussian processes—is maximized, such as through error-based or variance-driven criteria. Active learning loops implement this by iteratively evaluating the expensive model at selected points, balancing exploration of the input space and exploitation of uncertain areas to minimize overall evaluations.^[76] In recent developments from the 2020s, adaptive physics-informed neural networks (PINNs) have emerged for real-time adaptation in control systems, incorporating physical laws into the network to update surrogates dynamically as new data arrives. For example, Lyapunov-based PINNs enable adaptive control of uncertain Euler-Lagrange systems by online learning of dynamics, ensuring stability while reducing simulation times for real-time applications like robotics. These methods extend multi-fidelity principles by adaptively weighting physics constraints and data fidelities, achieving faster convergence in time-critical scenarios.^[77] As of 2025, advancements include adaptive quality-based multi-fidelity frameworks that maximize low-fidelity data utilization for structural optimization, and physics-informed multi-fidelity surrogates for modeling fluid flow in porous media, enhancing predictions in complex engineering simulations.^[78]^[79]

Software and Tools

Open-Source Packages

Several open-source packages facilitate the development and application of surrogate models, providing tools for constructing approximations such as response surfaces, kriging, Gaussian processes, and support vector regressions.^[80]^[81]^[82] DAKOTA, developed by Sandia National Laboratories, is a comprehensive C++ framework with Python bindings for optimization, uncertainty quantification (UQ), and surrogate modeling, supporting methods like polynomial response surfaces, Gaussian processes, and multifidelity surrogates integrated with design of experiments (DOE).^[80]^[83]^[84] Released under the GNU Lesser General Public License, it enables iterative surrogate-based optimization and is widely used in engineering simulations for reducing computational costs.^[85] The Surrogate Modeling Toolbox (SMT), a Python library, specializes in response surface methodologies and kriging-based surrogates, including implementations for hierarchical and mixed inputs, allowing users to build and evaluate models from sampled data.^[81]^[86] It supports training on datasets for prediction and uncertainty estimation, making it suitable for design optimization tasks.^[81] scikit-learn, a foundational machine learning library in Python, provides surrogate modeling capabilities through classes like GaussianProcessRegressor for probabilistic Gaussian process regression and SVR for support vector regression, which approximate complex functions with tunable kernels and hyperparameters.^[82]^[87] These tools are often employed in surrogate-assisted workflows due to their ease of integration and scalability for high-dimensional problems.^[88] Specialized packages extend surrogate functionality for targeted applications. UQPy, an open-source Python toolbox from the Shields Uncertainty Research Group, focuses on uncertainty quantification and includes modules for surrogate construction, such as polynomial chaos expansions and Gaussian processes, alongside sampling methods for validation.^[89]^[90]^[91] SMAC3, developed by the AutoML group, implements sequential model-based optimization using random forest surrogates to guide hyperparameter tuning and black-box optimization, supporting parallel evaluations and intensity measures for robust performance.^[92]^[93] These packages commonly feature seamless integration with Python and R ecosystems, enabling scripting for model training and evaluation, while many incorporate adaptive sampling techniques to refine surrogates by iteratively selecting informative data points based on acquisition functions like expected improvement.^[81]^[90]^[93] Community-driven tools on GitHub further enrich the landscape, such as the surrogate-models repository, which has seen 2025 updates incorporating deep learning-based surrogates like physics-enhanced neural networks for enhanced accuracy in scientific simulations.^[94]^[95] In practice, these packages support example workflows for DOE and validation: users can generate Latin hypercube samples via scikit-learn or UQPy, train a kriging surrogate in SMT or DAKOTA, and validate predictions against high-fidelity simulations using cross-validation metrics like mean squared error, ensuring reliable approximations for optimization loops.^[81]^[83]^[90]

Commercial Solutions

Commercial surrogate modeling solutions are proprietary software suites designed for enterprise-level integration in engineering workflows, offering robust tools for constructing and deploying surrogate models to accelerate simulations in fields like optimization and design. These tools emphasize user-friendly interfaces, scalability for large-scale computations, and compliance with industry standards such as ISO certifications for aerospace and automotive sectors.^[96]^[97]^[98] modeFRONTIER, developed by ESTECO, is a leading platform for multi-objective optimization that incorporates response surface (RS) models and AI-driven surrogates to approximate complex nonlinear systems, enabling efficient design space exploration with reduced computational demands. It features GUI-driven fitting tools for building metamodels and supports integration with CAD/CAE software for automated workflows. The 2025R2 release introduces the MUSA algorithm, enhancing surrogate accuracy in multi-strategy optimization scenarios.^[99]^[96]^[100] Isight from Dassault Systèmes provides integrated surrogate capabilities within process automation frameworks, using approximation models to create reduced-order representations of simulation data for faster iteration in multidisciplinary design. Its components include built-in DOE tools and metamodel builders that facilitate surrogate construction from finite element analyses, with enterprise scalability for distributed computing environments.^[97]^[101]^[102] ANSYS incorporates surrogate builders through its optiSLang AI+ module, which leverages machine learning to generate metamodels for finite element analysis (FEA) and computational fluid dynamics (CFD), allowing rapid design evaluations without full simulations. These tools support GUI-based model fitting and are certified for industries requiring high reliability, such as automotive crash testing.^[98]^[103]^[104] COMSOL Multiphysics offers built-in surrogate model functionality, introduced in version 6.2, for exporting compact approximations of multiphysics simulations to accelerate app deployment and digital twins. In 2025 updates, it emphasizes high-accuracy surrogates derived from FEA/CFD data, with features for neural network-based fitting and integration into collaborative workflows.^[105]^[106]^[59] Siemens Simcenter, particularly through HEEDS and Reduced Order Modeling extensions, applies surrogates in automotive applications like gear stress analysis and CFD optimization, using response surface models to predict performance from simulation data. The 2504 release consolidates surrogate tools in a unified interface for faster metamodel creation, supporting industry certifications for vehicle design.^[107]^[108]^[109] The MATLAB Optimization Toolbox includes surrogate optimization via the surrogateopt function, which builds Gaussian process-based models to minimize expensive objective functions in engineering design tasks. It provides scalable options for enterprise users, integrating with Simulink for surrogate-assisted simulations.^[110]^[111]^[112] Post-2023 trends in commercial surrogate solutions highlight a shift toward cloud-based platforms for collaborative design, enabling distributed training of AI surrogates and real-time sharing of metamodels across global teams, as seen in growing adoption for multiphysics applications.^[59]^[113]^[114]

References

[1]
A review of surrogate models and their application to groundwater ...
Jul 27, 2015 · Surrogate modeling aims to provide a simpler, and hence faster, model which emulates the specified output of a more complex model in function of ...
[2]
Surrogate Modeling - an overview | ScienceDirect Topics
Surrogate modeling is defined as a method used to approximate complex systems by creating simplified representations based on initial sampling sets of ...
[3]
https://www.sciencedirect.com/science/article/pii/B9780323851596502840
[4]
https://www.researchgate.net/publication/303098824_Introduction_to_Surrogate_Modeling_and_Surrogate-Based_Optimization
[5]
Introduction to Surrogate Modeling and Surrogate-Based Optimization
A surrogate model is defined as a data-driven model constructed with a small number of experimental samples, which approximates the original physical ...
[6]
https://ntrs.nasa.gov/api/citations/20050186653/downloads/20050186653.pdf
[7]
[PDF] Surrogate-based Analysis and Optimization
... .................. 43. 8.3. Construction of the Surrogate Model ............................................................................................ 44.
[8]
[PDF] Engineering Design via Surrogate Modelling
Engineering Design via Surrogate Modelling: A Practical Guide A. I. J. Forrester, A. Sóbester and A. J. Keane. © 2008 John Wiley & Sons, Ltd. Page 24. 4.
[9]
Engineering Design via Surrogate Modelling - Wiley Online Library
Jul 18, 2008 · Engineering Design via Surrogate Modelling: A Practical Guide. Author(s):. Dr Alexander I. J. Forrester, Dr András Sóbester, ...Missing: goals | Show results with:goals
[10]
On the use of surrogate models in engineering design optimization ...
Surrogate models are invaluable tools that greatly assist the process of computationally expensive analyses and optimization. Engineering optimization reaps ...Missing: goals | Show results with:goals
[11]
Surrogate-based analysis and optimization - ScienceDirect.com
The surrogates are constructed using data drawn from high-fidelity models, and provide fast approximations of the objectives and constraints at new design ...Missing: expansion | Show results with:expansion
[12]
A review of surrogate-assisted evolutionary algorithms for expensive ...
May 1, 2023 · This paper systematically summarizes the existing research results of SAEAs from the aspects of algorithms and applications.Review · Introduction · Common Surrogate Model
[13]
[PDF] Introduction to Surrogate Modeling
What is a surrogate model? Use surrogate! Page 4. Structural & Multidisciplinary Optimization Group. 4. • Surrogate models, response surface models, metamodels.
[14]
https://doi.org/10.1080/00401706.1979.10489755
[15]
https://link.springer.com/article/10.1007/s00158-008-0338-0
[16]
how cross-validation errors can help us to obtain the best predictor
Jan 8, 2009 · Multiple surrogates: how cross-validation errors can help us to obtain the best predictor. RESEARCH PAPER; Published: 08 January 2009. Volume 39 ...
[17]
Review of surrogate modeling in water resources - AGU Journals
Jul 31, 2012 · [1] Surrogate modeling, also called metamodeling, has evolved and been extensively used over the past decades. A wide variety of methods and ...Introduction · Response Surface Surrogates · Efficiency Gains of Surrogate...
[18]
Review of surrogate modeling in water resources
Summary of each segment:
[19]
[PDF] Gaussian Processes for Machine Learning
Gaussian processes for machine learning / Carl Edward Rasmussen, Christopher K. I. Williams. p. cm. —(Adaptive computation and machine learning). Includes ...
[20]
Derivation of the Kriging Equations
... ordinary kriging equations. Putting it all together, this system can be written in the matrix form. displaymath420. or more concisely as. equation326. where ...
[21]
How Kriging works—ArcGIS Pro
The following sections discuss how the general kriging formula is used to create a map of the prediction surface and a map of the accuracy of the predictions.
[22]
Kernel Cookbook
Standard Kernels. Squared Exponential Kernel. A.K.A. the Radial Basis Function kernel, the Gaussian kernel. It has the form: kSE(x,x′)=σ2exp(−(x−x′)22ℓ2)
[23]
Kriging Model - an overview | ScienceDirect Topics
Sacks et al. (1989) have extended the kriging principles to computer experiments by considering the correlation between two responses of a computer code ...
[24]
The different types of Kriging methods. - ResearchGate
While Simple Kriging assumes a known and constant mean, Ordinary Kriging assumes a global mean that is constant but unknown. Universal Kriging on the other hand ...<|control11|><|separator|>
[25]
A support vector regression-based multi-fidelity surrogate model
Jun 22, 2019 · In this paper, a multi-fidelity surrogate model based on support vector regression named as Co_SVR is developed by combining HF and LF models.Missing: seminal | Show results with:seminal
[26]
A generalised deep learning-based surrogate model for ... - Nature
Jun 5, 2023 · The use of surrogate models based on Convolutional Neural Networks (CNN) is increasing significantly in microstructure analysis and property ...Missing: seminal | Show results with:seminal
[27]
Surrogate modeling of passive microwave circuits using recurrent ...
Apr 17, 2025 · This research suggests a new technique for dependable modeling of microwave circuits. Its main ingredient is a recurrent neural network (RNN).
[28]
Efficient deep-learning-based surrogate model for reservoir ...
We introduce a training procedure that leverages transfer learning with multi-fidelity training data to construct surrogate models efficiently.Missing: post- | Show results with:post-
[29]
Review of machine learning-based surrogate models of ...
Dec 1, 2023 · Here, we review 120 research articles on machine learning-based surrogate models for groundwater contaminant modeling that were published between 1994 and 2022.Missing: rise post-
[30]
https://www.tandfonline.com/doi/abs/10.1080/00401706.1979.10489755
[31]
Comparison of Three Methods for Selecting Values of Input ...
Latin hypercube sampling · Sampling techniques · Simulation techniques · Variance reduction.
[32]
Design and Analysis of Computer Experiments - Project Euclid
Jerome Sacks. William J. Welch. Toby J. Mitchell. Henry P. Wynn. "Design and Analysis of Computer Experiments." Statist. Sci. 4 (4) 409 - 423, November, 1989.
[33]
An evaluation of adaptive surrogate modeling based optimization ...
This study evaluates an adaptive surrogate modeling based optimization (ASMO) method on two benchmark problems: the Hartman function and calibration of the SAC ...Missing: seminal | Show results with:seminal
[34]
Constrained adaptive sampling for domain reduction in surrogate ...
Jun 24, 2021 · This article addresses sampling-domain reduction for surrogate model generation through constrained adaptive sampling, particularly suited for ...2 Sampling Domain Reduction · 2.1 Surrogate Model... · 4 Results
[35]
Efficient Surrogate Model Development: Impact of Sample Size and ...
Aug 7, 2025 · Note that the sample size of the simulation data varies case by case and is dependent on model complexity (Davis et al., 2018) ; sufficient ...
[36]
Polynomial Response Surface based on basis function selection by ...
Nov 10, 2021 · Polynomial Regression Surface (PRS) is a commonly used surrogate model for its simplicity, good interpretability, and computational ...
[37]
Hyperparameter Optimization by Gradient Boosting surrogate models
Jan 6, 2021 · In this paper, we propose a new surrogate model based on gradient boosting, where we use quantile regression to provide optimistic estimates of the performance.
[38]
Data-driven surrogate model for aerodynamic design using ...
In LAS, the shape is normalized to possess a zero-sample mean (translation invariance) and an identity sample covariance (scale invariance) across the n ...
[39]
[PDF] Monotonicity-preserving Bootstrapped Kriging Metamodels for ...
The method is illustrated through the M/M/1 simulation model with as outputs either the estimated mean or the estimated 90% quantile; both outputs are monotonic ...
[40]
https://arxiv.org/abs/2306.07913
[41]
Surrogate Model - an overview | ScienceDirect Topics
Surrogate models are simplified approximations of more complex, higher-order models. They are used to map input data to outputs when the actual relationship ...Missing: seminal | Show results with:seminal
[42]
[2306.07913] Epistemic and Aleatoric Uncertainty Quantification and ...
Jun 13, 2023 · This work suggests several methods of uncertainty treatment in multiscale modelling and describes their application to a system of coupled turbulent transport ...
[43]
Uncertainty quantification in molecular simulations with dropout ...
Aug 14, 2020 · In this paper, we propose a class of Dropout Uncertainty Neural Network (DUNN) potentials that provide rigorous uncertainty estimates.<|separator|>
[44]
Uncertainty Quantification using Deep Ensembles for Decision ...
This paper uses deep ensembles to quantify aleatory and epistemic uncertainty, acting as an uncertainty-aware surrogate transition model for decision-making.
[45]
Surrogate-based analysis and optimization - ScienceDirect.com
This section discusses the basic unconstrained SBAO algorithm, a newly proposed multiple surrogate-based optimization approach, the use of surrogate management ...
[46]
Model-and-search: a derivative-free local optimization algorithm
May 11, 2025 · The surrogate model is then used to guide the search. We present extensive computational results on a collection of 501 publicly available test ...
[47]
Efficient sequential experimental design for surrogate modeling of ...
Fischer, J. Bect and E. Vazquez, Sequential design of experiments to estimate a probability of exceeding a threshold in a multi-fidelity stochastic simulator, ...
[48]
Surrogate-based sequential Bayesian experimental design using ...
In summary, the choice of the information acquisition function plays a crucial role in the efficacy of the sequential design of experiments strategy, since it ...
[49]
https://dl.acm.org/doi/10.1145/3583133.3590746
[50]
Surrogate-Assisted NSGA-II Algorithm for Expensive Multiobjective ...
Jul 24, 2023 · We propose in this work a surrogate assisted approach for multiobjective evolutionary algorithms by building a surrogate model on each objective.
[51]
Aerodynamic Prediction and Design Optimization Using Multi ... - MDPI
Performing an ASO typically requires hundreds of thousands of aerodynamic evaluations even for the optimization of a two-dimensional airfoil.
[52]
Performance comparison of multiple and single surrogate models for ...
Viana, F.A., Haftka, R.T., and Steffen, V., 2009. Multiple surrogates: how cross-validation errors can help us to obtain the best predictor. Structural and ...
[53]
Proper Orthogonal Decomposition-Based Surrogate Modeling ...
This work presents a surrogate modeling technique to quickly and accurately reproduce the nonlinear unbalance responses of industrial scale rotor-dynamical ...Study Case · Surrogate Models Training · Error Estimation
[54]
Evaluation of POD based surrogate models of fields resulting from ...
Nov 3, 2021 · In this work, POD-based surrogate models with Radial Basis Function interpolation are used to model high-dimensional FE data fields.
[55]
Rapid CFD Prediction Based on Machine Learning Surrogate Model ...
In handling complex coupled systems, hybrid ML models offer a scalable solution. For example, the PINN-XGBoost hybrid model exploits the physics-aware ...
[56]
Proper Orthogonal Decomposition, surrogate modelling and ...
A computational methodology is proposed for CFD-based aerodynamic design to exploit a reduced order model as surrogate evaluator.
[57]
A Hybrid Surrogate Modeling Approach for Vehicle Crash Simulations
A hybrid surrogate model consisting of a machine learning-enhanced spring-damper-mass model, is developed in this work.
[58]
Machine Learning Based Surrogate Models for the Thermal ... - MDPI
Surrogate models are simpler models that approximate the original models' input–output behavior, but require much less computational effort than the original ...
[59]
An efficient multiscale surrogate modelling framework for composite ...
Aug 1, 2020 · The surrogate model, defined at the macroscale, represents the nonlinear effective constitutive relationship of a homogenised composite material ...
[60]
Simulation trends for 2025: Get ready for AI and surrogate models
Dec 31, 2024 · Comsol's Bjorn Sjodin explains the value of reduced order modeling, how chatbots can help simulation beginners and what better AI could lead to.
[61]
Reduced Order and Surrogate Modeling for Digital Twins
Surrogates and reduced order models (ROMs) can make these tasks tractable, provided they are sufficiently accurate and can be constructed with sufficiently few ...
[62]
Recent Advances in Surrogate Modeling Methods for Uncertainty ...
Jun 13, 2022 · Although the above significant improvements have been achieved, the scheme that can strike a trade-off between accuracy and efficiency is still ...2.1. Probabilistic Models · 3. Numerical Methods In... · 5. Sampling Strategy Of...
[63]
[PDF] Surrogate modeling for uncertainty assessment
Thus, the propagation and analysis of uncertainty in such models with a method such as. Monte Carlo simulation, which in some cases can take several thousand ...
[64]
Efficient Computation of Sobol' Indices for Stochastic Models
Our method relies on an analysis of variance through a generalization of Sobol' indices and on the use of surrogate models. We show how to efficiently compute ...
[65]
A Comparison of Surrogate Modeling Techniques for Global ... - MDPI
Dec 24, 2021 · This paper shows how different surrogate modeling methods can be used to perform global sensitivity analysis via Sobol' indices in hybrid ...
[66]
Projecting Climate Dependent Coastal Flood Risk With a Hybrid ...
A surrogate modeling framework of waves, winds, and tides is developed in this study to efficiently predict spatially varying nearshore and estuarine water ...
[67]
Using machine learning surrogate modeling for faster QSP VP ... - NIH
Jun 16, 2023 · In general, there is a trade‐off between accuracy of the surrogate models and the efficiency of creating them. The surrogate accuracy should ...
[68]
Intrusive and non-intrusive uncertainty quantification methodologies ...
The applied NI methods include Monte Carlo based simulations, regression and projection based non-intrusive polynomial chaos (NIPC) methods. The uncertainty ...
[69]
https://theses.hal.science/tel-00823882/document
[70]
[PDF] Surrogate-Assisted Evolutionary Algorithms
May 18, 2013 · According to an analysis by [Alander, 1994] of 2500 papers published on Genetic Algorithms, Evolution Strategies, Evolutionary Pro- gramming, ...
[71]
https://arxiv.org/abs/1204.2356
[72]
[1204.2356] Self-Adaptive Surrogate-Assisted Covariance Matrix ...
Apr 11, 2012 · This paper presents a novel mechanism to adapt surrogate-assisted population-based algorithms. This mechanism is applied to ACM-ES, a recently proposed ...Missing: key | Show results with:key
[73]
https://dl.acm.org/doi/10.1145/3711896.3736882
[74]
Multi-fidelity optimization via surrogate modelling - Journals
This paper demonstrates the application of correlated Gaussian process based approximations to optimization where multiple levels of analysis are available.Missing: seminal | Show results with:seminal
[75]
Active learning for adaptive surrogate model improvement in high ...
Jul 10, 2024 · This paper investigates a novel approach to efficiently construct and improve surrogate models in problems with high-dimensional input and output.
[76]
https://link.springer.com/article/10.1007/s00158-024-03816-9
[77]
About Dakota - Dakota – Sandia National Laboratories
Dakota is open source under GNU LGPL, with applications spanning defense programs for DOE and DOD, climate modeling, computational materials, nuclear power ...
[78]
SMT: Surrogate Modeling Toolbox — SMT 2.10.0 documentation
The surrogate modeling toolbox (SMT) is an open-source Python package consisting of libraries of surrogate modeling methods.
[79]
1.7. Gaussian Processes - Scikit-learn
Gaussian Processes (GP) are a nonparametric supervised learning method used to solve regression and probabilistic classification problems.Gaussian Processes · GaussianProcessRegressor · 1.8. Cross decomposition · RBFMissing: surrogate SVR
[80]
Packages - Dakota – Sandia National Laboratories
Dakota utilizes the following Sandia-developed optimization, design of experiments, uncertainty quantification, and surrogate modeling libraries.
[81]
surrogate_based_global - Dakota Documentation
Description. The surrogate_based_global method iteratively performs optimization on a global surrogate using the same bounds during each iteration.<|separator|>
[82]
Dakota: Optimization and UQ - GitHub
The Dakota project delivers both state-of-the-art research and robust, usable software for optimization and UQ.
[83]
SMT 2.0: A Surrogate Modeling Toolbox with a focus on hierarchical ...
SMT 2.0 is the first open-source surrogate library to propose surrogate models for hierarchical and mixed inputs.
[84]
GaussianProcessRegressor — scikit-learn 1.7.2 documentation
GaussianProcessRegressor is for Gaussian process regression (GPR), allowing prediction without prior fitting and evaluating samples at given inputs.Missing: surrogate | Show results with:surrogate
[85]
16 Using sklearn Surrogates in spotpython
Besides the internal kriging surrogate, which is used as a default by spotpython , any surrogate model from scikit-learn can be used as a surrogate in ...Missing: library | Show results with:library
[86]
UQpy (Uncertainty Quantification with python) is a general purpose ...
UQpy (Uncertainty Quantification with python) is a general purpose Python toolbox for modeling uncertainty in physical and mathematical systems.
[87]
Welcome to UQpy's documentation! — UQpy v4.2.0 documentation
UQpy (Uncertainty Quantification with python) is a general purpose python toolbox for modeling uncertainty in physical and mathematical systems.Missing: open source
[88]
UQpy v4.1: Uncertainty quantification with Python - ScienceDirect.com
This paper presents the latest improvements introduced in Version 4 of the UQpy, Uncertainty Quantification with Python, library.Original Software... · 2. Software Description · 2.2. Software Modules
[89]
SMAC3: A Versatile Bayesian Optimization Package for ... - GitHub
SMAC offers a robust and flexible framework for Bayesian Optimization to support users in determining well-performing hyperparameter configurations.
[90]
SMAC3 Documentation - GitHub Pages
Introduction#. SMAC is a tool for algorithm configuration to optimize the parameters of arbitrary algorithms, including hyperparameter optimization of Machine ...Missing: open source
[91]
https://www.sciencedirect.com/science/article/pii/S2352711023002571
[92]
payel79/PEDS - GitHub
PEDS is a Julia package containing methodologies for Scientific Machine Learning surrogate models called Physics-Enhanced Deep Surrogates (PEDS)Content · System Requirements · Julia Dependencies
[93]
Modefrontier Capabilities - ESTECO Engineering
AI-data driven modeling. Enabling the development of computationally efficient surrogate models that expedite the exploration of complex designs spaces. Make ...
[94]
Isight & the SIMULIA Execution Engine - Dassault Systèmes
Isight and the SIMULIA Execution Engine (formerly Fiper) integrate multiple cross-disciplinary models and applications within a simulation process flow.Isight Partner Components · Isight Basic Components · Isight Add-on Components
[95]
How Ansys AI+ Modules Advance Simulation and Analysis
Sep 25, 2024 · The optiSLang AI+ module allows Ansys users to create surrogate models, which provide a way to explore possible designs even faster than with ...Cfd Ai+ · Granta Mi Ai+ · Optislang Ai+
[96]
Response surface models | Engineering technology | ESTECO
Use Response Surface Models (RSM) techniques to instantly predict the behavior of complex non-linear systems, while saving time and computational resources.Missing: limitations | Show results with:limitations
[97]
modeFRONTIER 2025R2 and VOLTA Out Now
Apr 22, 2025 · modeFRONTIER comes with MUSA, a new multi-strategy island-based algorithm designed to enhance the exploration of metamodeling and optimization ...
[98]
Approximation Models in Isight for Reduced Order Modeling - Inceptra
Jan 17, 2023 · Model order reduction methods refer to a technique of applying surrogate models, also known as transfer functions or approximation models, ...
[99]
Dassault Systemes Delivers Isight 4.0 for Simulation Automation and ...
Isight provides engineers with an open system for integrating design and simulation models, created with various CAD, CAE, and other software applications, ...<|separator|>
[100]
Optimize Design and Simulation with AI/ML and Metamodeling - Ansys
Nov 28, 2023 · Ansys' optimization solutions offer built-in capabilities that leverage artificial intelligence and machine learning.How Optislang Leverages... · Generate Design Of... · Optimization In A Click
[101]
How One Powerful Computational Combo Can Make All the Difference
Apr 7, 2025 · Ansys LS-DYNA and Twin Builder use reduced-order modeling (ROM) to enable faster, more accurate simulation for safety, especially in automotive ...
[102]
Introduction to Using Surrogate Models - COMSOL
A surrogate model is usually more simple and computationally efficient than a finite element model and is used to approximate the behavior of models that are ...
[103]
Modeling and Simulation Predictions for 2025 | COMSOL Blog
Jan 7, 2025 · “What I see is the biggest trend right now is the ability to create high-accuracy surrogate models. Like what we've introduced in COMSOL ...
[104]
Simcenter HEEDS | Siemens Software
Simcenter HEEDS is a powerful design space exploration and optimization software package that interfaces with all commercial CAD and CAE tools.Why Simcenter Heeds? · Leading Mdao Software · Simcenter Heeds Capabilities
[105]
AI-accelerated gear stress analysis - Simcenter
Aug 27, 2025 · Among the advancements, surrogate modeling for physics simulations experiences rapid growth thanks to advanced AI techniques and architectures.Challenge: Achieve Fast And... · Solution: Combine Powerful... · Results: Realize Fast And...
[106]
What's new in Simcenter HEEDS 2504?
May 8, 2025 · Simcenter HEEDS 2504 introduces a new approach to surrogate modeling with a consolidated Surrogates tab. This centralized environment simplifies ...
[107]
Surrogate Optimization - MATLAB & Simulink - MathWorks
What Is Surrogate Optimization? Surrogate optimization attempts to find a global minimum of an objective function using few objective function evaluations.
[108]
surrogateopt - Surrogate optimization for global minimization of time ...
surrogateopt is a global solver for time-consuming objective functions, searching for the global minimum of a real-valued objective function in multiple ...
[109]
Surrogate Optimization in Simulink Design Optimization - MathWorks
This example shows how to use surrogate optimization in Simulink Design Optimization to optimize the design of a hydraulic cylinder.<|separator|>
[110]
Advancing CFD with AI: Surrogate Modeling Approaches in the ...
Aug 19, 2025 · AI-powered surrogate models, built from high-fidelity simulation data, offer a breakthrough: rapid simulation predictions at lightning speed–up ...
[111]
Cloud-Based Simulation Software Charting Growth Trajectories
Rating 4.8 (1,980) Mar 14, 2025 · The cloud-based simulation software market is experiencing robust growth, driven by the increasing need for faster, more cost-effective, ...