Uncertainty analysis

Uncertainty analysis, often referred to as uncertainty quantification (UQ), is the systematic process of identifying, evaluating, and expressing the uncertainties associated with measurement results, model predictions, or simulations, characterizing the dispersion of values that could reasonably be attributed to the quantity being assessed.^[1] This involves quantifying the possible distribution of errors arising from various sources, such as measurement instruments, environmental factors, or modeling assumptions, to provide a realistic estimate of reliability rather than a single deterministic value.^[2]^[3] In scientific and engineering contexts, uncertainty analysis is essential for ensuring the validity of experimental results and computational models, enabling informed decision-making by assessing how variations in inputs propagate to outputs. It addresses doubts in quantitative risk assessments, policy analysis, and epidemiological studies by distinguishing between aleatory (inherent randomness) and epistemic (knowledge-based) uncertainties, thereby improving the robustness of predictions in fields like aerospace engineering and environmental modeling.^[4]^[5] For instance, in finite element analysis for biomedical engineering, it evaluates variability in material properties and boundary conditions to predict structural performance under real-world conditions.^[6] The core methodology follows standardized procedures, such as those outlined in the Guide to the Expression of Uncertainty in Measurement (GUM), which recommends a two-stage evaluation: Type A, based on statistical analysis of repeated observations to compute experimental standard deviations, and Type B, relying on scientific judgment, manufacturer specifications, or prior knowledge to estimate uncertainties from non-statistical sources.^[1] These component uncertainties are then combined using the law of propagation of uncertainty, typically through variance addition for independent inputs, accounting for sensitivities via partial derivatives in functional models: u_c^2(y) = \sum_{i=1}^N c_i^2 u^2(x_i), where c_i = \partial f / \partial x_i represents sensitivity coefficients.^[1]^[7] This framework promotes international comparability of results by requiring full disclosure of uncertainty sources and calculations.^[1] Beyond metrology, advanced techniques like Monte Carlo simulations and sensitivity analysis extend uncertainty propagation to complex, nonlinear systems, helping prioritize influential parameters and reduce overall uncertainty through targeted refinements. Recent advances as of 2025 have increasingly integrated UQ with machine learning and AI techniques to enhance predictive reliability in complex systems.^[8]^[9]^[10] In engineering design, such methods mitigate risks by quantifying forward uncertainty in outputs from input variabilities, supporting optimization and validation against experimental data.^[11] These approaches, grounded in probability and statistics, ensure that reported results include confidence intervals, fostering trust in applications ranging from NASA simulations to civil infrastructure assessments.^[12]^[13]

Fundamentals

Definition and Scope

Uncertainty analysis is the process of identifying, characterizing, and quantifying uncertainties arising from inputs, models, or measurements to evaluate their effects on outputs or conclusions.^[14]^[3] This involves systematically assessing limitations in knowledge or data to provide a more complete understanding of results, often distinguishing between random and systematic components.^[1] The foundations of uncertainty analysis trace back to early statistical work on error propagation, notably Karl Pearson's 1898 contributions to calculating probable errors in frequency constants and their influence on variation and correlation.^[15] These ideas evolved through the 20th century, culminating in standardized frameworks such as the Guide to the Expression of Uncertainty in Measurement (GUM), first published in 1995 and updated with minor corrections in 2008 by the Joint Committee for Guides in Metrology (JCGM).^[1] As of 2024, an introductory part (ISO/IEC Guide 98-1) was published, and in July 2025, the JCGM proposed a new definition of measurement uncertainty under review for future editions, including potential updates to the International Vocabulary of Metrology (VIM).^[16]^[17] The GUM, developed in response to a 1977 request from the Comité International des Poids et Mesures (CIPM) for harmonized uncertainty reporting, provides general rules for evaluation and expression applicable across measurement domains.^[1] Uncertainty analysis finds broad application in engineering for predicting system performance, in physics for validating experimental results, in environmental science for modeling pollutant dispersion or climate impacts, and in decision-making under uncertainty for policy formulation.^[4] It plays a critical role in risk assessment by quantifying potential variabilities in hazard predictions and in reliability engineering by informing design margins and failure probabilities.^[18]^[19] A foundational tool in this process is the law of propagation of uncertainty for a function y = f(x_1, \dots, x_n) with independent input quantities, where the combined standard uncertainty u_c(y) is given by

u_c(y) = \sqrt{ \sum_{i=1}^n \left( \frac{\partial f}{\partial x_i} \right)^2 u^2(x_i) },

known as the root-sum-square method, which approximates the output uncertainty based on input standard uncertainties u(x_i) and sensitivity coefficients.^[1]^[20]

Types of Uncertainty

In uncertainty analysis, uncertainties are broadly classified into aleatoric and epistemic types, reflecting fundamental distinctions in their origins and reducibility. Aleatoric uncertainty, also known as irreducible or stochastic uncertainty, arises from inherent randomness or variability in the system being studied, such as quantum effects, natural fluctuations, or stochastic processes that cannot be eliminated regardless of additional information.^[21] This type is typically modeled using probability distributions, like the Gaussian distribution, to capture the frequentist variability observed in repeated experiments under identical conditions.^[22] For instance, in weather forecasting, aleatoric uncertainty manifests in the unpredictable fluctuations of atmospheric patterns driven by chaotic dynamics.^[21] In contrast, epistemic uncertainty stems from a lack of knowledge or incomplete information, including measurement errors, model approximations, or insufficient data, and is potentially reducible through further investigation or improved methods.^[21] It represents subjective beliefs that can be updated as new evidence becomes available, often quantified in Bayesian frameworks where prior distributions evolve with data. An example is the uncertainty in readings from an uncalibrated sensor, which diminishes once calibration is performed to correct for systematic biases.^[21] This distinction is crucial: aleatoric uncertainty sets a fundamental limit on predictability, while epistemic uncertainty highlights opportunities for refinement in analysis or experimentation.^[22] Beyond these primary categories, uncertainty analysis recognizes other specific classifications that arise in practical applications, particularly in modeling and engineering contexts. Parameter uncertainty pertains to variability or imprecision in the input values or coefficients of a model, often due to estimation errors from limited observations.^[23] Model form uncertainty, a subset of epistemic uncertainty, arises from inadequacies in the structural representation of the system, such as missing physical processes or oversimplified assumptions in the mathematical formulation.^[24] For example, in structural engineering simulations, model form uncertainty may occur if turbulence effects are inadequately captured by the chosen equations.^[25] Scenario uncertainty, meanwhile, addresses unknowns related to future conditions or alternative pathways, such as varying environmental forcings or decision contexts, requiring the exploration of multiple plausible futures to bound potential outcomes.^[26] These classifications provide a framework for dissecting complex uncertainties, enabling targeted strategies in fields like physical experiments where both aleatoric variability in measurements and epistemic gaps in instrumentation can coexist.^[21]

Uncertainty in Physical Experiments

Sources of Experimental Uncertainty

Experimental uncertainty arises primarily from two categories: systematic errors, which introduce consistent biases, and random errors, which cause variability in repeated measurements. Systematic errors often stem from imperfections in the experimental setup or procedure, such as instrument calibration drift where a scale's zero point shifts over time due to wear, leading to persistent over- or underestimation of values. Environmental factors, like temperature fluctuations affecting the expansion of measurement scales or sensors, can also induce systematic biases by altering the physical properties of equipment during the experiment. Operator inconsistencies, including biased reading techniques or procedural variations, further contribute to these errors by introducing human-induced offsets that repeat across trials.^[27] Random errors, in contrast, manifest as unpredictable fluctuations due to inherent noise in the system, such as electronic noise in detectors or unresolved aleatoric effects from stochastic physical processes. These are typically quantified through repeated measurements, where the spread in results reflects sampling variability or transient disturbances like vibrations in the laboratory environment. In experiments, aleatoric uncertainties represent irreducible randomness, such as quantum fluctuations in particle detection, while epistemic uncertainties arise from incomplete knowledge of experimental conditions. Beyond these broad categories, specific sources include the finite resolution limits of instruments, where the smallest detectable change exceeds the true variation, leading to quantization errors in digital readouts.^[28] Uncertainties can also propagate from auxiliary measurements, for instance, the voltage input in a force sensor introducing error through its own calibration tolerances.^[27] Human factors, such as parallax errors from off-angle readings of analog dials, add further variability by depending on the observer's position and precision. A practical example occurs in tensile testing of materials, where grip slippage at the specimen-machine interface causes systematic underestimation of applied force, while strain gauge hysteresis—residual deformation in the gauge after load cycles—introduces random variability in elongation measurements.^[29]^[30] To ensure reliability, experimental uncertainties must comply with ISO standards, particularly the Guide to the Expression of Uncertainty in Measurement (GUM; JCGM GUM-1:2023), which emphasizes traceability to international metrology references for calibrating instruments and validating procedures.^[31]

Quantification and Propagation in Experiments

In experimental settings, uncertainty quantification begins with evaluating the standard uncertainty associated with individual measurements. Type A evaluation employs statistical methods based on repeated observations of the quantity under the same conditions. The standard deviation of the mean from these observations provides the Type A standard uncertainty, calculated as u(X_i) = \frac{s(X_i)}{\sqrt{n}}, where s(X_i) = \sqrt{\frac{1}{n-1} \sum_{k=1}^n (X_{i,k} - \bar{X}_i)^2} is the experimental standard deviation, \bar{X}_i is the arithmetic mean, and n is the number of observations.^[31] This approach captures random variations, such as those from instrument noise or environmental fluctuations in a laboratory measurement of voltage.^[31] Type B evaluation, in contrast, relies on non-statistical information, including prior knowledge, manufacturer specifications, calibration data, or assumed probability distributions for bounds. For instance, if an instrument's resolution provides a uniform distribution over an interval of width $2a, the standard uncertainty is u(x_i) = \frac{a}{\sqrt{3}}.^[31] This method addresses systematic effects, such as calibration drift, where repeated measurements are infeasible, and is often informed by datasheets or expert judgment in experimental protocols.^[31] Both Type A and Type B uncertainties contribute equally to the overall assessment, regardless of their origin.^[31] The GUM framework, updated as JCGM GUM-1:2023, standardizes this process, with ongoing discussions as of 2025 on refinements to uncertainty definitions.^[31] Propagation of these uncertainties occurs when the measurand is a function y = f(x_1, x_2, \dots, x_N) of multiple input quantities. The law of propagation of uncertainty combines the standard uncertainties via quadrature, yielding the combined standard uncertainty u_c(y) = \sqrt{ \sum_{i=1}^N \left( \frac{\partial f}{\partial x_i} \right)^2 u^2(x_i) }, assuming uncorrelated inputs; covariances are included if dependencies exist.^[31] This partial differentiation approach linearizes the function around the best estimates, suitable for small uncertainties in physical experiments like determining electrical resistance from voltage and current measurements.^[31] In practice, the combined standard uncertainty informs error bars on experimental plots, representing the dispersion of the result at a one-standard-deviation level. For example, in a laboratory plot of temperature versus time, error bars derived from u_c visualize the propagated uncertainty from sensor readings and environmental controls.^[31] The Guide to the Expression of Uncertainty in Measurement (GUM) framework standardizes this process, culminating in the expanded uncertainty U = k \cdot u_c(y), where the coverage factor k (typically 2 for approximately 95% coverage assuming normality) defines a confidence interval around the result.^[31]

Uncertainty in Mathematical Modeling

Sources of Modeling Uncertainty

Modeling uncertainty arises in mathematical models due to inherent limitations in representing complex real-world phenomena through theoretical constructs and approximations. These uncertainties differ from experimental ones by originating in the model's formulation rather than measurement processes. Key sources include parameter uncertainty, model structure uncertainty, input data uncertainty, and discretization errors in numerical implementations. Parameter uncertainty stems from the variability in estimated coefficients or values used within the model, often resulting from limited observational data or reliance on prior distributions for inference. For instance, in dynamical systems models, parameters like reaction rates in chemical kinetics may be calibrated from sparse datasets, leading to broad posterior distributions that reflect this variability. This type of uncertainty is particularly pronounced in inverse problems where parameters are inferred indirectly, amplifying errors in predictions.^[32] Model structure uncertainty originates from simplifying assumptions made during model development, such as choosing linear approximations over nonlinear ones in differential equations, which can introduce systematic biases. These errors occur because no single model can fully capture all relevant physics or interactions without trade-offs in complexity and computational feasibility. For example, approximating turbulent flows with Reynolds-averaged Navier-Stokes equations ignores finer-scale fluctuations, leading to discrepancies in simulated outcomes. Model structure uncertainty represents a form of epistemic uncertainty tied to incomplete knowledge of the appropriate model form.^[33]^[34] Input data uncertainty enters mathematical models through boundary conditions, forcing functions, or initial states derived from experimental measurements, propagating theoretical errors from empirical inputs. Even high-quality data carries inherent variability that affects model reliability, such as fluctuating environmental inputs in engineering simulations. This source bridges empirical and modeling domains but is distinctly modeled as an input perturbation within the theoretical framework.^[35] In numerical models, discretization errors arise from approximating continuous differential equations via finite difference, finite element, or other schemes, introducing truncation errors that depend on grid resolution and time steps. For example, coarser spatial grids in fluid dynamics simulations can lead to up to several percent deviations in key quantities like drag coefficients, with error magnitudes scaling as the grid size to a power determined by the method's order of accuracy. These errors are epistemic in nature, stemming from the approximation process rather than randomness.^[36]^[37] A prominent application of these uncertainties appears in climate models, where unresolved sub-grid processes—such as cloud formation or ocean mixing—are handled via parameterization schemes that introduce both structural and parametric uncertainties. For instance, convective parameterization in general circulation models can vary equilibrium climate sensitivity by 1–3°C, highlighting how simplifications in representing small-scale physics contribute to overall projection spreads. These schemes often rely on tunable parameters fitted to observations, compounding input data uncertainties from historical records.

Propagation Methods in Models

In mathematical modeling, uncertainty propagation refers to the process of quantifying how input uncertainties—such as those arising from model parameters or structural assumptions—affect the uncertainty in model outputs.^[1] This is essential for assessing the reliability of predictions in fields like engineering and environmental science, where models often involve nonlinear relationships between inputs and outputs.^[38] Propagation methods can be broadly classified into analytical and numerical approaches, each balancing computational efficiency with accuracy depending on the model's complexity.^[1] Analytical propagation methods rely on approximations, typically using Taylor series expansions around nominal input values to estimate output uncertainty. For a model output y = f(\mathbf{x}), where \mathbf{x} is a vector of inputs with known means and variances/covariances, the first-order Taylor expansion yields the propagated variance as:

\text{Var}(y) \approx \sum_{i=1}^n \left( \frac{\partial f}{\partial x_i} \right)^2 \text{Var}(x_i) + 2 \sum_{i=1}^n \sum_{j>i}^n \frac{\partial f}{\partial x_i} \frac{\partial f}{\partial x_j} \text{Cov}(x_i, x_j),

evaluated at the mean values of \mathbf{x}.^[1] This law of propagation of uncertainty assumes small input uncertainties and local linearity, making it computationally inexpensive but less accurate for highly nonlinear models.^[1] Perturbation methods extend this by treating input deviations as small perturbations around a base state, using first-order sensitivities (partial derivatives) to approximate output changes; for instance, the output perturbation \delta y \approx \sum_i \frac{\partial f}{\partial x_i} \delta x_i.^[39] These techniques are particularly useful for quick assessments in linear or mildly nonlinear systems, such as preliminary design stages.^[39] Numerical methods, in contrast, handle nonlinearities more robustly by simulating the model under varied input conditions. Monte Carlo simulation involves drawing random samples from the input probability distributions, evaluating the model for each sample to generate an empirical distribution of outputs, and estimating moments like mean and variance from the results.^[40] This approach provides unbiased estimates of the full output probability density function (PDF) without assuming linearity, though it requires many model evaluations for convergence, especially in high dimensions.^[40] To improve efficiency, Latin Hypercube Sampling (LHS) stratifies the input space into equally probable intervals and samples systematically across them, reducing the number of required simulations while maintaining low variance in estimates compared to simple random sampling. LHS was introduced as a stratified alternative for computer model uncertainty analysis, ensuring better coverage of the input domain. A key consideration in selecting propagation methods is the trade-off between computational cost and accuracy: analytical and perturbation approaches are fast (often requiring only a few model evaluations for sensitivities) but approximate, suiting linear cases, while Monte Carlo excels in nonlinear scenarios at higher costs, reducible by LHS.

Parameter Calibration and Output Analysis

Calibration Techniques

Calibration techniques in uncertainty analysis involve estimating model parameters to align predictions with observed data, thereby reducing epistemic uncertainty arising from incomplete knowledge of system behavior. These methods refine parameter values by minimizing discrepancies between model outputs and measurements, often incorporating statistical frameworks to quantify remaining uncertainties. Parameter uncertainty from modeling serves as the primary target, guiding the selection of calibration approaches that balance fit quality with robustness to data limitations.^[41] One foundational approach is least-squares optimization, which estimates parameters by minimizing the sum of squared residuals between observed data and model predictions. Formally, the parameter vector \theta is determined as \theta = \arg\min_{\theta} \sum_{i=1}^{n} (y_{\text{obs},i} - f(x_i; \theta))^2, where y_{\text{obs},i} are observations, f(x_i; \theta) is the model function, and n is the number of data points. This method is particularly effective for nonlinear models, providing efficient estimates with small datasets while enabling approximate confidence intervals for parameters. However, it requires iterative algorithms like Levenberg-Marquardt and can be sensitive to outliers or poor initial guesses.^[41]^[42] For probabilistic models, maximum likelihood estimation (MLE) extends this by maximizing the likelihood of observing the data given the parameters, explicitly accounting for measurement errors and noise distributions. In Gaussian process models, MLE calibrates kernel scale parameters to adapt to potential misspecifications, ensuring uncertainty estimates remain conservative rather than overly precise. This approach yields parameter estimates that incorporate error structures, improving reliability in settings with deterministic but noisy observations.^[43] To capture full posterior distributions of parameters under Bayesian frameworks, Markov Chain Monte Carlo (MCMC) methods sample from the joint posterior, integrating prior beliefs with likelihoods to propagate uncertainties. MCMC explores high-dimensional parameter spaces, generating chains that converge to the target distribution for robust inference, especially in complex computer models where traditional optimization may overlook multimodal posteriors. This technique enhances predictions by quantifying all sources of discrepancy, as demonstrated in applications like environmental simulations.^[44] A practical example is the calibration of the Soil and Water Assessment Tool (SWAT) hydrological model, where parameters governing soil hydrology are estimated using rainfall and streamflow data. In studies of watersheds like the Little Washita River basin, initial adjustments to the curve number (CN2) align surface runoff responses to observed rainfall events, followed by refinements to soil-related parameters such as groundwater delay (GW_DELAY) and baseflow recession (ALPHA_BF) to match hydrographs. This iterative process, often manual or automated, improves simulations of water balance components while highlighting parameter interactions.^[45] Parameter identifiability challenges, such as correlations leading to equifinality where multiple parameter sets yield similar outputs, are addressed through regularization techniques or informative priors to constrain the search space. The Generalized Likelihood Uncertainty Estimation (GLUE) methodology, for instance, evaluates ensembles of parameter sets using likelihood measures updated with priors, rejecting implausible combinations to enhance identifiability without assuming unique optima. However, GLUE has faced criticisms for lacking formal Bayesian coherence and potentially overestimating uncertainties, though its proponents argue it pragmatically handles model equifinality.^[46]^[47] Standards like ASTM E2935 support verification by providing equivalence testing protocols to confirm calibrated models align with reference processes within specified limits.^[48]

Output Uncertainty Assessment

Output uncertainty assessment evaluates the range of possible values in the final results of calibrated models or experiments, providing a measure of reliability for predictions or decisions. After parameter calibration, uncertainty in outputs arises from propagated input variabilities and inherent model limitations, necessitating methods to quantify and interpret this spread. Calibrated parameters serve as fixed or probabilistic inputs to forward uncertainty propagation techniques, enabling the generation of output distributions.^[49] Confidence intervals represent a primary tool for summarizing output uncertainty, derived from either analytical propagation of variances or empirical quantiles from Monte Carlo simulations. In Monte Carlo approaches, samples from input distributions are propagated through the model to obtain an empirical distribution of outputs, from which a 95% confidence interval is typically constructed as the interval between the 2.5th and 97.5th percentiles. This non-parametric method avoids assumptions about output normality and provides a direct probabilistic bound on the output variability.^[50] To attribute output uncertainty to specific inputs, variance-based global sensitivity analysis decomposes the total output variance into contributions from individual parameters and their interactions. The first-order Sobol index for input x_i, denoted S_i, quantifies the fraction of output variance attributable to x_i alone, calculated as:

S_i = \frac{\mathrm{Var}\left( \mathbb{E}[y \mid x_i] \right)}{\mathrm{Var}(y)}

where y is the model output. Higher-order indices capture interactions, offering a complete decomposition that aids in identifying dominant uncertainty sources post-calibration.^[51] Reporting output uncertainty emphasizes coverage probability—the likelihood that the true value falls within the assessed interval—and its implications for risk-based decisions. For instance, in risk analysis, uncertainty assessments inform decision thresholds by comparing predicted failure probabilities against acceptable risk levels, ensuring conservative margins where coverage is incomplete. Ensembles of models or simulations enhance robustness by averaging multiple realizations, reducing epistemic components and yielding tighter, more reliable uncertainty bounds.^[50] A key distinction in output assessment is between calibration uncertainty, stemming from parameter non-uniqueness during fitting, and prediction uncertainty, which encompasses additional sources like future inputs. Calibration uncertainty is isolated by propagating posterior parameter distributions, while total prediction uncertainty includes aleatory variations; failing to separate them can overestimate reliability.^[49] In structural reliability, uncertainty bands around simulation outputs visualize propagated variabilities, such as in loss estimation models where fractile bands about the mean loss highlight epistemic and aleatory effects on structural performance predictions. These bands guide design adjustments by quantifying the range of potential failure risks under parameter uncertainties.^[52]

Advanced Methods

Sensitivity Analysis

Sensitivity analysis within uncertainty analysis evaluates how variations in input parameters affect model outputs, enabling the identification of influential factors to guide model refinement and resource allocation. Local sensitivity analysis focuses on the linear response of outputs to small perturbations in inputs around a nominal point, typically computed via partial derivatives \frac{\partial y}{\partial x_i}, where y is the output and x_i an input parameter. This approach approximates the rate of change in output for infinitesimal input changes, providing insights into immediate impacts but assuming linearity and neglecting interactions or distributions beyond the local region.^[53] Global sensitivity analysis extends this by assessing input influences across their full ranges, accounting for nonlinearities, interactions, and input distributions. Sobol indices decompose the total output variance into contributions from individual inputs (first-order or main effects, S_i = \frac{V(E(y|x_i))}{V(y)}), groups of inputs (higher-order interactions, S_{ij} = \frac{V(E(y|x_i, x_j)) - V(E(y|x_i)) - V(E(y|x_j))}{V(y)}), and total effects (S_{T_i}, including all interactions involving x_i). These indices sum to unity for the variance decomposition, allowing prioritization of parameters that explain most uncertainty in outputs.^[54] Common methods for global sensitivity include Morris screening, which uses elementary effects—finite differences \frac{y(x_1, \dots, x_i + \Delta, \dots, x_k) - y(x)}{ \Delta }—across multiple trajectories to rank parameters by mean absolute effect (\mu^*) and nonlinearity (\sigma), efficiently screening large parameter sets at low computational cost. The Fourier Amplitude Sensitivity Test (FAST) employs spectral analysis, transforming inputs into sinusoidal functions with unique frequencies and computing sensitivity via Fourier coefficients of the output, isolating main effects and interactions through variance in frequency domains.^[55] In epidemiological modeling, sensitivity analysis identifies key parameters influencing outbreak predictions; for instance, in a dengue transmission model, the mosquito biting rate (B) and mosquito lifespan (\mu_m) exhibited the highest sensitivity indices (approximately +1 and -1.04, respectively) on the basic reproduction number \mathcal{R}_0, indicating that interventions targeting vector control could substantially reduce predicted infections.^[56] Sensitivity analysis integrates with uncertainty propagation by attributing output variance to specific inputs, prioritizing reducible epistemic uncertainties (e.g., those from parameter ignorance) for targeted data collection over irreducible aleatory ones. This decomposition targets output uncertainty from propagation methods, focusing attribution. However, high-dimensional problems pose computational challenges, as estimating indices like Sobol's requires O(k N) model evaluations ( k inputs, N samples), often mitigated by grouping similar factors or surrogate models to reduce effective dimensionality.^[57]^[58]

Bayesian Uncertainty Quantification

Bayesian uncertainty quantification provides a probabilistic framework for incorporating prior knowledge and updating beliefs with data to assess both parameter and predictive uncertainties in complex models. At its core, this approach relies on Bayes' theorem, which yields the posterior distribution of parameters \theta given data y as

p(\theta \mid y) \propto p(y \mid \theta) p(\theta),

where p(y \mid \theta) is the likelihood and p(\theta) is the prior distribution reflecting initial beliefs about the parameters. This posterior encapsulates parameter uncertainty by representing the updated probability distribution over \theta, while predictive uncertainty arises from integrating over this distribution to obtain the posterior predictive distribution p(\tilde{y} \mid y) = \int p(\tilde{y} \mid \theta) p(\theta \mid y) \, d\theta, which accounts for both aleatoric variability in new observations \tilde{y} and epistemic uncertainty in \theta.^[59] In settings with multi-level uncertainties, such as those involving spatial data, hierarchical Bayesian models extend this framework by structuring parameters across levels, allowing shared information to propagate through the hierarchy. For instance, in disease mapping, random effects can model unstructured heterogeneity and structured spatial correlations using pairwise difference priors, enabling "borrowing strength" from neighboring areas to stabilize estimates in regions with sparse data. This approach quantifies uncertainties at multiple scales, producing posterior distributions that capture both local variability and global patterns.^[60] To compute these posteriors, Markov chain Monte Carlo (MCMC) methods are commonly employed, but alternatives like variational inference and Gibbs sampling offer scalable options for high-dimensional problems. Variational inference approximates the posterior by optimizing a simpler distribution to minimize the Kullback-Leibler divergence, providing faster inference at the cost of some accuracy compared to MCMC, while Gibbs sampling iteratively samples from conditional distributions to explore the joint posterior. These techniques facilitate uncertainty quantification in large-scale applications by generating samples for estimating credible intervals.^[61] A practical example is the calibration of climate models, where Bayesian updating integrates observational data to refine parameters like equilibrium climate sensitivity (ECS). In the Pathfinder model, priors informed by CMIP6 and IPCC AR6 assessments are updated with historical CO₂ and global mean surface temperature data from 1750–2021, yielding posterior estimates such as ECS = 3.3 ± 0.7 K (1σ) and transient climate response (TCR) = 1.9 ± 0.3 K (1σ), quantifying remaining uncertainties in projections.^[62] Model uncertainty, arising from competing model structures, is addressed through Bayesian model averaging (BMA), which computes predictions as a weighted average over plausible models, with weights given by their posterior model probabilities. This method avoids overconfident inferences from model selection by explicitly accounting for uncertainty across the model space, leading to improved out-of-sample performance. BMA is particularly advantageous over frequentist methods in small-sample scenarios, where it leverages priors to achieve narrower credible intervals and higher precision with fewer observations—often requiring only 2–3 times the number of parameters in data, compared to 4–5 for maximum likelihood approaches—thus enhancing reliability in data-limited settings.^[63]^[64]