Fact-checked by Grok 2 weeks ago

Latin hypercube sampling

Latin hypercube sampling () is a technique used in statistics and computer experiments to generate a sample of parameter values from a multidimensional , ensuring that each variable's range is fully represented by dividing it into equally probable intervals and selecting one value at random from each interval, with values then permuted across variables to form the sample. Developed in 1979 by Michael McKay, Richard Beckman, and William Conover at to address in computer model outputs for applications like , LHS provides more efficient coverage of the input space compared to simple random sampling, particularly when evaluating expensive simulations with limited runs. By stratifying each independently and randomly pairing the strata, LHS reduces variance in estimates of means, variances, and functions, especially for monotonic response functions, making it an unbiased with superior precision in practice. In uncertainty and of complex systems, has become a method, enabling the propagation of input uncertainties through models to quantify output variability with fewer evaluations than traditional methods. It supports extensions such as among inputs, replication for , and with techniques like coefficients or for identifying influential variables. Widely applied in fields including nuclear disposal (e.g., performance assessments), environmental modeling, and engineering risk analysis, facilitates the estimation of cumulative distribution functions for key outputs like release rates. exhibits space-filling properties in high-dimensional spaces. Later variants, such as optimized or maximin , address challenges in further improving these space-filling characteristics.

Introduction

Definition and Basic Concept

Latin hypercube sampling (LHS) is a technique designed to generate a sample of values from a multidimensional , ensuring more coverage of the input compared to simple random sampling methods such as simulation. In LHS, for each of the k input s, the is divided into n equally probable intervals, and one value is randomly sampled from within each interval; these values are then combined across variables through random permutations to form an n by k sample matrix. This process guarantees that the resulting sample represents a of order n, where the of the points onto any single covers the full range of that without repetition in the intervals. The core advantage of LHS lies in its ability to avoid clustering of sample points, providing better exploration of the parameter space, particularly in high-dimensional settings where random sampling might leave regions undersampled. For instance, consider a simple two-dimensional case with n=4 and variables x and y, each ranging from 0 to 1. The range for x is divided into four s: [0, 0.25), [0.25, 0.5), [0.5, 0.75), and [0.75, 1.0]; similarly for y. One value is sampled from each for x (e.g., 0.1, 0.3, 0.6, 0.9) and for y (e.g., 0.2, 0.4, 0.7, 0.8), then permuted to pair them (e.g., (0.1, 0.4), (0.3, 0.8), (0.6, 0.2), (0.9, 0.7)). This arrangement ensures that the four points are spread across the unit square, with no two in the same horizontal or vertical , unlike crude random sampling where points might cluster in one corner.

Historical Development

An equivalent technique to Latin hypercube sampling was independently proposed in 1977 by Peteris Audze and Vilnis Eglājs as a method for . Latin hypercube sampling (LHS) was first introduced in 1979 by Michael D. McKay, Richard J. Beckman, and William J. Conover in their seminal paper titled "A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code," published in Technometrics. The authors proposed LHS as an efficient technique to generate input values for computer models, aiming to better estimate output statistics like means and variances compared to simple random sampling, particularly when evaluations are computationally expensive. This innovation drew from broader statistical traditions of to ensure more uniform coverage of the input . In the early 1980s, saw rapid adoption at , where researchers implemented it in software tools for designing computer experiments and analyzing complex simulation models. This uptake addressed key challenges in for engineering and scientific applications, such as nuclear reactor safety assessments, by reducing the number of required model runs while maintaining reliable estimates. The development of dedicated LHS software at Sandia, documented in technical reports from the period, facilitated its practical use in high-dimensional problems. The 1990s marked significant expansions of , with advancements in techniques and methods to control correlations among input variables. Michael Stein's 1987 analysis established large-sample properties and central limit theorems for , demonstrating its benefits over methods in certain scenarios. Complementing this, Ronald L. Iman and William J. Conover's 1982 distribution-free approach enabled the induction of specified rank correlations in samples, a critical enhancement for modeling realistic dependencies in inputs. By the 2000s, LHS integrated with optimization algorithms to create space-filling designs, improving uniformity and minimizing clustering in high-dimensional spaces. For instance, Rafał Stocki's 2005 method used optimization to enhance design reliability in , applying criteria like maximin distance to balance space-filling and projection properties. These developments solidified as a versatile tool in experimental design. has been widely adopted, with over 28,000 publications referencing the method across academic databases as of 2024, reflecting its proven robustness in uncertainty propagation and .

Mathematical Formulation

Construction of Latin Hypercube

Latin hypercube sampling constructs a sample by stratifying the range of each input to ensure even coverage across its distribution. For a problem with k dimensions () and n desired samples, the process begins by dividing the (CDF) of each into n equal-probability intervals, each spanning a probability mass of $1/n. This creates n non-overlapping strata per dimension, guaranteeing that the marginal distribution of each is represented proportionally in the sample. Next, for each dimension j = 1, \dots, k, a random value is selected from within each of the n strata. To avoid overlap in the joint , the assignment of these values to the n sample points is randomized using a . Specifically, a random \pi_j of the integers \{1, \dots, n\} is generated for dimension j, which dictates the order in which the strata are assigned to the sample indices i = 1, \dots, n. This ensures that in the final sample matrix, each row (corresponding to a ) contains exactly one value from each stratum, with no two samples sharing the same stratum in any . The columns of the matrix then represent the n joint sample vectors across all k dimensions. For variables following a uniform distribution on [0, 1], the sample values can be explicitly computed as X_{j,i} = \frac{\pi_j(i) + U_{j,i} - 1}{n}, where U_{j,i} \sim \text{Uniform}(0,1) is a uniform random variate that places the point randomly within its assigned stratum. This formula positions each X_{j,i} within the interval [(\pi_j(i)-1)/n, \pi_j(i)/n]. For arbitrary continuous distributions, the uniform quantiles are transformed via the inverse CDF F_j^{-1} of the j-th variable: X_{j,i} = F_j^{-1}\left( \frac{\pi_j(i) + U_{j,i} - 1}{n} \right), ensuring the samples respect the target marginal distributions. The following pseudocode illustrates a basic implementation in a programming language like Python, assuming access to a random number generator and inverse CDF functions:
import numpy as np
from scipy.stats import uniform  # For example uniform; replace with target dist's ppf (inverse CDF)

def lhs_sample(k, n, dists=None):
    # dists: list of scipy.stats distributions or None for uniform [0,1]
    if dists is None:
        dists = [uniform(loc=0, scale=1)] * k
    
    # Initialize k x n matrix
    X = np.zeros((k, n))
    
    for j in range(k):
        # Generate random permutation pi_j of 1 to n
        pi = np.random.permutation(n) + 1  # pi in {1, ..., n}
        
        # Generate n uniform random variates
        U = np.random.uniform(0, 1, n)
        
        # Compute uniform quantiles
        quantiles = (pi + U - 1) / n
        
        # Transform to target distribution
        X[j, :] = dists[j].ppf(quantiles)
    
    return X.T  # Return n x k matrix for joint samples
This algorithm first generates independent stratified samples per dimension via permutations and then assembles them into the joint matrix, with the transpose yielding the conventional n \times k form where rows are sample vectors. The method draws from the combinatorial foundation of Latin squares, extended to hypercubes for multidimensional stratification.

Probability Distribution and Sampling

Latin hypercube sampling (LHS) can be extended to sample from arbitrary continuous probability distributions by transforming the uniform random variables produced in the basic construction. Specifically, for a random variable X with cumulative distribution function (CDF) F, each uniform sample point U from the [0,1] interval is mapped to X = F^{-1}(U), where F^{-1} is the inverse CDF. This inverse transform sampling ensures that the resulting samples follow the target distribution F, as the equal-probability strata in the uniform space correspond directly to equal-probability quantiles in the target distribution. The for each dimension in an LHS sample matches the target exactly in finite samples, with precisely one falling into each of the n equal-probability bins defined by the quantiles of F. This guarantees representativeness across the full of the , avoiding under- or over-sampling of tails or central regions. The empirical CDF of the ordered sample values aligns with the target CDF at the midpoints of the uniform strata, given by the probabilities \frac{i - 0.5}{n} for i = 1, \dots, n: \hat{F}(x_{(i)}) = \frac{i - 0.5}{n}, where x_{(i)} is the i-th ordered sample. This property provides unbiased estimation of the marginal CDF and enhances efficiency for quantile-based analyses compared to unstratified methods. In the joint distribution, LHS induces a structured dependence among dimensions due to the Latin hypercube constraint, which prevents any two samples from occupying the same stratum in more than one dimension. This dependence is not independence but allows for controlled correlations through the selection of permutations during sample generation; for instance, techniques such as the Iman-Conover method can adjust the sample to achieve specified rank correlations between -1 and 1 while preserving marginals. The resulting samples are exchangeable—the joint distribution is invariant under permutation of the observations—but not independent, owing to the fixed one-per-stratum allocation. This exchangeability, combined with the negative dependence in certain pairs of coordinates, leads to variance reduction for estimators of means and other functionals, particularly when the integrand shows additive structure across dimensions.

Properties

Space-Filling Characteristics

Latin hypercube sampling () exhibits strong space-filling properties by design, ensuring a more of points across the multidimensional compared to simple random sampling. In one-dimensional projections, guarantees that each interval (stratum) of equal probability contains exactly one sample point, resulting in a for every dimension. For two-dimensional projections, the points form a structure, where no two points occupy the same row or column in the discretized grid, preventing repeated pairs and promoting even pairwise coverage. These projection properties contribute to LHS minimizing clustering in the , as evidenced by lower discrepancy measures relative to random sampling. Specifically, the wrap-around L_2-discrepancy of LHS designs has a significantly lower and variance than that of corresponding random designs, indicating superior uniformity in the average case. This reduction in clustering helps avoid large voids or overdense regions that are common in random samples, enhancing overall space exploration. To quantify space-filling uniformity, metrics such as the maximin (which maximizes the minimum inter-point ) and (which minimizes the sum of inverse distances) are commonly applied; standard LHS performs well on these criteria on average, though optimized variants can further improve scores. In k dimensions, LHS ensures coverage of all one-dimensional slices defined by the , with exactly one point per per dimension, thereby reducing voids across the by systematically partitioning and sampling the space. Visually, in two dimensions, LHS point clouds appear as a grid-like spread with points evenly dispersed without overlaps in row-column bins, contrasting with random sampling's tendency toward clustering in corners or along edges; this difference becomes more pronounced in three dimensions, where LHS maintains stratified coverage while random points may leave substantial unexplored volumes.

Statistical Properties

Latin hypercube sampling () provides probabilistic guarantees for the convergence of estimators derived from its samples, akin to those in simple random sampling. The sample constructed from an LHS follows a , converging in distribution to a multivariate with equal to the true and matching that of independent and identically distributed (i.i.d.) sampling. A Berry-Esseen-type bound establishes that this convergence occurs at a rate of O(1/\sqrt{n}), where n is the sample size, identical to the rate but with potentially superior constants, especially for smooth integrand functions. A primary advantage of LHS lies in its variance reduction properties for the estimator of the mean. For a function f with finite second moment, the variance of the sample mean \bar{Y} = n^{-1} \sum_{i=1}^n f(X_i) decomposes as \Var(\bar{Y}) \approx \frac{1}{n} \int [f(x) - \mu]^2 \, dx + \frac{1}{n} \sum_{i \neq j} \Cov(f(X_i), f(X_j)), where \mu = \mathbb{E}[f(X)]. The first term corresponds to the i.i.d. variance, while the covariance terms in LHS are controlled and generally positive yet smaller than those arising from unstratified random sampling, resulting in an overall lower variance. This reduction is particularly effective for additive models, where f(x) = \sum_{k=1}^d g_k(x_k), as LHS stratifies each marginal distribution uniformly, minimizing inter-sample correlations across dimensions and yielding a variance close to \frac{1}{n} \sum_{k=1}^d \Var(g_k(X_k)). Seminal results by demonstrate that LHS can attain near-optimal for broad classes of functions, including those exhibiting additive or low-effective-dimensional structure, outperforming simple random sampling even in high dimensions when n \gg d. Furthermore, as n grows large, the marginal uniformity of LHS ensures that the samples approximate asymptotically, facilitating reliable without the need for complex dependence adjustments in large-sample analyses.

Advantages and Limitations

Benefits over Random Sampling

Latin hypercube sampling (LHS) offers improved coverage of the multidimensional input space compared to simple random sampling by stratifying the range of each input variable into equal-probability intervals and ensuring exactly one sample falls into each interval per variable. This systematic stratification prevents the clustering or undersampling of particular regions that frequently occurs in random sampling, where samples may by chance concentrate in limited portions of the space, leading to biased or unreliable estimates. As a result, LHS reduces the number of samples required to obtain reliable approximations of model outputs, such as means or distributions, by promoting a more uniform representation across the entire parameter domain. In high-dimensional settings, LHS provides greater efficiency by enabling thorough exploration of the input space with far fewer evaluations than random sampling, which is essential for computationally intensive models where each run may take significant time or resources. Unlike random sampling, which can suffer from the curse of dimensionality—resulting in sparse coverage as dimensions increase—LHS maintains balanced stratification across all dimensions simultaneously, ensuring that projections onto any subset of variables also exhibit good space-filling properties. This makes LHS particularly valuable for in complex simulations, where random sampling might require exponentially more points to achieve comparable fidelity. Studies demonstrate that LHS can require 10-100 times fewer runs than simple random sampling to attain similar accuracy in uncertainty quantification tasks, depending on the model's structure and the statistic of interest; for instance, in empirical comparisons using computer codes, LHS with 16 samples yielded mean estimates with standard deviations approximately one-fourth those of random sampling, implying up to 16-fold efficiency in variance terms. A key example of LHS's variance reduction occurs with linear functions, where the method eliminates interval variance within each stratified marginal distribution, resulting in an unbiased estimation of the population mean with greatly reduced variance, as the expected value in each uniform stratum matches the true marginal mean. For multivariate linear functions that are additive across independent inputs, this property extends, yielding greatly reduced variance contribution from the linear components (approaching zero for large n) and asymptotically lower overall variance compared to random sampling. LHS also exhibits greater robustness relative to random sampling, being less sensitive to "lucky" or "unlucky" draws that can produce estimates in finite samples due to uneven coverage. By design, LHS bounds the sample variance such that it never exceeds that of random sampling for functions and remains controlled even for non-monotone cases, providing more consistent performance across multiple realizations.

Potential Drawbacks

One notable limitation of standard Latin hypercube sampling (LHS) is the potential introduction of spurious correlations between input variables due to the random permutation process used in construction. These induced dependencies can distort sensitivity analyses by creating artificial relationships that do not reflect the underlying model structure, particularly when variables are intended to be independent. Such correlations are not inherent to the method but arise from the lack of optimization in basic implementations, necessitating post-sampling checks like scatter plots to verify independence. Generating and scaling LHS designs for large sample sizes (e.g., n > ) incurs higher computational costs compared to simple random sampling, as the and steps become resource-intensive, especially in high-dimensional spaces. Standard LHS lacks inherent scalability, often requiring complete regeneration of samples when increasing size, which discards prior evaluations and amplifies expense for computationally demanding models. This issue is exacerbated by the curse of dimensionality, where space-filling properties degrade as the number of dimensions k greatly exceeds n, leading to poorer coverage and reduced effectiveness in exploring the input space. In applications involving non-smooth or functions, unoptimized can exhibit clustering of points, failing to achieve uniform coverage despite its stratified design, as critiqued in analyses from the emphasizing the method's reliance on assumptions for reliable performance. While extensions and optimization techniques can mitigate these risks, basic remains vulnerable to such challenges in complex scenarios.

Comparison to Other Sampling Methods

Versus Simple Random Sampling

Simple random sampling, often referred to as Monte Carlo sampling, generates points through independent and identically distributed uniform draws from the unit hypercube [0,1]^k. In high dimensions, this approach is prone to leaving large gaps in the sample space due to the curse of dimensionality, where the exponential growth in volume leads to poor coverage with a fixed number of points. Latin hypercube sampling (LHS) addresses these issues through , dividing each dimension into n equal-probability intervals and ensuring one sample in each interval per dimension, which guarantees coverage of all marginal quantiles. This results in superior space-filling properties and lower when estimating integrals of smooth functions compared to simple random sampling. Quantitatively, the star discrepancy of LHS samples is bounded by O\left(\sqrt{d/n}\right) with high probability, similar to simple random sampling, but LHS often provides superior empirical coverage in practice due to its stratified structure and induced negative dependence, particularly for moderate sample sizes and dimensions. For instance, in a 10-dimensional space, simple random sampling with a limited number of points may leave large portions of the hypercube unsampled, whereas LHS ensures every one-dimensional stratum is represented, providing more uniform coverage. In seminal work, McKay et al. (1979) demonstrated that achieves significant over random sampling in computer experiments; for the SOLA-PLOOP model, the standard deviation of the mean estimator under was approximately one-fourth that under random sampling, corresponding to a variance reduction factor of about 16, with reductions of 2-5 times observed across various test functions for other estimators.

Versus Other Stratified Methods

Latin hypercube sampling () differs from traditional stratified methods primarily in its approach to . While stratified typically applies univariate stratification along each dimension independently, achieves enhanced coverage by stratifying each and randomly pairing the strata across dimensions, ensuring representation in one-dimensional projections but not necessarily in joint projections. This makes particularly effective for exploring high-dimensional spaces, where simple independent stratification may lead to clustering. Compared to orthogonal Latin hypercubes or quasi-Monte Carlo methods like Sobol sequences, standard LHS offers greater simplicity in generation but may exhibit less uniformity in lower-dimensional projections. Orthogonal variants of LHS reduce correlations between factors but have poorer space-filling properties than standard LHS, and they are more computationally intensive to construct, especially for larger sample sizes. Sobol sequences, being deterministic low-discrepancy sequences, provide superior rates for in uniform settings due to their provable discrepancy bounds, but adapting them to non-uniform or correlated distributions requires additional transformations that complicate implementation. LHS demonstrates greater flexibility for incorporating correlated inputs compared to grid-based stratification techniques, as evidenced by benchmarks from the early 2000s that highlight its ability to maintain prescribed correlation structures while preserving marginal distributions. In contrast to full factorial designs, which require an exponential number of points (k^l for k factors at l levels) and become infeasible in high dimensions (e.g., beyond 5-10 variables), LHS scales linearly with sample size, enabling efficient exploration without exhaustive enumeration. For instance, in , LHS outperforms crude by better accounting for variable interactions, leading to more reliable identification of influential parameters in nonlinear models.

Algorithms for Generation

Standard Algorithm

The standard algorithm for generating Latin hypercube samples begins by selecting the number of samples n and the number of dimensions k, where the goal is to produce an n \times k of points that stratify the input space. For each dimension j = 1, \dots, k, the (CDF) F_j of the target distribution is divided into n equal-probability intervals, each of width $1/n, ensuring that the samples cover the full range of each . This guarantees that exactly one sample falls into each interval per dimension, promoting even coverage. Next, for each dimension j, a random permutation \pi_j of the integers \{1, 2, \dots, n\} is generated independently; this permutation assigns the strata randomly to the sample rows, preventing systematic alignment across dimensions. Then, for each row i = 1, \dots, n, an independent uniform random variable U_i \sim \text{Uniform}(0,1) is sampled to determine the position within the assigned stratum. The sample value in row i and column j is computed as X_{i j} = F_j^{-1}\left( \frac{\pi_j(i) + U_i - 1}{n} \right), where F_j^{-1} is the inverse CDF for dimension j. This centers the sample randomly within each stratum while respecting the target distribution. The algorithm's is O(kn \log n), dominated by the generation of k random permutations, each requiring O(n \log n) time via sorting-based methods. The output is an n \times k where each column marginally follows the specified and the rows form a Latin hypercube design. Below is for the algorithm in a Python-like format, assuming general inverse CDFs are available (e.g., via scipy.stats for common distributions). For distributions on [0,1], F_j^{-1}(p) = p; for normal, F_j^{-1}(p) = \sqrt{2} \cdot \text{erfinv}(2p - 1).
import numpy as np
from scipy.stats import norm  # Example for normal; replace with appropriate dist

def latin_hypercube_sample(n, k, dist_params=None):
    """
    Generate n x k Latin hypercube samples.
    dist_params: list of k distribution objects or params; default uniform [0,1]
    """
    if dist_params is None:
        dist_params = [None] * k  # Uniform [0,1] case
    
    # Step 1: Generate k independent random permutations of [0, 1, ..., n-1]
    perms = [np.random.permutation(n) for _ in range(k)]
    
    # Step 2: Generate n uniform [0,1] samples for intra-stratum positioning
    U = np.random.[uniform](/page/Uniform)(0, 1, n)
    
    # Step 3: Construct the sample matrix
    samples = np.zeros((n, k))
    for j in range(k):
        # Stratum indices (0-based)
        strata = perms[j]
        # Quantile positions: (stratum + U - 1)/n, but adjust to 0-based
        quantiles = (strata + U) / n  # Equivalent to (pi(i) + U -1)/n + 1/n, but uniform shift
        if dist_params[j] is None:
            # Uniform [0,1]
            samples[:, j] = quantiles
        else:
            # Example: standard normal (mean=0, std=1)
            # Replace with dist_params[j].ppf(quantiles) for general
            samples[:, j] = [norm](/page/Normal).ppf(quantiles)
    
    return samples
This implementation handles uniform and normal cases directly; for other distributions, replace the inverse CDF call (e.g., ppf method) accordingly.

Optimization Techniques

Optimization techniques for Latin hypercube sampling (LHS) aim to enhance the properties of standard designs by reducing inter-column correlations and improving space-filling characteristics, leading to more efficient computer experiments. These methods typically start from a random LHS and apply iterative algorithms to adjust the sample points while preserving the Latin hypercube structure. Key objectives include minimizing empirical correlations between variables to avoid spurious dependencies and maximizing the uniformity of point distribution in the design space. One prominent approach for correlation reduction involves selecting permutations of the rows to minimize the between columns. The Iman-Conover , originally developed to induce desired rank among input variables, can be adapted to achieve near-zero by targeting a with off-diagonal elements close to zero. This rearranges the sample values using rank-based permutations, ensuring that the marginal distributions remain unchanged while controlling pairwise dependencies. For instance, columnwise algorithms iteratively swap elements between pairs of columns to reduce the maximum absolute , providing an efficient way to decorrelate the . Space-filling criteria further optimize LHS by promoting even coverage of the input space, which is crucial for modeling and . The maximin criterion seeks to maximize the minimum distance between any two points in the design, formulated as \max \min_{i \neq j} \| \mathbf{x}_i - \mathbf{x}_j \| where \mathbf{x}_i and \mathbf{x}_j are distinct sample points. This approach, introduced in the context of distance-based designs, helps avoid clustering and ensures good projection properties in lower dimensions. Another criterion is the minimum potential energy, which minimizes the sum of inverse distances (or logarithms thereof) between points, analogous to repelling particles in a : \min \sum_{i < j} \frac{1}{\| \mathbf{x}_i - \mathbf{x}_j \|^p} for some p > 0, often with p=1 or logarithmic form to emphasize close pairs. These criteria are optimized using algorithms such as simulated annealing, which probabilistically accepts perturbations to escape local optima, or genetic algorithms that evolve populations of candidate designs toward better space-filling properties. Seminal work on orthogonal Latin hypercubes by Ye demonstrated that optimized designs can achieve near-orthogonality among columns, reducing estimation variance in linear models compared to random LHS. For example, these designs lower the variance of response predictors by ensuring uncorrelated projections, with empirical studies showing improvements in efficiency for computer experiments. Such optimizations are particularly valuable in high-dimensional settings, where random LHS may exhibit undesirable structure.

Applications

In Computer Experiments

Latin hypercube sampling (LHS) plays a central role in the of computer experiments (), serving as a space-filling that efficiently covers multidimensional spaces with minimal points. This approach is particularly valuable for constructing models, such as Gaussian processes, where the goal is to approximate expensive computational simulations. By stratifying each input dimension and ensuring even marginal coverage, LHS reduces the number of required model evaluations compared to full designs or simple random sampling, while maintaining low discrepancy in the sample . In uncertainty propagation, is frequently combined with methods to estimate global indices, including Sobol indices and those derived from the Fourier Amplitude Test (FAST). These techniques decompose the output variance into contributions from individual inputs and their interactions, enabling analysts to identify key uncertainty drivers in complex models. The stratified nature of enhances the accuracy of these variance-based measures by providing more uniform sampling than pure random methods, especially when sample sizes are limited. A practical for in computer experiments begins with generating the sample points across the parameter ranges, followed by evaluating the deterministic model at these locations to obtain response values. models are then fitted to these pairs, allowing or over the entire input space without additional simulations. This process is exemplified in climate modeling, where the 2006 IPCC Guidelines for National Greenhouse Gas Inventories describe as a sampling option for in emissions inventories, which can produce smoother output distributions with sample sizes of only a few hundred. In automotive crash simulations, LHS has been applied to vary geometric parameters, such as part thicknesses, in finite element models of structures to assess under parametric , enabling the construction of metamodels that predict energy absorption and deformation with reduced computational cost.

In Engineering and

In , Latin hypercube sampling () is employed to enhance and by efficiently propagating uncertainties in complex systems, particularly in high-stakes domains such as and . For instance, in protection systems, LHS facilitates dynamic reliability analysis by generating stratified samples of failure probabilities and repair times, enabling more accurate quantification of system unavailability compared to traditional methods. LHS was also prominently used in the 1996 performance assessment for the (WIPP), a U.S. waste repository, to sample 57 uncertain input variables in fluid flow models, generating complementary cumulative distribution functions for release over 10,000 years to support EPA compliance certification. Similarly, in applications, LHS supports for fault trees in and systems, reducing computational demands while capturing rare event tails in failure distributions. In , LHS addresses parameter uncertainty in groundwater modeling, allowing for robust predictions in hydrological studies amid variable aquifer properties. Recent 2020s research has utilized LHS to sample and recharge rates in transient models, improving calibration and uncertainty bounds for contaminant transport simulations in aquifers. This approach has been particularly valuable in assessing climate impacts on resources, where LHS ensures even coverage of multidimensional spaces to evaluate model without exhaustive enumeration. Within pharmaceuticals, LHS samples drug response surfaces to optimize dosing regimens, minimizing the need for extensive clinical trials by exploring pharmacokinetic-pharmacodynamic interactions under uncertainty. For anti-malarial drugs like piperaquine, has been used in simulations to assess pharmacokinetic-pharmacodynamic responses by sampling thousands of parameter sets, aiding in the evaluation of dosing regimens under uncertainty in patient covariates such as body weight and clearance rates. This method's enhances the reliability of virtual trials, supporting regulatory decisions on dose adjustments. In biomedical applications, LHS quantifies uncertainty in epidemiological models, notably for infectious disease outbreaks like , by sampling transmission parameters such as contact rates and periods. Studies have applied LHS combined with partial rank correlation coefficients to perform global on SEIR models, revealing key drivers of outbreak trajectories and informing intervention strategies. For dynamics, this has enabled efficient exploration of scenario spaces, assessing the impact of non-pharmaceutical measures on reproduction numbers. A representative example in involves oil , where varies and permeability to forecast under geological . In history matching workflows, iterative generates models that condition on , improving reserves estimation and enhancing recovery predictions in heterogeneous . This application leverages 's space-filling properties to reduce runs while maintaining statistical fidelity in .

Extensions and Variants

Optimized Latin Hypercube Sampling

Optimized Latin hypercube sampling refers to variants of the standard Latin sampling method where the initial design is refined through optimization objectives to enhance space-filling properties, uniformity, or the incorporation of variable dependencies. These optimizations address limitations in basic , such as clustering of points or poor coverage in projections, by applying criteria that promote even distribution across the hypercube. Common techniques include distance-based metrics and energy minimization, often implemented via iterative algorithms that adjust point positions while preserving the Latin hypercube structure. The maximin criterion is a key optimization approach that maximizes the minimum between any pair of points in the design, thereby improving overall uniformity and space-filling characteristics. Introduced by and Mitchell in 1995, this method uses to iteratively swap elements within the Latin structure until the criterion is optimized, resulting in designs that are asymptotically optimal under certain measures. For correlated Latin hypercube sampling, extensions allow control over dependencies between variables, such as correlations, by optimizing the design to match specified structures while maintaining marginal uniformity; this is particularly useful when variables exhibit natural relationships in the model. Another influential criterion is the Audze-Eglājs method, which minimizes the total of the point configuration by treating points as repelling charged particles in the , leading to highly uniform distributions. Originally formulated in 1977 for experimental , it has been adapted for through coordinate exchange or evolutionary optimization, ensuring low clustering and good properties. Post-2000 developments, including built-in optimizers in the lhs (such as geneticLHS and maximinLHS), leverage these criteria via evolutionary algorithms like genetic algorithms or to generate improved designs; for example, these can reduce L2-discrepancy measures by up to 30% compared to unoptimized in low-dimensional cases. In practice, for a 5-dimensional with 20 points, such optimizations demonstrably enhance one- and two-dimensional projections, showing reduced overlap and better coverage than standard . In high-stakes domains like modeling, optimized variants outperform their standard counterparts by providing more reliable parameter space exploration, which improves the accuracy of and estimates in simulations.

Progressive and Adaptive LHS

Progressive Latin Hypercube Sampling (PLHS) is a sequential sampling strategy that begins with a small initial sample size and incrementally adds points while maintaining the Latin hypercube properties across all subsets and their unions. This approach ensures maximum in one-dimensional projections as the sample grows, allowing analysts to start with a modest , such as n=10, and scale up to n=100 or more without regenerating the entire sample, thereby saving computational resources in iterative simulations. The algorithm for PLHS involves generating an initial Latin hypercube slice and then appending subsequent slices through optimized permutations that insert new rows and columns, preserving the stratified structure. This process uses a optimization to minimize disruptions to existing projections, enabling efficient handling with improved convergence and robustness over traditional one-shot Latin hypercube sampling. Adaptive variants of Latin hypercube sampling extend this by dynamically adjusting strata based on model outputs, such as adding points in regions of high or deviation to refine models. For instance, integration with shifts focus to failure-prone regions in reliability analysis, using the most probable point to guide sampling and reduce variance in estimates of . These methods iteratively evaluate outputs to prioritize informative samples, achieving target accuracies with fewer points than static designs, as demonstrated in surrogate-based optimization where initial sets are augmented until errors fall below 1%. In applications, PLHS supports uncertainty quantification in environmental simulations by allowing refinement without restarting analyses. Adaptive LHS finds use in processes, such as chemical production optimization, and extends to scenarios requiring responsive sampling like uncertainty propagation or in dynamic systems. Recent developments as of 2025 include quantization-based Latin hypercube sampling (QLHS) for handling dependent inputs via Voronoi , and the LHS-in-LHS expansion strategy for adding samples to existing LHS sets while preserving properties.

References

  1. [1]
    [PDF] A Comparison of Three Methods for Selecting Values of Input ...
    The third method is called here Latin hypercube sampling. It is an extension of quota sampling (Steinberg 1963), and it is a first cousin to the "random balance ...
  2. [2]
    [PDF] Latin Hypercube Sampling and the Propagation of Uncertainty in ...
    The following techniques for uncertainty and sensitivity analysis are briefly summarized: Monte Carlo analysis, differential analysis, response surface ...
  3. [3]
    A Comparison of Three Methods for Selecting Values of Input ... - jstor
    If I = 1, we have random sampling over the entire sample space. Latin Hypercube Sampling. The same reasoning that led to stratified sampling, ensuring that all ...
  4. [4]
    [PDF] A User's Guide to Sandia's Latin Hypercube Sampling Software
    Carlo sampling scheme. One such scheme, developed by McKay, Conover, and Beckman (1979), is. Latin hypercube sampling. Latin hypercube sampling selects n ...
  5. [5]
    Large Sample Properties of Simulations Using Latin Hypercube ...
    Latin hypercube sampling (McKay, Conover, and Beckman 1979) is a method of sampling that can be used to produce input values for estimation of expectations ...
  6. [6]
    A distribution-free approach to inducing rank correlation among ...
    Jun 27, 2007 · A distribution-free approach to inducing rank correlation among input variables. Ronald L. Iman Sandia National Laboratories, Albuquerque, NM ...
  7. [7]
  8. [8]
    Large Sample Properties of Simulations Using Latin Hypercube ...
    Latin hypercube sampling (McKay, Conover, and Beckman 1979) is a method of sampling that can be used to produce input values for estimation of expectations ...
  9. [9]
  10. [10]
  11. [11]
    [PDF] Latin Hypercubes and Space-filling Designs - arXiv
    Mar 12, 2022 · Latin hypercubes are represented by an n x k matrix, where each column is a permutation of n equally spaced levels. Space-filling designs seek ...
  12. [12]
    Large Sample Properties of Simulations Using Latin Hypercube ...
    Latin hypercube sampling (McKay, Conover, and Beckman 1979) is a method of sampling that can be used to produce input values for estimation of expectations ...
  13. [13]
    [PDF] 10 Advanced variance reduction - Art Owen
    Then §10.3 presents Latin hypercube sampling, a stratification method suitable for large or even unbounded dimension. We round out our mini-chapter on advanced ...
  14. [14]
    [PDF] arXiv:1505.02689v1 [stat.ME] 11 May 2015
    May 11, 2015 · Latin hypercube designs often produce spurious correlations unless corrected or iterated in some way (e.g. [33, 34, 9, 35]). Stratified ...
  15. [15]
    [PDF] Methods for Uncertainty and Sensitivity Analysis ... - DiVA portal
    In general it is a good practice to check that spurious correlations are not introduced during the LHS sampling procedure. One should also observe, that the ...
  16. [16]
    [PDF] Scalability on LHS Samples for Use in Uncertainty Analysis of Large ...
    In this paper, a modification of LHS is proposed, that considers the Scalability of the samples (LHS-S or. Scalable Latin Hypercube Sampling). ... imply very ...
  17. [17]
    What Latin Hypercube Is Not - ResearchGate
    Among various strategies for producing input parameter samples is Latin hypercube design (LHD). LHDs are generated by Latin hypercube sampling (LHS), a type of ...
  18. [18]
    Improvements to and limitations of Latin hypercube sampling
    Challenges associated with the use of PSS designs and their limitations are discussed.
  19. [19]
    [PDF] Lecture 11: High Dimensional Geometry, Curse of ... - cs.Princeton
    The curse of dimensionality means algorithms are harder to design in high dimensions, with running times exponential in the dimension. Dimension reduction is ...Missing: unsampled | Show results with:unsampled
  20. [20]
    None
    ### Summary of Upper Bound for Star Discrepancy of Latin Hypercube Samples
  21. [21]
    Latin Hypercube vs. Monte Carlo Sampling
    Latin Hypercube sampling (LHS) aims to spread the sample points more evenly across all possible values [7]. It partitions each input distribution into N ...
  22. [22]
    [PDF] A Note on Near-Orthogonal Latin Hypercubes with Good Space ...
    Orthogonal Latin hypercubes (OLHs) are generally inflexible with respect to run sizes and the numbers of factors, and do not guarantee desirable ...
  23. [23]
    a Comparison of Latin Hypercube and Quasi Monte Carlo Sampling ...
    May 10, 2015 · The methods compared are Monte Carlo with pseudo-random numbers, Latin Hypercube Sampling, and Quasi Monte Carlo with sampling based on Sobol sequences.
  24. [24]
    Extension of Latin hypercube samples with correlated variables
    A procedure for extending the size of a Latin hypercube sample (LHS) with rank correlated variables is described and illustrated.
  25. [25]
    Comparison of Factorial and Latin Hypercube Sampling Designs for ...
    LHS showed better prediction accuracy compared to FFD, since LHS considers the nonlinear impacts for a given number of treatments. It is always a good idea to ...Missing: dimensions | Show results with:dimensions
  26. [26]
    [PDF] 18-660: Numerical Methods for Engineering Design and Optimization
    Latin Hypercube Sampling (LHS). □ Two dimensional Latin hypercube sampling. x. 1 and x. 2 must be independent. Generate one-dimensional LHS samples for x.Missing: 2D | Show results with:2D
  27. [27]
    A study on algorithms for optimization of Latin hypercubes
    This paper is focused on sampling using optimized Latin hypercubes (OLH). In Section 2, the definition of an Latin hypercubes (LH) and the optimality ...<|separator|>
  28. [28]
    Minimax and maximin distance designs - ScienceDirect.com
    We develop the notions of minimax and maximin distance sets (designs). These are intended for use in the selection-of-sites problem.
  29. [29]
    Orthogonal Column Latin Hypercubes and Their Application in ...
    In this paper, a class of orthogonal Latin hypercubes that preserves orthogonality among columns is proposed. Applying an orthogonal Latin hypercube design to a ...
  30. [30]
    A fast optimal latin hypercube design for Gaussian process ...
    Based on the design of experiment (DOE), determining a reasonable statistical sample space is an important part for training the GP surrogate model. In this ...
  31. [31]
    Accelerating Gaussian Process surrogate modeling using ...
    This paper proposes a new sequential surrogate modeling by integrating a Compositional Kernel Learning (CKL) method for Gaussian process into a sequential ...
  32. [32]
    Latin hypercube sampling and the propagation of uncertainty in ...
    The goal of an uncertainty analysis is to determine the uncertainty in the elements of y that results from uncertainty in the elements of x.
  33. [33]
    [PDF] CHAPTER 3 UNCERTAINTIES
    of different sampling methods, including random Monte Carlo simulation and variations of Latin Hypercube. Sampling (LHS), which can produce 'smoother ...
  34. [34]
    [PDF] Effects of Latin hypercube sampling on surrogate modeling and ...
    Latin hypercube sampling is widely used design-of-experiment technique to select design points for simulation which are then used to construct a surrogate ...
  35. [35]
    A parametric metamodel of the vehicle frontal structure accounting ...
    Dec 13, 2022 · 10 High-Fidelity simulations are chosen by means of a Latin Hypercube Sampling. The sampled points in the parametric space are reported in ...
  36. [36]
    A new approach for dynamic reliability analysis of reactor protection ...
    Furthermore, Latin hypercube sampling (LHS) was employed to define a new ... Jung et al. A software fault tree analysis technique for formal requirement ...
  37. [37]
    An integrated uncertainty analysis method for the risk assessment of ...
    ... Latin Hypercube Sampling (LHS) method is proposed in this paper. In order to ... fault tree analysis (FTA) can be used to identify the causes of ...<|separator|>
  38. [38]
    new insights using groundwater modeling case study
    Jun 17, 2024 · In various hydrological studies, it is seen that a single ... Latin hypercube sampling (Loh, 1996), having an upper bound of 1300 ...
  39. [39]
    Improving the predictive skills of hydrological models using a ...
    Oct 7, 2022 · ... groundwater modeling, global hydrological simulations, droughts ... latin hypercube sampling technique (Audze 1977; McKay et al. 2000) ...<|separator|>
  40. [40]
    Making the Most of Clinical Data: Reviewing the Role of ...
    This can be done through simulation studies, ideally in a full multivariate framework (e.g., Latin Hypercube Sampling (6)), whereby all input parameters are ...
  41. [41]
    A primer on using mathematics to understand COVID-19 dynamics
    We apply standard uncertainty and sensitivity analysis, using Latin Hypercube Sampling ... Causes of backward bifurcations in some epidemiological models. Journal ...
  42. [42]
    Mathematical modeling and analysis of COVID-19 and typhoid fever ...
    Sep 26, 2025 · Susceptible individuals acquire COVID-19 through contact with individuals infected with COVID-19 ... Latin hypercube sampling and partial rank ...
  43. [43]
    History matching with iterative Latin hypercube samplings and ...
    ... Latin hypercube sampling holds. Our proposed algorithm ILHS is designed so ... It is possible to improve oil-reservoir simulation models by conditioning them on ...
  44. [44]
    [PDF] Latin hypercube sampling with dependence and applications in ...
    Abstract: In Monte Carlo simulation, Latin hypercube sampling (LHS) [McKay et al. (1979)] is a well-known variance reduction technique for vectors of ...<|separator|>
  45. [45]
    [PDF] Modification of the Audze-Egljs criterion to achieve auniform ...
    The Audze–Egl¯ajs (AE) criterion was developed to achieve a uniform distribution of experimental points in a hypercube. However, the paper shows that the AE ...
  46. [46]
    [PDF] optimization of Latin Hypercube Samples and subprojection properties
    Jul 25, 2013 · Latin Hypercube Sampling, which is an extension of stratified ... repartitions for one-dimensional projections, and not for projections to ...
  47. [47]
    Progressive Latin Hypercube Sampling: An efficient approach for ...
    We propose a new strategy, called Progressive Latin Hypercube Sampling (PLHS), which sequentially generates sample points while progressively preserving the ...
  48. [48]
    Adaptive Latin Hypercube Sampling for a Surrogate-Based ... - MDPI
    Nov 16, 2023 · In this study, we have developed an adaptive Latin hypercube sampling (LHS) method that generates additional sample points from areas with the highest output ...2.1. Latin Hypercube... · 4. Case Study · 5. Results And Discussion
  49. [49]
    Adaptive Importance Latin Hypercube Sampling | GT
    Feb 4, 2009 · The computational efficiency is much better than standard LHS sampling and improves as the failure probability decreases. The method is applied ...