Fact-checked by Grok 2 weeks ago

Partial least squares regression

Partial least squares regression (PLSR), also known as projection onto latent structures, is a multivariate statistical technique that models the linear relationship between a predictor X (containing independent variables) and a response Y (containing dependent variables) by extracting latent variables that maximize the between X and Y, thereby addressing issues like , , and high dimensionality in the data. Originating from econometric path modeling developed by Herman Wold in the 1960s, PLSR was adapted and popularized in by Svante Wold and Harald Martens in the late 1970s and 1980s, evolving into a versatile tool for predictive modeling across various fields. PLSR generalizes both (PCA), which decomposes X to explain its variance, and multiple linear regression (MLR), which directly relates X to Y but struggles with collinear predictors; in contrast, PLSR constructs latent variables that prioritize predictive relevance for Y over mere explanation of X. The method employs the nonlinear iterative partial least squares (NIPALS) algorithm, which iteratively deflates X and Y to extract successive orthogonal components, or "latent vectors," until the desired number of components is reached or model criteria are met. This process yields regression coefficients for prediction, variable importance scores (e.g., VIP), and biplots for interpretation, making it robust for scenarios with more variables than observations. One of the primary advantages of PLSR is its ability to handle ill-conditioned matrices common in high-dimensional settings, such as or , where traditional MLR fails due to or instability. It also supports multiple responses in Y, enabling simultaneous modeling of several outcomes, and provides diagnostic tools like cross-validation to assess model quality and avoid overparameterization. Applications span (e.g., quantitative structure-activity relationships in ), , , and , where PLSR variants like behavioral PLS are used to correlate activity patterns with cognitive tasks. Despite its strengths, PLSR assumes and can be sensitive to outliers, often requiring preprocessing like centering and scaling of variables.

Introduction and Core Concepts

Definition and Motivation

Partial least squares regression (PLSR), also known as partial least squares (PLS), is a multivariate statistical technique that models the relationships between of predictor variables \mathbf{X} and of response variables \mathbf{Y} by extracting latent variables that maximize the between projections of \mathbf{X} and \mathbf{Y}. This approach integrates features of , which focuses on variance in \mathbf{X}, and multiple regression, which predicts \mathbf{Y} from \mathbf{X}, to create components relevant to both sets of variables. Unlike traditional methods, PLSR constructs these latent variables iteratively, ensuring they capture the strongest linear relationships while reducing the impact of irrelevant variation. The primary motivation for PLSR arises from the limitations of ordinary least squares (OLS) regression in handling complex datasets, particularly those exhibiting among predictors, high dimensionality where the number of variables greatly exceeds the number of observations (p \gg n), and substantial noise. In such cases, OLS often fails due to unstable coefficient estimates and , as the of \mathbf{X} becomes singular or ill-conditioned. PLSR addresses these issues by performing dimension reduction that prioritizes directions of maximum with \mathbf{Y}, thereby providing robust predictions even in noisy environments. PLSR was introduced by Herman Wold in the as an econometric method and adapted for in the 1970s as a solution to calibration challenges in , where relating instrumental responses to chemical concentrations is complicated by correlated measurements. A key application of PLSR lies in predictive modeling for domains with inherently collinear predictors, such as , where adjacent wavelengths in spectral data are highly correlated, leading to redundant information that confounds standard . By deflating the data through successive latent components, PLSR simplifies these high-dimensional inputs while preserving predictive power, making it ideal for quantitative analysis in fields like near-infrared for . This method extends the exploratory capabilities of into a supervised suited for tasks.

Core Idea

Partial least squares (PLS) regression operates on a geometric foundation that seeks to identify directions in the predictor space (X) and response space (Y) which capture the strongest linear relationships between the two datasets. At its core, PLS constructs successive pairs of weight vectors, denoted as \mathbf{w} for X and \mathbf{c} for Y, such that the resulting score vectors \mathbf{t} = X\mathbf{w} and \mathbf{u} = Y\mathbf{c} exhibit maximum covariance. This maximization occurs under the constraint that the weight vectors have unit length (\|\mathbf{w}\| = 1, \|\mathbf{c}\| = 1), ensuring that the projections represent directions of highest correlation rather than mere scaling. Geometrically, this process aligns the data by finding hyperplanes in the high-dimensional spaces of X and Y that are as close as possible, prioritizing the shared variance that links predictors to responses. The objective of this search is formally expressed as maximizing the covariance between the scores: \max_{\mathbf{w}, \mathbf{c}} \operatorname{Cov}(\mathbf{t}, \mathbf{u}) = \max_{\mathbf{w}, \mathbf{c}} \mathbf{w}^T X^T Y \mathbf{c}, subject to the unit norm constraints. This formulation, rooted in the NIPALS algorithm developed by Herman Wold, transforms the original variables into latent components that successively explain the covariance structure, starting with the pair that captures the largest shared variation. Unlike scalar products in simpler regressions, this eigenvector-like problem identifies orthogonal directions iteratively, with each new pair deflating the residual matrices to remove the influence of prior components. To extract these components, PLS proceeds through deflation steps: after computing the first score pair (\mathbf{t}_1, \mathbf{u}_1), the loadings are calculated as \mathbf{p}_1 = X^T \mathbf{t}_1 / (\mathbf{t}_1^T \mathbf{t}_1) and \mathbf{q}_1 = Y^T \mathbf{u}_1 / (\mathbf{u}_1^T \mathbf{u}_1), and the matrices X and Y are updated by subtracting the rank-one approximations \mathbf{t}_1 \mathbf{p}_1^T and \mathbf{u}_1 \mathbf{q}_1^T, respectively, yielding residuals that are orthogonal to the previous scores. This verbal description illustrates a stepwise orthogonalization, where each iteration uncovers deeper layers of , building a akin to a tailored focused on inter-block relationships rather than intra-block variance. The process continues until the desired number of components is reached or the residuals are negligible, providing a geometric ladder that ascends from dominant to subtle covariances. In contrast to (PCA), which identifies directions in X that maximize only the variance within X itself, PLS explicitly incorporates information from Y to guide the projection directions, ensuring that the latent variables are predictive and relevant to the responses. This Y-aware maximization addresses scenarios where X exhibits but strong predictive signals toward Y, making PLS particularly suited for high-dimensional data with noisy or correlated predictors.

Mathematical Foundation

Underlying Model

Partial least squares (PLS) regression approximates a linear for relating predictor variables to response variables, particularly in scenarios with or high dimensionality. The model decomposes the centered data matrices X (of dimensions n \times p, where n is the number of observations and p the number of predictors) and Y (of dimensions n \times m, where m is the number of responses) into latent components that capture the shared structure between them. The underlying equations are: \begin{align*} X &= T P^T + E, \\ Y &= U Q^T + F, \end{align*} where T and U are the score matrices (both n \times A, with A denoting the number of latent components), P is the X-loading matrix (p \times A), Q is the Y-loading matrix (m \times A), and E and F are the residual matrices capturing unexplained variation in X and Y, respectively. The scores in T represent projections of the predictors onto the , while those in U do the same for the responses. The scores are related through a U \approx T B, where B is a containing the correlations (or regression coefficients) between successive pairs of score vectors t_a and u_a for each component a = 1, \dots, A. This relationship links the latent structures of X and Y, allowing PLS to model predictive associations. The model assumes linear relationships between the original variables and the latent components, with the score vectors in T being mutually orthogonal (T^T T = I) and those in U approximately orthogonal (U^T U \approx I). After extracting each component, the residuals E and F are deflated by subtracting the contribution of the current latent variables, enabling sequential extraction of subsequent components. In this framework, the latent variables embodied in the scores T and U capture the or shared variance between X and Y, facilitating while preserving predictive information, whereas the residuals E and F represent noise or structure orthogonal to the model.

Objective Function

The objective function of partial least squares (PLS) regression seeks to extract successive latent components that capture the structure between the predictor X and the response Y. For the a-th component, the goal is to maximize the between the score vectors t_a = X w_a and u_a = Y c_a, which is given by \operatorname{cov}(t_a, u_a) = w_a^T X^T Y c_a / (n-1), where n is the number of observations, subject to the constraints \|w_a\| = 1 and \|c_a\| = 1. This criterion identifies weight vectors w_a and c_a that align the directions of maximum variance explanation in X and Y. To extend to multiple components, PLS proceeds iteratively: after extracting the pair (t_a, u_a), the matrices are deflated to remove the contribution of this component, ensuring subsequent components capture . Specifically, the predictor is updated as X \leftarrow X - t_a p_a^T, where the loading p_a = X^T t_a / t_a^T t_a, and similarly for the response , Y \leftarrow Y - u_a q_a^T, with q_a = Y^T u_a / u_a^T u_a. This process enforces among the score vectors \{t_a\} and \{u_a\} across components, preventing redundancy in the . The regression aspect of PLS combines the extracted components to form predictions. Let W = [w_1, \dots, w_A], P = [p_1, \dots, p_A], and Q = [q_1, \dots, q_A] denote the matrices of weights and loadings for A components; the coefficient matrix is then B = W (P^T W)^{-1} Q^T, yielding the predicted responses \hat{Y} = X B. The number of components A is typically selected via cross-validation to balance model complexity and predictive accuracy, minimizing error on held-out data.

Computation

Algorithms Overview

Partial least squares (PLS) regression employs a variety of computational to extract latent variables that maximize the between predictor and response matrices, addressing challenges like and high dimensionality. Iterative methods, such as the Nonlinear Iterative Partial Least Squares (NIPALS) , offer flexibility in handling diverse structures by sequentially deflating the matrices after each component , making them suitable for exploratory analyses. In contrast, direct methods based on (), like the SIMPLS , provide numerical and efficiency by solving the through eigendecomposition of matrices, avoiding iterative approximations and reducing computational overhead for large datasets. These approaches differ in their balance of interpretability and speed, with iterative methods being more transparent but potentially slower, while direct methods prioritize precision in factor estimation. PLS algorithms are categorized into modes based on the response structure: PLS1 for univariate responses (single Y variable), which simplifies the deflation step by focusing on scalar projections, and PLS2 for multivariate responses (multiple Y variables), which extends the maximization to joint response modeling. algorithms, used in nonlinear extensions of PLS (e.g., PLS), represent using kernel matrices for and nonlinear modeling, while linear variants like standard NIPALS and SIMPLS operate directly on the original matrices. This distinction allows kernel-based implementations to handle kernel tricks for nonlinear extensions without explicit feature mapping, though both modes maintain the core PLS objective of predictive relevance. Convergence in iterative algorithms like NIPALS is achieved by repeating projections until the scores and weights reach a fixed point, typically monitored via a small change threshold (e.g., 10^{-12}) or a maximum limit (e.g., 200), ensuring of residuals across components. However, NIPALS can exhibit slow in ill-conditioned data and non-uniqueness in weight vectors due to and sign ambiguities, which may affect without normalization constraints. Direct SVD-based methods circumvent these issues by providing unique decompositions, though they require more for operations. The number of components in PLS models is selected using criteria that balance fit and prediction: R^2 measures the proportion of explained variance in the response (or predictors), favoring models that capture cumulative variation without overfitting, while Q^2 assesses predictive relevance through cross-validation (e.g., via predicted residual sum of squares, PRESS); values greater than 0 indicate predictive relevance, with higher values suggesting better prediction, often selected via methods like the van der Voet test. These metrics guide practical implementation by identifying the "knee" in validation plots, prioritizing generalization over mere goodness-of-fit.

NIPALS Algorithm

The Nonlinear Iterative Partial Least Squares (NIPALS) algorithm is the foundational for computing PLS components, particularly suited for the general multivariate case (PLS2) where the response matrix Y has multiple columns. It extracts successive latent variables by maximizing the between X-scores (t) and Y-scores (u) through iterative projections and deflations. The algorithm begins by centering and scaling the predictor X (n × p) and response Y (n × q). For each latent component h (up to A components), the process initializes Y-scores u_h = Y v_h (often starting with a column of Y or random), then iterates: X-weights w_h = X^T u_h / ||X^T u_h||_2 (normalized to unit norm); X-scores t_h = X w_h; Y-weights c_h = Y^T t_h / t_h^T t_h; update u_h = Y c_h; repeat until convergence (e.g., change in t_h < 10^{-12}). Once converged, compute Y-loadings q_h = Y^T t_h / t_h^T t_h (or from c_h in some variants); X-loadings p_h = X^T t_h / t_h^T t_h. Deflate residuals: X ← X - t_h p_h^T; Y ← Y - u_h q_h^T (or t_h c_h^T in simplified forms), ensuring orthogonality of successive components. After A components, regression coefficients are B = W (P^T W)^{-1} Q^T, where W, P are matrices of weights and X-loadings, Q of Y-loadings. Predictions are Ŷ = X B + B_0, with B_0 from means. For a single component, pseudocode is:
Initialize: Center and scale X and Y; select initial u (e.g., first column of Y)
Loop until convergence:
  w = (X^T u) / ||X^T u||_2
  t = X w
  c = (Y^T t) / (t^T t)
  u = Y c
q = (Y^T t) / (t^T t)
p = (X^T t) / (t^T t)
Deflate: X = X - t p^T; Y = Y - t c^T  (or u q^T)
Store: w, t, c, p, q
Repeat for multiple components on deflated data. This method's iterations ensure covariance maximization but can be computationally intensive for large matrices.

PLS1 Variant

The PLS1 variant of partial least squares regression is specifically designed for scenarios where the response variable Y is univariate, meaning it consists of a single scalar response vector of length n (the number of observations). This adaptation simplifies the general by eliminating the need for iterative updates to the Y scores within each component extraction, as the covariance maximization between X and Y can be computed directly. Unlike the multivariate case, PLS1 treats Y as a single column, allowing for efficient deflation and loading computations tailored to univariate prediction tasks. The algorithm begins by centering and scaling the predictor matrix X (n × p, where p is the number of variables) and the response vector y to ensure zero mean and unit variance. Initialization sets the Y scores u to y itself. For each latent component h (up to a maximum A components), the process extracts weights and scores as follows: the X weights vector w_h is computed as the normalized projection w_h = X^T u / ||X^T u||_2, ensuring ||w_h||_2 = 1; the X scores t_h are then t_h = X w_h; the Y loading q_h is directly q_h = y^T t_h / t_h^T t_h; and the inner product for normalization is simplified since no further Y weight iteration is required. This direct computation of q_h leverages the univariate nature of y, avoiding the matrix operations needed for multivariate Y. The X loadings p_h are p_h = X^T t_h / t_h^T t_h. Deflation then updates the residuals: X ← X - t_h p_h^T and y ← y - t_h q_h, removing the contribution of the current component while preserving orthogonality. After extracting A components, the regression coefficients are derived from the stored matrices: let W be the n × A matrix of weights [w_1, ..., w_A], P the p × A matrix of loadings [p_1, ..., p_A], and q the A × 1 vector of Y loadings [q_1, ..., q_A]. The coefficient vector b (p × 1) is given by b = W (P^T W)^{-1} q, which provides a least-squares fit within the latent space. The predicted response is then ŷ = X b + b_0, where b_0 is the intercept computed from the means of X and y to account for centering. This formulation ensures that the model minimizes the reconstruction error for both X and y while maximizing their covariance. PLS1 is particularly common in calibration problems, such as spectroscopic analysis where a single property (e.g., concentration) is predicted from multivariate spectra, due to its computational efficiency and robustness to collinearity in high-dimensional X. For a single component (A=1), the pseudocode is:
Initialize: Center and scale X and y; u = y
w = (X^T u) / ||X^T u||_2
t = X w
q = (y^T t) / (t^T t)
p = (X^T t) / (t^T t)
Deflate: X = X - t p^T; y = y - t q
Store: w, t, p, q
To extend to multiple components, repeat the process A times on the deflated residuals, accumulating W, P, and q, then compute b as above. This iterative deflation ensures successive components capture orthogonal variance, with the number A often selected via cross-validation to balance fit and prediction error.

Variants and Extensions

Orthogonal PLS (OPLS)

Orthogonal partial least squares (OPLS) is a variant of PLS that partitions the systematic variation in the predictor matrix X into predictive variation (correlated with the response Y) and orthogonal variation (uncorrelated with Y). This separation enhances model interpretability by isolating noise and irrelevant systematic variation into orthogonal components, improving diagnostics such as R² and Q² metrics without affecting predictive performance. OPLS is particularly useful in applications like and , where distinguishing relevant signals from confounding factors is crucial. The OPLS algorithm extends the standard PLS decomposition by incorporating orthogonal signal correction (OSC) or direct orthogonalization in the latent variable extraction. In the NIPALS-based implementation, after computing the predictive weights and scores as in PLS, an orthogonal filter removes the uncorrelated part from X residuals before proceeding to the next component. This results in a model where each predictive component maximizes covariance with Y, while orthogonal components capture structured noise orthogonal to both X and Y. The method was introduced by and in 2002.

Regularized PLS (L-PLS)

Regularized partial least squares (L-PLS) extends the standard PLS framework by incorporating L1 (Lasso-like) regularization to promote sparsity in the model coefficients, making it particularly suitable for high-dimensional datasets where variable selection is crucial. This regularization is applied to the weights or loadings within the PLS latent structure, addressing issues like multicollinearity and overfitting that are common in scenarios with many predictors relative to observations. The approach balances prediction accuracy with interpretability by shrinking irrelevant coefficients to zero, thereby identifying key features that drive the relationship between predictors X and responses Y. The core formulation of L-PLS modifies the PLS objective to include an L1 penalty on the regression coefficients B, formulated as minimizing \|Y - X B\|^2 + \lambda \|B\|_1 within the latent framework of PLS, where \lambda > 0 controls the sparsity level. This penalty encourages many elements of B to be exactly zero, facilitating automatic selection while preserving the and iterative component extraction characteristic of PLS. Unlike PLS, which uses all variables in each latent component, L-PLS enforces sparsity directly on the weight vectors, leading to sparser linear combinations of predictors. The algorithm for L-PLS adapts the NIPALS procedure through iterative soft-thresholding applied to the weight updates during each component extraction step. Specifically, for the (a+1)-th component, the weight vector is computed as \mathbf{w}_{a+1} = \text{soft_threshold}\left( \frac{X^T \mathbf{u}_a}{\|\mathbf{X}^T \mathbf{u}_a\|}, \lambda \right), where \mathbf{u}_a is the score vector from the previous step, and the soft-thresholding operator is defined as \text{soft_threshold}(z, \lambda) = \text{sign}(z) \max(|z| - \lambda, 0) for each element z. This thresholding is performed after the initial covariance maximization but before normalization, ensuring sparsity while maintaining the deflation of residuals in subsequent iterations. The process repeats for a specified number of components, with \lambda typically tuned via cross-validation. L-PLS induces sparsity in the latent components, which propagates to the final coefficients, making it advantageous for in high-dimensional settings such as , where thousands of genes must be sifted to identify those associated with phenotypes. By embedding regularization within the NIPALS-like iterations, it avoids the need for post-hoc thresholding and improves model stability and interpretability compared to unregularized PLS. This method was introduced in the late 2000s, with foundational work appearing around 2009.

Advanced Extensions

One advanced extension of partial least squares (PLS) regression is the three-pass regression filter (3PRF), introduced in 2015 as a method for forecasting with a large number of predictors. The 3PRF operates as a estimator that directly identifies relevant forecast factors from high-dimensional data, reducing to standard PLS as a special case when the factors are orthogonal. This approach enhances efficiency in multi-block or high-dimensional settings by avoiding iterative deflation steps common in traditional PLS algorithms, making it suitable for applications involving numerous variables such as . Another direct computational variant is PLS-SVD, which computes PLS components non-iteratively using () on the \mathbf{X}^T \mathbf{Y}. Specifically, the SVD yields [\mathbf{U}, \mathbf{S}, \mathbf{V}] = \mathrm{svd}(\mathbf{X}^T \mathbf{Y}), from which the first weight vectors are extracted as \mathbf{w} = \mathbf{V}(:,1) for the predictors and \mathbf{c} = \mathbf{U}(:,1) for the responses, followed by and to obtain scores and loadings. This method is particularly advantageous for large-scale datasets, as it leverages efficient SVD implementations to bypass convergence issues in iterative algorithms like NIPALS, thereby improving computational speed without sacrificing the maximization of between latent variables. PLS correlation, also known as partial least squares correlation (PLSC), modifies the objective to maximize the between latent variable scores rather than their , assuming standardized input variables. This variant projects both predictor and response blocks onto common latent spaces where the is optimized, making it robust to differences and suitable for exploratory analyses of symmetric relationships. In practice, PLSC extracts successive pairs of singular vectors from the matrix of standardized \mathbf{X} and \mathbf{Y}, providing a for identifying shared patterns without assuming a directional predictor-response distinction. These extensions find niche applications in complex data environments; for instance, 3PRF has been applied in to handle multivariate brain imaging data with many features, enabling robust factor extraction for predictive modeling. Similarly, PLS-SVD facilitates efficient analysis of large-scale datasets, such as genomic or spectroscopic arrays, where iterative methods would be prohibitive due to computational demands.

History and Development

Origins and Key Contributors

Partial least squares (PLS) regression originated in the mid-1960s as a response to challenges in handling within econometric path models and principal component estimation. Herman O. A. Wold, a , introduced key precursors through iterative procedures, initially termed nonlinear iterative (NILES), which evolved into the nonlinear iterative partial (NIPALS) . This foundational work appeared in Wold's 1966 chapter, where he described methods for estimating principal components and related models using iterative techniques, laying the groundwork for PLS in functional modeling contexts within . By the mid-1970s, refined these ideas for more complex systems involving latent variables, transitioning toward what would become known as "functional PLS." In his 1975 publication, he presented the NIPALS approach for path models with latent variables, emphasizing soft modeling techniques that prioritized predictive utility over strict distributional assumptions, particularly to address in regression-based path analyses. This work marked a pivotal shift, introducing concepts like Mode A ( scheme for functional relations) in subsequent elaborations, and was published in a volume on quantitative . Herman Wold's contributions established PLS as a flexible tool for econometric applications, with the NIPALS precursor enabling iterative deflation of covariance structures. The evolution to "inverse PLS" occurred in the late 1970s and early 1980s, adapting the method for predictive modeling in natural sciences, particularly . Svante Wold, Herman's son and a , played a crucial role in this transition, applying and modifying PLS for handling high-dimensional data with , such as in . In 1983, Svante Wold, along with Harald Martens and Herman Wold, demonstrated PLS's practical utility in a seminal paper on multivariate problems in , effectively bridging theoretical to spectroscopic applications by inverting the modeling focus from path structures to direct prediction of responses from predictors.

Milestones and Evolution

During the 1990s, the Nonlinear Iterative Partial Least Squares (NIPALS) algorithm for PLS regression achieved greater standardization within the chemometrics community, facilitating its routine application in multivariate through integration into established software platforms like , which enhanced accessibility for spectroscopic and tasks. This period marked a shift toward computational refinement rather than theoretical innovation, solidifying PLS as a practical tool for handling collinear data in industrial settings. The 2000s brought key theoretical advancements, including the introduction of Orthogonal PLS (OPLS) in 2002 by Trygg and Wold, which partitioned variance into predictive and orthogonal components to improve model interpretability and in complex datasets. Concurrently, kernel PLS emerged around 2001, extending the method to nonlinear relationships by mapping data into higher-dimensional reproducing kernel Hilbert spaces, thus bridging PLS with kernel-based techniques. In the 2010s, focus turned to handling high-dimensional and noisy data, with sparse variants like sparse PLS (SPLS) proposed in by Chun and Keleş, incorporating L1 penalization for simultaneous dimension reduction and variable selection in and beyond. Robust PLS methods also gained traction, addressing outliers through modifications like partial robust M-, as detailed in works from onward. A notable refinement was the three-pass filter (3PRF) in 2015 by and Pruitt, which generalized PLS for with many predictors while maintaining constrained least-squares efficiency. Post-2020 developments have integrated PLS with modern computational paradigms, such as neural PLS hybrids exemplified by deep partial least squares models in 2023, combining latent variable extraction with for instrumental variable in . Bayesian PLS approaches have also advanced, with Urbas et al. (2024) introducing a probabilistic framework that quantifies parameter uncertainty in predictions, particularly useful for . In 2025, further extensions include PLS models accommodating skew-normal and skew-t distributions for handling asymmetric data, and hybrid approaches combining PLS with for enhanced predictive performance in complex datasets. Overall, PLS has evolved from its econometric origins through its adaptation in by Svante and others—to a versatile component of pipelines, evidenced by exponential citation growth post-2000, reflecting its adoption across bioinformatics, , and predictive modeling.

Applications

In Chemometrics and Spectroscopy

Partial least squares regression (PLSR) finds primary application in and for quantitative predictions from spectral data, particularly in near-infrared () spectroscopy where it models complex relationships between absorbance spectra and chemical properties. A classic example is the prediction of gasoline number, where PLSR calibrates spectra (typically in the 660-1215 range) against reference octane values, achieving accurate forecasts despite overlapping spectral features from mixtures. This approach has been instrumental in rapid, non-destructive in fuel analysis. In of multicomponent mixtures, PLSR excels by calibrating predictor matrices X (spectral intensities across wavelengths) to response matrices Y (component concentrations), effectively resolving overlapping signals in techniques like Fourier-transform infrared (FTIR) or spectroscopy. The method's strength lies in handling severe among wavelengths—common in spectroscopic data due to correlated absorbances—by projecting data into a lower-dimensional that maximizes covariance between X and Y, outperforming univariate or multiple in noisy, collinear environments. Validation routinely employs RMSEP from cross-validation or independent test sets to assess predictive reliability, ensuring model robustness against spectral variations like instrument drift or sample heterogeneity. A prominent involves pharmaceutical tablet composition analysis, where PLSR integrated with has been applied since the 1980s to quantify active pharmaceutical ingredients () and excipients non-destructively. Early implementations in the late 1980s and 1990s focused on uniformity testing for tablets, using PLSR to correlate diffuse reflectance spectra with API content, achieving RMSEP below 2% for low-dose formulations. This evolved into (PAT) frameworks, where PLSR models are combined with (DOE) such as factorial or response surface designs to optimize calibration sets, enhancing model transferability across manufacturing batches and reducing validation time. For example, in continuous tablet manufacturing, DOE-guided PLSR controls pressing pressure via real-time feedback, maintaining API uniformity with prediction errors under 1.5%.

In Bioinformatics and Social Sciences

In bioinformatics, partial least squares (PLS) regression has been widely applied to high-dimensional data from for disease classification tasks. A seminal approach uses PLS to reduce the dimensionality of thousands of features while predicting tumor types, such as distinguishing between subtypes or classes, by extracting latent variables that maximize covariance between predictors and response outcomes. This method outperforms traditional classifiers in scenarios with multicollinear genes and small sample sizes, and has been applied to benchmark datasets like the Golub data. For instance, PLS variants have successfully classified multi-class tumors from colon and profiles, highlighting its utility in identifying biologically relevant patterns for . In the social sciences, PLS regression, particularly through partial least squares (PLS-SEM), serves as a robust tool for and of survey data with latent structures. Unlike covariance-based SEM, PLS-SEM estimates composite models by iteratively minimizing residuals in relationships, making it suitable for exploratory of complex behavioral constructs like attitudes or satisfaction from responses. It has been employed to model latent variables in psychometric scales, such as validating multi-item surveys for psychological traits, where it handles non-normal data and smaller samples effectively compared to maximum likelihood methods. This approach equates to factor-score regression in traditional , enabling the assessment of measurement invariance and in fields like and . An illustrative application in involves using PLS for stock market forecasting from multivariate . By projecting numerous economic indicators onto latent components correlated with future returns, PLS has demonstrated superior out-of-sample predictability over univariate models. This supervised dimension reduction captures dependencies in financial variables, aiding . Emerging uses of PLS in , such as (fMRI), leverage its ability to perform supervised on spatiotemporal brain . PLS identifies task-related patterns by maximizing between voxel activations and behavioral outcomes, enabling discrimination between healthy and diseased states like Alzheimer's, with reported cross-validated sensitivities up to 84.6%. Recent extensions, including oriented PLS, enhance this by partitioning variance into task-relevant components, outperforming (PCA) in supervised settings due to its focus on predictive rather than mere variance. These advances, building on early applications in event-related potentials, support PLS as a key method for uncovering brain-behavior relationships in high-dimensional studies, with ongoing developments as of 2023 integrating PLS with for improved accuracy in and applications.

To Principal Component Regression

Principal component regression (PCR) applies principal component analysis (PCA) solely to the predictor matrix X to derive orthogonal components that capture the maximum variance in X, after which the response Y is regressed onto these components (scores) using ordinary least squares, effectively ignoring Y during the decomposition process. In contrast, partial least squares (PLS) regression constructs latent components by simultaneously considering both X and Y, weighting the directions in X by their relevance to Y through maximization of the between the component scores of X and Y. This key distinction arises because PCR prioritizes directions of highest variance within X, potentially including noise irrelevant to Y, whereas PLS emphasizes components that explain variation in Y, making it particularly advantageous for predictive modeling when the response drives the relevant structure in the predictors. In scenarios involving among predictors, PLS typically outperforms in prediction accuracy, as demonstrated in simulation studies where PLS achieves lower (MSE) using fewer components than . For instance, on datasets with high , such as near-infrared spectra, PLS often achieves comparable or better prediction performance with fewer components than . The regression coefficients highlight this integration of Y: in PCR, the coefficient vector is given by B = V (V^T X^T X V)^{-1} V^T X^T Y, where V comprises the loadings from PCA on X alone, reflecting only X's structure. By comparison, PLS coefficients incorporate Y-integrated weights W, derived to maximize X-Y covariance, resulting in B = W (P^T W)^{-1} P^T Y (or equivalent forms), where P are the loadings for X and the process ensures components are attuned to Y.

To Ridge Regression and Lasso

Ridge regression addresses in ordinary (OLS) by incorporating an L2 penalty term, which shrinks the coefficients toward zero without setting any to exactly zero, thereby stabilizing estimates but retaining all predictors without explicit dimension reduction. In contrast, regression employs an L1 penalty to induce sparsity by driving some coefficients to exactly zero, enabling automatic variable selection; however, it can be unstable in the presence of highly correlated predictors, often selecting only one from a group of correlated variables. Partial least squares (PLS) regression differs fundamentally by constructing latent variables that maximize between predictors and the response, integrating dimension reduction with predictive modeling to handle both and high-dimensional settings (high p/n ratios) more effectively than , which focuses solely on penalization without reducing variables. Unlike , PLS preserves information from correlated predictors by incorporating response-guided components, avoiding the instability of sparse selection in such scenarios. Simulation studies have shown that PLS can yield better predictive performance than in multicollinear settings, often requiring fewer effective parameters for similar MSE. In chemometric applications with correlated spectroscopic data, such as , PLS has shown competitive MSE performance compared to both and .

Implementations

Software Packages

Partial least squares (PLS) regression is implemented in various open-source and commercial software packages, enabling its application across statistical computing environments. In the R programming language, the pls package provides core functionality for PLS regression, utilizing the NIPALS algorithm as the default method while also supporting alternatives like kernel PLS and SIMPLS, with specific handling for univariate (PLS1) and multivariate (PLS2) response variables. Complementing this, the mixOmics package extends PLS to multi-block data integration and sparse variants, such as sparse PLS (sPLS) for variable selection via lasso penalization and multi-block PLS for analyzing multiple datasets measured on the same samples. In , the library offers the PLSRegression class within its cross_decomposition module, implementing PLS for tasks with support for multiple targets (PLS1/PLS2) and an option for SVD-based computation through the related PLSSVD transformer, which performs on the . For advanced extensions, the pypls package specializes in PLS-DA and OPLS-DA, tailored for high-dimensional data in fields like and analysis. Commercial tools include SIMCA from Sartorius (formerly Umetrics), which provides comprehensive PLS and orthogonal PLS (OPLS) modeling for multivariate data analysis, including tools for model building, validation, and prediction in process industries. Similarly, Unscrambler by CAMO Software focuses on spectroscopy applications, supporting PLS regression alongside PCA for calibration and predictive modeling in near-infrared and other spectral data contexts. For , the Statistics and Machine Learning Toolbox includes the plsregress function for PLS regression, computing predictor and response loadings with options for cross-validation and handling multicollinear data. A prominent third-party extension is PLS_Toolbox by Eigenvector Research, offering an extensive suite of chemometric tools including advanced PLS variants (e.g., OPLS, SIMPLS) and integrations for multivariate analysis within ; as of November 2025, it supports versions up to R2024b but is incompatible with R2025a due to Java deprecation.

Practical Usage Guidelines

In partial least squares (PLS) regression, preprocessing the data is a foundational step to ensure model reliability and interpretability. Both the predictor matrix \mathbf{X} and response matrix \mathbf{Y} should be centered by subtracting their column means and scaled by dividing by their standard deviations, transforming them into z-scores; this equalizes the influence of variables with different units or scales, preventing dominance by high-variance predictors and aiding in the identification of relevant latent structures. Outliers must be addressed early, as they can distort component loadings and inflate prediction errors—techniques such as robust scalers (e.g., using medians and interquartile ranges) or diagnostic plots like Hotelling's T^2 and distance-to-model plots are recommended to detect and either remove or downweight them before fitting the model. Selecting the optimal number of latent components A requires careful validation to balance explanatory power and generalizability. Cross-validation, particularly leave-one-out or k-fold variants, is the standard method: fit models with increasing A and choose the value that minimizes the predicted (PRESS), which quantifies prediction error on held-out data; this guards against , where excessive components capture noise rather than signal. As a rule, stop adding components once PRESS begins to increase or when the reduction in error plateaus, ensuring the model retains without sacrificing predictive accuracy. Model validation emphasizes metrics that assess both fit and predictive utility. The cross-validated coefficient of determination, Q^2, measures out-of-sample performance, with values exceeding 0.05 indicating at least minimal predictive relevance, though Q^2 > 0.5 is desirable for robust models; negative Q^2 signals poor generalization and warrants model revision. Variable importance in projection (VIP) scores provide insight into predictor relevance, calculated as the weighted sum of squared loadings across components—variables with VIP > 1 are deemed influential and retained, while those below 0.8 can often be excluded to simplify the model without substantial loss in performance. Practical tips enhance PLS application efficiency. For univariate responses (single \mathbf{Y}), begin with PLS1, which optimizes a single set of weights for the response and often yields simpler, more stable models than PLS2 for multivariate \mathbf{Y}; PLS2 is preferable only when responses are highly correlated. For improved interpretability, especially in noisy datasets, orthogonal PLS (OPLS) variants separate predictive and orthogonal variation, clarifying which predictors directly relate to \mathbf{Y} versus unrelated noise. Recent guidelines from 2023 emphasize integrating PLS with for nonlinear extensions, such as deep PLS layers that learn hierarchical latent representations while preserving interpretability, particularly useful in high-dimensional settings like instrumental variable regression.

References

  1. [1]
  2. [2]
    [PDF] Partial Least Squares (PLS) Regression.
    Introduction. Pls regression is a recent technique that generalizes and combines features from principal component analysis and multiple regression.
  3. [3]
    [PDF] Partial Least Square (PLS) Analysis∗ - Indian Academy of Sciences
    Partial least square (PLS) analysis is the most favourite tool in chemometrics to develop calibration models. PLS tech- nique allows us to decipher even the ...<|control11|><|separator|>
  4. [4]
    [PDF] Overview and Recent Advances in Partial Least Squares
    Projections of the observed data to its latent structure by means of PLS was developed by. Herman Wold and coworkers [48, 49, 52]. PLS has received a great ...
  5. [5]
    [PDF] Partial Least Square Regression PLS-Regression
    PLS regression is a recent technique that generalizes and combines features from principal component analysis and multiple regres-.
  6. [6]
    [PDF] A Simple Explanation of Partial Least Squares
    Apr 27, 2013 · The simplest case of linear regression yields some geometric intuition on the coefficient. Suppose we have a univariate model with no ...
  7. [7]
    None
    ### Summary of PLS Regression from https://www.utdallas.edu/~herve/Abdi-PLS-pretty.pdf
  8. [8]
    [PDF] Partial least squares regression and projection on latent structure ...
    Partial least squares (PLS) regression (a.k.a. projection on latent structures) is a recent technique that combines features from and generalizes principal ...
  9. [9]
    [PDF] Properties of Partial Least Squares (PLS) Regression, and ...
    • Recall PCA: X = TPT = T k. P k. T + E. • From SVD: X = USVT, T = US, P = V. Page 6. Principal Components. Regression (PCR). • Principal Components Regression ...
  10. [10]
    [PDF] An Introduction to Partial Least Squares Regression - OARC Stats
    Partial least squares is a popular method for soft modelling in industrial applications. This paper intro- duces the basic concepts and illustrates them with a ...Missing: original | Show results with:original<|control11|><|separator|>
  11. [11]
  12. [12]
    Regularized Partial Least Squares with an Application to NMR ... - NIH
    We introduce a framework for Regularized PLS by solving a relaxation of the SIMPLS optimization problem with penalties on the PLS loadings vectors.
  13. [13]
    A new approach to forecasting using many predictors - ScienceDirect
    The 3PRF is a constrained least squares estimator and reduces to partial least squares as a special case. Simulation evidence confirms the 3PRF's forecasting ...Missing: way factorization
  14. [14]
    Partial Least Squares (PLS) methods for neuroimaging: A tutorial ...
    May 15, 2011 · In neuroimaging, PLS refers to two related methods: (1) symmetric PLS or Partial Least Squares Correlation (PLSC), and (2) asymmetric PLS or ...
  15. [15]
    4 Commented Timeline | The Saga of PLS
    Throughout the 1970s, Herman Wold led the development of an alternative methodology to estimate path models by applying iterative algorithms of least squares ...
  16. [16]
    Path Models with Latent Variables: The NIPALS Approach
    1975, Pages 307-357. Quantitative Sociology. 11 - Path Models with Latent Variables: The NIPALS Approach. Author links open overlay panel. HERMAN WOLD. Show ...
  17. [17]
    The multivariate calibration problem in chemistry solved by the PLS ...
    Aug 24, 2006 · WOLD, W.J. DUNN, A. RUHE (1982). The collinearity problem in linear regression. The partial least squares (PLS) approach to generalized inverses ...Missing: Svante | Show results with:Svante
  18. [18]
    [PDF] Kernel Partial Least Squares Regression in Reproducing Kernel ...
    Multicollinearity results in large variances and covariances for the least squares estimators of the regression coefficients. Multicollinearity can also produce ...
  19. [19]
    Sparse partial least squares regression for simultaneous dimension ...
    Jan 6, 2010 · We provide an efficient implementation of sparse partial least squares regression and compare it with well-known variable selection and ...
  20. [20]
    Robust and classical PLS regression compared - Liebmann - 2010
    Feb 3, 2010 · Classical PLS regression is compared with partial robust M regression (PRM) based on different data sets containing noise and/or outliers in ...
  21. [21]
    Deep partial least squares for instrumental variable regression
    Sep 2, 2025 · These methods suggest that partial least squares as a precursor to a more general framework of deep learning and instrumental variables can ...<|separator|>
  22. [22]
    Predicting milk traits from spectral data using Bayesian probabilistic ...
    We introduce a Bayesian latent-variable model, emulating the desirable properties of PLS regression while accounting for parameter uncertainty in prediction.
  23. [23]
    Partial Least Squares (PLS) Methods: Origins, Evolution, and ...
    Aug 6, 2025 · This article attempts to clarify some of the existing doubts about PLS methods and in an integrative capacity, allow them to be better understood.
  24. [24]
    The Evolution of Data-Driven Modeling in Organic Chemistry
    Both chemometrics and machine learning evolved from the fields of pattern recognition and computational learning theory by applying statistical methods to ...
  25. [25]
    Prediction of gasoline octane numbers from near-infrared spectral ...
    Prediction of gasoline octane numbers from near-infrared spectral features in the range 660-1215 nm | Analytical Chemistry.
  26. [26]
    [PDF] Rating Octane of Gasoline by Near Infrared Spectroscopy - Infometrix
    Running PLS predictions on all three subsets and contrasting the results to the original PLS results for the entire range at 1 nm resolution creates the picture ...
  27. [27]
    Partial Least-Squares Regression as a Tool to Retrieve Gas ...
    We report on a statistical tool based on partial least-squares regression (PLSR) able to retrieve single-component concentrations in a multiple-gas mixture.Partial Least Squares... · Results and Discussion · Conclusions · Author Information
  28. [28]
    Near-Infrared Spectroscopy: Applications in the Analysis of Tablets ...
    Partial Least-Squares. Partial least-squares (PLS) is similar to PCR, providing factors that are analogous to principal components. PLS attempts to explain as ...<|control11|><|separator|>
  29. [29]
    Partial Least Squares Regression-Based Robust Forward Control of ...
    Jan 20, 2020 · In this study, we established a robust feed-forward control model for the tableting process by partial least squares regression using the near-infrared (NIR) ...
  30. [30]
    Partial Least Squares, Experimental Design, and Near-Infrared ... - NIH
    Apr 4, 2023 · Near-infrared spectrophotometry and partial least squares regression (PLSR) were evaluated to create a pleasantly simple yet effective approach for measuring ...
  31. [31]
    Multi-class cancer classification via partial least squares with gene ...
    The methodologies proposed in this paper are applied to four gene expression data sets with multiple classes: (a) a hereditary breast cancer data set with (1) ...Missing: regression | Show results with:regression
  32. [32]
    Multi-class tumor classification by discriminant partial least squares ...
    In this paper, discriminant partial least squares was used to classify the different types of human tumors using four microarray datasets and showed good ...Missing: seminal | Show results with:seminal
  33. [33]
    Reflections on Partial Least Squares Path Modeling - Sage Journals
    We conclude with recommendations as to whether and how PLS-PM serves as a viable contender to SEM approaches for estimating and evaluating theoretical models.
  34. [34]
    Equivalence of Partial-Least-Squares SEM and the Methods of ...
    SEM is often regarded as an extension of factor analysis in the psychometric literature. This methodology also covers several widely used statistical models ...
  35. [35]
    Stock return predictability in the frequency domain - ScienceDirect.com
    This paper expects the PLS method to align the economic fundamental measure with the target variable's explanatory purpose and to identify the most predictive ...
  36. [36]
    Partial Least Squares for Discrimination in fMRI Data - PMC - NIH
    The results support the potential of using fMRI as an imaging biomarker or diagnostic tool to discriminate individuals with disease or high risk.Missing: advantages | Show results with:advantages
  37. [37]
    Multivariate analysis of fMRI data by oriented partial least squares
    Compared to more traditional model-driven analysis techniques, one major advantage of PCA is that it makes no prior assumptions about the partitioning of data ...
  38. [38]
    Partial least squares analysis of neuroimaging data - ResearchGate
    Aug 5, 2025 · This paper reviews the more recent extension of PLS to the analysis of spatiotemporal patterns present in fMRI, ERP, and MEG data.
  39. [39]
    (PDF) A comparison of partial least squares regression with other ...
    Aug 7, 2025 · KeyWords: Ordinary least squares, Ridge regression, Principal component regression, Partial least squares.
  40. [40]
    Partial Least Squares Regression and Principal Components ...
    Apply partial least squares regression (PLSR) and principal components regression (PCR), and explore the effectiveness of the two methods.
  41. [41]
  42. [42]
    PLS: the R package - Bjørn-Helge Mevik
    pls is an R package implementing partial least squares regression (PLSR) and principal component regression (PCR).Missing: PLS1 | Show results with:PLS1
  43. [43]
    [PDF] Introduction to the pls Package
    The pls package implements Principal Component Regression (PCR) and Partial Least. Squares Regression (PLSR) in R, and is freely available from the CRAN website ...<|control11|><|separator|>
  44. [44]
    (s)PLS - mixOmics
    A robust, malleable multivariate projection-based method. It can be used to explore or explain the relationship between two continuous datasets.<|control11|><|separator|>
  45. [45]
    Multiblock (s)PLS - mixOmics
    Multiblock partial least squares (MB-PLS) is an extension of the PLS method that performs N-integration on multiple datasets measured across the same samples.
  46. [46]
    PLSRegression — scikit-learn 1.7.1 documentation
    PLSRegression is also known as PLS2 or PLS1, depending on the number of targets. For a comparison between other cross decomposition algorithms, see Compare ...Missing: SVD | Show results with:SVD
  47. [47]
    PLSSVD — scikit-learn 1.7.2 documentation
    This transformer simply performs a SVD on the cross-covariance matrix X'Y. It is able to project both the training data X and the targets Y.
  48. [48]
    Omicometrics/pypls: Implementation of PLS-DA and OPLS ... - GitHub
    This package implements PLS-DA and OPLS-DA for analysis of high-dimensional data derived from, for example, mass spectrometry in metabolomics.
  49. [49]
    SIMCA® - Multivariate Data Analysis Software - Sartorius
    Filter data with appropriate tools; Calculate multivariate models (PCA, PLS, OPLS); Compare model results; Select optimal filter and spectral range; Predict new ...SIMCA®-online · SIMCA Free Trial Download · SIMCA Free Trial Cloud
  50. [50]
    Unscrambler X - PaNdata Software
    Unscrambler X is for multivariate data analysis, used for calibration and predictive models in spectroscopic analysis, and supports techniques like PLS and PCA.
  51. [51]
    plsregress - Partial least-squares (PLS) regression - MATLAB
    The `plsregress` function performs partial least-squares (PLS) regression, returning predictor and response loadings for a regression of responses on ...Missing: R Python commercial
  52. [52]
    Analyze and Model Data on GPU - MATLAB & Simulink - MathWorks
    To use a GPU, start with CPU-optimized code, use gpuArray, transfer data, and use Parallel Computing Toolbox. Many functions automatically execute on GPU.
  53. [53]
    [PDF] A Beginner's Guide to Partial Least Squares Analysis - Angelfire
    Because this article is intended as a general introduction, it avoids mathematical details as far as possible and instead focuses on a presentation of PLS, ...
  54. [54]
  55. [55]
    [PDF] The PLS Procedure - SAS Support
    The PLS procedure enables you to choose the number of extracted factors by cross validation—that is, fitting the model to part of the data, minimizing the ...
  56. [56]
    A new model selection criterion for partial least squares regression
    This method is based on the Predicted Residual Error Sum of Squares (PRESS) for PLS regression. Our mathematical development is based on matrix calculations ...
  57. [57]
    Extension and significance testing of Variable Importance in ...
    Nov 15, 2023 · Variable Importance in the Projection (VIP) index is extensively used to highlight the relative importance of a given variable in predicting the response.
  58. [58]
    Vip - Eigenvector Research Documentation Wiki
    Jan 10, 2018 · Variable Importance in Projection (VIP) scores estimate the importance of each variable in the projection used in a PLS model and is often used ...
  59. [59]
    Orthogonal PLS (OPLS) Modeling for Improved Analysis and ...
    In this article we describe the extension into orthogonal PLS modeling, in terms of two new methods, called OPLS and O2PLS, with similar prediction capacity ...Missing: 2002 | Show results with:2002
  60. [60]
    Deep partial least squares for instrumental variable regression
    Jun 19, 2023 · In this paper, we propose deep partial least squares for the estimation of high-dimensional nonlinear instrumental variable regression.