Fact-checked by Grok 2 weeks ago

Random subspace method

The random subspace method, also known as attribute bagging or feature bagging, is an technique in that improves the accuracy and of classifiers by training multiple base models on randomly selected subsets of the feature space, rather than on subsets of the training data itself. Introduced by Tin Kam Ho in 1998, it was originally proposed for constructing decision forests, where individual decision trees are built in pseudorandomly chosen feature subspaces to exploit high-dimensional data while mitigating . In the method, a is formed by projecting the original onto a randomly selected of features, typically around half the total dimensionality for optimal performance, after which a base classifier—such as a —is trained on this reduced representation. The predictions from these diverse classifiers are then aggregated, often through majority voting for tasks or averaging of posterior probabilities, to produce the final output. This approach preserves perfect accuracy on the set while enhancing on unseen , particularly in scenarios with many irrelevant or noisy features, as the randomness in promotes diversity among the ensemble members. Experimental evaluations on datasets, such as those from the Statlog repository, demonstrated that random subspace ensembles outperform decision trees and compete favorably with other resampling methods like bagging and boosting, especially on high-dimensional problems like DNA sequence with 180 features. The random subspace method has been extended beyond decision trees to linear classifiers and other models, showing robustness in improving weak learners through feature randomization. It forms a foundational component in more advanced ensembles, such as random forests, where feature subspace sampling is combined with to further decorrelate trees. Applications span in , forecasting in high-dimensional , and even fault detection systems, with ongoing research adapting it for modern challenges like sparse data and scalable computation.

Background

Definition

The random subspace method is an technique that constructs a classifier by combining multiple base learners, typically decision trees, where each base learner is on a randomly selected of the input —referred to as a —while using the full for every base learner. This approach aims to improve by leveraging the introduced through . Also known as attribute bagging or feature bagging, the method emphasizes randomization in feature selection to generate diverse base classifiers, contrasting with techniques that rely on data sampling for diversity. The core idea is that by projecting the data into lower-dimensional random subspaces, the base learners capture different aspects of the feature space, enhancing overall ensemble robustness without reducing the training data size. For instance, in a dataset with d features, each base learner operates on a random subset of size k < d, such as selecting half the features in high-dimensional settings like d = 180. Feature subset selection is typically performed via uniform random sampling without replacement, as outlined in the following brief pseudocode:
function select_subspace(d, k):
    features = [1, 2, ..., d]
    shuffle features randomly
    return features[1:k]  // Select first k after shuffle (equivalent to without replacement)
This ensures each subspace is a distinct random projection of the original feature space.

History

The random subspace method was introduced by in 1998 through his seminal paper "The Random Subspace Method for Constructing Decision Forests," published in IEEE Transactions on Pattern Analysis and Machine Intelligence. This work proposed training multiple decision trees on randomly selected subsets of features to address limitations in traditional decision tree methods, particularly in high-dimensional spaces where exhaustive searches for optimal splits become computationally infeasible. Ho's initial motivation stemmed from challenges in pattern recognition tasks involving high-dimensional data, such as optical character recognition and satellite image classification, where the method demonstrated improved accuracy over single decision trees by reducing overfitting and enhancing generalization. Early applications extended to computer vision domains, including face recognition systems that leveraged random subspaces to handle variability in facial features and lighting conditions, with notable implementations appearing in research by the early 2000s. Subsequent developments in the 2000s integrated the random subspace method into broader ensemble frameworks, most prominently by Leo Breiman in his 2001 formulation of random forests, which combined feature subsampling with bootstrap aggregation to further diversify classifiers. Follow-up works around this period provided theoretical bounds on the method's performance, analyzing variance reduction and error rates in ensemble settings. By the 2010s, the method gained practical adoption through open-source libraries, with extensions implemented in scikit-learn's BaggingClassifier (supporting random feature subsets since version 0.15 in 2012) for scalable machine learning workflows. Additionally, MATLAB's Statistics and Machine Learning Toolbox incorporated random subspace ensembles via the fitensemble function around the mid-2010s (introduced in R2015b), facilitating its use in statistical and engineering applications.

Methodology

Core algorithm

The random subspace method constructs an ensemble of base learners by training each on a randomly selected subset of features from the full feature set, while using the entire training dataset for each learner. The core procedure begins by specifying the number of base learners T, the total number of features d, and the subset size k (where typically k < d). For each base learner t = 1 to T, a subset of k features is chosen uniformly at random without replacement from the d available features. A base classifier, such as a , is then trained on the complete training dataset but restricted to the selected k features. Finally, predictions from all T base learners are aggregated: for classification tasks, majority voting is used to determine the final class label; for regression, the predictions are averaged to obtain the output value. The following pseudocode outlines the full process for a classification task:
Algorithm Random Subspace Method (RSM)

Input: Training dataset D with n samples and d features, number of base learners T, subset size k, base learner type (e.g., decision tree)
Output: Ensemble classifier f

1. Initialize ensemble f = empty
2. For t = 1 to T:
   a. Randomly select a subset S_t of k features from the d features (without replacement)
   b. Train base learner h_t on D using only features in S_t
   c. Add h_t to f
3. For a new instance x:
   a. For each h_t in f, compute prediction p_t = h_t(x) using only features in S_t
   b. Compute final prediction: argmax_c (sum over t of I(p_t = c)) where I is the indicator function (majority vote)
Return f
This procedure ensures diversity among base learners by varying the feature subspaces, while maintaining access to all samples for robust training. In the aggregation step for classification, ties in the majority vote (e.g., when an even number of learners split evenly across classes) can be resolved by random selection among tied classes or by assigning priority based on class indexing, though the original formulation does not specify a default rule. Out-of-bag (OOB) estimates, commonly used in bagging-based ensembles for internal validation, are not inherent to pure RSM since every sample is used in every base learner; however, OOB-like validation can be approximated by combining RSM with bootstrap sampling or using cross-validation on the full ensemble. The choice of base learner in RSM is flexible but typically involves weak or moderately strong classifiers to promote ensemble strength through diversity; decision trees are the canonical example as proposed originally, though extensions include decision stumps, k-nearest neighbors, or linear models. The computational complexity of RSM depends on the base learner; for decision trees, training each tree on n samples and k features requires O(n k \log n) time, leading to an overall training complexity of O(T n k \log n), with prediction time O(T k) per instance assuming balanced trees.

Parameter choices

The number of base learners T in the random subspace method, often implemented as decision trees, is a key hyperparameter that influences the ensemble's stability and generalization. Typical values range from 10 to 500, with higher numbers generally improving performance by reducing variance, though diminishing returns occur beyond approximately 128 trees in many settings. Selection of T is commonly performed using cross-validation to balance bias and variance, ensuring the ensemble achieves low error without excessive computational cost. The subspace dimension k, representing the number of features randomly selected for each base learner, is another critical parameter, particularly in high-dimensional datasets where d \gg n (features much larger than samples). Common choices include k = \sqrt{d} for moderate dimensions or k = d/2 to retain sufficient information while promoting diversity; in Ho's original formulation, guidelines suggest k around 60-80% of d for high-dimensional data to maintain discriminative power without overfitting. These values are tuned via validation to optimize accuracy, as smaller k enhances diversity but may increase bias. Feature selection within each subspace is typically uniform random sampling without replacement to ensure non-redundant subsets and avoid correlated base learners, which could degrade ensemble performance. Alternatives include weighted strategies, where features are sampled proportional to their relevance (e.g., based on information gain or statistical tests), as explored in extensions of the method; uniform sampling suffices for most cases, but weighted approaches can improve efficiency in very high dimensions by prioritizing informative features. Without replacement is preferred to prevent repeated feature usage in a single subspace, fostering greater independence among learners. For base learners, such as decision trees, hyperparameters like maximum depth are adjusted to promote diversity; full-depth trees (unpruned) are standard in the original method to capture complex patterns, but limiting depth (e.g., to 5-10 levels) can prevent overfitting and enhance ensemble robustness in noisy data. Other tree-specific parameters, like minimum split size, are set conservatively to maintain variability across subspaces. Validation of parameter choices often employs out-of-bag (OOB) error estimation adapted for feature bagging, where for each sample, predictions are aggregated from base learners whose subspaces exclude certain features relevant to that sample, providing an unbiased generalization estimate without separate holdout sets. Cross-validation remains a robust alternative, especially when combining random subspaces with sample bagging, to fine-tune T and k.

Theoretical foundations

Motivation

The random subspace method was developed to tackle key challenges in machine learning, particularly in high-dimensional settings where the curse of dimensionality prevails. In such spaces, the exponential growth in volume relative to the number of data points results in sparse distributions, making it difficult for classifiers to capture meaningful patterns without overfitting to noise or irrelevant features. This often leads to poor generalization performance, as models trained on the full feature set struggle to distinguish signal from sparsity-induced artifacts. A core motivation lies in the need for diversity within to boost predictive accuracy. Traditional ensembles trained on the complete feature space tend to produce highly correlated base learners, limiting their collective strength. By randomly selecting feature subspaces for each base classifier, the method decorrelates these learners, fostering independence that amplifies the ensemble's robustness and reduces variance, especially beneficial in dimensions where full-space training exacerbates correlation. The approach offers distinct advantages over single classifiers by enabling improved generalization through the aggregation of decisions from multiple low-dimensional projections, which collectively average out idiosyncrasies of individual subspaces and enhance resilience to high-dimensional noise. Originally intended to construct "decision forests" for reliable pattern recognition, it circumvents the computational burden of exhaustive feature subset searches while preserving near-perfect training accuracy and scaling effectively with dimensionality—indeed, unlike many methods hampered by the curse, it thrives in high dimensions by exploiting the abundance of subspaces. Empirical studies in the 1990s further underscored the viability of random projections, demonstrating their ability to approximately preserve pairwise distances and geometric structures in high-dimensional data, which intuitively supports the subspace sampling strategy's effectiveness without requiring dimensionality reduction preprocessing. This observation aligns with the theoretical underpinnings of the , providing a high-level justification for why random subspaces maintain discriminative power akin to the original space.

Performance analysis

The random subspace method achieves performance improvements primarily through a bias-variance decomposition, where it significantly reduces variance by training base classifiers on uncorrelated random subspaces of the feature space, while introducing only a minimal increase in bias compared to a single full-dimensional classifier. This variance reduction arises because subspaces selected from high-dimensional data tend to capture diverse aspects of the feature distribution, leading to classifiers with low pairwise correlation. Theoretical analysis by Ho demonstrates that the method's error rates can converge to the under conditions of sufficient dimensionality and appropriate subspace sampling, as the ensemble aggregates classifiers that collectively approximate the optimal in high-dimensional settings. Subsequent work has extended this with margin-based generalization bounds, showing that larger margins in the ensemble decision boundary improve generalization, with error rates upper-bounded by terms involving the margin distribution and subspace diversity. Empirical studies on UCI repository datasets validate these theoretical insights, particularly in high-dimensional cases; for instance, on the ionosphere dataset (with 34 features and 351 samples), the method yields 5-10% accuracy gains over single decision trees, achieving test error rates around 6-8% versus 12-15% for individual trees. Similar improvements are observed across other datasets like sonar and satellite image classification, where ensemble sizes of 20-50 subspaces consistently outperform baselines by reducing overfitting in scenarios where dimensionality exceeds sample size (d >> n). Key factors influencing performance include the high-dimensional regime (d >> n), where subspace randomization effectively mitigates the curse of dimensionality; the expected ensemble error can be approximated as E[\text{err}] \approx (1 - \rho) \cdot \text{base_err} + \rho \cdot \text{maj_err}, with \rho denoting the average between base classifiers, base_err the of a single classifier, and maj_err the under perfect correlation (majority vote collapse). Lower \rho (achieved via diverse subspaces) drives the error closer to base_err weighted by independence benefits. However, these analyses assume feature independence, and performance degrades with highly correlated inputs, as subspaces may fail to produce sufficiently diverse classifiers, leading to higher \rho and diminished variance reduction. In such cases, preprocessing for decorrelation or hybrid methods may be necessary to restore effectiveness.

Applications

Classification tasks

The random subspace method is primarily employed for binary and multi-class classification tasks, where decision trees serve as the base learners to construct ensembles that enhance generalization performance in high-dimensional spaces. By training each tree on a randomly selected subset of features, the method reduces overfitting and improves accuracy through majority voting aggregation of predictions from the ensemble members. In its original application to face recognition, the random subspace method stabilizes subspace-based classifiers like Fisherface by generating multiple low-dimensional projections from random feature subsets, leading to more robust ensemble decisions under varying lighting and pose conditions. Similarly, in bioinformatics, it addresses datasets with thousands of features by selecting informative subspaces, enabling effective classification of cancer subtypes while mitigating of dimensionality. A notable from the method's foundational work involves the dataset, comprising 351 radar return samples with 34 for of ionospheric structure presence; here, ensembles of 50 decision trees achieved approximately 95% accuracy, surpassing single-tree performance of about 91%. Extensions of the random subspace method include hybrids with probabilistic sampling to handle imbalanced classes, where subsampling is combined with diverse ensemble techniques to prioritize minority samples and improve without excessive majority class dilution. Another variant integrates random subspaces with support vector machines (SVMs), training multiple SVMs on diverse feature subsets to boost classification accuracy in power quality event detection and other high-dimensional problems. Software implementations facilitate practical use; for instance, scikit-learn's ExtraTreesClassifier incorporates random subspace selection via the max_features parameter alongside randomized splits for efficient ensemble classification. In MATLAB, the fitcensemble function supports random subspace ensembles through the 'Subspace' method option, allowing customizable feature subset sizes for decision tree or discriminant analysis learners.

Regression and forecasting

The random subspace method adapts naturally to regression tasks by replacing majority voting with averaging of predictions from base regressors, such as regression trees or linear models, to produce a final continuous output. This aggregation reduces variance in high-dimensional settings where individual models may overfit, leading to more stable estimates. For instance, the method has been implemented in packages like regRSM for high-dimensional linear regression, where subspaces are randomly selected to compute variable importance and predictions via weighted averaging. In forecasting applications, particularly for time-series data, random subspace methods excel in high-dimensional environments by generating diverse submodels and averaging their predictions to mitigate overfitting and improve accuracy. A seminal 2019 study in the Journal of Econometrics by Boot and Nibbering applied these methods to U.S. macroeconomic time-series prediction, demonstrating improved mean squared error compared to benchmarks like principal component regression across quarterly and monthly datasets. The approach involves constructing forecasts from numerous submodels, enhancing reliability in economic indicators with many predictors. A specific variant incorporates Gaussian weighting of predictors within subspaces, where weights are drawn from a to softly emphasize or de-emphasize features, rather than hard selection, which is particularly effective for macroeconomic forecasting by balancing bias and variance. This technique allows for smoother integration of noisy or correlated variables common in . Practical examples include stock price prediction, where random subspace classifiers have been used to forecast returns by preprocessing high-dimensional and averaging submodel outputs for directional accuracy. Similarly, in with high-dimensional meteorological data, such as and inputs, the supports air quality predictions through deep random learning frameworks that handle spatial-temporal features. A key challenge in applying random subspace methods to forecasting lies in managing temporal dependencies, often addressed by employing rolling windows over the full to simulate out-of-sample predictions while preserving sequential structure. This ensures submodels capture evolving patterns without lookahead bias, though it increases computational demands in large time-series.

Comparison to bagging

The random subspace method (RSM) and (bootstrap aggregating) are both ensemble techniques that promote among base learners to improve predictive performance, but they differ fundamentally in their sampling strategies. , introduced by Breiman in , generates multiple predictors by drawing bootstrap samples—random subsets of training instances with replacement—while retaining all available features for each base model, thereby emphasizing through variations in the data to reduce variance. In contrast, RSM, proposed by Ho in , employs the full training for every base learner but randomly selects subsets of features (columns), focusing on feature to decorrelate models and address challenges like in feature-rich environments. These differences lead to distinct profiles across regimes. RSM particularly excels in high-dimensional settings with low sample sizes (high-d, low-n), such as in high-dimensional tasks with numerous features (e.g., bag-of-words representations), where bagging's instance sampling can yield highly correlated predictors due to limited variability. For example, Breiman's 2001 analysis of datasets with over 100 features demonstrated that RSM often outperforms bagging by better exploiting feature subspaces to capture relevant patterns without redundancy. Bagging, however, performs more robustly in low-dimensional datasets with noisy observations, where its instance resampling effectively averages out instability in the . Hybrids that integrate both instance and feature sampling enhance ensemble strength by balancing data and feature perturbations, often yielding superior generalization in mixed-dimensionality problems.

Integration with random forests

The random subspace method (RSM) is integrated into random forests by combining it with bagging to enhance ensemble diversity and performance, particularly in high-dimensional settings. Introduced by Breiman in 2001, random forests construct an ensemble of where each tree is trained on a bootstrap sample of the data (bagging, or row sampling), and at each node of every tree, a random subset of features is selected for split consideration (feature sampling via RSM). This dual —over instances and features—addresses limitations of standalone bagging by reducing among trees while preserving individual tree strength. The RSM's key contribution to random forests lies in its per-node feature subsampling, which mitigates and improves in high-dimensional spaces. For tasks, Breiman recommended selecting m = \sqrt{d} features at each split, where d is the total number of features, though alternatives like m = \log_2(d) + 1 or fixed small values (e.g., 25 for very high d) can be tuned for better results. This approach draws directly from Ho's RSM, which demonstrated that building trees in random feature subspaces yields robust classifiers by avoiding reliance on any single feature set. In practice, this integration decorrelates trees more effectively than bagging alone, leading to lower variance and higher accuracy. Extensions of this integration include extremely randomized trees (ExtraTrees), proposed by Geurts et al. in 2006, which further randomize within the selected . Unlike standard random forests, where the best is chosen from the , ExtraTrees select split attributes and cut-points entirely at random from the subspace, then grow trees to full depth without . This "extremely randomized" variant amplifies RSM's benefits, often resulting in faster training and comparable or superior performance on noisy or high-dimensional data, while maintaining the bagging framework. Random forests inherit RSM's robustness to high dimensionality, enabling effective handling of datasets where features outnumber samples, such as images or . For instance, on the zipcode digit recognition dataset (7,291 training samples, 2,007 test samples, 256 features), random forests achieved a test error rate of 6.3% using selection-based splits, outperforming single trees (20.6% error) and demonstrating the value of subspace randomization in reducing ensemble correlation. This integration yields consistent accuracy gains over plain bagging, especially in such domains, by promoting tree independence. Modern variants continue to leverage RSM within forest-like structures for specialized tasks. A 2024 extension adapts random subspace methods to local projections for estimating dynamic causal effects with high-dimensional controls, aggregating projections over random feature subsets to improve inference reliability in econometric applications. This builds on RSM's foundational role in random forests, extending its utility to under dimensionality challenges.

References

  1. [1]
    The random subspace method for constructing decision forests
    The random subspace method for constructing decision forests. Abstract: Much of previous attention on decision trees focuses on the splitting criteria and ...
  2. [2]
    [PDF] RaSE: Random Subspace Ensemble Classification
    Another example of ensemble classification is the random subspace method, which was first studied in the context of decision trees (Ho, 1998). As the name ...
  3. [3]
    [PDF] Random Sampling for Subspace Face Recognition - CUHK EE
    A robust random sampling face recognition system integrating shape, texture, and Gabor responses is finally constructed. Keywords: random subspace method, ...Missing: early | Show results with:early
  4. [4]
    [PDF] 1 RANDOM FORESTS Leo Breiman Statistics Department University ...
    Ho [1998] has written a number of papers on. "the random subspace" method which does a random selection of a subset of ... Classification Algorithms, Machine ...
  5. [5]
    BaggingClassifier — scikit-learn 1.7.2 documentation
    When random subsets of the dataset are drawn as random subsets of the features, then the method is known as Random Subspaces [3]. Finally, when base estimators ...
  6. [6]
    Ensemble Algorithms - MATLAB & Simulink - MathWorks
    Statistics and Machine Learning Toolbox offers three objects for bagging and random forest: ... "The random subspace method for constructing decision forests." ...
  7. [7]
    Random Subspace Method Algorithm - GM-RKB
    Jun 1, 2024 · In machine learning the random subspace method, also called attribute bagging or feature bagging, is an ensemble learning method that attempts ...
  8. [8]
    [PDF] Evolutionary Weights for Random Subspace Learning
    May 6, 2016 · random subspace learning, a variation of it with attribute bagging and we will also talk ... look at Random Subspace Method, or also know as ...
  9. [9]
    [PDF] STAT 451: Machine Learning Lecture Notes - Sebastian Raschka
    6.4 Time Complexity. Measuring the time complexity of decision tree algorithms can be complicated, and the approach is not very straight-forward. However, we ...<|control11|><|separator|>
  10. [10]
    Random Subspace Aggregation for Cancer Prediction with Gene ...
    Following experiment sets, default S to be the square root of M (feature ... Weighted random subspace method for high dimensional data classification.
  11. [11]
    Weighted random subspace method for high dimensional data ... - NIH
    The random subspace method introduced by Ho (1998) represents a distinct aggregating method that has a fundamentally different philosophy from the above ...
  12. [12]
    Full article: Enriching the random subspace method with margin theory
    Aug 27, 2018 · The random subspace method (RSM) has proved its excellence in numbers of pattern recognition tasks. However, the standard RSM is limited ...Missing: early | Show results with:early
  13. [13]
    Random Sampling for Subspace Face Recognition
    May 1, 2006 · Random sampling in face recognition uses random sampling on feature space, training samples, and subspace parameters to create multiple ...Missing: bioinformatics | Show results with:bioinformatics
  14. [14]
    Hybrid probabilistic sampling with random subspace for imbalanced ...
    We propose a method HPS-DRS that combines two ideas: Hybrid Probabilistic Sampling technique ensemble with Diverse Random Subspace to address these issues.
  15. [15]
    A random subspace ensemble classification model for ...
    This study proposes SVM based Random Subspace (RS) ensemble classifier to discriminate different Power Quality Events (PQEs) in a photovoltaic (PV) connected ...
  16. [16]
    ExtraTreesClassifier — scikit-learn 1.7.2 documentation
    This class implements a meta estimator that fits a number of randomized decision trees (aka extra-trees) on various sub-samples of the dataset.Missing: subspace | Show results with:subspace
  17. [17]
    fitcensemble - Fit ensemble of learners for classification - MATLAB
    “The random subspace method for constructing decision forests.” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 8, pp. 832–844 ...Description · Examples · Input Arguments · Name-Value Arguments
  18. [18]
  19. [19]
    A Spatial-Temporal Modeling Approach for Air Quality Prediction
    In this paper, we developed a long short-term memory-based deep random subspace learning (LSTM-DRSL) framework to achieve high forecast accuracy. The framework ...<|separator|>
  20. [20]
    [PDF] Bagging Predictors - UC Berkeley Department of Statistics
    Abstract. Bagging predictors is a method for generating multiple versions of a pre- dictor and using these to get an aggregated predictor.
  21. [21]
    [PDF] pasting bites together for prediction in large data sets and on-line
    Bagging: This is simple random selection from D with all examples having the same probability of being selected. Continue until N examples are selected. The ...Missing: subspace | Show results with:subspace
  22. [22]
    Extremely randomized trees | Machine Learning
    Mar 2, 2006 · This paper proposes a new tree-based ensemble method for supervised classification and regression problems.
  23. [23]
    [2406.01002] Random Subspace Local Projections - arXiv
    Jun 3, 2024 · Random subspace methods are adapted to estimate local projections by averaging regressions over subsets of controls, recovering impulse ...