Fact-checked by Grok 2 weeks ago

Mean squared prediction error

The mean squared prediction error (MSPE) is a fundamental statistical metric used to evaluate the predictive accuracy of a model by quantifying the expected squared difference between actual outcomes and model predictions for new, unseen . Formally, it is defined as \mathbb{E}[(Y - \hat{Y})^2], where Y represents the true value and \hat{Y} the predicted value, with the expectation taken over the of inputs and outputs. This measure emphasizes out-of-sample performance, distinguishing it from in-sample error estimates that may overestimate accuracy due to . In practice, MSPE is estimated using a test dataset by averaging the squared residuals: \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2, providing a direct assessment of how closely predictions align with reality on average. It serves as a cornerstone for and validation in fields such as , , and , where minimizing MSPE guides choices between simpler and more complex models. A related , the root mean squared prediction error (RMSPE), takes the of MSPE to express error in the original units of the target variable, aiding interpretability. The MSPE decomposes into three components—squared bias (systematic prediction error), variance of the predictor (sensitivity to training data fluctuations), and irreducible noise—illustrating the inherent trade-off between underfitting and in predictive modeling. For instance, in ordinary least squares , the optimal predictor under MSPE is the \mathbb{E}[Y \mid X], which minimizes the risk function R(g) = \mathbb{E}[(Y - g(X))^2]. This decomposition underscores MSPE's role in balancing model flexibility with generalization, influencing techniques like cross-validation to reliably estimate it.

Basic Concepts

Definition

The mean squared prediction error (MSPE) serves as a key measure of predictive accuracy in statistical modeling, representing the of the squared difference between a model's predicted value and the actual outcome for a new or unseen observation. This metric quantifies how well a model generalizes beyond the data used to train it, capturing both and variance in predictions. In contrast to the (MSE), which evaluates the performance of an estimator by computing the expected squared deviation from the true underlying , MSPE specifically emphasizes the quality of in a prediction setting, where the focus is on future responses rather than fitted values from the training data. This distinction highlights MSPE's role in assessing out-of-sample performance, making it particularly valuable for and validation in and tasks. For instance, consider a model predicting prices based on variables such as square footage and neighborhood characteristics; here, MSPE measures the squared deviation between the model's price forecasts and actual prices for new listings, offering a direct of the forecasts' reliability on a squared .

Interpretation

The mean squared prediction error (MSPE) serves as a key metric for evaluating the inaccuracy in a model's predictions, representing the expected value of the squared differences between actual and predicted outcomes. In practical terms, it captures how closely a model's forecasts align with observed data on , with smaller MSPE values signaling superior predictive accuracy and reliability for future observations. Because MSPE involves squaring the errors, it is reported in units that are the square of the target variable's units, rendering it inherently scale-dependent and challenging to interpret directly in the context of the original data scale. For instance, if the target is measured in dollars, MSPE would be in dollars squared, which may obscure intuitive understanding without additional normalization. Assessing whether an MSPE value is "good" remains highly context-dependent, varying by field, data scale, and baseline expectations; there is no universal threshold, but in domains like financial forecasting, an MSPE substantially below the unconditional variance of the target variable is often deemed acceptable, with relative reductions of 10-20% compared to simple benchmarks (such as random walks) highlighting meaningful improvements. A notable limitation of MSPE is its heightened sensitivity to outliers, as the squaring process disproportionately penalizes large errors relative to smaller ones, potentially skewing assessments in noisy datasets. Furthermore, its squared-unit nature limits direct interpretability, prompting frequent use of the root mean squared prediction error (RMSPE), the of MSPE, as a variant that restores the original scale for more accessible analysis.

Mathematical Formulation

Population MSPE

The population mean squared prediction (MSPE) represents the theoretical average squared deviation between true outcomes and predictions across the entire , assuming access to infinite from a . This metric serves as an ideal for model performance, capturing the minimal achievable under perfect conditions. It is particularly useful for understanding the fundamental limits of prediction in statistical models. Formally, the population MSPE is given by \text{MSPE} = E[(Y - \hat{Y})^2], where Y denotes the true outcome variable, \hat{Y} is the predicted value (typically \hat{Y} = \hat{f}(X) for a predictor function \hat{f} and covariates X), and the expectation is over the population joint distribution of (Y, X). This formulation assumes a fixed true underlying model, where predictions are deterministic functions of the covariates, and the population distribution remains invariant over time or draws. These assumptions ensure that the MSPE reflects intrinsic model limitations rather than sampling variability. The MSPE can be intuitively decomposed as \text{MSPE} = \text{Var}(Y \mid X) + [\text{Bias}(\hat{f}(X))]^2 + \text{Var}(\hat{f}(X)), where \text{Var}(Y \mid X) is the irreducible error arising from stochastic noise in Y conditional on X, \text{Bias}(\hat{f}(X)) = E[\hat{f}(X)] - E[Y \mid X] quantifies the average deviation of the predictor from the true conditional expectation, and \text{Var}(\hat{f}(X)) measures the predictor's variability across possible realizations (which diminishes to zero in the infinite-data population limit for consistent estimators). This breakdown highlights how prediction error stems from inherent data noise, systematic model mismatch, and predictor instability, providing an intuitive entry point to error sources without exhaustive analysis. In the context of linear regression over a population, suppose the true model is Y = \beta_0 + \boldsymbol{\beta}^T X + \epsilon, with E[\epsilon \mid X] = 0 and \text{Var}(\epsilon \mid X) = \sigma^2. Using the population least-squares coefficients (attainable with infinite data), the predictor \hat{f}(X) = \beta_0 + \boldsymbol{\beta}^T X incurs no bias or variance, yielding \text{MSPE} = \sigma^2, the irreducible error. If the linear form is misspecified relative to the true E[Y \mid X], the MSPE includes an additional bias term, manifesting as estimation error from the model's inability to capture nonlinearity, though variance remains negligible in this idealized setting.

Sample MSPE

The sample mean squared prediction error (MSPE) adapts the theoretical population MSPE to finite datasets, providing an of prediction accuracy based on observed . Unlike the population version, which represents an over infinite , the sample MSPE is calculated directly from a limited number of observations, making it susceptible to variability and in small datasets. This empirical formulation serves as the theoretical target for evaluating model performance in practice. The standard formula for sample MSPE is \text{MSPE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2, where y_i denotes the observed values in the , \hat{y}_i the corresponding from the model, and n the number of observations used in the . Here, the \hat{y}_i may be generated on the same used for model fitting (in-sample) or on a separate held-out portion (out-of-sample set). In sample contexts, a critical distinction arises between and sets: on the set often yield optimistically low MSPE values due to , whereas set evaluations better reflect generalization to unseen . In small samples, the unadjusted sample MSPE can underestimate the true prediction error by failing to account for model complexity. To mitigate this, adjustments incorporating are applied, such as dividing the sum of squared errors by n - p (where p is the number of estimated parameters) to obtain an unbiased estimate of the error variance, akin to the standard MSE in . This correction helps prevent downward bias, particularly when n is close to p. For illustration, consider computing sample MSPE in a forecasting model applied to a of 100 observations, such as data on economic indicators. The model might be fitted to the first 80 observations (training set) to generate predictions, with the remaining 20 held out as the test set. The sample MSPE is then the average of the squared differences between the 20 test observations and their one-step-ahead , yielding a scalar value that quantifies the model's predictive fidelity on this finite holdout sample.

Computation Methods

Out-of-Sample Computation

Out-of-sample mean squared prediction error (MSPE) is computed by first partitioning the dataset into a set, used for model fitting, and a separate test set reserved for evaluation. The model is trained solely on the data to generate parameter estimates, after which predictions are produced for each in the test set based on its features. The MSPE is then obtained by averaging the squared residuals between the observed test values and these predictions, providing a direct measure of predictive accuracy on unseen . This method delivers an unbiased assessment of the model's ability to generalize to new data, distinct from training performance, and is essential for identifying where a model performs well on familiar data but poorly on novel instances. For time-series data, out-of-sample computation requires careful handling to maintain temporal dependencies and prevent the use of information in training. A common strategy involves a hold-out period, where the final portion of the series (e.g., the last 20% of observations) serves as the test set, with the model fitted to all preceding data. Alternatively, rolling windows are employed, in which the training window slides forward: for each step, the model is refitted on a contiguous block of past observations to the next one or more periods, and prediction errors are aggregated across these steps to compute the overall MSPE. This respects the chronological order and simulates real-world scenarios. The following Python pseudocode illustrates a basic implementation using scikit-learn for out-of-sample MSPE in a non-time-series context; for time-series, the split would use sequential indexing instead of random partitioning:
python
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# Assume X (features) and y (target) are defined
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
mspe = mean_squared_error(y_test, y_pred)
print(f"Out-of-sample MSPE: {mspe}")
This computes the sample MSPE as referenced earlier.

In-Sample Computation

The in-sample mean squared prediction error (MSPE) is obtained by fitting a predictive model to the dataset, generating predictions for those same training observations, and then averaging the squared residuals between the actual and predicted values. This procedure yields the training error, formally expressed as \text{Err}_{\text{tr}} = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{f}(x_i))^2, where n denotes the number of training samples, y_i the observed responses, and \hat{f}(x_i) the model's predictions on the training inputs x_i. In the context of ordinary least squares (OLS) regression, this in-sample MSPE simplifies to the residual sum of squares (RSS) divided by the sample size n, providing a direct measure of the model's fit to the data used for estimation. Despite its simplicity, the in-sample MSPE systematically underestimates the true expected prediction error on unseen data due to , where the model captures noise in the training set rather than underlying patterns, leading to overly performance assessments. This downward , termed , arises from the positive between predictions and residuals in fitted models and increases with model complexity, such as the number of parameters. To quantify and correct for this , the Mallows' C_p offers a practical adjustment, estimating the MSPE as C_p = \frac{\text{RSS}_p}{\hat{\sigma}^2} - (n - 2p), where \text{RSS}_p is the RSS for a model with p parameters and \hat{\sigma}^2 is an unbiased estimate of the irreducible error variance, typically derived from the full model's residuals. Under a correctly specified model, E[C_p] \approx p, allowing identification of overfit subsets where C_p > p. In-sample MSPE computation serves as a convenient initial diagnostic for evaluating model adequacy on the training data prior to more reliable out-of-sample evaluation.

Estimation Techniques

Population Estimation

Estimating the population mean squared prediction error (MSPE), defined as the expected squared difference between actual and predicted values over the entire population, requires methods that approximate this quantity from finite sample data while accounting for model complexity and sampling variability. These estimators typically rely on parametric assumptions about the underlying model to derive unbiased or asymptotically consistent approximations of the true MSPE. Common approaches include analytical formulas for specific model classes and resampling techniques that simulate population behavior. In linear regression models, analytical estimators such as the adjusted R-squared provide a direct way to approximate the population MSPE. The adjusted R-squared, given by \bar{R}^2 = 1 - (1 - R^2) \frac{n-1}{n-p-1}, where R^2 is the coefficient of determination, n is the sample size, and p is the number of predictors, estimates the expected out-of-sample R^2, which relates to the MSPE via \text{MSPE} \approx \widehat{\mathrm{Var}}(Y) (1 - \bar{R}^2), with \widehat{\mathrm{Var}}(Y) the sample variance of Y (e.g., TSS / (n-1)) and the error variance estimated by the residual mean square. This adjustment penalizes for additional predictors, offering an unbiased estimate under the assumption of a correctly specified linear model. Similarly, the predicted residual error sum of squares (PRESS) statistic serves as another analytical estimator, computed as \text{PRESS} = \sum_{i=1}^n (y_i - \hat{y}_{(i)i})^2, where \hat{y}_{(i)i} is the predicted value for observation i using the model fitted without that observation. Dividing PRESS by n yields an estimate of the population MSPE, particularly useful for model comparison in linear settings. These methods assume the model is correctly specified, with errors that are independent and identically distributed (i.i.d.) with mean zero and constant variance, ensuring the estimators' consistency as sample size increases. For more general or complex models where analytical forms are unavailable, asymptotic approximations via bootstrap methods simulate the population variability to estimate the MSPE. The parametric bootstrap, for instance, involves fitting the model to the sample, generating bootstrap replicates from the fitted distribution (e.g., assuming i.i.d. errors), refitting the model on each replicate, and computing the average squared prediction error across these simulations. This approach yields an estimate of the population MSPE by mimicking the sampling process, with theoretical guarantees under i.i.d. error assumptions and correct model specification. The method replaces cumbersome analytical derivations with computational resampling, providing reliable approximations even for moderate sample sizes. An illustrative example is the estimation of MSPE in Gaussian process (GP) models, where analytical variance formulas directly quantify prediction uncertainty. In GP regression, the predictive distribution at a new point x^* is Gaussian with mean \bar{f}(x^*) = \mathbf{k}_*^T (\mathbf{K} + \sigma_n^2 \mathbf{I})^{-1} \mathbf{y} and variance V(Y^*) = k(x^*, x^*) - \mathbf{k}_*^T (\mathbf{K} + \sigma_n^2 \mathbf{I})^{-1} \mathbf{k}_* + \sigma_n^2, where \mathbf{K} is the kernel matrix on training inputs, \mathbf{k}_* is the kernel vector between training and test points, and \sigma_n^2 is the noise variance. Under the GP assumptions of i.i.d. noise and a correctly specified kernel, the population MSPE at x^* equals this predictive variance for the noisy observation Y^*, as the mean is the minimum-variance unbiased predictor. This closed-form expression allows precise estimation without resampling, highlighting the model's probabilistic nature.

Cross-Validation Approaches

Cross-validation approaches provide a robust framework for estimating the mean squared prediction error (MSPE) in scenarios with limited data, by systematically partitioning the dataset and reusing it to simulate multiple out-of-sample evaluations. These methods build on the principle of hold-out validation but enhance stability by averaging predictions across repeated splits, yielding a more reliable of predictive performance without requiring additional data. In k-fold cross-validation, the dataset is randomly divided into k equally sized subsets, or folds. For each iteration from 1 to k, the model is trained on the union of k-1 folds and tested on the remaining held-out fold to compute the (MSE) for that fold. The overall k-fold CV estimate of MSPE, denoted as \text{CV}_{(k)}, is then the average of these k fold-specific MSEs: \text{CV}_{(k)} = \frac{1}{k} \sum_{m=1}^{k} \frac{1}{n/k} \sum_{i \in C_m} (y_i - \hat{y}_{-m}(x_i))^2, where C_m is the m-th fold, n is the total number of observations, and \hat{y}_{-m}(x_i) is the prediction for observation i from the model trained excluding fold m. Common choices for k include 5 or 10, balancing computational cost and estimate precision. Leave-one-out cross-validation (LOOCV) represents a special case of k-fold CV where k = n, the sample size, such that each fold consists of a single observation. The model is refitted n times, each excluding one observation, and the MSPE estimate is the average of the n squared prediction errors on the left-out points: \text{LOOCV} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_{(i)}(x_i))^2, with \hat{y}_{(i)}(x_i) denoting the for the i-th from the model trained on all other n-1 . While LOOCV provides an approximately unbiased estimate of MSPE, particularly for linear models under ordinary fitting, it is computationally intensive, often requiring O(n) times more effort than a single full fit. Variants of k-fold CV address specific data characteristics to improve MSPE estimation. Stratified k-fold CV ensures that each fold maintains the same proportion of class labels or response distributions as the full , which is essential for imbalanced to prevent skewed validation errors and more accurately reflect overall predictive performance. For sequential or time-dependent , time-series cross-validation modifies the folding to respect temporal order, using expanding or rolling windows where training sets grow chronologically and validation sets consist of subsequent observations, thus avoiding lookahead in MSPE calculations. Cross-validation methods offer several advantages for MSPE estimation, including reduced variance in the error estimate compared to a single train-test split, as the averaging over multiple folds provides a more stable approximation of out-of-sample performance. However, they introduce higher computational demands, especially as k increases, and may incur slight toward overly optimistic estimates if folds are not sufficiently independent. In practice, k-fold with moderate k (e.g., 5–10) strikes a favorable between and variance for most applications.

Properties and Applications

Bias-Variance Decomposition

The mean squared prediction error (MSPE) can be decomposed into three additive components: the squared bias, the variance, and the irreducible error. This decomposition provides insight into the sources of prediction error in statistical models. Formally, for a predictor \hat{f}(x) estimating the true function f(x), where Y = f(x) + \epsilon and \epsilon is noise with E(\epsilon) = 0 and \text{Var}(\epsilon) = \sigma^2, the MSPE at a point x is given by \text{MSPE}(x) = E[(Y - \hat{f}(x))^2] = [\text{Bias}(\hat{f}(x))]^2 + \text{Var}(\hat{f}(x)) + \sigma^2, where the expectation is taken over the joint distribution of the training data and the new observation Y. The bias term, \text{Bias}(\hat{f}(x)) = E[\hat{f}(x)] - f(x), measures the systematic deviation between the average prediction of the model and the true value, arising from assumptions in the model that fail to capture the underlying relationship. High bias typically occurs in overly simplistic models that underfit the data, leading to consistent under- or over-prediction across different training sets. The variance term, \text{Var}(\hat{f}(x)) = E[(\hat{f}(x) - E[\hat{f}(x)])^2], quantifies the sensitivity of the predictor to fluctuations in the training data. It reflects how much the model's predictions vary when trained on different samples from the same distribution, often increasing with model complexity as the estimator becomes more attuned to noise in specific datasets. The irreducible error \sigma^2 represents the inherent randomness in the data that no model can eliminate. The bias-variance tradeoff highlights a fundamental tension: more complex models, such as those with higher-dimensional parameters, tend to reduce bias by better approximating f(x) but increase variance due to greater sensitivity to training data variations; conversely, simpler models exhibit lower variance but higher bias. Techniques like regularization are employed to balance these components, minimizing overall MSPE by penalizing excessive complexity. A classic illustration of this decomposition involves on simulated data where the true function is f(x) = \sin(12\pi x) with . As the degree increases from 1 to 9, the squared decreases sharply initially but stabilizes, while the variance rises monotonically, leading to an optimal degree (around 8 or 9) that minimizes MSPE before dominates. This behavior is depicted in plots showing the three components as functions of model degree, demonstrating how the total error first declines and then rises.

Use in Regression and Machine Learning

In ordinary (OLS) regression, the parameters are chosen to minimize the in-sample sum of squared errors, which under standard assumptions (e.g., linearity, no , homoscedasticity) provides an unbiased of the that minimizes the population MSPE. This approach extends to regularized variants like , which adds an L2 penalty to the in-sample MSE minimization to reduce variance in high-dimensional settings, and regression, which incorporates an L1 penalty to promote sparsity while still targeting MSPE reduction for improved predictive accuracy. In , MSPE plays a central role in hyperparameter tuning, often evaluated through cross-validation techniques such as grid search, where candidate hyperparameter sets are selected based on the lowest average MSPE across validation folds to ensure robust . For methods, random forests predictions from multiple decision trees, yielding a lower overall MSPE than individual trees due to , as the forest's bound depends on the strength and of the trees. In recent applications during the 2020s, MSPE—typically computed as on validation sets—guides model training and to prevent , with scalable approximations like mini-batch estimates enabling efficient computation on large datasets. These approximations maintain close fidelity to full-batch MSPE while reducing computational overhead in high-scale training regimes. A illustrative contrasts MSPE performance across domains: in stock return prediction, models like neural networks have achieved out-of-sample R² up to 0.172 (corresponding to a ~17% reduction in MSPE relative to historical averages) when incorporating multiple economic predictors, highlighting the metric's to market volatility. In contrast, for predicting hospital patient outcomes such as length of stay, models yielded MSPE equivalents (via root mean squared error) around 3-4 days, demonstrating MSPE's utility in clinical settings where absolute error scales with outcome variability but supports actionable .

References

  1. [1]
    2.1 - Prediction Accuracy | STAT 897D
    The mean squared error (MSE) is calculated by squaring the residuals and summing them. The value is usually interpreted as either how far (on average) the ...
  2. [2]
    [PDF] Lecture 5: Correlation, prediction, and regression
    To measure how good the predictor g(X) is, we often use the mean-square error (MSE): R(g) = E((Y − g(X))2). Namely, the MSE is the expected squared deviation ...
  3. [3]
    Statistical predictions with glmnet - PMC - NIH
    Aug 23, 2019 · The mean squared error (MSE) is the average of the squared difference between the observations and the estimated values from the fitted model.
  4. [4]
    [PDF] Lecture 1: Optimal Prediction (with Refreshers)
    Mean squared error is bias (squared) plus variance. This is the simplest form of the bias-variance decomposition, which is one of the central parts of ...
  5. [5]
    Mean Squared Error (MSE) vs. Mean Squared Logarithmic ... - Built In
    Jul 10, 2025 · MSE emphasizes large errors, while MSLE reduces outlier impact using logarithmic scaling, making it useful for skewed data with varying value ...
  6. [6]
    Mean Squared Error (MSE) - Statistics By Jim
    Mean squared error (MSE) measures error in statistical models by using the average squared difference between observed and predicted values.
  7. [7]
    [PDF] Are Product Spreads Useful for Forecasting? An Empirical ...
    Apr 14, 2014 · We document MSPE reductions as high as 20% and directional accuracy as high as 63% at the two-year horizon, making product spread models a good ...
  8. [8]
  9. [9]
    [PDF] Prediction - Department of Statistics - University of Michigan
    Oct 27, 2021 · The mean squared prediction error for OLS regression is easy to derive. The testing data follow y∗ i. = x∗ i β + ∗ i . Let y∗ = X∗ ˆ β ...
  10. [10]
    5.8 Evaluating point forecast accuracy - OTexts
    Forecasting: Principles and Practice (3rd ed). 5.8 Evaluating point forecast ... mean squared error: RMSE=√mean(e2t). Mean absolute error ...
  11. [11]
    [PDF] Model Selection and Diagnostics - School of Statistics
    Jan 4, 2017 · The adjusted R2 is a relative measure of fit: R2 a = 1 −. SSE/dfE. SST/dfT. = 1 −. ˆσ2 s2. Y where s2. Y. = Pn i=1(yi −¯y)2 n−1 is the sample ...<|control11|><|separator|>
  12. [12]
    mean_squared_error — scikit-learn 1.7.2 documentation
    Mean squared error regression loss. Read more in the User Guide. Defines aggregating of multiple output values. Array-like value defines weights used to ...
  13. [13]
    [PDF] Lecture 6: Bias and variance - MS&E 226: “Small” Data
    Training error. The first idea for estimating prediction error of a fitted model might be to look at the sum of squared error in-sample: Errtr = 1 n n. X i=1.
  14. [14]
    [PDF] Some Comments on C P C. L. Mallows Technometrics, Vol. 15, No ...
    Apr 5, 2007 · It is illuminating to consider how patterns similar to those hc describes would show up on a C,-plot. First, suppose the indcpcndcnt variables ...Missing: paper | Show results with:paper
  15. [15]
    11.7 - Cross-validation | STAT 462
    We then calculate the sum of squared prediction errors, and combine the K estimates of prediction error to produce a K-fold cross-validation estimate. When K = ...
  16. [16]
    3.1. Cross-validation: evaluating estimator performance - Scikit-learn
    The prediction function is learned using k − 1 folds, and the fold left out is used for test. Example of 2-fold cross-validation on a dataset with 4 samples: >> ...
  17. [17]
    A Comparative Study of the Use of Stratified Cross-Validation ... - NIH
    Feb 20, 2023 · Previous studies have shown that higher classification performance scores (AUC) can be achieved on imbalanced data sets using DOB-SCV instead of SCV.
  18. [18]
    5.10 Time series cross-validation | Forecasting - OTexts
    This procedure is sometimes known as “evaluation on a rolling forecasting origin” because the “origin” at which the forecast is based rolls forward in time.
  19. [19]
    [PDF] 1 RANDOM FORESTS Leo Breiman Statistics Department University ...
    A bound for the mean squared generalization error is derived that shows that the decrease in error from the individual trees in the forest depends on the ...
  20. [20]
    [PDF] Scalable Subsampling Inference for Deep Neural Networks - arXiv
    May 14, 2024 · Scalable subsampling is a non-random technique for creating a 'subagged' DNN estimator, which is computationally efficient without sacrificing ...
  21. [21]
    [PDF] Predicting Stock Market Returns with Machine Learning
    Our forecasts outperform those generated by established benchmark models in terms of both mean squared error and directional accuracy. They also generate ...
  22. [22]
    Personalized predictions of patient outcomes during and after ...
    Apr 3, 2020 · LOS in days was predicted poorly, within 3.97 days measured by root mean square error (RMSE; average LOS 2.94–3.71 days). LOS over 5 days ...