Fact-checked by Grok 2 weeks ago

Linear prediction

Linear prediction is a technique that estimates the current value of a discrete-time signal as a of its previous samples, minimizing the prediction through least-squares optimization. This approach models signals, particularly random processes, using an autoregressive where the predicted sample \hat{x}_n = \sum_{k=1}^p a_k x_{n-k}, with a_k as predictor coefficients and p as the model order, yielding a residual e_n = x_n - \hat{x}_n. Originating from early 20th-century work on time series analysis, such as Yule's 1927 application to sunspot data and advancements by Wiener and Kolmogorov in the 1940s for extrapolation of processes, linear prediction has become foundational in fields like speech analysis and geophysical signal processing. In speech processing, linear prediction underpins linear predictive coding (LPC), which represents the vocal tract as an all-pole filter driven by a periodic impulse train for voiced sounds or noise for unvoiced ones, enabling efficient estimation of formants, pitch, and spectral envelopes. The core principle involves solving the Yule-Walker equations derived from the autocorrelation of the signal to obtain the coefficients, often via efficient algorithms like Levinson-Durbin recursion, which achieves computational complexity of O(p^2). Two primary estimation methods dominate: the autocorrelation method, which assumes signal stationarity and applies windowing (e.g., Hamming) over the entire frame to ensure filter stability, and the covariance method, which handles non-stationarity by minimizing error over finite intervals but risks instability without additional constraints. These techniques facilitate applications such as speech compression in codecs like CELP, synthesis for vocoders, speaker identification, and even extensions like perceptual linear prediction for improved psychoacoustic modeling. Beyond speech, linear prediction extends to diverse domains including electroencephalogram ( for brain signal modeling, seismic data processing in , and through autoregressive techniques. Its efficacy stems from the parsimonious all-pole model, which captures smooth spectral envelopes with fewer parameters than direct methods, though it assumes and may require pole-zero models for complex signals. Influential developments, such as Itakura and Saito's maximum likelihood formulation in 1968, have solidified its role in modern tools and standards.

Fundamentals

Definition and Basic Concept

Linear prediction is a fundamental technique in for estimating the value of a discrete-time signal at a given time instant based on a of its previous values. Formally, the predicted sample \hat{x}(n) is approximated as \hat{x}(n) = \sum_{k=1}^{p} a_k x(n-k), where p is the prediction order, a_k are the coefficients determined to minimize the prediction error, and x(n-k) denote the p past samples. The resulting prediction error is defined as e(n) = x(n) - \hat{x}(n), representing the difference between the actual and estimated values. This approach models the signal as an output of a driven by past observations, enabling efficient representation and analysis of correlated data. The method operates under key assumptions that ensure its applicability and optimality. Primarily, it assumes the underlying signal process is , meaning its statistical properties, such as mean and , remain constant over time, which allows the use of time-invariant coefficients. Additionally, the predictor is strictly linear in the coefficients a_k, facilitating straightforward optimization without introducing nonlinear complexities. These assumptions underpin the technique's effectiveness in scenarios where signals exhibit temporal dependencies, such as in speech or geophysical data. To minimize the E[e^2(n)], the coefficients a_k are selected according to the , which posits that the optimal e(n) must be uncorrelated with each of the past samples used in the . Mathematically, this requires E[e(n) x(n-k)] = 0 for k = 1, 2, \dots, p, ensuring the contains no component predictable from the observed window. This principle derives from the projection theorem in and guarantees the least-squares solution for linear estimators. A illustrative example is one-step-ahead prediction (p=1) applied to a white noise signal, where consecutive samples are uncorrelated with zero mean and unit variance. Here, the optimal coefficient a_1 = 0, yielding \hat{x}(n) = 0 and e(n) = x(n), so the prediction error variance equals the signal variance \sigma_x^2 = 1. Consequently, the prediction gain, defined as $10 \log_{10} (\sigma_x^2 / \sigma_e^2), is zero decibels, demonstrating that linear prediction offers no improvement for inherently unpredictable, uncorrelated processes.

Historical Development

The roots of linear prediction trace back to 19th-century advancements in time series analysis, where statisticians like , , and developed foundational methods for modeling sequential data and , laying the groundwork for predictive techniques in stochastic processes. A pivotal early contribution came in 1927 when George Udny Yule applied autoregressive models to analyze and predict sunspot data, introducing a method to investigate periodicities in disturbed series that marked the first explicit use of linear prediction in time series . This work demonstrated how past values could linearly predict future ones, influencing subsequent statistical modeling of temporal dependencies. In 1938, Herman Wold advanced the theoretical underpinnings in his doctoral thesis, decomposing stationary processes into deterministic and purely components, which provided a rigorous link between and optimal linear . Wold's emphasized the of processes in , establishing principles for representing phenomena as sums of predictable and unpredictable parts. Post-World War II, linear gained prominence in speech , with significant developments at Bell Laboratories in the 1960s and 1970s. Bishnu S. Atal and colleagues explored linear predictive coding (LPC) for speech analysis and synthesis, building on earlier statistical approaches to model vocal tract resonances efficiently. Independently, Fumitada Itakura at NTT Laboratories in 1967 introduced maximum likelihood methods for LPC parameter estimation, adapting techniques to spectral envelope in audio signals. John Burg's 1967 presentation on maximum entropy spectral analysis further propelled the field by proposing an efficient method for estimating prediction error and coefficients, which avoided assumptions about unobserved autocorrelation lags and improved resolution in short data records, influencing algorithms in throughout the 1970s. A key milestone was John D. Markel's 1971 paper on LPC applications, which detailed trajectory estimation and synthesis techniques, enabling practical systems. By the 1980s, linear prediction was integrated into standards, notably the GSM full-rate speech adopted in 1990, which employed regular pulse excitation with long-term prediction (RPE-LTP) based on LPC principles to achieve efficient low-bitrate voice compression for mobile networks. This adoption marked linear prediction's transition from theoretical tool to widespread engineering application in digital communications.

Mathematical Formulation

Forward Linear Prediction Model

The forward linear prediction model estimates the current value of a discrete-time signal x(n) using a of its p previous samples, enabling causal of future samples from past observations. This approach assumes the signal follows an autoregressive where each sample is predictable from earlier ones, minimizing the mean-squared prediction error through optimal coefficient selection. The model is foundational in for applications requiring efficient representation of correlated data, such as speech and . The predicted value \hat{x}(n) for a p-th order forward predictor is expressed as \hat{x}(n) = -\sum_{k=1}^{p} a_p(k) x(n-k), where a_p(k) denote the prediction coefficients that weight the past samples x(n-k). These coefficients are chosen to minimize the of the squared , effectively capturing the signal's short-term correlations. The negative sign convention aligns with the autoregressive formulation, where the predictor acts as an all-pole filter. The forward prediction error e_p(n) is the difference between the actual sample and its prediction: e_p(n) = x(n) - \hat{x}(n) = x(n) + \sum_{k=1}^{p} a_p(k) x(n-k). This error signal, often termed the residual or innovation, quantifies the unpredictable component of the signal and serves as input to synthesis filters in predictive coding schemes. For stationary signals, the error is uncorrelated with past samples when coefficients are optimally estimated, ensuring the predictor extracts all linear dependencies. Reflection coefficients, denoted k_i, arise in the recursive computation of predictor coefficients and represent the partial correlation between the current sample and the farthest past sample, conditioned on intervening samples. In the forward model, they facilitate by requiring |k_i| < 1 for all i = 1, \dots, p, which guarantees that the associated all-pole filter has poles inside the unit circle, preventing instability in implementation. These coefficients are particularly useful in adaptive scenarios, as they allow incremental updates without recomputing the entire set of a_p(k). The lattice structure provides a modular block diagram realization of the forward predictor, consisting of a tapped delay line with multipliers by the reflection coefficients and adders for error computation. It recursively builds higher-order predictors from lower-order ones through stages where each level updates the forward error using the previous backward error, enabling efficient order recursion and numerical stability. This structure is depicted as a series of delay elements z^{-1}, reflection coefficient multipliers k_m, and summers, forming an IIR filter equivalent to the direct-form predictor. For illustration, consider a second-order (p=2) forward predictor applied to a unit-amplitude sinusoidal signal x(n) = \cos(\omega_0 n), where \omega_0 is the normalized frequency. The optimal coefficients are a_2(1) = -2 \cos(\omega_0) and a_2(2) = 1, derived from the signal's autoregressive representation. Substituting these yields \hat{x}(n) = 2 \cos(\omega_0) x(n-1) - x(n-2), resulting in zero prediction error e_2(n) = 0 for the noiseless sinusoid, as the model perfectly reconstructs the periodic waveform. This demonstrates how increasing the predictor order reduces the error to negligible levels for signals with few dominant frequencies, highlighting the model's efficacy in capturing sinusoidal correlations.

Backward Linear Prediction Model

The backward linear prediction model provides an anti-causal approach to estimating past samples of a signal from subsequent observations within a finite window, contrasting with the causal nature of forward prediction. For a signal x(n) and a prediction order m, the predicted value of the past sample is given by \hat{x}(n-m) = -\sum_{k=1}^{m} a_m(k) x(n-m+k), where a_m(k) are the prediction coefficients optimized to minimize the prediction error over the window. This formulation arises in the context of autoregressive modeling for stationary processes, where the backward predictor leverages future samples to reconstruct earlier ones, facilitating computations in recursive algorithms. The backward prediction error, denoted \tilde{e}_m(n), quantifies the discrepancy between the actual and predicted past sample: \tilde{e}_m(n) = x(n-m) + \sum_{k=1}^{m} a_m(k) x(n-m+k). This error is minimized in a least-squares sense, yielding normal equations analogous to those in forward prediction for stationary signals. For such processes, the backward and forward models are related by time-reversibility of the autocorrelation, with backward coefficients being the reverse of the forward ones: the coefficient weighting the sample closest to the predicted one in backward corresponds to the farthest in forward, i.e., a_m^b(k) = a_m^f(m + 1 - k). In order-recursive algorithms like the Levinson-Durbin recursion, the backward prediction model plays a crucial role by enabling efficient step-up procedures from order m-1 to m. Specifically, the backward errors from lower-order predictors are used alongside forward errors to compute the next reflection coefficient, allowing the full set of coefficients to be updated with O(m) operations per step while exploiting the Toeplitz structure of the autocorrelation matrix. A key stability condition for the resulting inverse filter (the autoregressive model) requires the magnitude of each reflection coefficient to be less than 1, ensuring all poles lie inside the unit circle and preventing instability in the prediction process. Violation of this condition, such as |k_m| \geq 1, can lead to poles outside the unit circle, rendering the filter non-causal or divergent in implementation.

Parameter Estimation

Yule-Walker Equations

The Yule-Walker equations provide a set of linear normal equations for estimating the coefficients of a linear predictor by minimizing the mean squared prediction error under the orthogonality principle. For a forward linear prediction error e_p(n) = x(n) - \sum_{j=1}^p a_p(j) x(n-j), the principle requires that the error be orthogonal to the data used for prediction, yielding E[e_p(n) x(n-k)] = 0 for k = 1, 2, \dots, p. Substituting the error expression gives the system: r(k) - \sum_{j=1}^p a_p(j) r(k-j) = 0, \quad k = 1, 2, \dots, p, where r(k) = E[x(n) x(n-k)] is the autocorrelation function at lag k. In matrix form, these equations are expressed as \mathbf{R} \mathbf{a}_p = \mathbf{r}_p, where \mathbf{a}_p = [a_p(1), a_p(2), \dots, a_p(p)]^T is the coefficient vector, \mathbf{r}_p = [r(1), r(2), \dots, r(p)]^T, and \mathbf{R} is the p \times p symmetric autocorrelation matrix with elements R_{i,j} = r(|i-j|). The Toeplitz structure arises from the stationarity assumption and facilitates specialized solution methods. The prediction error variance is then \sigma_p^2 = r(0) - \sum_{j=1}^p a_p(j) r(j). For low prediction orders, explicit solutions highlight the method's simplicity. In first-order prediction (p=1), the equation simplifies to r(1) - a_1(1) r(0) = 0, so a_1(1) = r(1)/r(0). For second-order (p=2), the system is: \begin{bmatrix} r(0) & r(1) \\ r(1) & r(0) \end{bmatrix} \begin{bmatrix} a_2(1) \\ a_2(2) \end{bmatrix} = \begin{bmatrix} r(1) \\ r(2) \end{bmatrix}, yielding a_2(1) = \frac{r(1)(r(0) - r(2))}{r(0)^2 - r(1)^2} and a_2(2) = \frac{r(1)^2 - r(0) r(2)}{r(0)^2 - r(1)^2} after normalization by r(0). These ratios directly relate coefficients to autocorrelations, assuming r(0) > 0. The derivation assumes the signal is wide-sense , ensuring the r(k) exists and depends only on the lag k. For non-stationary signals, the equations can be applied approximately by windowing the data to create stationary frames, estimating r(k) within each window via time-averaged correlations. Direct solution of the matrix equation requires O(p^3) operations via inversion or , which motivates efficient recursive algorithms like the Levinson-Durbin recursion for higher orders.

Levinson-Durbin Recursion

The Levinson-Durbin recursion, also known as the Levinson algorithm, provides an efficient method for solving the Yule-Walker equations to obtain coefficients from estimates, building on the work of Levinson for general problems and Durbin for autoregressive estimation. It recursively computes the coefficients for increasing predictor orders up to p, leveraging the Toeplitz structure of the autocorrelation matrix to avoid direct matrix inversion. The algorithm begins with initialization for orders 0 and 1. For order 0, the prediction error variance is E_0 = r(0), where r(k) denotes the at lag k. For order 1, the coefficient is a_1(1) = r(1)/r(0), and the error variance is E_1 = r(0) (1 - a_1(1)^2). For higher orders m = 2 to p, the recursion proceeds as follows: first, compute the k_m = \frac{r(m) - \sum_{j=1}^{m-1} a_{m-1}(j) r(m-j)}{E_{m-1}}, where the sum is over previous s. Then, update the coefficients for j = 1 to m-1: a_m(j) = a_{m-1}(j) - k_m a_{m-1}(m-j), set a_m(m) = k_m, and update the error variance: E_m = E_{m-1} (1 - k_m^2). This process yields the p-th order coefficients a_p(j) upon completion. The recursion offers several advantages, including quadratic computational complexity of O(p^2) operations, which is significantly more efficient than the O(p^3) required for general matrix solving methods. Additionally, the condition |k_m| < 1 for all m serves as a stability check, ensuring the resulting all-pole filter is minimum-phase and the autocorrelation is positive definite. The reflection coefficients k_m also directly correspond to the parameters of an equivalent lattice filter structure, facilitating implementations in adaptive filtering and speech processing. The following pseudocode outlines the procedure, assuming input autocorrelations r(0) to r(p) and output coefficients a_p(1) to a_p(p):
Input: r(0), r(1), ..., r(p)
Output: a(1), a(2), ..., a(p); E (final error variance)

E = r(0)
if p >= 1:
    a[1] = r(1) / E
    E = E * (1 - a[1]^2)
for m = 2 to p:
    sum = 0
    for j = 1 to m-1:
        sum += a[j] * r[m - j]
    k = (r[m] - sum) / E
    for j = 1 to m-1:
        a_new[j] = a[j] - k * a[m - j]
    a[m] = k
    for j = 1 to m-1:
        a[j] = a_new[j]
    E = E * (1 - k^2)

return a(1) to a(p), E
(Note: In practice, arrays store coefficients up to the current order, with symmetric updates for efficiency.) As an illustrative example, consider computing 3rd-order coefficients for an autoregressive process with autocorrelations r(0) = 1, r(1) = 0.5, r(2) = 0.2, r(3) = 0.08.
  • Order 1: a_1(1) = 0.5, E_1 = 1 \times (1 - 0.25) = 0.75.
  • Order 2: k_2 = \frac{0.2 - 0.5 \times 0.5}{0.75} = \frac{0.2 - 0.25}{0.75} = -0.0667, then a_2(1) = 0.5 - (-0.0667) \times 0.5 \approx 0.5333, a_2(2) = -0.0667, E_2 = 0.75 \times (1 - 0.0044) \approx 0.7467.
  • Order 3: k_3 = \frac{0.08 - (0.5333 \times 0.2 + (-0.0667) \times 0.5)}{0.7467} \approx \frac{0.08 - 0.0733}{0.7467} \approx 0.0089, then a_3(1) = 0.5333 - 0.0089 \times (-0.0667) \approx 0.5340, a_3(2) = -0.0667 - 0.0089 \times 0.5333 \approx -0.0715, a_3(3) = 0.0089, E_3 \approx 0.7467 \times (1 - 0.00008) \approx 0.7466.
This step-wise evolution demonstrates how lower-order solutions inform higher ones, converging to the final coefficients [0.5340, -0.0715, 0.0089].

Applications

Speech Coding and Analysis

In (LPC), speech is modeled as the output of an all-pole vocal tract excited by a quasi-periodic for voiced segments or random noise for unvoiced segments. This approach captures the spectral envelope of the speech signal efficiently, enabling and . The core LPC is s(n) = \sum_{k=1}^{p} a_k s(n-k) + G u(n), where s(n) represents the speech sample at time n, a_k are the linear predictor coefficients, p is the model order, G is the gain factor scaling the excitation, and u(n) is the input excitation signal. Parameter extraction in LPC involves framing the speech signal, applying pre-emphasis, and estimating the predictor coefficients via methods like the autocorrelation or covariance approach. Typically, predictor orders of 10 to 12 are used for sampling rates between 8 and 16 kHz to adequately model the vocal tract resonances up to the Nyquist frequency. A Hamming window is commonly applied to each frame to minimize spectral leakage and ensure stationarity assumptions hold during coefficient computation. LPC finds extensive use in formant estimation and detection within . Formants, which correspond to vocal tract resonances, are derived from the roots of the predictor A(z) = 1 - \sum_{k=1}^{p} a_k z^{-k}, where the angles of the complex roots near the unit circle yield the frequencies. For detection, inverse filtering with the LPC model removes the spectral envelope, flattening the signal and exposing periodicity from which the pitch period can be estimated via or zero-crossing . LPC forms the foundation for several speech coding standards, including the (CELP) algorithm in Federal Standard FS1016, which operates at 4.8 kbps by quantizing LPC parameters alongside excitation codebooks, and the GSM Full Rate , which employs Regular Pulse Excitation with Long-Term Prediction (RPE-LTP) based on LPC at 13 kbps. These standards achieve effective compression rates in the 4-8 kbps range for variants like enhanced CELP, balancing quality and bandwidth for . Quantization of LPC parameters is evaluated using the spectral distortion (SD) measure, defined as the log spectral distance between original and quantized envelopes; an average SD of approximately 1 dB is widely regarded as achieving perceptually transparent quality without excessive outliers above 4 dB.

Time Series Forecasting

Linear prediction plays a central in through its equivalence to autoregressive () models, where the prediction coefficients directly correspond to the parameters. In an (p) model, the value at time t is expressed as x_t = \sum_{k=1}^p a_k x_{t-k} + e_t, with e_t as , mirroring the forward linear framework. The one-step-ahead forecast is thus \hat{x}(t+1) = \sum_{k=1}^p a_k x_{t+1-k}, and for h-steps ahead, it extends recursively as \hat{x}(n+h) = \sum_{k=1}^{p} a_k \hat{x}(n+h-k), relying on previously computed forecasts for lags beyond observed data. Model order selection is crucial for effective AR forecasting, balancing goodness-of-fit with model parsimony to avoid overfitting. Common criteria include the Akaike Information Criterion (AIC), defined as \text{AIC} = -2 \log L + 2p, where L is the likelihood and p is the number of parameters, and the Bayesian Information Criterion (BIC), \text{BIC} = -2 \log L + p \log n with sample size n, both minimized to choose p. Additionally, the partial autocorrelation function (PACF) aids identification: for an AR(p) process, the PACF cuts off to zero after lag p, providing a graphical tool to estimate the order by inspecting significant spikes within the 95% confidence bounds of approximately \pm 2 / \sqrt{n}. In multi-step prediction, forecast uncertainty grows due to error propagation, with the h-step-ahead variance given by \sigma^2 \sum_{j=0}^{h-1} \phi_j^2, where \phi_j (often denoted \psi_j) are the coefficients from the AR model's infinite representation, starting with \phi_0 = 1. This variance increases with h, reflecting accumulating uncertainty from iterated predictions, and prediction intervals widen accordingly using the normal approximation \hat{x}(n+h) \pm z_{\alpha/2} \sqrt{\sigma_h^2}. Parameters are typically estimated via methods like Yule-Walker equations, as detailed elsewhere. AR models via linear prediction have been applied to forecast economic indicators and stock prices. However, AR models assume stationarity, requiring constant mean and variance, which many real series violate through trends or seasonality; non-stationary cases often necessitate differencing to achieve stationarity, leading to ARIMA extensions that integrate differencing with AR components for improved long-term forecasts.

Extensions

Multi-Channel Linear Prediction

Multi-channel linear prediction extends the scalar linear prediction framework to vector-valued signals, where the observation at each time step is a d-dimensional vector \mathbf{x}(n) \in \mathbb{R}^d, such as signals captured by multiple sensors or variables in a multivariate time series. The predicted vector is given by \hat{\mathbf{x}}(n) = \sum_{k=1}^{p} \mathbf{A}_k \mathbf{x}(n-k), where p is the prediction order and each \mathbf{A}_k is a d \times d coefficient matrix capturing linear dependencies across channels and lags. This model assumes stationarity and aims to minimize the prediction error covariance, generalizing the univariate case to account for inter-channel correlations. Parameter estimation in multi-channel linear prediction relies on a generalization of the Yule-Walker equations, which relate the coefficient matrices to the autocorrelation matrices of the process. For each lag k = 1, \dots, p, the equations take the form \mathbf{R}(0) \mathbf{A}(k) + \sum_{j=1}^{k-1} \mathbf{A}(j) \mathbf{R}(k-j) = -\mathbf{R}(k), where \mathbf{R}(k) = \mathbb{E}[\mathbf{x}(n) \mathbf{x}^T(n-k)] is the d \times d autocorrelation matrix at lag k, and \mathbf{R}(0) is the covariance matrix. Solving this system of matrix equations yields the \mathbf{A}_k, often via least-squares methods or recursive algorithms adapted from the scalar Levinson-Durbin recursion, though the multivariate version requires matrix inversions at each step. In applications, multi-channel linear prediction is widely used in for arrays, where the coefficient matrices model spatial correlations to enhance signals from specific directions while suppressing . For instance, in acoustic networks, it enables distributed by predicting and subtracting reverberant or noisy components across microphones, improving signal-to-noise ratios in reverberant environments. In econometrics, it underpins multivariate autoregressive () models, which forecast multiple interrelated , such as economic indicators, by capturing cross-variable dynamics essential for and . The increased dimensionality introduces significant computational challenges, as estimating the coefficients involves solving linear systems of size up to p d \times p d, leading to a complexity of O(p^3 d^3) for direct methods like of the full . To mitigate this, techniques such as () are employed for , projecting the high-dimensional channels onto a lower-dimensional that preserves predictive correlations while reducing the effective d. A representative example is two-channel audio prediction for noise cancellation, where d=2 microphones capture a speech signal corrupted by correlated noise. The model estimates cross-channel coefficients \mathbf{A}_k that predict the noise in one channel from the other, enabling subtraction to recover clean audio.

Relation to Autoregressive Models

Linear prediction is fundamentally equivalent to autoregressive (AR) modeling, where the prediction coefficients a_k directly correspond to the AR parameters in the model x(n) = -\sum_{k=1}^{p} a_k x(n-k) + e(n), with e(n) representing innovation. This equivalence arises because the linear predictor minimizes the , which aligns with the AR process definition that expresses the current observation as a of past values plus uncorrelated . The power (PSD) of such an AR process is then given by P(\omega) = \frac{\sigma_e^2}{|A(e^{j\omega})|^2}, where A(z) = 1 + \sum_{k=1}^{p} a_k z^{-k} is the AR and \sigma_e^2 is the variance of the innovation e(n). A key interpretation of this equivalence lies in the maximum principle, where the linear prediction represents the maximum (or flattest) consistent with the known function up to p. This approach, originally developed by , ensures that the extrapolated autocorrelations beyond p are chosen to maximize the spectral subject to the constraints from observed , yielding an all-pole model identical to the AR . Consequently, linear prediction provides an unbiased high-resolution spectral estimate under these constraints, particularly advantageous for short records where nonparametric methods fail. In practice, AR parameters can be fitted directly to the data using variants of Durbin's algorithm, which performs least-squares estimation without first computing explicit autocorrelations, thus avoiding potential biases from sample autocorrelation estimates. This direct method iteratively refines the coefficients by minimizing the sum of squared prediction errors over the observed , offering computational efficiency and numerical stability for moderate-order models. Linear prediction, as an AR method, connects to other time series models such as (MA) and autoregressive (ARMA) processes; for instance, a finite-order AR approximation via linear prediction can model short-memory MA processes by using sufficiently high orders to capture the infinite AR representation of an MA. In ARMA contexts, linear prediction coefficients approximate the AR component, providing a pole-only model that serves as a starting point for full ARMA . Under Gaussian assumptions for the innovations, linear prediction estimators for AR parameters exhibit asymptotic , achieving the Cramér-Rao lower bound as the sample size increases, with the least-squares estimates being asymptotically equivalent to the maximum likelihood . This property ensures consistent and normally distributed estimates, with variance determined by the inverse of the information matrix derived from the process autocovariances.

References

  1. [1]
    3.9. Linear prediction - Introduction to Speech Processing
    This is a linear predictor because it takes a linearly weighted sum of past components to predict the current one.Missing: fundamentals | Show results with:fundamentals
  2. [2]
    None
    Summary of each segment:
  3. [3]
    [PDF] FUNDAMENTALS OF LINEAR PREDICTION Shivali Srivastava
    The linear prediction method provides a robust, reliable and accurate method for estimating the parameters that characterize the linear time-varying system. It ...
  4. [4]
  5. [5]
    Historical Perspective on Business Forecasting and Time Series ...
    The 19th century saw the introduction of statistical methods by Adolphe Quetelet, Francis Galton, and Karl Pearson. These methods greatly enhanced the field of ...
  6. [6]
    [PDF] The History of Linear Prediction
    Atal was with Bell Labs (1961–1996) and AT&T Labs Research (1997–2002), Florham. Park, New Jersey, where he was a technology director. He became a Bell ...
  7. [7]
    [PDF] Maximum entropy spectral analysis
    4. As one can see, this extension method is much closer to the actual autocorrelation values than the zero extension commonly used.
  8. [8]
  9. [9]
  10. [10]
    [PDF] The Yule Walker Equations for the AR Coefficients
    Fortunately, there is a better, easier way to obtain the AR coefficient for the arbitrary p, the Yule-Walker Equations. Consider the general AR(p) xi+1 = φ1xi ...Missing: linear | Show results with:linear
  11. [11]
    None
    ### Summary of Yule-Walker Equations Derivation
  12. [12]
    Computation of Linear Prediction Coefficients
    $$ M\times M$ system of linear normal equations (also called Yule-Walker or Wiener-Hopf equations): ... windowed'' method (windowed by the rectangular window).Missing: non- stationary
  13. [13]
    The Wiener (Root Mean Square) Error Criterion in Filter Design and ...
    The Wiener (Root Mean Square) Error Criterion in Filter Design and Prediction. Norman Levinson,. Norman Levinson. Massachusetts Institute of Technology.
  14. [14]
    Estimation of Parameters in Time‐Series Regression Models - 1960
    Summary. We consider the estimation of the coefficients in a general linear regression model in which some of the explanatory variables are lagged values of the ...
  15. [15]
    [PDF] LINEAR PREDICTION Levinson-Durbin Algorithm Lattice Filter ...
    Assume that the linear prediction parameters of order M-1 are known. They satisfy (1) with. M replaced by M-1 . Consider the forward problem as follows :.
  16. [16]
    [PDF] LECTURE 25 : LEVINSON FILTERING - ECE, IISc
    Levinson filtering is concerned with one-step prediction of a random sequence whose second ... 0.2 Pseudocode: Levinson-Durbin. 1: Initialize 0 = Cy(0) and K0 = −.<|separator|>
  17. [17]
    Speech Analysis and Synthesis by Linear Prediction of the Speech ...
    Aug 1, 1971 · The speech wave, sampled at 10 kHz, is analyzed by predicting the present speech sample as a linear combination of the 12 previous samples. The ...
  18. [18]
    Introduction - Linear Predictive Coding - Ircam
    The LPC order is related to the sample rate of the audio file: 10000 Hz - LPC order = 12-14 (males) and 8-10 (females);. 22050 Hz - LPC order = 24-26 (males) ...
  19. [19]
    LPC Analysis and Synthesis of Speech - MATLAB & Simulink
    Each frame is windowed using a Hamming window. Eleventh-order autocorrelation coefficients are found, and then the reflection coefficients are calculated ...
  20. [20]
    Formant Estimation with LPC Coefficients - MATLAB & Simulink
    Obtain the linear prediction coefficients. To specify the model order, use the general rule that the order is two times the expected number of formants plus 2.Missing: parameter | Show results with:parameter
  21. [21]
    [PDF] Formant location from LPC analysis data
    Abstract— The estimation of formant frequencies and band- widths from the filter coefficients obtained through LPC analysis of speech is discussed from several ...Missing: detection | Show results with:detection
  22. [22]
    Package: areas/speech/systems/celp/
    The following articles describe the Federal-Standard-1016 4.8-kbps CELP coder (it's unnecessary to read more than one): Campbell, Joseph P. Jr., Thomas E ...
  23. [23]
    [PDF] Full rate speech; Transcoding (GSM 06.10 version 6.0.0 ... - ETSI
    According to this reference configuration, the speech encoder takes its input as a 13 bit uniform PCM signal either from the audio part of the mobile station or ...
  24. [24]
    [PDF] quantization of the lpc model with the reconstruction error distortion ...
    It has been reported that transparent quality is achieved if the average of the Spectral Distortion (SD) between original and quantized models is about 1 dB ...
  25. [25]
    8.3 Autoregressive models | Forecasting: Principles and ... - OTexts
    In an autoregression model, we forecast the variable of interest using a linear combination of past values of the variable.
  26. [26]
    8.6 Estimation and order selection | Forecasting - OTexts
    For ARIMA models, the corrected AIC can be written as AICc=AIC+2(p+q+k+1)(p+q+k+2)T−p−q−k−2, AICc = AIC + 2 ( p + q + k + 1 ) ( p + q + k + 2 ) T − p − q − k − ...
  27. [27]
    6.4.4.6.3. Partial Autocorrelation Plot
    Specifically, partial autocorrelations are useful in identifying the order of an autoregressive model. The partial autocorrelation of an AR( p ) process is ...<|separator|>
  28. [28]
    8.8 Forecasting | Forecasting: Principles and Practice (2nd ed)
    ### Summary of h-Step Ahead Forecasts and Prediction Intervals for AR(p) Models
  29. [29]
    A model for stock price forecasting based on ARMA systems
    In this study, one special stock market signal is considered and analyzed using “ARMA” model with different number of poles and zeros, in order to estimate ...Missing: Yule- Walker
  30. [30]
    8.1 Stationarity and differencing | Forecasting - OTexts
    8.1 Stationarity and differencing. A stationary time series is one whose properties do not depend on the time at which the series is observed.
  31. [31]
    8.5 Non-seasonal ARIMA models | Forecasting - OTexts
    8.5 Non-seasonal ARIMA models. If we combine differencing with autoregression and a moving average model, we obtain a non-seasonal ARIMA model.
  32. [32]
    [PDF] Vector Autoregressive Models for Multivariate Time Series
    The vector autoregression (VAR) model is one of the most successful, flexi- ble, and easy to use models for the analysis of multivariate time series. It is.
  33. [33]
    Empirical multivariate autoregressive equation predictors of the ...
    A multivariate analogue of the Yule-Walker equations is presented as a procedure for estimating the parameters of the model. The model is used to predict the ...
  34. [34]
    [PDF] A comparison of multivariate autoregressive estimators
    The Nuttall-Strand method (multivariate Burg) yielded the best results for estimating MVAR parameters. For large sample sizes, Burg-type and Arfit had similar ...
  35. [35]
    Parallel processing of distributed beamforming and multichannel ...
    May 22, 2023 · This paper proposes a parallel processing framework of distributed beamforming and multichannel linear prediction (DB-BFMCLP), which consists of generalized ...
  36. [36]
    [PDF] Motion artifact reduction in EEG recordings using multi-channel ...
    With this, we propose a motion artifact reduction algorithm based on a multi-channel linear prediction (MLP) filter. ... canonical correlation analysis (CCA) has ...<|separator|>
  37. [37]
    [PDF] Speech Enhancement with Dual-path Multi-Channel Linear ...
    Aug 17, 2025 · In this paper, we propose a speech enhancement method using dual-path MCLP filters and multi-norm beamforming. Specifically, for the MCLP part, ...
  38. [38]
    Maximum entropy spectral analysis and autoregressive decomposition
    This paper investigates the work of Akaike (1969a, b) on a criterion for choosing the length of the required prediction error filter and compares two methods ...