Partial autocorrelation function
The partial autocorrelation function (PACF) is a fundamental tool in time series analysis that quantifies the direct correlation between observations in a stationary time series at a given lag, after accounting for the correlations attributable to all shorter lags. Formally, for a mean-zero stationary process \{X_t\}, the PACF at lag k, denoted \phi_{kk}, is the correlation between X_{t+k} and X_t conditional on the intervening values X_{t+1}, \dots, X_{t+k-1}; for k=1, it equals the autocorrelation at lag 1, while for k \geq 2, it is computed as \phi_{kk} = \corr(X_{t+k} - \hat{X}_{t+k}, X_t - \hat{X}_t), where \hat{X}_{t+k} and \hat{X}_t are the linear projections onto the intervening variables.[1][2] Introduced as part of the Box-Jenkins methodology for ARIMA modeling in the seminal 1970 text Time Series Analysis: Forecasting and Control, the PACF plays a central role in model identification by helping distinguish autoregressive (AR) from moving average (MA) components. For pure AR(p) processes, the PACF exhibits a sharp cutoff to zero beyond lag p (i.e., \phi_{kk} = 0 for all k > p), making it particularly useful for determining the order p of an AR model; in contrast, for MA(q) processes, the PACF tails off gradually toward zero without a strict cutoff.[2][3] Estimation of the PACF typically relies on sample autocorrelations via methods such as the Yule-Walker equations or the more efficient Levinson-Durbin recursion, which solves the system \rho_j = \sum_{i=1}^k \phi_{k i} \rho_{|j-i|} for j = 1, \dots, k, yielding \phi_{kk} = \frac{\rho_k - \sum_{j=1}^{k-1} \phi_{k-1,j} \rho_{k-j}}{1 - \sum_{j=1}^{k-1} \phi_{k-1,j} \rho_j}.[1] In practice, PACF plots are examined alongside autocorrelation function (ACF) plots within the Box-Jenkins framework to select appropriate ARIMA parameters, with confidence bands (often \pm 1.96 / \sqrt{T} for large samples T) indicating significant lags.[2] This approach has broad applications in forecasting economic indicators, climate data, and financial series, where it aids in parsimonious model specification and diagnostic checking.Fundamentals
Autocorrelation Function Overview
The autocorrelation function (ACF) quantifies the linear relationship between values of a time series and its own past or future values at various lags, thereby measuring the extent of dependence within the series.[4] For a weakly stationary process \{Y_t\}, the population autocorrelation at lag k is defined as \rho_k = \frac{\Cov(Y_t, Y_{t-k})}{\Var(Y_t)}, where \Cov(Y_t, Y_{t-k}) is the covariance between observations separated by k time units, and \Var(Y_t) is the variance of the process, which remains constant across time due to stationarity.[5] This normalized measure ranges from -1 to 1, with \rho_0 = 1 and values near zero indicating negligible linear dependence at that lag. The concept of correlation underlying the ACF originated with Francis Galton's work in 1886 on familial inheritance patterns, where he examined dependencies in stature across generations.[6] Its application to time series analysis was formalized in the 1920s by G. Udny Yule, who developed methods to model periodicities in disturbed data such as sunspot numbers, introducing autoregressive representations that relied on serial correlations.[7] This was further advanced in the early 1930s by Gilbert Walker through equations linking autocorrelations to model parameters. In time series analysis, the ACF reveals the overall pattern of serial dependence, helping to detect trends, seasonality, or randomness in data, but it aggregates both direct and indirect influences across lags without isolating them. The partial autocorrelation function serves as an extension, focusing on direct linear effects by controlling for intervening lags.Definition of Partial Autocorrelation Function
The partial autocorrelation function (PACF) at lag k, denoted \phi_{kk}, measures the correlation between Y_t and Y_{t-k} after removing the linear effects of the intermediate lags from 1 to k-1.[8] This is formally defined as the conditional correlation \phi_{kk} = \corr(Y_t, Y_{t-k} \mid Y_{t-1}, \dots, Y_{t-k+1}) for a stationary time series \{Y_t\}.[8] Unlike the autocorrelation function (ACF), which captures unconditional correlations, the PACF isolates the direct association at lag k by accounting for intervening variables.[9] In population terms, \phi_{kk} can be expressed as the correlation between residuals from the best linear predictors excluding the intermediate effects: \phi_{kk} = \corr\left( Y_t - \sum_{j=1}^{k-1} \phi_{k-1,j} Y_{t-j}, \, Y_{t-k} - \sum_{j=1}^{k-1} \phi_{k-1,j} Y_{t-k-j} \right), where the coefficients \phi_{k-1,j} (for j = 1, \dots, k-1) are obtained from the autoregressive approximation of order k-1.[10] This formulation arises from the structure of the inverse of the Toeplitz autocovariance matrix, where the PACF values correspond to specific elements reflecting conditional dependencies.[11] The PACF is closely related to the Yule-Walker equations, which provide a system for solving autoregressive coefficients: for lags j = 1, \dots, k, \rho_j = \sum_{i=1}^k \phi_{ki} \rho_{j-i}, where \rho_j is the ACF at lag j.[8] Here, \phi_{kk} is the final coefficient in the solution to this k-th order system, representing the additional contribution of lag k beyond lower-order terms.[11] A distinctive property of the PACF is its behavior for autoregressive processes: for an AR(p) process, \phi_{kk} = 0 for all k > p, providing a theoretical cutoff that aids in model order identification.[8] This zeroing out after lag p stems directly from the finite dependence structure of AR models as captured by the Yule-Walker solutions.[12]Computation
Theoretical Derivation
The partial autocorrelation function (PACF) for a stationary time series {Y_t} is derived under the assumptions of weak stationarity, which ensures that the mean and autocovariance function depend only on the time lag, and the existence of finite second moments, which guarantees that the autocovariance function \gamma(h) = \mathrm{Cov}(Y_t, Y_{t+h}) is well-defined for all lags h. Consider the joint distribution of the random vector \mathbf{Z}{k+1} = (Y_t, Y{t-1}, \dots, Y_{t-k+1}, Y_{t-k})^\top for fixed t and lag k \geq 1. Under the additional assumption that {Y_t} is a Gaussian process, this vector follows a multivariate normal distribution with mean vector \boldsymbol{\mu} = \mathbb{E}[\mathbf{Z}{k+1}] and covariance matrix \boldsymbol{\Gamma}{k+1}, a (k+1) \times (k+1) symmetric Toeplitz matrix with elements [\boldsymbol{\Gamma}{k+1}]{i,j} = \gamma(|i-j|). The k-th partial autocorrelation coefficient \phi_{k,k} is defined as the partial correlation between Y_t and Y_{t-k} conditional on the intervening variables Y_{t-1}, \dots, Y_{t-k+1}, which, for multivariate normal distributions, equals the correlation between the residuals of the linear regressions of Y_t and Y_{t-k} onto the intervening variables.[13] Explicitly, \phi_{k,k} = \mathrm{Corr}(Y_t - \hat{Y}t^{(k-1)}, Y{t-k} - \hat{Y}{t-k}^{(k-1)} ), where \hat{Y}t^{(k-1)} and \hat{Y}{t-k}^{(k-1)} are the projections of Y_t and Y{t-k} onto the span of {Y_{t-1}, \dots, Y_{t-k+1}}, respectively. This partial correlation can be expressed in matrix form using the inverse autocovariance matrix \boldsymbol{\Gamma}{k+1}^{-1}. Specifically, the elements of \boldsymbol{\Gamma}{k+1}^{-1} = (\pi_{i,j}){i,j=1}^{k+1} satisfy the Yule-Walker relations derived from the autoregressive representation, and \phi{k,k} = -\pi_{1,k+1} / \sqrt{\pi_{1,1} \pi_{k+1,k+1}}, confirming its role as the autoregressive coefficient in the order-k AR model. For the population PACF, \phi_{k,k} is the (k,k)-th element of the solution to the Yule-Walker system normalized appropriately, confirming its role as the autoregressive coefficient in the order-k AR model. A recursive method to compute the PACF from the autocorrelation function \rho_h = \gamma(h)/\gamma(0) avoids direct inversion of successively larger matrices and follows from the Levinson-Durbin algorithm, originally developed for efficient solution of Toeplitz systems in time series fitting. The forward recursion initializes with \phi_{1,1} = \rho_1 and, for k \geq 2, updates the coefficients as \begin{align*} \phi_{k,j} &= \phi_{k-1,j} - \phi_{k,k} \phi_{k-1,k-j}, \quad j = 1, \dots, k-1, \\ \phi_{k,k} &= \frac{\rho_k - \sum_{j=1}^{k-1} \phi_{k-1,j} \rho_{k-j}}{1 - \sum_{j=1}^{k-1} \phi_{k-1,j} \rho_j}, \end{align*} where the denominator is the reciprocal of the innovation variance in the order-(k-1) AR approximation, ensuring numerical stability and O(k^2) computation for the first m coefficients. This recursion equates the PACF to the innovation algorithm coefficients and holds under the stationarity assumption, providing the theoretical link between ACF and PACF for model identification.[13]Sample Estimation
The sample partial autocorrelation function (PACF) is estimated from observed time series data \{Y_t\}_{t=1}^n by substituting the sample autocorrelation coefficients \hat{\rho}_k into the Durbin-Levinson recursion, which provides an efficient recursive computation of the autoregressive coefficients \hat{\phi}_{k,j} for j=1,\dots,k and k=1,\dots,m, where m is the maximum lag of interest and typically m < n/2 to ensure reliable estimates.[13] The recursion begins with \hat{\phi}_{1,1} = \hat{\rho}_1 and proceeds iteratively as \hat{\phi}_{k,k} = \frac{\hat{\rho}_k - \sum_{j=1}^{k-1} \hat{\phi}_{k-1,j} \hat{\rho}_{k-j}}{1 - \sum_{j=1}^{k-1} \hat{\phi}_{k-1,j} \hat{\rho}_j}, with the remaining coefficients updated via \hat{\phi}_{k,j} = \hat{\phi}_{k-1,j} - \hat{\phi}_{k,k} \hat{\phi}_{k-1,k-j} for j=1,\dots,k-1.[13] The sample PACF at lag k is then \hat{\phi}_{k,k}, serving as a plug-in estimator for the population partial autocorrelation \phi_{k,k}. An equivalent approach computes the sample PACF through direct least-squares regression: for each lag k=1,\dots,m, fit an autoregressive model of order k by ordinary least squares (OLS) to the equation Y_t = \sum_{j=1}^k \phi_{k,j} Y_{t-j} + \epsilon_t for t=k+1,\dots,n, and extract \hat{\phi}_{k,k} as the coefficient on the k-th lag, which isolates the direct correlation after adjusting for intermediate lags.[1] This regression-based method is computationally straightforward for low m but becomes intensive for large m due to repeated matrix inversions, making the Durbin-Levinson recursion preferable for efficiency. In small samples (n < 100), the sample PACF \hat{\phi}_{k,k} exhibits bias, typically negative for k \geq 1, due to the finite-sample variability in \hat{\rho}_k and the recursive structure, leading to underestimation of the true partial autocorrelations especially at higher lags.[14] To address this bias and obtain reliable inference, truncated estimators limit computation to lags up to a small multiple of the expected order (e.g., m \approx 2p for suspected AR(p)), reducing variance at the cost of potentially missing higher-order effects, while bootstrapping methods—such as the stationary bootstrap—generate resampled series to construct empirical confidence intervals for \hat{\phi}_{k,k}, accounting for dependence structure without assuming normality.[15] Common software implementations facilitate these computations: thepacf() function in R's base stats package uses the Durbin-Levinson recursion by default for sample PACF estimation up to specified lags, including optional confidence bands based on asymptotic normality.[16] Similarly, the pacf() function in Python's statsmodels.tsa.stattools module supports multiple methods including Yule-Walker (via Durbin-Levinson), OLS, and Burg's algorithm, with built-in options for confidence intervals.[14]