Johansen test
The Johansen test is a maximum likelihood-based procedure for testing the presence and determining the number of cointegrating relations among a set of non-stationary, integrated time series variables in a vector autoregressive (VAR) model.[1] Developed by Danish econometrician Søren Johansen in 1988, the test addresses cointegration, a concept where linear combinations of individually non-stationary series can form stationary processes, allowing for long-run equilibrium relationships despite short-run deviations.[2] It is particularly suited for multivariate systems and extends earlier univariate approaches by providing a framework to estimate cointegration vectors and test hypotheses about their structure.[1] The underlying model is a Gaussian VAR process of order k, expressed in error correction form as \Delta X_t = \sum_{i=1}^{k-1} \Gamma_i \Delta X_{t-i} + \Pi X_{t-k} + \mu + \epsilon_t, where X_t is a p \times 1 vector of variables integrated of order 1 (I(1)), \Pi = \alpha \beta' with \alpha and \beta being p \times r matrices of adjustment coefficients and cointegrating vectors (rank r < p), \mu accounts for deterministic terms like constants or linear trends, and \epsilon_t are i.i.d. Gaussian errors.[1] Under the null hypothesis of no cointegration, \Pi = 0, implying the variables are I(1) without long-run relations; otherwise, r > 0 indicates cointegration.[2] The maximum likelihood estimator for the cointegration space spanned by \beta is obtained from the eigenvectors corresponding to the largest canonical correlations between \Delta X_t and lagged levels X_{t-k}, adjusted for lagged differences.[1] Central to the test are two likelihood ratio statistics for the rank r: the trace test, -T \sum_{i=r+1}^p \ln(1 - \hat{\lambda}_i), which tests the null of at most r cointegrating relations against the alternative of more than r; and the maximum eigenvalue test, -T \ln(1 - \hat{\lambda}_{r+1}), which tests exactly r against r+1.[1] Here, T is the sample size, and \hat{\lambda}_i are the estimated eigenvalues. These statistics' asymptotic distributions are non-standard, depending on the inclusion of deterministic components, and are tabulated as functionals of Brownian motions, such as the trace of squared integrals of Brownian motion processes.[1] The sequential testing procedure starts with r=0 and increases until the null is not rejected, providing the cointegration rank.[2] Widely applied in econometrics for analyzing economic relationships like purchasing power parity or money demand, the Johansen test has influenced extensions for small samples, structural breaks, and non-Gaussian errors, maintaining its status as a cornerstone for multivariate cointegration analysis.[1]Introduction
Definition and Purpose
The Johansen test is a statistical procedure developed by Søren Johansen for detecting cointegration among multiple time series that are integrated of order one, denoted as I(1).[3] It specifically tests the rank r of the cointegration matrix, which represents the number of linearly independent long-run equilibrium relationships among the variables, allowing researchers to identify stationary linear combinations despite the individual non-stationarity of the series. Cointegration occurs when non-stationary I(1) time series share a stable long-run relationship such that certain linear combinations of them are stationary, or I(0), reflecting economic equilibria like those between prices and exchange rates. This contrasts with unit root tests, such as the Augmented Dickey-Fuller (ADF) test, which focus on assessing stationarity in a single time series or residuals from pairwise regressions, whereas the Johansen approach handles multivariate systems directly to uncover multiple cointegrating relations.[4] The primary purpose of the Johansen test is to guide the specification of models that account for these long-run dependencies, enabling accurate inference in non-stationary data without the pitfalls of spurious correlations. In economics, it plays a vital role in analyzing relationships—such as between money demand and income—by avoiding invalid regressions that arise when non-cointegrated I(1) variables are analyzed together, thus ensuring reliable estimates of equilibrium dynamics.[4] The test underpins the vector error correction model framework for capturing both short-term adjustments and persistent equilibria.Historical Development
The concept of cointegration emerged from early work on error correction models and vector autoregression (VAR) frameworks in econometrics. Clive Granger introduced the foundational idea of cointegrated variables and their connection to error-correcting mechanisms in his 1981 paper, emphasizing how non-stationary time series could maintain long-run equilibrium relationships.[5] This built upon Christopher Sims' 1980 advocacy for unrestricted VAR models as a flexible tool for analyzing multivariate time series without imposing rigid structural assumptions.[6] In 1987, Robert Engle and Granger formalized a two-step procedure for testing and estimating cointegration in bivariate systems, involving residual-based regression followed by unit root tests on the residuals, which demonstrated superconsistent estimation properties but was limited to single-equation settings.[7] Søren Johansen extended these ideas to a full multivariate likelihood-based framework starting in 1988, deriving maximum likelihood estimators and likelihood ratio tests for cointegration vectors within Gaussian VAR processes integrated of order one.[2] His seminal 1991 paper in Econometrica provided a comprehensive procedure for estimating the cointegration rank and testing restrictions on cointegrating relations, enabling simultaneous inference across multiple variables and overcoming the limitations of the Engle-Granger approach by incorporating the full system dynamics.[8] This likelihood ratio method, often referred to as the Johansen test, marked a shift toward parametric, system-wide analysis of cointegration. Following its introduction, the Johansen test gained widespread adoption in econometric research during the 1990s, becoming a cornerstone for analyzing long-run relationships in macroeconomic time series.[9] Refinements addressed finite-sample biases, with Johansen proposing Bartlett-type corrections for the cointegration rank test in 2000 and further adjustments for hypothesis testing on cointegrating vectors in 2002.[10] As of 2025, the test remains a standard feature in major time series software packages, such as R's urca library and Python's statsmodels, while ongoing extensions incorporate structural breaks in deterministic trends to handle regime shifts, as detailed in Johansen, Mosconi, and Nielsen's 2000 framework.[11]Theoretical Background
Cointegration Concept
Cointegration describes a situation in which two or more time series, each of which is non-stationary, possess a stable long-run equilibrium relationship such that a linear combination of them is stationary.[7] This concept is particularly relevant for integrated processes of order one, denoted I(1), where individual series require first differencing to become stationary, but the combination achieves stationarity without differencing.[7] The intuition behind cointegration is that while the series may drift apart in the short run due to stochastic trends, they are tethered by an underlying equilibrium, with any deviations tending to correct themselves over time.[7] For instance, aggregate consumption and disposable income in an economy often wander individually but maintain a proportional long-run relationship, reflecting economic theory.[7] A time series is said to be integrated of order d, or I(d), if it must be differenced d times to induce stationarity; cointegration among I(1) series implies that the system's effective order of integration is reduced to zero for the equilibrium error.[7] The Engle-Granger representation theorem establishes that if variables are cointegrated, they can be modeled in an error correction form, which links short-run changes to adjustments toward the long-run equilibrium.[7] This differs fundamentally from spurious correlation, where regressing unrelated I(1) series produces high R-squared values and significant coefficients by chance, without implying any true relationship, as highlighted in early critiques of such regressions.[7] The vector error correction model serves as the primary framework for operationalizing cointegration in multivariate settings.[7]Vector Error Correction Model
The Vector Error Correction Model (VECM) serves as the foundational framework for analyzing cointegrated multivariate time series, reformulating a vector autoregressive (VAR) process to explicitly account for both short-run fluctuations and long-run equilibrium constraints. In this model, integrated variables of order one (I(1)) are modeled such that deviations from their long-run relationships trigger corrective adjustments, ensuring the process remains stationary in differences while respecting the cointegration structure. This representation is essential for systems where individual series exhibit unit roots but linear combinations do not, allowing researchers to capture the mechanics of economic or financial equilibria that persist over time.[2] The VECM arises as a reparameterization of a standard VAR(p) model specified in levels for I(1) series, transforming the unrestricted differenced form into one that includes lagged error correction terms derived from the cointegrating relations. This equivalence ensures that the VECM imposes no additional restrictions beyond those implied by cointegration, preserving the full information content of the underlying VAR while highlighting the equilibrium-correcting dynamics. Key elements include the adjustment matrix, typically denoted α, which quantifies how each variable responds to disequilibria in the system, and the cointegrating matrix β, which defines the stable long-run relationships among the variables. The cointegration rank r, representing the dimension of the space spanned by β, determines the number of such equilibria.[8] The VECM relies on several core assumptions to facilitate estimation and inference. Errors are assumed to be identically and independently distributed as Gaussian with zero mean and a constant covariance matrix, ensuring no serial correlation and enabling maximum likelihood estimation. In certain applications, weak exogeneity is imposed on subsets of variables, meaning their marginal processes do not depend on the cointegrating errors, which simplifies conditional modeling without loss of efficiency for parameters of interest. These assumptions underpin the model's suitability for likelihood-based procedures.[8] This structure makes the VECM particularly amenable to testing for cointegration, as it supports direct inference on the rank of the cointegration space through maximum likelihood methods, forming the basis for procedures like the Johansen test. By embedding the long-run relations within a dynamic error-correcting framework, the VECM provides a unified approach to both estimation and hypothesis testing in non-stationary systems.[2]Mathematical Formulation
VECM Representation
The vector error correction model (VECM) provides the foundational representation for analyzing cointegrated time series within the Johansen test framework. For a p \times 1 vector of integrated of order one, I(1), variables Y_t, the VECM is derived from a vector autoregressive (VAR) model of order k and takes the form \Delta Y_t = \sum_{i=1}^{k-1} \Gamma_i \Delta Y_{t-i} + \Pi Y_{t-k} + \varepsilon_t, where \Delta Y_t = Y_t - Y_{t-1} is the first difference, \Pi is a p \times p matrix capturing the long-run equilibrium relationships, \Gamma_i are p \times p matrices representing short-run dynamics for i = 1, \dots, k-1, and \varepsilon_t is a p \times 1 vector of white noise errors assumed to be i.i.d. Gaussian with mean zero and covariance matrix \Omega.[3][12] Under the cointegration hypothesis, the matrix \Pi admits a reduced rank factorization \Pi = \alpha \beta', where \alpha is a p \times r matrix of adjustment speeds (indicating the rate at which the system corrects deviations from equilibrium) and \beta is a p \times r matrix whose columns are the r cointegrating vectors (defining the stationary linear combinations of the variables), with $0 < r < p.[3][13] The reduced rank r of \Pi implies that there are r linearly independent long-run equilibrium relations among the p variables, ensuring that while individual series may be non-stationary, certain combinations \beta' Y_t are stationary.[3] The \Gamma_i matrices capture the contemporaneous and lagged short-run adjustments to deviations from equilibrium.[12] The lag length k-1 in the VECM corresponds to the order k of the underlying VAR model in levels, which is typically selected using information criteria such as the Akaike information criterion (AIC) or Bayesian information criterion (BIC) to balance model fit and parsimony while ensuring residuals are approximately white noise.[12] For identification of the cointegrating relations, the matrix \beta is normalized such that one element in each column is set to unity (often the coefficient on a variable treated as numeraire), ensuring uniqueness up to rotations within the column space spanned by \beta.[13] This normalization facilitates interpretation of the equilibrium relations while the rank r can be tested via the eigenvalues of \Pi, as detailed in subsequent sections on test statistics.[3]Cointegration Relations and Rank
In the vector error correction model (VECM), the cointegration matrix \Pi captures the long-run equilibrium relationships among the variables, expressed as \Pi = \alpha \beta', where \alpha and \beta are p \times r matrices of full column rank, and r denotes the cointegration rank, representing the number of linearly independent cointegrating relations.[8] The rank r of \Pi is reduced (with $0 < r < p) when the variables are integrated of order 1 but share stationary linear combinations, ensuring that the process is not explosive and allows for error correction dynamics.[13] The columns of \beta form the cointegrating vectors, spanning the space of stationary linear combinations \beta' Y_t, where each such combination is integrated of order 0 despite the individual variables Y_t being I(1).[8] These vectors define the long-run equilibria, while \alpha represents the adjustment speeds toward those equilibria. To characterize the non-stationary components, the orthogonal complements \alpha_\perp and \beta_\perp—p \times (p - r) matrices satisfying \alpha' \alpha_\perp = 0 and \beta' \beta_\perp = 0—span the directions of the I(1) common trends driving the system.[13] The determination of r involves a sequential hypothesis testing procedure, starting with the null H_0: r \leq 0 against H_1: r > 0, and proceeding to test H_0: r \leq m against H_1: r > m (trace test) or H_0: r = m against H_1: r = m+1 (maximum eigenvalue test) for m = 1, \dots, p-1, until the null is not rejected, thus estimating the rank as the smallest m where the null r \leq m is not rejected.[8] This approach ensures identification of the dimension of the cointegrating space. Interpretation of r varies by boundary: if r = 0, no cointegration exists, implying the system follows a pure vector autoregression in first differences without long-run relations; if r = p, all variables are stationary, reducing the model to a VAR in levels.[13]Test Statistics
Trace Test
The trace test, one of the two main likelihood ratio tests in the Johansen framework, examines the null hypothesis that the cointegration rank of the matrix \Pi in the vector error correction model (VECM) is at most r, denoted H_0: \rank(\Pi) \leq r, against the alternative that the rank exceeds r, H_1: \rank(\Pi) > r.[8] The test statistic is given by \lambda_{\trace}(r) = -T \sum_{i=r+1}^{p} \log(1 - \hat{\lambda}_i), where T is the sample size, p is the number of variables in the system, and \hat{\lambda}_i (with i = 1, \dots, p and \hat{\lambda}_1 \geq \cdots \geq \hat{\lambda}_p) are the estimated eigenvalues obtained from the maximum likelihood estimation of the VECM.[8] This statistic arises from comparing the likelihood of the unrestricted VECM to the restricted model under the null hypothesis, capturing the joint contribution of the smaller eigenvalues to deviations from the null.[8] To determine the cointegration rank, the trace test is applied sequentially starting from r = 0: the null H_0: \rank(\Pi) \leq 0 (no cointegration) is tested against H_1: \rank(\Pi) > 0; if rejected, the process advances to r = 1 (H_0: \rank(\Pi) \leq 1 vs. H_1: \rank(\Pi) > 1), continuing until the null is not rejected at a chosen significance level, with the last rejected null indicating the estimated rank.[8] This procedure leverages the nested nature of the hypotheses, providing a systematic way to identify the dimension of the cointegrating space without testing all possible ranks simultaneously.[8] A key advantage of the trace test is its ability to assess the presence of multiple cointegrating vectors in a single joint test against alternatives where the true rank may substantially exceed the hypothesized value, making it suitable for systems with potentially rich long-run relationships.[14] Regarding power properties, the trace test generally demonstrates superior performance compared to the maximum eigenvalue test when the true cointegration rank is distant from the null (e.g., higher ranks or stronger deviations), as it aggregates information across multiple eigenvalues, though it may exhibit size distortions in finite samples.[14] For instance, in an applied analysis of p=3 economic time series, if the trace statistic rejects the null for r=0 (indicating at least one cointegrating relation) but fails to reject for r=1, the procedure concludes that the cointegration rank is exactly 1, implying a single long-run equilibrium among the variables.[8] The trace test is often used complementarily with the maximum eigenvalue test to cross-validate rank estimates.[14]Maximum Eigenvalue Test
The maximum eigenvalue test, also known as the maximal eigenvalue test, is one of the two primary likelihood ratio statistics proposed by Johansen for determining the cointegration rank in vector autoregressive models. It specifically tests the null hypothesis that the rank of the cointegration matrix Π is equal to r (i.e., there are exactly r cointegrating relations) against the alternative that the rank is r+1, by examining the largest remaining eigenvalue after accounting for the first r relations. The test statistic is given by \lambda_{\max}(r) = -T \log(1 - \hat{\lambda}_{r+1}), where T is the sample size and \hat{\lambda}_{r+1} is the (r+1)-th largest estimated eigenvalue of the matrix associated with the long-run equilibrium parameters.[8] The procedure involves sequential testing, starting with r=0 and increasing r until the null hypothesis is not rejected, similar to the trace test but with a narrower focus on each incremental change in rank. This approach provides greater precision in identifying the exact point of transition from non-cointegration to cointegration, making it particularly useful for pinpointing the precise number of stable relations in the system.[2] Compared to the trace test, which assesses the overall significance of all eigenvalues up to a certain rank for broader hypothesis testing, the maximum eigenvalue test is sharper for determining the exact cointegration rank, as it isolates the contribution of the next largest eigenvalue. It exhibits advantages in terms of better size control and higher power, especially for alternatives close to the null hypothesis and in finite samples, where the trace test may suffer from greater distortions.[14] In interpretation, rejection of the null at a given r indicates the presence of at least r+1 cointegrating relations, prompting continuation of the sequential process; non-rejection at that r establishes it as the estimated cointegration rank, beyond which no further relations exist. This test's emphasis on individual eigenvalue significance enhances its utility for applications requiring accurate rank estimation in economic time series analysis.[8]Asymptotic Properties
Null Hypothesis Distribution
Under the null hypothesis of at most r cointegrating relations in a vector error correction model (VECM), the Johansen test statistics do not follow a standard chi-squared distribution due to the non-stationarity induced by unit roots in the underlying vector autoregressive process. Instead, they exhibit non-standard asymptotic distributions that are functionals of multidimensional stochastic processes, specifically involving Brownian motions, which arise from the integration and cointegration properties of the system.[8] Johansen (1991) establishes that, under the null hypothesis, both the trace test and maximum eigenvalue test statistics converge in distribution to expressions comprising sums of functions derived from Brownian motions, potentially with drift terms depending on the model specification. These limiting distributions account for the spurious regression phenomena inherent in non-stationary systems, ensuring that the tests maintain proper size asymptotically without relying on nuisance parameter adjustments beyond the deterministic components.[8] The form of these asymptotic distributions is influenced by the treatment of deterministic trends in the VECM, leading to distinct cases: Case I assumes no constant term; Case II incorporates a restricted constant within the cointegrating space; and Case III allows an unrestricted constant in the error correction term. Each case modifies the Brownian motion processes—such as introducing location shifts or drifts—resulting in different limiting distributions for the test statistics.[15] Critical values for these distributions are obtained through Monte Carlo simulations tailored to specific model dimensions, such as the number of variables p (often denoted as k), and the chosen case, providing tabulated quantiles that approximate the non-standard limits for practical inference.[15] In the special case where the hypothesized cointegrating rank r equals the system dimension p (i.e., the stationary case with no unit roots), the test statistics become asymptotically chi-squared distributed, aligning with standard likelihood ratio testing principles under full rank conditions.[8]Critical Values and Corrections
The Johansen test accommodates different specifications for deterministic terms in the underlying vector error correction model (VECM), which influence the distribution of the test statistics and thus the appropriate critical values. These specifications, often denoted as cases I through V, account for the presence and location of intercepts and trends. Case I assumes no deterministic terms (no intercept or trend) in either the cointegrating equations or the VAR model. Case II includes a restricted intercept in the cointegrating equations but no trend. Case III features an unrestricted intercept in the VAR but none in the cointegrating equations, with no trend. Cases IV and V incorporate trends: Case IV has an unrestricted intercept and linear trend in the VAR, while Case V restricts the intercept to the cointegrating equations and allows an unrestricted linear trend in the VAR. Critical values for the trace and maximum eigenvalue test statistics under these cases are derived from the asymptotic distributions involving stochastic integrals and are tabulated in seminal works. Johansen and Juselius (1990) provide initial tables for the trace and maximum eigenvalue statistics across various cases, based on simulations with 6,000 replications, covering dimensions up to four variables and significance levels of 90%, 95%, and 99%. Osterwald-Lenum (1992) extends these by offering more precise quantiles of the asymptotic distributions, computed with higher accuracy using response surface approximations, which are widely implemented in econometric software for direct access during testing. These tables ensure reliable inference by accounting for the non-standard distributions under the null hypothesis of no cointegration or reduced rank. In finite samples, the asymptotic critical values often lead to oversized tests, particularly for the trace statistic, prompting the development of small-sample corrections. The Bartlett correction adjusts the trace statistic by multiplying it by the factor \frac{T - d k}{T}, where T is the sample size, d is the number of variables, and k is the lag length; this degrees-of-freedom adjustment reduces size distortion, as derived from higher-order asymptotic expansions. For the maximum eigenvalue test, the Reinsel-Ahn correction similarly scales each eigenvalue by \frac{T - d k}{T}, improving approximation to the chi-squared distribution under the null and enhancing power in small samples around 50-100 observations. These corrections are particularly effective when the true cointegrating rank is low and are routinely applied in empirical analyses to mitigate bias.[16] As alternatives to analytical corrections, bootstrap methods offer improved finite-sample accuracy by resampling the empirical distribution of the data. Parametric bootstraps simulate VECM residuals under the null hypothesis to generate empirical critical values, while non-parametric versions resample the actual residuals to preserve dependence structure; both approaches reduce size distortions more effectively than asymptotic values in samples under 200 observations, especially in cases with near-unit roots or structural breaks. Studies show bootstrapped p-values yield rejection rates closer to nominal levels across various cases, making them suitable for robust inference when model misspecification is suspected. The choice of case significantly affects test power and inference validity, as misspecification of deterministic terms can inflate or deflate rejection probabilities. Guidelines recommend starting with Case III for economic time series lacking clear trends, escalating to Case IV or V if unit root tests on residuals indicate non-stationarity after de-trending; visual inspection of data plots and auxiliary tests for trends in the levels versus differences help select the parsimonious specification that aligns with economic theory and avoids over-parameterization.Implementation
Estimation Procedure
The estimation procedure for the Johansen test begins with pre-testing the time series to confirm they are integrated of order one, I(1), which is a prerequisite for cointegration analysis. This involves applying unit root tests, such as the Augmented Dickey-Fuller (ADF) test, to the levels of the series to check for non-stationarity and to the first differences to verify stationarity.[2] If the series are not I(1), the Johansen framework does not apply, as cointegration requires non-stationary but linearly combinable series.[8] Next, specify the lag order p for the underlying vector autoregression (VAR) model in levels by estimating unrestricted VAR models of varying lags and selecting the order that minimizes information criteria such as the Akaike Information Criterion (AIC), Schwarz Information Criterion (SIC), or Hannan-Quinn Information Criterion (HQIC). This step ensures the model captures the short-run dynamics adequately without overfitting.[17] With the lag order determined, reparameterize the VAR into its vector error correction model (VECM) form and estimate the parameters using maximum likelihood estimation (MLE) under the reduced rank restriction on the long-run matrix \Pi = \alpha \beta'. This involves a reduced rank regression approach: first, obtain residuals from regressing the differenced variables \Delta x_t and the lagged levels x_{t-1} on the lagged differences to concentrate the likelihood, then solve for the cointegrating parameters via the eigenvalues of the resulting cross-product matrices. Ordinary least squares (OLS) can be applied to the differenced equations for initial unrestricted estimation, but MLE is essential for imposing the rank restriction and joint estimation of adjustment speeds \alpha and cointegrating vectors \beta.[2][8] The eigenvalues \hat{\lambda}_i are then computed from the eigenvalue decomposition of the matrix derived from these cross-products, ordered in descending magnitude, to quantify the strength of the cointegrating relationships. The number of significant eigenvalues informs the cointegration rank r.[17] To determine r, apply sequential testing using either the trace test or the maximum eigenvalue test, starting from the null hypothesis of rank zero and proceeding upward until the test fails to reject, under the appropriate deterministic case (e.g., no intercept, intercept in cointegrating relations, or linear trend). This yields the estimated cointegration rank r.[2][8] Post-estimation, normalize the cointegrating vectors \beta by setting one coefficient to unity (typically on a variable of interest) to identify the relations economically, and if theoretical restrictions are imposed on \beta, test them using likelihood ratio statistics to assess overidentifying constraints.[17]Software Tools
The Johansen test is implemented in various econometric software packages, enabling researchers to perform cointegration analysis on multivariate time series data. These tools typically support both trace and maximum eigenvalue statistics, with options for specifying lag lengths, deterministic trends, and model cases (e.g., no intercept, intercept in cointegrating equations).[18] In R, theurca package provides comprehensive functionality for the Johansen procedure through the ca.jo() function, which conducts the test on a VAR model and reports trace and eigenvalue statistics along with eigenvectors and loading factors; it supports all standard cases including unrestricted intercepts and trends.[19] The package also includes cajorls() for estimating the VECM based on the test results, offering restricted least-squares estimation and diagnostic outputs.[20] As of version 1.3-4 (released May 2024), urca remains a standard open-source option for reproducible cointegration analysis in R.[19]
Python's statsmodels library implements the Johansen test via the coint_johansen() function in the statsmodels.tsa.vector_ar.vecm module, which returns a JohansenTestResult object containing test statistics, critical values, and p-values for rank determination. This integrates seamlessly with pandas DataFrames for data input and manipulation, facilitating preprocessing of time series like differencing or lag selection.[21] Version 0.14.4 (released July 2025) includes enhancements to vector autoregression modules, improving numerical stability for large datasets.[21]
MATLAB's Econometrics Toolbox features the jcitest() function for the Johansen cointegration test, which tests multiple ranks sequentially and outputs decisions, statistics, and maximum likelihood estimates for VECM parameters.[18] It allows customization of lags (e.g., via the Lags parameter) and deterministic components (e.g., H1* for unrestricted trends), supporting table or timetable inputs for flexible data handling.[18]
Commercial software like EViews and Stata offer built-in commands tailored for econometric workflows. In EViews 14, the Johansen test is accessed through the cointegration estimation menu, with enhancements allowing finer control over deterministic trends and exogenous variables in long-run relations, such as restricting them to cointegrating vectors only.[22] Stata's vecrank command performs the Johansen rank test using trace and maximum eigenvalue statistics, with options for lag specification, trend inclusion, and information criteria like BIC for model selection; it integrates with vec for subsequent VECM estimation.[23]
Open-source alternatives include OxMetrics (via its PcGive module), which supports the full Johansen procedure for I(1) and I(2) cointegration testing in VAR models, including instability and normality diagnostics.[24] GRETL provides the johansen_test() function for cointegration rank determination, available in both GUI and script modes, with support for small-sample corrections via add-on packages like johansensmall.[25] These tools emphasize reproducibility, with GRETL's scripting enabling automated simulations.[26]
Recent updates across these packages as of November 2025, such as those in statsmodels 0.14.4 (July 2025) and EViews 14 (2025), focus on expanded model specifications and computational efficiency, though no widespread adoption of parallel computing specifically for Johansen simulations has been documented as of late 2025.[21][27] Users should consult package changelogs for compatibility with large-scale applications.