Fact-checked by Grok 2 weeks ago

GEH statistic

The GEH statistic, named after transport planner Geoffrey E. Havers, is an empirical measure developed in the 1970s in the United Kingdom for traffic engineering to evaluate the goodness-of-fit between observed field data and modeled estimates of traffic volumes, particularly in calibration and validation of microsimulation models. It provides a dimensionless index that balances absolute and relative differences, making it suitable for comparing flows across varying magnitudes without undue bias toward high-volume links. The formula for the GEH statistic is given by \text{GEH} = \sqrt{\frac{2(E - V)^2}{E + V}} where E represents the model-estimated and V the observed field count, typically in vehicles per hour. This expression, mathematically akin to a but empirically derived for traffic data, yields values closer to zero for better agreement; for instance, it penalizes discrepancies more severely at higher flows while allowing broader relative tolerances at low volumes. In practice, the GEH statistic is applied to individual links, ramps, and aggregate flows in freeway and arterial network models, with acceptability thresholds varying by jurisdiction—for example, GEH < 5 for at least 85% of links and GEH < 4 for total flows in U.S. guidelines, or GEH < 3 for state facilities in criteria. While widely adopted in software like VISSIM and Aimsun for its simplicity and robustness, it has faced criticism for non-linearity and insensitivity to certain error types, prompting alternatives like the SQV statistic.

Background

Origin and History

The GEH statistic derives its name from Geoffrey E. Havers, a transport planner who developed it in the 1970s while working for the Greater London Council in England. Havers proposed the measure as a heuristic tool tailored to the challenges of traffic engineering in urban settings, particularly in London, where accurate comparisons between real-world data and predictive models were essential for infrastructure planning. The initial purpose of the GEH statistic was to evaluate the goodness-of-fit between observed volumes and those generated by models, under the that traffic flows follow a Poisson-like where variance approximates the . This approach addressed limitations in traditional metrics like differences, which could be misleading for low-volume links, by incorporating both relative and absolute discrepancies in a balanced manner suitable for hourly flow data. Early adoption of the GEH statistic occurred through its integration into official guidelines by the Highways , which included it in the Design Manual for Roads and Bridges (DMRB) for validating highway assignment models. This endorsement helped standardize its use in and practices across the , establishing it as a for model in projects.

Mathematical Formulation

The GEH statistic is mathematically formulated as \text{GEH} = \sqrt{\frac{2(M - C)^2}{M + C}}, where M denotes the modeled and C the observed (counted) , both typically expressed in vehicles per hour. This expression incorporates the squared difference (M - C)^2 to quantify deviation, normalized by the sum M + C in the denominator, which balances absolute and relative discrepancies between the two . The provides a scale akin to a standard deviation, while the factor of 2 derives from the index of dispersion, under the assumption that traffic flows follow a quasi- where variance is proportional to the mean. When applied to hourly flows in vehicles per hour, the resulting GEH value has units of (\text{vehicles per hour})^{1/2}, rendering it not strictly unitless but effectively scale-invariant for comparative purposes across links.

Applications and Interpretation

Primary Uses in Traffic Modeling

The GEH statistic serves as a key goodness-of-fit measure in traffic modeling, enabling practitioners to evaluate the alignment between modeled and observed volumes by balancing absolute and relative differences. It is particularly valuable in scenarios requiring precise comparisons of traffic data sets, supporting the refinement and reliability of models used in and engineering. One primary application involves comparing traffic counts with automated counts to ensure data consistency across collection methods. For instance, in video-based detection systems, GEH quantifies the accuracy of algorithm-derived volumes against manual observations, with values below 5 indicating strong agreement and validating the automated approach for practical deployment in traffic . GEH is routinely employed to validate travel demand forecasting models against base-year observed data, assessing how well simulated traffic patterns replicate historical counts at key locations. This helps confirm the model's ability to represent real-world travel behavior before applying it to future scenarios, such as infrastructure impact assessments. In traffic simulation models, GEH facilitates parameter adjustment during calibration, as outlined in guidelines for analytical travel forecasting. By iteratively minimizing GEH values between simulated and field-measured flows, modelers refine inputs like origin-destination matrices and route choices to achieve acceptable replication of traffic conditions. Additionally, GEH assesses flow accuracy in highway assignment models, particularly at individual links or screenlines, to gauge the overall quality of traffic distribution outputs. This use supports the evaluation of assignment procedures in large-scale networks, ensuring modeled link volumes align closely with observed data from automatic traffic counters.

Thresholds and Quality Criteria

The GEH statistic is interpreted through standardized thresholds that classify the agreement between modeled and observed traffic volumes. A GEH value below 5.0 is generally considered indicative of a good match, while values between 5.0 and 10.0 suggest that further investigation is needed to identify potential discrepancies, and values exceeding 10.0 signal poor performance requiring model adjustments. In practice, the GEH is typically applied to hourly traffic flows, such as peak-hour volumes in vehicles per hour, to ensure comparability with observed . Aggregate quality is often assessed by calculating the percentage of individual links or observation points meeting the thresholds, with criteria such as at least 85% of links achieving GEH < 5.0 commonly used for overall model validation. Adjustments to these thresholds may be applied in context-specific scenarios, particularly for low-volume links where the inherently allows higher due to greater relative variability in counts, as the weights and differences to accommodate Poisson-like distributions in sparse data.

Limitations

Key Criticisms

One key practical drawback of the GEH statistic is its dependency on the magnitude of traffic volumes, which imposes stricter relative tolerances on higher absolute flows compared to lower ones. This scale sensitivity arises because the formula balances absolute and relative errors in a way that requires smaller percentage discrepancies for large-volume links to achieve acceptable GEH values, rendering it less suitable for direct comparisons between datasets with differing scales, such as hourly versus daily traffic flows. The GEH statistic exhibits asymmetry in its treatment of predictions, particularly in contexts influenced by constraints, where over-predictions and under-predictions may not be penalized equivalently due to underlying distributional skews in counts. This can introduce in model validation, as the metric may undervalue errors in congested scenarios where volumes are censored by , leading to uneven assessments of model performance across diverse network conditions. A notable lack of standardization in GEH application stems from the absence of universal guidelines for aggregating results across spatial networks or temporal periods, such as screenlines or multi-hour totals. While thresholds like GEH < 5 are commonly applied to individual links, extending the metric to aggregated levels often yields inconsistent variance patterns, complicating network-wide evaluations and potentially misrepresenting overall model fit. Furthermore, the GEH statistic tends to overemphasize accuracy in low-flow conditions by permitting larger percentage errors at sparse volumes, which aligns with Poisson-distributed counts on lightly loaded roads but can mask significant relative discrepancies in data-scarce areas. This leniency may obscure validation issues in rural or off-peak scenarios, where even modest absolute errors could have disproportionate impacts on decisions.

Statistical Shortcomings

The GEH statistic lacks a formal statistical , positioning it as an rather than a rigorous hypothesis test, despite its superficial resemblance to established metrics like the under a assumption. Although rooted in the index of dispersion—which assumes traffic counts follow a Poisson-like process where variance equals the mean—the GEH does not derive from a systematic probabilistic framework, leaving its thresholds (such as 5.0 for acceptable fit) as guidelines without underlying from first principles. This absence of formal grounding means the metric cannot generate p-values or confidence intervals, depriving users of a probabilistic interpretation to assess the significance of discrepancies between observed and modeled volumes. A key theoretical flaw in the GEH is its failure to account for the variance structure inherent in data, including variations due to sample size, road type, time of day, or levels. The metric implicitly assumes a fixed proportionality between variance and (often six times the for links), but empirical analyses of automatic (ATC) datasets reveal variance-to-mean ratios as high as 17, indicating that this assumption frequently does not hold and leads to unreliable assessments of model fit. Consequently, GEH treats all observations equally regardless of their statistical reliability, overlooking how smaller sample sizes or higher variability in urban or congested settings inflate errors without adjustment for standard deviation. The nonlinear scaling of the GEH formula exacerbates these issues by producing inconsistent sensitivity across different volume ranges, where the same absolute or relative difference yields varying GEH values depending on the scale of flows. For instance, low-volume links may show exaggerated discrepancies due to the metric's form, while high-volume scenarios understate errors, complicating the distinction between meaningful and negligible deviations without scale-specific normalization. This nonlinearity lacks a probabilistic basis, as the metric does not provide a consistent for akin to standardized tests, resulting in arbitrary interpretations that vary by . Furthermore, the GEH's design poses challenges for aggregation and cross-dataset comparisons, as it evaluates individual link or observation pairs without inherent for differing scales or variances, making it difficult to combine results into network-level metrics or benchmark against diverse datasets. Screenline totals, for example, often yield misleadingly low GEH values due to aggregated high means overshadowing individual variances, hindering reliable across models or regions. Without adjustments for these structural inconsistencies, the metric's utility in broader statistical validation remains limited, underscoring the need for theoretically sound alternatives in modeling.

Alternatives

SQV Statistic Overview

The SQV (Scalable Quality Value) statistic was introduced in by Markus Friedrich and colleagues as a scalable measure designed to validate travel demand models by comparing observed and modeled single values, such as traffic volumes or trip distances. This development addressed the need for a metric adaptable to varying data magnitudes, extending beyond traditional measures limited to specific scales like hourly traffic counts. The SQV is mathematically defined as: \text{SQV} = \frac{1}{1 + \sqrt{\frac{(M - C)^2}{f \cdot C}}} where M represents the modeled value, C the observed (counted) value, and f a scaling factor calibrated to the expected magnitude of the indicator being evaluated. For example, f = 1,000 is typically used for hourly traffic volumes, while smaller values like f = 1 apply to person trips per day. The produces unitless outputs ranging from 0, indicating a poor match between modeled and observed values, to 1, signifying a . The scaling factor f ensures applicability across diverse indicators, such as traffic volumes, trip durations, or distances, by normalizing the comparison to the data's inherent scale. As a variant of the chi-squared statistic, SQV balances absolute and relative errors in model validation, providing a more flexible assessment than non-scalable alternatives while maintaining interpretability for single-value comparisons. This foundation allows it to quantify deviations in a manner sensitive to both the magnitude of differences and their proportionality to observed counts.

Advantages and Enhancements of SQV

The Scalable Quality Value (SQV) extends its utility across various fields in and travel demand modeling, including the validation of hourly and daily volumes, distances, and shares. By incorporating a scaling factor f, SQV adapts to different data magnitudes; for instance, f = 1 is suitable for distances measured in kilometers, ensuring consistent application without distortion from units or scales. This flexibility allows SQV to evaluate model performance in diverse contexts, such as comparing simulated versus observed volumes at stations or network-wide flow distributions. SQV establishes clear quality categories to guide model refinement: values ≥ 0.85 indicate a good match between observed and modeled data, while ≥ 0.90 signify a very good fit; lower thresholds suggest areas requiring adjustments. Unlike the GEH statistic, which exhibits bias by being overly sensitive to differences at higher flows, SQV addresses this through its symmetric treatment of observed and modeled values, providing equitable assessment regardless of absolute scale. Its probabilistic foundation, rooted in assumptions, enhances reliability by implicitly accounting for observational variance and sample size via scaling, thereby weighting errors appropriately without explicit chi-squared derivations. Further enhancements of SQV include its scalability for , enabling seamless transitions from single-link validations to broader network-level analyses in large-scale simulations. This property supports iterative model improvements in tools like MATSim, where SQV facilitates precise by quantifying fit across aggregated metrics such as total daily trips or modal splits. Overall, these advantages position SQV as a robust, adaptable metric that resolves key limitations of traditional measures like GEH, promoting more accurate traffic forecasting.

References

  1. [1]
    13.5.2.3 Throughput - Texas Department of Transportation
    The GEH statistic is typically calculated for mainline segments and ramps. ... Table 13-4: GEH Statistic Guidelines. GEH Statistics. Guidance. < 3.0.
  2. [2]
    5.0 Calibration of Microsimulation Models - FHWA Office of Operations
    GEH equals the square root of the sum of E minus V, (Equation 4). where: E = model estimated volume. V = field count. 5.7 Example Problem: Model Calibration.
  3. [3]
    [PDF] The GEH measure and quality of the highway assignment models
    The GEH statistic1 is used to represents goodness-of-fit of a model. It takes into account both the absolute difference and the percentage difference between ...
  4. [4]
    [PDF] VISSIM CALIBRATION AND VALIDATION - WSdot.com
    Aug 28, 2006 · The GEH statistic is a formula used in traffic engineering, traffic forecasting, and traffic modeling to compare two sets of traffic volumes.
  5. [5]
    [PDF] TRAFFIC APPRAISAL IN URBAN AREAS - National Highways
    May 4, 1996 · The GEH statistic: GEH = (M-C)2. (M+C) / 2 a where: GEH is the GEH statistic. M is the modelled flow, and. C is the observed flow, is a form of ...<|control11|><|separator|>
  6. [6]
    Detection of vehicles and analysis of traffic volume by real-time ...
    Aug 14, 2025 · The system checks accuracy through manual counting methods combined with statistical indicators like GEH and MAPE.
  7. [7]
    [PDF] NCHRP Report 765 – Analytical Travel Forecasting Approaches for ...
    NCHRP Report 765 covers analytical travel forecasting approaches for project-level planning and design, sponsored by AASHTO and FHWA.<|control11|><|separator|>
  8. [8]
    Statistical Methods for Model Validation - Aimsun Next Users Manual
    GEH¶. The GEH statistic is used to compare traffic volumes. Its name is derived from its inventor Geoffrey E. Havers and is used as an acceptance criterion ...
  9. [9]
    5. Model development
    Statistical models use sample data to develop mathematical relationships between factors. ... GEH statistic. The GEH statistic, a form of Chi-squared ...
  10. [10]
    [PDF] The Calibration of Traffic Simulation Models
    (2007), GEH is used as a fitness function for a traffic model calibration by summing all the values obtained from all the pairs of measured and observed data.
  11. [11]
    Scalable GEH: A Quality Measure for Comparing Observed and ...
    Mar 28, 2019 · The paper focuses on comparing pairs of single values with specific reference to the quality measure GEH (named after Geoffrey E. Havers, who introduced it for ...<|control11|><|separator|>
  12. [12]
    [PDF] Modelling Cycling Flow for the estimation of cycling risk at a meso ...
    The GEH has limitations; it does not take account of the variability of the count data and typically uses peak hourly flows to determine 'goodness of fit' ( ...Missing: shortcomings | Show results with:shortcomings
  13. [13]
  14. [14]
    [PDF] Planning and Visual Tools for an optimal linking of On-demand ...
    ... SQV, see (Friedrich et al.,. 2019)). The SQV ranges between 0 (no match) and 1 (perfect match), where is excellent for. SQV≥0.9 and sufficient for SQV≥0.7.
  15. [15]
    Creating and Validating Hybrid Large-Scale, Multi-Modal Traffic ...
    ... limitations of the traditional GEH statistic method, which only allows for comparisons of hourly indicators. The proposed validation methods facilitated the ...