Semi-global matching
Semi-global matching (SGM) is a computer vision algorithm designed for dense stereo matching, which estimates disparity maps from pairs of rectified stereo images to enable 3D reconstruction via triangulation.[1] Introduced by Heiko Hirschmüller in 2005, it combines elements of local and global optimization techniques to achieve high accuracy while maintaining computational efficiency.[2] The core of SGM involves computing a pixel-wise matching cost, typically using Mutual Information to compensate for radiometric differences between images, followed by aggregation of these costs along multiple one-dimensional paths (often in eight or sixteen directions) to enforce smoothness constraints and minimize discontinuities.[1] This path-based optimization approximates a global energy minimization, outperforming purely local methods in handling slanted surfaces and object boundaries, while avoiding the high complexity of full global approaches like graph cuts or belief propagation.[1] Additional features include built-in occlusion handling, sub-pixel refinement for precise disparity estimation, and post-processing steps such as outlier filtering and gap interpolation, resulting in sub-pixel accuracy competitive with top-performing algorithms on benchmarks like the Middlebury Stereo Evaluation.[1] With a linear time complexity of O(W × H × D)—where W and H are image dimensions and D is the disparity range—SGM processes typical images in 1-2 seconds and scales to large formats (up to billions of pixels) using tiling techniques.[1] It has been optimized for various platforms, including CPUs, GPUs (achieving 4.5 frames per second on 640×480 images), and FPGAs (up to 27 Hz with low power consumption), making it suitable for real-time applications.[3] Due to its robustness against illumination changes, low-texture regions, and parameter variations, SGM finds extensive use in fields such as robotics, autonomous vehicle navigation, driver assistance systems, remote sensing, and aerial image matching.[3]Introduction
Overview
Semi-global matching (SGM) is a widely used algorithm in computer vision for estimating dense disparity maps from pairs of rectified stereo images, enabling accurate 3D reconstruction of scenes. It represents a hybrid approach that approximates the global optimization of disparity estimation—typically an NP-hard problem—by aggregating pixel-wise matching costs along multiple one-dimensional paths across the image. This path-based aggregation enforces smoothness constraints in a semi-global manner, drawing on the interdependence of neighboring pixels modeled through Markov random fields.[1][4] The core innovation of SGM lies in its ability to balance the superior accuracy of global methods, which integrate information across the entire image to handle discontinuities and fine structures, with the efficiency of local methods that operate on small windows. By optimizing costs directionally from various paths (such as horizontal, vertical, and diagonal), SGM achieves a polynomial-time approximation that avoids the computational expense of full 2D global minimization while producing high-quality results.[1][4] Taking a rectified stereo image pair as input—where corresponding pixels are aligned along scanlines—SGM outputs a dense disparity map assigning disparity values to each pixel in the reference image, facilitating depth computation and applications like robotics and autonomous navigation. Key advantages include robust performance in textureless regions, where local methods often fail due to ambiguous matches, and improved handling of occlusions and radiometric variations, all while maintaining linear computational complexity suitable for real-time processing on standard hardware.[1][5][4]Historical Development
Semi-global matching (SGM) was introduced by Heiko Hirschmüller in 2005 at the IEEE Computer Vision and Pattern Recognition (CVPR) conference in a paper titled "Accurate and Efficient Stereo Processing by Semi-Global Matching and Mutual Information."[6] Developed at the German Aerospace Center (DLR), the algorithm addressed key challenges in dense stereo matching, such as handling radiometric differences between images and achieving high accuracy at depth discontinuities while maintaining computational efficiency.[1] The initial motivation stemmed from the need for robust stereo processing in planetary imaging applications, including the mapping of Mars surfaces using DLR's High Resolution Stereo Camera (HRSC), where traditional methods struggled with large-scale, low-texture scenes and varying illumination.[1] In 2008, Hirschmüller published an extended journal version in IEEE Transactions on Pattern Analysis and Machine Intelligence, refining the method with pathwise cost aggregation and demonstrating its effectiveness on aerial and space imaging datasets, such as HRSC Mars images and urban aerial surveys. This publication solidified SGM's role in remote sensing, enabling precise 3D reconstructions for planetary exploration and Earth observation.[3] By this time, early implementations began appearing in research for robotics, highlighting SGM's balance of global optimization and real-time feasibility. Adoption accelerated in the late 2000s, with re-implementations in open-source libraries like OpenCV's StereoSGBM module, officially integrated in version 2.1 in 2010, facilitating its use in computer vision pipelines. This led to widespread application in robotics and autonomous systems by the early 2010s, including stereo vision for planetary rovers at DLR.[7] Milestone hardware integrations emerged around the same period, such as GPU accelerations in 2008 for real-time processing and FPGA implementations in 2009 for low-power embedded systems, paving the way for deployment in resource-constrained environments like space missions.[3]Background
Stereo Matching Fundamentals
Stereo vision is a computer vision technique that estimates the three-dimensional structure of a scene by analyzing two or more images captured from slightly different viewpoints, typically using parallel cameras to exploit epipolar geometry for depth recovery.[8] Epipolar geometry describes the projective relationship between corresponding points in the images, where a point in one image projects onto an epipolar line in the other, constraining the search for matches and reducing the correspondence problem from 2D to 1D.[9] Disparity refers to the horizontal pixel shift between corresponding points in the left and right images of a rectified stereo pair, serving as an inverse measure of depth.[8] The depth Z at a point can be computed from the disparity d using the formula Z = \frac{f \cdot b}{d}, where f is the focal length of the camera and b is the baseline distance between the two cameras.[8] This relationship assumes a pinhole camera model and calibrated, rectified images, enabling direct conversion from disparity maps to depth information for 3D reconstruction.[8] Image rectification is a preprocessing step that transforms the stereo images so that epipolar lines become horizontal and aligned across both views, simplifying the matching process to a one-dimensional search along corresponding rows.[8] This transformation is achieved by applying homographies derived from the camera calibration parameters, ensuring that corresponding points share the same vertical coordinate.[10] Stereo matching algorithms can be categorized as sparse or dense, with dense methods computing disparity values for every pixel in the image to produce complete disparity maps suitable for applications like 3D surface reconstruction and view interpolation.[8] In contrast, sparse matching focuses on distinctive features such as edges or corners, often requiring post-processing to interpolate depths across the entire image, but dense approaches provide more comprehensive coverage for dense pixel-wise depth estimation.[8]Limitations of Prior Methods
Prior to the development of semi-global matching, stereo matching algorithms were broadly categorized into local and global approaches, each exhibiting significant trade-offs in accuracy, efficiency, and robustness. Local methods, such as winner-takes-all schemes with window-based similarity measures (e.g., sum of squared differences or normalized cross-correlation), prioritize computational speed by aggregating matching costs within fixed or adaptive windows around each pixel. These techniques achieve real-time performance, often processing images in under 2 seconds on contemporary hardware, making them suitable for applications requiring low latency. However, they suffer from high sensitivity to noise and illumination variations, as pixelwise cost computations are inherently ambiguous and prone to erroneous matches in low-texture regions.[8][11] A primary limitation of local methods is the production of streaking artifacts in textureless areas, where the assumption of constant disparity within the aggregation window leads to inconsistent estimates across scanlines, resulting in visible linear patterns in the disparity map. Additionally, these methods fail at depth discontinuities, causing over-smoothing and "foreground fattening" effects that blur object boundaries and fine structures, as the window-based aggregation propagates neighboring disparities inappropriately. Occlusions exacerbate these issues, often inducing front-parallel biases where hidden regions are incorrectly assigned disparities from visible surfaces, further degrading accuracy in uniform or occluded scenes.[8][11] In contrast, global methods, including graph cuts, belief propagation, and dynamic programming, aim for higher accuracy by minimizing a 2D energy function that enforces smoothness constraints across the entire image, yielding disparity maps that are more consistent and less prone to artifacts in textureless regions. These approaches excel at handling depth discontinuities and occlusions through explicit modeling in the energy terms, producing smoother results with reduced over-smoothing compared to local techniques. However, their computational intensity is a major drawback; for instance, graph cuts and belief propagation often exhibit O(N^2) or higher complexity relative to image size N, leading to runtimes of 20 seconds to several minutes per image, alongside substantial memory demands for storing graph representations or message-passing states.[8][11] Global methods also remain sensitive to radiometric changes, such as varying illumination between stereo pairs, requiring careful cost function design to avoid biased matches, and they can introduce streaking in dynamic programming variants due to independent 1D optimizations per scanline. While capable of sub-pixel accuracy in controlled benchmarks, their high resource requirements render them unsuitable for real-time applications or large-scale processing, such as in robotics or autonomous driving. These trade-offs—efficiency but inaccuracy in local methods versus accuracy but slowness in global methods—highlighted the need for hybrid approaches that approximate global optimality with manageable computational costs.[8][11]Core Algorithm
Matching Cost Computation
In semi-global matching (SGM), the matching cost computation forms the foundation of the disparity estimation process by quantifying the dissimilarity between corresponding pixels in a rectified stereo image pair. This pixelwise cost is essential for handling radiometric differences, such as variations in illumination or sensor characteristics between the base image I_b and the match image I_m. The original SGM algorithm employs Mutual Information (MI) as the primary matching cost metric, which measures the statistical dependence between intensity distributions without assuming a linear relationship between pixel values. The MI-based cost for a pixel p in the base image and disparity d is defined as C_{MI}(p, d) = -MI(I_b, f_D(I_m))(I_b^p, I_m^q), where q = e_{bm}(p, d) denotes the epipolar line correspondence in the match image, f_D is a warping function based on disparity d, and MI is computed from joint and marginal entropies: MI(I_1, I_2) = H(I_1) + H(I_2) - H(I_1, I_2). Entropies are estimated using discretized 256-bin histograms of image intensities, smoothed with a Gaussian kernel (e.g., 7×7) to reduce noise sensitivity. To address the computational expense of full MI calculation, a hierarchical approximation (HMI) is used: it starts with a coarse disparity estimate from downsampled images and refines it iteratively, adding approximately 14% to the runtime while maintaining accuracy. For robustness in radiometrically varying scenes, the MI cost avoids simplistic metrics like absolute or squared intensity differences, which fail under non-linear transformations. An alternative in the original implementation is the Birchfield-Tomasi (BT) cost, which computes a sampling-insensitive measure of intensity difference by linearly interpolating pixel values along epipolar lines, formulated as the minimum of forward and backward warped differences: C_{BT}(p, d) = \min(|I_b^p - I_m^{q_f}|, |I_b^p - I_m^{q_b}|, |I_b^p - I_m^q|), where q_f and q_b account for subpixel shifts. This BT cost is particularly effective for scenes with occlusions or textureless regions when combined with a small support window (e.g., 5×5 pixels). Preprocessing assumes rectified images with known epipolar geometry, though extensions allow for unrectified pairs by incorporating geometric transformations. In practice, the matching cost is aggregated over multiple disparity hypotheses (typically 64–128 levels) and a local neighborhood to form an initial cost volume, which is then refined through path-based aggregation in subsequent SGM steps. This design ensures subpixel accuracy (e.g., 0.25-pixel resolution) and low error rates on benchmark datasets like the Middlebury stereo evaluation, where MI-based costs yield bad-pixel errors below 10% in textured regions.Path-Based Cost Aggregation
In semi-global matching, path-based cost aggregation approximates global smoothness constraints by propagating matching costs along multiple one-dimensional paths across the image, rather than performing exhaustive two-dimensional optimization. This step transforms the initial pixelwise matching cost volume C(p, d)—computed for each pixel p and disparity d—into a smoothed aggregate that encourages piecewise smooth disparity fields while preserving edges. By limiting aggregation to linear paths, the method achieves computational efficiency while closely approximating the minimum-energy solution of a global stereo model.[12] Paths are selected to cover the two-dimensional support region of each pixel without requiring a full graph-based computation. Typically, at least eight directions are used, such as horizontal, vertical, and diagonal paths originating from the image borders, with sixteen directions recommended for better approximation of global optimality. These paths are straight lines in the reference image but may appear non-straight in the matching image due to varying disparities along the way. For each path direction r, dynamic programming is applied sequentially from the path's starting border pixel toward the target pixel p, computing the minimum cost L_r(p, d) that reaches p at disparity d. This recursive formulation penalizes abrupt disparity changes to enforce smoothness, allowing small variations for slanted surfaces while imposing higher costs on large jumps indicative of depth discontinuities.[12] The core dynamic programming update for the cost along path r is given by L_r(p, d) = C(p, d) + \min \left( L_r(p - r, d),\ L_r(p - r, d - 1) + P_1,\ L_r(p - r, d + 1) + P_1,\ \min_i L_r(p - r, i) + P_2 \right) - \min_k L_r(p - r, k), where p - r denotes the predecessor pixel along the path, P_1 is a small constant penalty for one-pixel disparity changes, and P_2 (with P_2 \geq P_1) is a larger penalty for changes exceeding one pixel, often adapted to local image gradients to better preserve edges. The subtraction of \min_k L_r(p - r, k) normalizes the costs to prevent unbounded growth, ensuring L_r(p, d) \leq C_{\max} + P_2 where C_{\max} is the maximum initial matching cost. This formulation balances fidelity to local evidence with global consistency, as the minimum over predecessor states favors smooth propagation unless a discontinuity justifies a higher penalty.[12] The final aggregated cost for pixel p at disparity d is obtained by summing the minimum costs over all path directions: S(p, d) = \sum_r L_r(p, d). This summation approximates the global energy minimization of a Markov random field model for stereo, where the path-based approach reduces the computational complexity from quadratic in image dimensions to linear, specifically O(W \cdot H \cdot D \cdot N) with image width W, height H, maximum disparity D, and N paths (typically 8–16), making it suitable for real-time applications. The resulting S(p, d) provides a robust energy landscape for subsequent disparity selection, effectively mitigating the streaking artifacts common in purely local methods.[12]Disparity Map Selection
In the final step of semi-global matching, the disparity map is generated through winner-takes-all (WTA) inference, where for each pixel p, the optimal disparity d^* is selected as the value that minimizes the aggregated matching cost: d^* = \arg\min_d S(p, d), with S(p, d) representing the total cost aggregated across multiple paths from the preceding aggregation stage.[1] This produces an initial disparity image D_b that assigns an integer disparity to every pixel based solely on the lowest cost, enabling efficient computation without iterative optimization.[1] Post-processing refines this initial map to remove artifacts and handle inconsistencies. A 3×3 median filter is applied to D_b (and a corresponding right-to-left map D_m) to eliminate speckle noise and outliers, preserving edges while smoothing isolated erroneous disparities.[1] Additionally, a left-right consistency check detects occlusions and mismatches by comparing corresponding disparities: for a pixel p in the left image with disparity D_b, the corresponding point q in the right image is evaluated, and if |D_b - D_m| > 1, the disparity is invalidated to mark occluded or unreliable regions.[1] To achieve sub-pixel accuracy beyond integer disparities, quadratic interpolation is performed around the WTA minimum. This fits a parabola to the aggregated costs at the minimum disparity and its two neighbors, estimating the precise sub-pixel offset that minimizes the cost curve and thereby enhancing the overall precision of the map.[1] The resulting output is a dense disparity map, where invalid regions from the consistency check may be interpolated for completeness, providing a comprehensive representation of depth that can be triangulated with camera parameters to yield a 3D point cloud for applications such as reconstruction.[1]Variants and Extensions
Memory-Efficient Variant
The standard Semi-global matching (SGM) algorithm faces significant memory challenges due to the cost volume, which requires O(W × H × D) storage, where W and H are the image width and height, and D is the disparity range; for high-resolution images such as 1 megapixel with D=128, this can exceed 1 GB, making it prohibitive for embedded systems or large-scale processing.[1] To address this, early adaptations introduced tiling strategies, processing the image in overlapping strips or tiles to limit memory usage while maintaining boundary consistency through weighted merging of results from adjacent tiles.[1] Additionally, using 16-bit integers instead of floating-point representations for cost storage further reduces the footprint, scaling costs to fit within 11 bits for the initial matching costs.[1] A more advanced memory-efficient variant, known as efficient SGM (eSGM), pipelines the path aggregation process to avoid storing the full cost volume by computing forward and backward passes in-place along sequential directions, such as top-down and bottom-up stripes, thereby reducing the temporary memory requirement to O(W × D).[13] This approach reuses memory by overwriting minima from previous aggregation passes and limits the number of paths—typically to four or eight instead of the full set in standard SGM—to prioritize horizontal and vertical directions while aggregating costs sequentially (e.g., horizontal paths first, followed by vertical).[13] Such pipelining enables processing of high-resolution images on resource-constrained hardware, including GPUs and FPGAs, achieving real-time performance; for instance, it processes 640×480 images with D=64 in 0.06 seconds on FPGAs.[13] While eSGM incurs a slight accuracy trade-off, with error rates on benchmark datasets like Middlebury differing by less than 0.1% compared to standard SGM (e.g., 7.16% vs. 7.17% bad pixel error), it increases computational time by approximately 50% due to the multi-pass nature but remains suitable for applications requiring low memory, such as automotive stereo vision.[13] This variant builds on the path-based aggregation of standard SGM by adapting it for sequential reuse rather than simultaneous computation across all paths.[13]Advanced Improvements
Since its inception, Semi-global matching (SGM) has undergone significant enhancements to address limitations in handling complex scenes, such as untextured regions and varying lighting conditions, particularly in post-2010 developments focused on accuracy and adaptability. One notable advancement is the Semi-Global Matching with Priors (SGMP) algorithm introduced in 2017, which incorporates surface orientation priors to better manage slanted untextured surfaces that traditional SGM struggles with due to over-smoothed disparities. By adding geometric constraints directly to the energy function, SGMP enforces piecewise-planar assumptions on surface normals estimated from initial disparity cues, resulting in reduced errors on benchmarks like the Middlebury dataset, where it achieves up to 20% improvement in bad pixel rates for slanted areas compared to vanilla SGM.[14] Building on these geometric refinements, the Improved Semi-Global Matching (I-SGM) variant, proposed in 2023, tailors SGM for challenging extraterrestrial environments like lunar rover navigation. I-SGM enhances edge preservation by introducing adaptive penalties that dynamically adjust based on local gradient magnitudes and illumination variations, mitigating artifacts in low-contrast, dimly lit terrains with uneven shadows. This adaptation proves particularly effective for obstacle detection, yielding disparity maps with 15-25% fewer outliers in simulated lunar datasets under complex lighting, thereby improving rover path planning reliability.[15] Adaptations for neuromorphic sensors represent another key evolution, with event-based SGM emerging around 2018 to leverage dynamic vision sensors (DVS) like event cameras. Unlike frame-based SGM, event-based versions process asynchronous brightness change events rather than full images, enabling sub-millisecond latency for disparity estimation in high-speed or dynamic scenes. By accumulating events into contrast-maximized representations and applying path aggregation on these sparse inputs, the method preserves depth edges in motion-blurred environments, achieving real-time performance at over 200 Hz on synthetic and real DVS datasets while reducing motion artifacts by up to 30% compared to traditional stereo.[16] Hardware accelerations have further propelled SGM's practicality for real-time applications, exemplified by FPGA implementations in 2023 that employ parallel comparator structures to expedite cost aggregation. These designs pipeline multiple disparity hypotheses across systolic arrays, minimizing memory accesses and enabling processing of HD-resolution (1280×720) stereo pairs at up to 60 frames per second on FPGAs such as Stratix V, with power consumption under 2 W on Zynq UltraScale+. Such optimizations maintain SGM's accuracy—nearly identical to software versions—while enabling suitability for embedded systems in robotics and automotive vision.[17] By 2025, recent trends in SGM enhancements increasingly integrate deep learning for robust cost initialization, with hybrid SGM-CNN models combining neural feature extractors to initialize matching costs before classical aggregation. These hybrids, such as those using CNNs for radiometric-invariant descriptors, enhance disparity accuracy in scenarios with illumination discrepancies or textureless regions, reporting 10-15% gains in endpoint error on KITTI benchmarks over pure SGM. This fusion leverages CNNs' learned invariance while retaining SGM's global consistency, fostering deployment in diverse real-world conditions like autonomous driving.[18][19]Applications and Evaluation
Key Applications
Semi-global matching (SGM) has found widespread adoption in robotics and autonomous vehicles, where it enables robust depth sensing for obstacle avoidance and 3D environmental mapping. In planetary exploration, the German Aerospace Center (DLR) has integrated SGM into rover systems since 2005, utilizing stereo cameras to generate dense disparity maps for safe navigation on uneven terrains, as demonstrated in the IDEFIX rover's autonomous navigation experiments. Similarly, in advanced driver-assistance systems (ADAS), SGM supports real-time stereo vision for 3D reconstruction, aiding in features like adaptive cruise control and lane departure warnings by providing accurate depth information from vehicle-mounted cameras.[3][20] In aerial and satellite imagery processing, SGM excels at deriving digital surface models (DSMs) from stereo pairs, facilitating applications such as change detection in orthoimages and urban modeling. For instance, it has been employed to monitor environmental changes, like forest area alterations, by computing precise disparity maps from satellite stereo data, enabling the identification of temporal differences in terrain elevation. In urban settings, drone-captured stereo imagery processed with SGM supports the creation of detailed 3D city models, capturing building facades and street layouts for applications in urban planning and infrastructure monitoring.[3][21][22] Medical imaging benefits from SGM in 3D reconstruction tasks, particularly with stereo endoscopes during minimally invasive surgeries. The algorithm processes binocular endoscopic images to estimate depth maps, providing surgeons with enhanced spatial awareness of internal anatomies, which improves precision in procedures like laparoscopy by visualizing tissue surfaces in three dimensions.[23] Consumer devices increasingly incorporate SGM variants for augmented reality (AR) and virtual reality (VR) functionalities. In smartphones, semi-global block matching—a derivative of SGM—utilizes dual-camera setups to compute depth for features like portrait mode bokeh effects and AR object placement, enabling seamless integration of virtual elements into real-world scenes captured by mobile lenses. VR headsets leverage similar techniques for environment mapping, generating disparity maps from stereo views to support immersive pass-through video and spatial tracking.[24] Emerging applications of SGM extend to space exploration and high-speed robotics. Improved variants like I-SGM enhance obstacle detection on lunar and Martian surfaces under challenging illumination, as tested for rover autonomy in complex extraterrestrial environments. Additionally, event-based adaptations of SGM, which process asynchronous visual data from neuromorphic sensors, enable low-latency depth estimation for high-speed robotic tasks, such as agile manipulation in dynamic settings, with ongoing developments enabling near-real-time performance on specialized hardware as of 2023.[15][25]Performance Comparisons
Semi-global matching (SGM) demonstrates strong performance in stereo matching benchmarks, particularly in balancing accuracy and computational efficiency. On earlier Middlebury stereo datasets (pre-2014), standard SGM implementations achieve bad pixel errors (pixels with disparity error >1 pixel) in the range of 5-7%, with variants like improved SGM reaching as low as 4.1%; on the more challenging 2014 dataset, standard SGM errors are around 25-30%, though variants achieve 4-7%. This outperforms traditional local methods that often exceed 20% error due to poor handling of occlusions and textureless regions.[26][27] For sub-pixel precision, SGM's endpoint error averages around 0.5-1.0 pixels on Middlebury 2014, providing reliable dense disparity maps for applications requiring fine detail.[28] In comparisons to local methods like block matching, SGM improves accuracy in textureless areas by approximately 50%, reducing erroneous matches from over 30% to under 15% in such regions. Versus global methods such as graph cuts, SGM is roughly 10 times faster while maintaining comparable accuracy, as graph cuts require iterative optimization over the full 2D energy function.[3] However, SGM is slower than basic block matching, which processes images in milliseconds but at the cost of higher error rates (e.g., 15-25% bad pixels on Middlebury).[29] Deep learning-based methods like PSMNet achieve superior accuracy, with bad pixel errors as low as ~3.1% (Out-Noc) on the KITTI 2012 benchmark compared to typical SGM variants' 9-15%, but they are 100 times slower on CPU (minutes per image versus SGM's seconds) and require GPU acceleration for real-time use.[30] Standard SGM runtime is 0.1-1 second per image on a modern CPU for VGA-resolution pairs, enabling near-real-time processing, while memory-efficient variants run in under 30 ms on embedded hardware like FPGAs.[1][31] SGM proves robust to radiometric differences and noise but remains sensitive to penalty parameter tuning (P1 for small disparity changes, P2 for large jumps), where suboptimal values can increase errors by 20-30%.[32] Evaluated on standard datasets like Middlebury (for controlled indoor scenes) and KITTI (for outdoor driving), SGM consistently ranks high in the efficiency-accuracy trade-off, as per Middlebury 2014 and KITTI 2012/2020 benchmarks, where optimized variants excel in non-occluded regions with ~5-10% error.[27][33] Despite these strengths, SGM struggles with thin structures and repetitive patterns, leading to over 15% error in such cases; recent hybrids combining CNN-based cost computation with SGM aggregation mitigate this, reducing errors on thin objects by up to 40% in 2020s benchmarks.[34][35]| Method | Bad Pixel Error (Middlebury, %) | Bad Pixel Error (KITTI D1-all, %) | Runtime (CPU, VGA image) |
|---|---|---|---|
| Local Block Matching | 20+ | 15-25 | <0.1 s |
| SGM (Standard, pre-2014) | 5-7 | ~22 | 0.1-1 s |
| SGM (Optimized Variants) | 4-7 (2014) | 9-15 | 0.1-1 s |
| Graph Cuts | 4-6 | 7-10 | 1-10 s |
| PSMNet (Deep Learning) | 3-4 | ~2.8 | >100 s (no GPU) |