Fact-checked by Grok 2 weeks ago

Connected-component labeling

Connected-component labeling (CCL) is a computational technique in image processing and that scans a to identify groups of connected pixels—known as connected components or blobs—and assigns a unique to each such group based on their spatial adjacency, typically using 4-connectivity ( and vertical neighbors) or 8-connectivity (including diagonals); it can be extended to images. This process partitions the image into distinct regions, enabling subsequent analysis such as object counting, shape measurement, or feature extraction. The concept of connected-component labeling originated in the as part of early digital picture processing methods, with Rosenfeld and John L. Pfaltz introducing sequential local operations to label connected subsets in raster-scanned images, addressing challenges like label merging for overlapping provisional assignments. Subsequent advancements, including Robert M. Haralick's 1981 iterative approach, refined the process to eliminate auxiliary storage needs while producing final labels directly from the input image. By the , the standard two-pass algorithm emerged as a cornerstone: the first pass provisionally labels pixels and records equivalences between labels, while the second pass resolves these equivalences using union-find data structures to assign consistent unique labels. CCL plays a pivotal role in applications ranging from and automated inspection to and geographic information systems, where it facilitates tasks like segmenting neurons in images or detecting objects in data. Modern variants optimize for speed and memory efficiency on parallel architectures, such as GPUs, with algorithms like Light Speed Labeling achieving linear-time performance through reduced memory accesses and equivalence resolution. Despite its maturity, ongoing research as of 2025 focuses on handling sparse or high-resolution images, including CUDA-based methods for super-large binary images.

Introduction

Overview

Connected-component labeling (CCL) is a fundamental technique in image processing that identifies and assigns unique labels to distinct regions of connected in a , where is determined by pixel adjacency. These regions, often referred to as connected components or blobs, represent coherent objects or features separated by background pixels, enabling the segmentation and analysis of visual data. The basic workflow of CCL begins with a binary input image, consisting of foreground (typically 1s) and background (0s) pixels, and produces an output labeled image where each receives a distinct identifier. This process scans the image systematically, propagating labels to adjacent pixels that satisfy the connectivity criterion, and resolves any provisional label conflicts to ensure unique assignments per component. From a perspective, CCL treats pixels as nodes in an undirected , with edges connecting adjacent foreground pixels based on the chosen neighborhood model; the connected components then correspond to the maximal connected subgraphs within this structure. This formulation facilitates the application of graph-based algorithms for region discrimination and analysis in . CCL originated in the 1960s-1970s as part of early computer vision research aimed at pattern recognition and picture processing, with foundational work by Rosenfeld and Pfaltz introducing sequential labeling methods for binary images.

Applications

Connected-component labeling (CCL) plays a pivotal role in object detection within binary images for computer vision tasks, particularly in medical imaging where it facilitates the segmentation of tumors in MRI scans by identifying and isolating connected regions of interest. For instance, in brain tumor analysis, CCL classifies tumor objects by labeling connected pixel groups after thresholding, enabling precise localization and measurement. Similarly, in document analysis, CCL supports optical character recognition (OCR) by separating individual characters or text lines from binarized document images, allowing for efficient feature extraction in printed text processing. In , CCL aids obstacle mapping by segmenting connected components in sensor-derived binary maps, such as those from projections, to delineate navigable paths and avoid hazards in environments. As a preprocessing step in broader image analysis pipelines, CCL enables removal by discarding small spurious components and supports object counting, such as in for quantifying cells in fluorescent images where it labels foreground regions post-segmentation to tally biological entities accurately. As of 2025, CCL integrates into autonomous vehicle systems for lane marking segmentation, where it processes edge-detected binary images to identify and label connected lane boundaries, enhancing path planning in varied road conditions. In AI-driven , CCL contributes to crowd analysis by labeling connected pedestrian blobs in binarized video frames, facilitating and in public spaces. The primary benefits of CCL lie in its ability to simplify complex image data into , countable regions, thereby enabling subsequent measurements of component properties like size, shape, and for across these domains.

Fundamentals

Definition

Connected-component labeling (CCL) is a core technique in digital image processing that identifies and distinguishes disjoint regions, known as connected components, within a binary image. Specifically, it partitions the set of foreground pixels (typically those with value 1) into maximal subsets where pixels are connected according to a predefined neighborhood structure, such as 4-connectivity (considering only horizontal and vertical neighbors) or 8-connectivity (including diagonal neighbors as well). While primarily defined for binary images, CCL can be extended to grayscale images by first applying a thresholding operation to binarize the image. CCL algorithms assign unique labels to each connected component, often using provisional assignments and equivalence resolution in multi-pass methods, ensuring each component receives a unique, final identifier from 1 to N, where N is the total number of components. This partitioning relies on an equivalence relation induced by pixel connectivity: two pixels p and q are equivalent if there exists a path of adjacent foreground pixels connecting them, with adjacency defined by the chosen connectivity model. The connected components then correspond to the equivalence classes under this relation, forming a complete and disjoint segmentation of the foreground pixels. A binary input image is formally denoted as a function f(x,y) \in \{0,1\}, where (x,y) are pixel coordinates, 0 represents background, and 1 represents foreground; the output is a labeled image g(x,y), where g(x,y) = 0 for background pixels and g(x,y) \in \{1, 2, \dots, N\} for foreground pixels belonging to the g(x,y)-th component. Connected-component analysis (CCA) builds upon CCL by incorporating the measurement of geometric and statistical properties for each labeled component, such as its area (number of pixels), (average position), and bounding box (minimal enclosing ). These properties enable further quantitative analysis, like feature extraction for , but CCA presupposes the completion of the labeling process.

Connectivity Models

In connected-component labeling for binary digital images, connectivity defines whether pixels belong to the same component based on adjacency rules. Formally, for a pixel p = (x, y) in the foreground, the neighborhood N(p) consists of adjacent foreground pixels according to a specified model. Two foreground pixels are connected if there exists a between them where each consecutive pair shares a neighborhood within N(p). The 4-connectivity model, also known as neighborhood, considers a connected only to its four orthogonal neighbors: north (N), south (S), east (E), and west (W), corresponding to positions (x-1, y), (x+1, y), (x, y-1), and (x, y+1). This model excludes diagonal adjacencies, making it suitable for shapes without diagonal extensions, such as grid-aligned structures. Mathematically, two s (i_1, j_1) and (i_2, j_2) are 4-connected if there is a where each step changes exactly one coordinate by ±1 ( distance =1). In contrast, the 8-connectivity model, based on the Moore neighborhood, includes all eight surrounding pixels: the four orthogonal neighbors plus the four diagonals (NE, NW, SE, SW) at positions (x-1, y-1), (x-1, y+1), (x+1, y-1), and (x+1, y+1). This is the standard choice for most applications in image processing, as it better captures natural, irregularly shaped objects by allowing diagonal connections. Formally, an 8-path between pixels requires \max(|i_r - i_{r+1}|, |j_r - j_{r+1}|) = 1 for each step (excluding identical pixels). The choice between and significantly affects labeling outcomes. Under , diagonally adjacent foreground pixels may form separate components, potentially increasing the total number of detected objects—for instance, a diagonal chain might split into multiple components. Conversely, can merge background regions across diagonals, which may lead to fewer but larger foreground components if background connectivity is not dual-managed (e.g., using for the background complement to preserve ). This duality ensures consistent topological properties, such as Euler numbers, in images. Edge cases arise with image borders and isolated pixels. Pixels at the borders have truncated neighborhoods, including only valid adjacent positions within the bounds; for example, a corner pixel in 8-connectivity has only three neighbors. Isolated foreground pixels, lacking any neighbors in the chosen model, form singleton components and receive unique labels. These cases are handled inherently by restricting N(p) to the image domain, ensuring no external connections are assumed.

Algorithms

One-Component-at-a-Time

The one-component-at-a-time algorithm for connected-component labeling, also known as the flood-fill or seed-fill method, identifies and labels each in a sequentially by fully processing one component before proceeding to the next. It begins by scanning the image in raster order—left-to-right and top-to-bottom—to locate unlabeled foreground s (typically represented as 1s). Upon finding such a , the algorithm assigns it a unique from an incrementing counter and then propagates this to all connected foreground s within the same component using a technique. This approach treats the image as an implicit graph where foreground s are nodes and edges connect neighboring s based on the chosen connectivity model (4- or 8-connected). Developed as a foundational technique in early , it emerged in the alongside pioneering work on digital picture processing and graph-based image analysis. The algorithm's steps are straightforward and rely on a single linear scan interrupted by traversals:
  • Initialize a label counter to 1 and ensure all pixels are marked as unlabeled (e.g., via a separate visited or by setting labels to 0).
  • Iterate through each in row-major order.
  • If the current is an unlabeled foreground , assign it the current value and increment the counter.
  • Perform a (DFS) or (BFS) starting from this seed to all connected unlabeled foreground pixels with the same value, marking them as processed to prevent revisits.
For the traversal, DFS explores as far as possible along each branch before backtracking, while BFS explores level by level. Data structures are minimal: a implements iterative DFS (or can be used, though it risks for large components), and a implements BFS. Unlike more advanced methods, no equivalence tables or provisional labels are required, as each component receives its final label immediately upon discovery. This method's simplicity offers key advantages, including ease of implementation and intuitive operation, making it suitable for educational purposes or small-scale applications. It ensures accurate labeling without the complexity of resolving label equivalences, directly producing a final output where connected pixels share identical labels. Historically, it provided the basis for integrating concepts from into vision tasks, influencing subsequent developments in during the 1970s. However, the algorithm has notable limitations, particularly for efficiency on larger images. While its is —with N as the total number of pixels, since each pixel is examined and potentially enqueued/dequeued a constant number of times—the practical constant factor is elevated due to traversal overhead and neighborhood checks during propagation. This makes it less suitable for dense or high-resolution images, where the repeated access to scattered locations can lead to poor performance. Additionally, for very large components, BFS queues or DFS stacks may consume significant .

Two-Pass Labeling

The two-pass labeling algorithm is a sequential method for connected-component labeling that processes a in two raster scans to assign unique labels to connected components, minimizing memory usage while achieving linear time performance. Introduced in the foundational work on digital picture processing, this approach first assigns provisional labels during a forward scan and then resolves equivalences in a second scan. It is particularly efficient for 4- or 8-connectivity models and forms the basis for many practical implementations in image analysis. In the first pass, scans the from top to bottom and left to right. For each foreground (typically 1), it examines its relevant to assign a provisional . For 8-connectivity, the checked are the north (N), northwest (NW), and west (W) positions, as these are the only ones that could have been labeled earlier in the scan order. If none of these is a foreground with a , a new provisional is assigned (incrementing a global ). If exactly one has a , that is assigned to the current . If multiple have different , the smallest is assigned, and equivalences between the differing are recorded to indicate they belong to the same component. This process ensures that connected foreground receive compatible provisional without immediate resolution. Equivalences are managed using a union-find , which efficiently tracks sets of provisional labels that represent the same component. The structure consists of a parent array where each provisional label points to its representative (root) parent, initially itself. The find-root operation traces the parent chain to the root, often with path compression to optimize subsequent lookups by setting each node directly to the root. The union operation merges two sets by linking the root of one to the root of the other, typically using union-by-rank or union-by-size to maintain balance and near-constant time performance (amortized O(α(n)), where α is the inverse ). During the first pass, when multiple neighbor labels are found, the algorithm performs union operations on their roots after assigning the minimum label. This avoids redundant labels while deferring full resolution. The second pass, typically another forward or backward , replaces each provisional with its final equivalent by performing a find-root operation on the union-find structure. Each foreground pixel's is updated to the root of its provisional label's set, ensuring all pixels in a share the same unique . To produce consecutive labels starting from 1, the roots can be remapped to sequential integers after identifying all unique roots. This pass handles any remaining conflicts from the first pass by propagating the resolved labels across the image. For 8-connectivity, the neighbor checks in the first pass suffice, as the union-find merges ensure global consistency without additional conflict resolution in the second pass. The algorithm's is , where N is the number of s, due to constant-time operations per pixel across both passes (with union-find's near-linear amortized cost). is in the worst case for the label array and union-find parent array, but practically O(M) where M is the number of provisional labels (bounded by the number of components, often much smaller than N). This makes it suitable for large images compared to recursive methods. A high-level pseudocode outline for the algorithm (assuming 8-connectivity and a image img of size rows × cols, with label for output, and union-find functions find and union_sets) is as follows:
# Initialization
next_label = 1
parent = []  # Union-find parent [array](/page/Array)
[rank](/page/Rank) = []    # For union-by-[rank](/page/Rank)

# First pass
for i from 0 to rows-1:
    for j from 0 to cols-1:
        if img[i][j] == 1:  # Foreground [pixel](/page/Pixel)
            candidates = []
            if i > 0 and label[i-1][j] > 0:  # N [neighbor](/page/Neighbor)
                candidates.append(label[i-1][j])
            if i > 0 and j > 0 and label[i-1][j-1] > 0:  # NW [neighbor](/page/Neighbor)
                candidates.append(label[i-1][j-1])
            if j > 0 and label[i][j-1] > 0:  # W [neighbor](/page/Neighbor)
                candidates.append(label[i][j-1])
            
            if candidates empty:
                label[i][j] = next_label
                parent.append(next_label)
                [rank](/page/Rank).append(0)
                next_label += 1
            else:
                min_label = min(candidates)
                label[i][j] = min_label
                for c in candidates:
                    if c != min_label:
                        union_sets(find(c), find(min_label))

# Second pass (after union-find resolution)
for i from 0 to rows-1:
    for j from 0 to cols-1:
        if label[i][j] > 0:
            label[i][j] = find(label[i][j])  # Replace with root
# Optional: Remap roots to 1, 2, ..., num_components
This outline captures the core steps, with full implementations requiring careful indexing and initialization of union-find operations.

Parallel Variants

In the early 1990s, parallel variants of connected-component labeling (CCL) emerged to address the computational demands of large-scale image processing on multiprocessor systems. These approaches typically employed domain decomposition, partitioning the image into independent strips or blocks assigned to separate processors for local labeling. After local processing, border synchronization steps resolved equivalences between adjacent subimages by exchanging and merging provisional labels along shared boundaries, ensuring global consistency. This method, often implemented on mesh-connected or hypercube architectures, reduced sequential bottlenecks but required careful load balancing to minimize communication overhead. Modern implementations have seen a resurgence post-2010 with GPU-based acceleration using , enabling massive for high-resolution images. These variants adapt traditional two-pass labeling to single-pass schemes on GPUs, leveraging for efficient provisional assignment and union-find operations within thread blocks. For instance, algorithms propagate labels across pixels in a manner, using operations or warp-level primitives to handle neighborhood checks in . Distributed versions extend this to big data scenarios, such as analysis, by decomposing images across clusters and synchronizing via for cross-node equivalences. Key challenges in CCL include race conditions during simultaneous label assignments, which can lead to inconsistent provisional labels in concurrent accesses to shared structures. Solutions mitigate this through prefix-sum operations for compact label allocation, avoiding conflicts by precomputing indices, or hierarchical union-find structures that perform root finding in logarithmic steps across processors. OpenCV's connectedComponentsWithAlgorithm incorporates CCL via multi-threading for CPU execution, supporting both block-based and run-length encodings for on varied hardware. Performance gains from these variants are substantial, achieving speedups of 10-100x over sequential methods on multi-core CPUs and GPUs, particularly for images exceeding megapixel. Theoretical complexity often approaches O(N/p + log N), where N is the number of pixels and p is the number of processors, dominated by local processing and logarithmic synchronization phases. As of 2025, parallel CCL integrates seamlessly into pipelines for real-time segmentation, serving as post-processing to refine masks from neural networks in AI devices, enhancing applications like autonomous and .

Examples

Graphical Illustration

To illustrate the execution of connected-component labeling algorithms, consider a step-by-step graphical example using the two-pass on a 5x5 , where foreground pixels (1) form two distinct components under 8-connectivity rules. The input image is shown below, with 1 representing foreground pixels and 0 background:
12345
100000
201100
301010
400110
500001
In the first pass, the image is scanned from top to bottom and left to right. Provisional labels are assigned to foreground s based on neighboring labels in the forward (the itself, left, above-left, above, and above-right for 8-connectivity). New labels are issued when no labeled neighbors exist, and equivalences are recorded using a union-find structure for any conflicting provisional labels. The resulting provisional labeling is:
12345
100000
201100
301010
400110
500003
Here, the component at positions (2,2)-(2,3)-(3,2)-(3,4) receives 1, with (3,4) assigned 1 based on its above-left neighbor at (2,3); the diagonal-connected s at (4,3)-(4,4) connect to 1 via above-left (3,2) and above-right (3,4) neighbors; and the isolated at (5,5) gets 3. No equivalences are recorded in this example. The equivalence resolution step uses union-find, but with no merges needed, labels remain as provisional roots. In the second pass, scanning from top to bottom and left to right, each provisional is replaced by its root equivalent from the union-find structure. To assign consecutive unique labels, roots are mapped (e.g., 3 to 2). The final output assigns unique labels to each :
12345
100000
201100
301010
400110
500002
This resolves any potential conflicts, yielding one main component (label 1) and one small isolated component (label 2). The 8-connectivity model is particularly evident in the diagonal connections, where pixels like (3,4) connect via above-left neighbor and (4,3) via above-left and above-right neighbors, ensuring the L-shaped component spanning rows 2-4 is treated as a single entity. Without 8-connectivity, this would split into separate components under 4-connectivity rules. Common pitfalls in the first pass often arise from corner or edge s with partial neighbor visibility, leading to temporary label fragmentation in more complex images. For instance, a foreground on the edge might temporarily appear disconnected if its diagonal neighbor is not yet resolved, but the algorithm's forward mask and union-find prevent over-labeling by proper neighbor checks. This highlights how the two-pass process ensures correct labeling without fragmentation in this case. For the one-component-at-a-time variant, a sequential traversal (e.g., flood-fill) processes each component individually starting from an unvisited foreground . Consider a simple 3x3 circular shape :
123
1010
2111
3010
Starting at (1,2) with label 1, a depth-first or visits all 8-connected neighbors recursively, labeling the entire disk as 1 in one traversal, then moves to the next unvisited foreground for subsequent components. This approach suits sparse images but requires a visited array to avoid reprocessing. Such graphical illustrations are commonly employed in teaching and via tools like MATLAB's bwlabel , which generates labeled images and visualizations for verifying and label counts in .

Pseudocode

Connected-component labeling algorithms are often implemented using that outlines the core logic in a manner. This section presents for the primary sequential approaches: the one-component-at-a-time method, which propagates labels through depth-first or , and the two-pass method, which resolves label equivalences using union-find data structures. These assume a 2D represented as a where foreground pixels are 1 and background are 0, with (4- or 8-neighbor) specified as a . For large images, implementations must consider constraints, as the label requires additional proportional to the image size, potentially leading to out-of-memory errors on resource-limited systems.

One-Component-at-a-Time Algorithm

This algorithm initializes a label counter and scans the image row-by-row, left-to-right. Upon encountering an unlabeled foreground , it assigns a new and propagates it to all connected neighbors using a stack-based DFS (or queue-based BFS) subroutine. The connectivity parameter determines the neighbor offsets checked (e.g., {(0,-1), (-1,0), (0,1), (1,0)} for 4-connectivity or including diagonals for 8-connectivity). This approach ensures each component is labeled in a single traversal but may require recursion or space up to the component size.
function OneComponentAtATime(image, height, width, connectivity):
    labels = 2D array of size height x width, initialized to 0
    label_counter = 1
    neighbor_offsets = get_offsets(connectivity)  // e.g., 4 or 8 directions
    
    for y from 0 to height-1:
        for x from 0 to width-1:
            if image[y][x] == 1 and labels[y][x] == 0:
                labels[y][x] = label_counter
                stack = push((y, x))
                while stack is not empty:
                    (cy, cx) = stack.pop()
                    for each (dy, dx) in neighbor_offsets:
                        ny = cy + dy
                        nx = cx + dx
                        if 0 <= ny < height and 0 <= nx < width and image[ny][nx] == 1 and labels[ny][nx] == 0:
                            labels[ny][nx] = label_counter
                            stack.push((ny, nx))
                label_counter += 1
    
    return labels

Two-Pass Algorithm

The two-pass algorithm performs a forward raster scan to assign provisional labels and detect equivalences, followed by a backward pass to resolve them using for efficient root finding and merging. During the first pass, neighbor checks use the same connectivity parameter, recording equivalences when adjacent pixels have different provisional labels. The structure supports path compression and union-by-rank for near-constant time operations. A second forward pass then assigns the final root label to each pixel. This method is memory-efficient compared to recursive propagation but requires an equivalence table or array sized to the maximum provisional label.
function UnionFind(parent, rank):
    function find(x):
        if parent[x] != x:
            parent[x] = find(parent[x])  // path compression
        return parent[x]
    
    function union(x, y):
        px = find(x)
        py = find(y)
        if px != py:
            if rank[px] > rank[py]:
                parent[py] = px
            elif rank[px] < rank[py]:
                parent[px] = py
            else:
                parent[py] = px
                rank[px] += 1

function TwoPass(image, height, width, connectivity):
    labels = 2D array of size height x width, initialized to 0
    forward_offsets = get_forward_offsets(connectivity)  // e.g., left, above-left, above, above-right for 8-connectivity
    max_label = 0
    parent = []  // will resize as needed
    rank = []
    
    // Initialize union-find as labels are created
    for y from 0 to height-1:
        for x from 0 to width-1:
            if image[y][x] == 1:
                connected = set()
                for each (dy, dx) in forward_offsets:
                    ny = y + dy
                    nx = x + dx
                    if 0 <= ny < height and 0 <= nx < width and labels[ny][nx] > 0:
                        connected.add(labels[ny][nx])
                if not connected:
                    max_label += 1
                    while len(parent) <= max_label:
                        parent.append(len(parent))
                        rank.append(0)
                    labels[y][x] = max_label
                else:
                    min_label = min(connected)
                    labels[y][x] = min_label
                    for l in connected:
                        if l != min_label:
                            union(min_label, l)
    
    // Second pass: resolve labels
    for y from 0 to height-1:
        for x from 0 to width-1:
            if labels[y][x] > 0:
                labels[y][x] = find(labels[y][x])
    
    // Optional: renumber to consecutive labels
    root_to_label = {}
    new_label = 1
    for y from 0 to height-1:
        for x from 0 to width-1:
            if labels[y][x] > 0:
                root = labels[y][x]
                if root not in root_to_label:
                    root_to_label[root] = new_label
                    new_label += 1
                labels[y][x] = root_to_label[root]
    
    return labels

Parallel Variant: Prefix-Sum Label Allocation on GPU

For GPU implementations, a parallel variant adapts the two-pass approach by distributing pixel processing across threads and using a prefix-sum operation to allocate unique final labels to equivalence class roots after equivalence resolution. Threads initially assign provisional labels in parallel, detect local equivalences, and compact roots; a global prefix sum then renumbers them uniquely, avoiding atomic conflicts. This leverages CUDA's parallel scan primitives for efficiency on large images.
kernel PrefixSumAllocation(labels, num_pixels):  // Simplified GPU [kernel](/page/Kernel) outline
    // Phase 1: Local provisional labeling and equivalence detection per thread block
    // (Omitted: thread-local union-find and compaction)
    
    // Phase 2: Global [prefix sum](/page/Prefix_sum) on root counts
    roots = compact_roots(labels)  // Array of unique roots
    prefix_sums = parallel_scan(roots, num_roots)  // [CUDA](/page/CUDA) [prefix sum](/page/Prefix_sum)
    
    // Phase 3: Assign final labels
    for each pixel in parallel:
        root = find(labels[pixel])
        labels[pixel] = prefix_sums[root_index(root)]
Implementation notes emphasize thread synchronization for neighbor access in GPUs and warn that for images exceeding available VRAM (e.g., >10^9 pixels), tiled processing or out-of-core methods are necessary to avoid crashes.

Performance and Implementations

Evaluation Frameworks

Evaluation frameworks for connected-component labeling (CCL) algorithms primarily focus on assessing efficiency, correctness, and robustness across diverse scenarios. Key metrics include execution time, which measures the overall processing speed on images; usage, evaluating auxiliary structures like equivalence tables or label arrays; labeling accuracy, ensuring correct assignment of unique labels to equivalent pixels while verifying relations; and scalability, examining as image dimensions increase or component counts vary. These metrics enable standardized comparisons, with hardware-agnostic approaches emphasizing independent of specific architectures. A prominent benchmarking framework is (Yet Another Connected Components Labeling Benchmark), an open-source C++ library introduced in 2016, supporting over 37 algorithms on both synthetic and real-world / images. YACCLAB tests algorithms using datasets like MIRflickr for natural scenes, Fingerprints for textured patterns, and for volumetric data, providing metrics such as average runtime, memory accesses, and correctness verification through ground-truth comparisons. It facilitates reproducible evaluations by including varied image characteristics, including noise and component densities, and supports both CPU and GPU implementations. Comparative analyses via YACCLAB and similar tools reveal that two-pass labeling algorithms typically outperform one-component-at-a-time approaches by factors of 2-5x in execution time on raster-scan images, due to efficient forward and backward passes that the entire image sequentially rather than iteratively per component. variants, particularly GPU-based implementations like CUDA-KE (Komura Equivalence), achieve 5-10x speedups over sequential CPU methods on large datasets, with further gains up to 50x reported for optimized block-based algorithms on high-density images, highlighting the benefits of massive in handling inter-pixel dependencies. These comparisons underscore the trade-offs: two-pass methods excel in efficiency for dense components, while versions scale better for high-resolution inputs. Testing methodologies in these frameworks systematically vary parameters to probe robustness, such as 4- vs. 8-connectivity models, levels from 0-20% corruption, and component densities ranging from sparse (few large blobs) to dense (thousands of small regions). Hardware-agnostic metrics, like instructions per or hit rates, ensure fair assessments across platforms, often using synthetic generators to control and foreground occupancy. Real-image tests incorporate domain-specific challenges, such as varying contrasts in fingerprints or volumetric in medical scans, to evaluate practical performance. Recent developments integrate CCL into hybrid deep learning pipelines, where neural segmentation (e.g., for masks) is followed by post-processing CCL for precise labeling in applications like document analysis and inverse design.

Hardware Architectures

Field-programmable gate arrays (FPGAs) offer customizable hardware architectures for accelerating connected-component labeling (CCL), particularly through single-pass designs that enable streaming processing for real-time video analysis. These implementations process images row-by-row in a single scan, resolving label equivalences on-the-fly to avoid multiple memory accesses. Pipelined scanning architectures on FPGAs divide the image into overlapping windows, propagating provisional labels through hardware pipelines while using on-chip union-find data structures for efficient merging. Equivalence tables and resolution stacks manage temporary labels, with root finding performed via path compression to minimize conflicts. A on Zynq UltraScale+ MPSoC supports 2 or 4 pixels per clock cycle, achieving real-time CCL for 4K-resolution (3840×2160) images at 60 frames per second; the 4-pixel variant operates at 150 MHz with 21% lower power consumption compared to the 2-pixel version at 300 MHz. Earlier single-pass prototypes from similarly emphasize online via equivalence tables, enabling high-throughput labeling without buffering entire images. Throughput gains from FPGA designs significantly outperform CPU-based methods; for instance, a approach with 30 parallel cores on FPGA delivers over 108 times the speed of multi-threaded CPU implementations for large images. A FPGA tailored for high-resolution images with complex, heterogeneous structures uses a modified two-pass method optimized for low on-chip memory (under 1 MB), supporting up to 8K×8K resolutions at frame rates exceeding 30 on mid-range devices. Application-specific integrated circuits () and custom hardware target CCL in power-constrained embedded systems, such as medical imaging devices and Internet of Things () sensors, where size, , and with other units are paramount. These designs often incorporate dedicated labeling cores with fixed-function pipelines for scanning and , minimizing gate count while supporting 4- or 8-connectivity models. A 2014 ASIP (application-specific instruction-set ) for embedded CCL provides configurable parallelism levels, allowing trade-offs between throughput and silicon area for into system-on-chip platforms. Low-power variants prioritize sub-threshold operation and to extend battery life in portable devices; for example, a heterogeneous suitable for applications achieves labeling speeds of 100-200 MHz while consuming under 50 mW, enabling on-device processing in or wearable systems without offloading to external processors. In IoT contexts, custom integrate CCL with sensor interfaces, using run-length compression to reduce data movement and power draw to microwatt levels for always-on in edge environments. Graphics processing units (GPUs) leverage massive parallelism via kernels for CCL on large-scale s, partitioning the workload into thread blocks that process blocks independently before global resolution. Block-based approaches assign threads to local neighborhoods, using for intra-block unions and operations on global for inter-block merges, which handle propagation across boundaries. This avoids full buffering and supports scalable labeling for volumes up to gigapixel sizes. Optimized algorithms like Block-based Union-Find (BUF) and Block-based Komura Equivalence (BKE) enhance eight-connectivity handling on GPUs by minimizing atomic contention through hierarchical resolution: local union-find within blocks followed by batched global updates. These methods achieve up to 10-20× speedup over baseline GPU CCL for synthetic and real images, with BUF excelling in dense components due to efficient path halving. Direct GPU algorithms further integrate analysis features, using atomics for tentative label assignment and synchronization barriers for coherency, reducing passes to one while supporting analysis (e.g., feature extraction) in a single kernel launch. Hybrid CPU-GPU systems offload the parallel scanning and provisional labeling to GPUs, reserving the CPU for final equivalence resolution and output formatting, which mitigates data transfer overhead in iterative workflows. A 2024 hybrid algorithm employs union-find with path compression tailored for sparse , processing zero-suppressed encodings to cut by 50% and enabling real-time CCL on datasets from imaging. This offloading strategy scales to multi-GPU setups, where CPUs coordinate block partitioning for images exceeding single-device memory limits. Hardware architectures for CCL encounter persistent challenges in synchronization latency and power efficiency, particularly in parallel designs where inter-component merges require atomic updates or barriers that can serialize execution. On GPUs, atomic operations for union-find introduce contention in high-density regions, inflating latency by up to 2-5× during global resolution phases despite overall throughput gains. FPGA and ASIC implementations must optimize pipeline stalls from label propagation, often trading clock frequency for reduced dynamic power (e.g., targeting 100-200 mW for embedded use), while ensuring thermal efficiency in integrated systems.

References

  1. [1]
    The connected-component labeling problem: A review of state-of-the ...
    This article addresses the connected-component labeling problem which consists in assigning a unique label to all pixels of each connected component (ie, each ...Missing: original | Show results with:original
  2. [2]
    Image Analysis - Connected Components Labeling
    Connected components labeling groups pixels by connectivity, scanning pixel-by-pixel to identify connected regions with similar intensity values.
  3. [3]
    [PDF] Sequential Operations in Digital Picture Processing
    SEQUENTIAL OPERATIONS IN DIGITAL .PICTURE PROCESSING ... RECEI'VED JANUARY, 1966; R~WS~D Aem~, 1966. Page 24. 494. AZRIEL ROSENFELD AND JOHN L. PFALTZ. REFERENCES.
  4. [4]
    [PDF] Connected Component Labeling Algorithm for very complex and ...
    It was first presented in 1966 by Rosenfeld and. Pfaltz,2 an advancement 1981 by Haralick.3 As illustrated in figure 5, the image is first scanned in forward.
  5. [5]
    [PDF] Optimizing Two-Pass Connected-Component Labeling Algorithms*
    Abstract We present two optimization strategies to improve connected component labeling algorithms. Taking together, they form an efficient two-pass ...
  6. [6]
    [PDF] Segmentation and Tracking of 3D Neuron Microscopy Images Using ...
    Then we label neurons using an efficient connected component labeling algorithm. We show sample results obtained from real neuron image data. I. INTRODUCTION.
  7. [7]
    [PDF] What Is the World's Fastest Connected Component Labeling ...
    Dec 14, 2014 · Abstract—Optimizing connected component labeling is cur- rently a very active research field. Some teams claim to have.
  8. [8]
    Sequential Operations in Digital Picture Processing | Journal of the ...
    Sequential Operations in Digital Picture Processing. Authors: Azriel Rosenfeld, ... Share. First page of PDF. Formats available. You can view the full content in ...
  9. [9]
    [PDF] Connected Component Analysis - Purdue Engineering
    The set of connected components partition an image into segments. • Image segmentation is an useful operation in many image processing applications. Page 2 ...
  10. [10]
    Connectivity in Digital Pictures | Journal of the ACM
    Character recognition in an experimental reading machine for the blind. In Recognizing Pallerns: Studies in Living and A utomatie Systems.
  11. [11]
    [PDF] An Algorithm for Connected-Component Labeling, Hole Labeling ...
    This paper proposes a two-scan algorithm for labeling connected components and holes simultaneously in a binary image by use of the same data structure. With ...
  12. [12]
    [PDF] Connected Components Labeling and Analysis for sparse images
    Abstract—Connected components labeling and analysis for dense images have been extensively studied on a wide range of architectures.
  13. [13]
    [PDF] Optimizing Connected Component Labeling Algorithms - OSTI.GOV
    ABSTRACT. This paper presents two new strategies that can be used to greatly improve the speed of connected component labeling algorithms.Missing: Rosenfeld seminal
  14. [14]
    Parallel architectures and algorithms for image component labeling
    Insufficient relevant content. The provided URL (https://ieeexplore.ieee.org/document/159904) does not contain accessible text or a full paper to extract information on early parallel approaches for connected component labeling, including domain decomposition into strips or blocks, local labeling followed by border synchronization.
  15. [15]
    Leveraging deep learning segmentation techniques and connected ...
    Dec 3, 2024 · Leveraging deep learning segmentation techniques and connected component analysis to automate high-level cost estimates of facade retrofits ...
  16. [16]
    (PDF) Connected Component Labeling in CUDA - ResearchGate
    Jun 17, 2014 · Connected component labeling (CCL) is a task of detecting connected regions in input data, and it finds its applications in pattern recognition, computer ...Missing: post- | Show results with:post-
  17. [17]
  18. [18]
    YACCLAB: Yet Another Connected Components Labeling Benchmark
    Stripe-Based Labeling Algorithm, H.L. Zhao, Y.B. Fan, T.X. Zhang, H.S. ... Kim, "Fast Parallel Connected Component Labeling Algorithm Using CUDA Based ...Missing: strips | Show results with:strips
  19. [19]
    [PDF] Optimizing GPU-Based Connected Components Labeling Algorithms
    Abstract—Connected Components Labeling (CCL) is a fun- damental image processing technique, widely used in various application areas.
  20. [20]
    A Study on Connected Components Labeling algorithms using GPUs
    We propose in this article an optimized version of CCL for GPUs using GPGPU (General-Purpose Computing on Graphics Processing Units) techniques.
  21. [21]
    [PDF] Two Strategies to Speed up Connected Component Labeling ...
    Lemma 5: The worst-case time of the analysis phase of a two-pass connected component labeling algorithm using a union-find with pass compression is O(p) ...
  22. [22]
    [PDF] YACCLAB - Yet Another Connected Components Labeling Benchmark
    In this paper, we propose and describe YACCLAB, Yet Another Connected. Components Labeling Benchmark. Together with a rich and varied dataset, YACCLAB contains ...
  23. [23]
  24. [24]
    Printed document layout analysis and optical character recognition ...
    Jul 3, 2025 · ... deep learning to . ... Following this, Connected Component Labeling (CCL) is applied to the white areas, isolating ...
  25. [25]
    Real-Time FPGA Implementation of Parallel Connected Component ...
    Apr 1, 2021 · In this paper, a hardware implementation in reconfigurable logic of a single-pass connected component labelling (CCL) and connected component analysis (CCA) ...
  26. [26]
  27. [27]
    (PDF) A run-length based connected component algorithm for FPGA ...
    ... Speed up. Connected Component Labeling Algorithms. LBNL Tech. Report LBNL ... With 30 cores on an FPGA, it is over 108 times faster than a multi-threaded ...<|separator|>
  28. [28]
    Connected Component Labeling algorithm for very complex and ...
    Oct 20, 2015 · In this article, the FPGA implementation of a CCL method is presented, which was specially designed to process high resolution images with ...
  29. [29]
    A flexible ASIP architecture for connected components labeling in ...
    In this paper, we introduce a hardware architecture to accelerate connected component labeling (CCL) for embedded systems. The proposed CCL architecture ...Missing: ASIC | Show results with:ASIC
  30. [30]
    An Efficient Connected Component Labeling Architecture for ... - MDPI
    Mar 6, 2018 · In this paper, we present the design of a new Connected Component Labeling hardware architecture suitable for high performance heterogeneous image processing ...Missing: ASIC | Show results with:ASIC
  31. [31]
    [PDF] Optimized Block-Based Algorithms to Label Connected Components ...
    In order to produce a fair comparison with previously proposed alternatives, YACCLAB, a public CCL benchmarking framework, has been extended and made suitable ...
  32. [32]
    (PDF) Optimized Block-Based Algorithms to Label Connected ...
    In this paper, two new eight-connectivity CCL algorithms are proposed, namely Block-based Union Find (BUF) and Block-based Komura Equivalence (BKE). These ...
  33. [33]
    [PDF] A new Direct Connected Component Labeling and Analysis ... - HAL
    Nov 15, 2018 · A new Direct Connected Component Labeling and ... Parallel Light Speed. Labeling for connected component analysis on multi-core processors.
  34. [34]
    Parallel CPU- and GPU-based connected component algorithms for ...
    Dec 16, 2024 · Our GPU implementation achieved a throughput of up to 300 million hits per second, providing a two-order-of-magnitude speedup over compared CPU- ...