Connected-component labeling
Connected-component labeling (CCL) is a computational technique in image processing and computer vision that scans a binary image to identify groups of connected pixels—known as connected components or blobs—and assigns a unique label to each such group based on their spatial adjacency, typically using 4-connectivity (horizontal and vertical neighbors) or 8-connectivity (including diagonals); it can be extended to grayscale images.[1][2] This process partitions the image into distinct regions, enabling subsequent analysis such as object counting, shape measurement, or feature extraction.[2]
The concept of connected-component labeling originated in the 1960s as part of early digital picture processing methods, with Azriel Rosenfeld and John L. Pfaltz introducing sequential local operations to label connected subsets in raster-scanned images, addressing challenges like label merging for overlapping provisional assignments.[3] Subsequent advancements, including Robert M. Haralick's 1981 iterative approach, refined the process to eliminate auxiliary storage needs while producing final labels directly from the input image.[4] By the 1980s, the standard two-pass algorithm emerged as a cornerstone: the first pass provisionally labels pixels and records equivalences between labels, while the second pass resolves these equivalences using union-find data structures to assign consistent unique labels.[5]
CCL plays a pivotal role in applications ranging from pattern recognition and automated inspection to medical imaging and geographic information systems, where it facilitates tasks like segmenting neurons in microscopy images[6] or detecting objects in satellite data.[7] Modern variants optimize for speed and memory efficiency on parallel architectures, such as GPUs, with algorithms like Light Speed Labeling achieving linear-time performance through reduced memory accesses and equivalence resolution.[1] Despite its maturity, ongoing research as of 2025 focuses on handling sparse or high-resolution images, including CUDA-based methods for super-large binary images.[8][9]
Introduction
Overview
Connected-component labeling (CCL) is a fundamental technique in image processing that identifies and assigns unique labels to distinct regions of connected pixels in a binary image, where connectivity is determined by pixel adjacency.[10] These regions, often referred to as connected components or blobs, represent coherent objects or features separated by background pixels, enabling the segmentation and analysis of visual data.[3]
The basic workflow of CCL begins with a binary input image, consisting of foreground (typically 1s) and background (0s) pixels, and produces an output labeled image where each connected component receives a distinct integer identifier.[10] This process scans the image systematically, propagating labels to adjacent pixels that satisfy the connectivity criterion, and resolves any provisional label conflicts to ensure unique assignments per component.[3]
From a graph theory perspective, CCL treats pixels as nodes in an undirected graph, with edges connecting adjacent foreground pixels based on the chosen neighborhood model; the connected components then correspond to the maximal connected subgraphs within this structure.[10] This formulation facilitates the application of graph-based algorithms for region discrimination and analysis in digital pictures.[3]
CCL originated in the 1960s-1970s as part of early computer vision research aimed at pattern recognition and picture processing, with foundational work by Rosenfeld and Pfaltz introducing sequential labeling methods for binary images.[10]
Applications
Connected-component labeling (CCL) plays a pivotal role in object detection within binary images for computer vision tasks, particularly in medical imaging where it facilitates the segmentation of tumors in MRI scans by identifying and isolating connected regions of interest. For instance, in brain tumor analysis, CCL classifies tumor objects by labeling connected pixel groups after thresholding, enabling precise localization and measurement. Similarly, in document analysis, CCL supports optical character recognition (OCR) by separating individual characters or text lines from binarized document images, allowing for efficient feature extraction in printed text processing.
In robotics, CCL aids obstacle mapping by segmenting connected components in sensor-derived binary maps, such as those from LiDAR projections, to delineate navigable paths and avoid hazards in real-time environments. As a preprocessing step in broader image analysis pipelines, CCL enables noise removal by discarding small spurious components and supports object counting, such as in microscopy for quantifying cells in fluorescent images where it labels foreground regions post-segmentation to tally biological entities accurately.
As of 2025, CCL integrates into autonomous vehicle systems for lane marking segmentation, where it processes edge-detected binary images to identify and label connected lane boundaries, enhancing path planning in varied road conditions. In AI-driven surveillance, CCL contributes to crowd analysis by labeling connected pedestrian blobs in binarized video frames, facilitating density estimation and anomaly detection in public spaces.
The primary benefits of CCL lie in its ability to simplify complex image data into discrete, countable regions, thereby enabling subsequent measurements of component properties like size, shape, and centroid for quantitative analysis across these domains.
Fundamentals
Definition
Connected-component labeling (CCL) is a core technique in digital image processing that identifies and distinguishes disjoint regions, known as connected components, within a binary image. Specifically, it partitions the set of foreground pixels (typically those with value 1) into maximal subsets where pixels are connected according to a predefined neighborhood structure, such as 4-connectivity (considering only horizontal and vertical neighbors) or 8-connectivity (including diagonal neighbors as well). While primarily defined for binary images, CCL can be extended to grayscale images by first applying a thresholding operation to binarize the image.[2] CCL algorithms assign unique labels to each connected component, often using provisional assignments and equivalence resolution in multi-pass methods, ensuring each component receives a unique, final identifier from 1 to N, where N is the total number of components.[1]
This partitioning relies on an equivalence relation induced by pixel connectivity: two pixels p and q are equivalent if there exists a path of adjacent foreground pixels connecting them, with adjacency defined by the chosen connectivity model. The connected components then correspond to the equivalence classes under this relation, forming a complete and disjoint segmentation of the foreground pixels. A binary input image is formally denoted as a function f(x,y) \in \{0,1\}, where (x,y) are pixel coordinates, 0 represents background, and 1 represents foreground; the output is a labeled image g(x,y), where g(x,y) = 0 for background pixels and g(x,y) \in \{1, 2, \dots, N\} for foreground pixels belonging to the g(x,y)-th component.[1]
Connected-component analysis (CCA) builds upon CCL by incorporating the measurement of geometric and statistical properties for each labeled component, such as its area (number of pixels), centroid (average position), and bounding box (minimal enclosing rectangle). These properties enable further quantitative analysis, like feature extraction for object recognition, but CCA presupposes the completion of the labeling process.[11][1]
Connectivity Models
In connected-component labeling for binary digital images, connectivity defines whether pixels belong to the same component based on adjacency rules. Formally, for a pixel p = (x, y) in the foreground, the neighborhood N(p) consists of adjacent foreground pixels according to a specified model. Two foreground pixels are connected if there exists a path between them where each consecutive pair shares a neighborhood relation within N(p).[12]
The 4-connectivity model, also known as von Neumann neighborhood, considers a pixel connected only to its four orthogonal neighbors: north (N), south (S), east (E), and west (W), corresponding to positions (x-1, y), (x+1, y), (x, y-1), and (x, y+1). This model excludes diagonal adjacencies, making it suitable for shapes without diagonal extensions, such as grid-aligned structures. Mathematically, two pixels (i_1, j_1) and (i_2, j_2) are 4-connected if there is a path where each step changes exactly one coordinate by ±1 (Manhattan distance =1).[12][2]
In contrast, the 8-connectivity model, based on the Moore neighborhood, includes all eight surrounding pixels: the four orthogonal neighbors plus the four diagonals (NE, NW, SE, SW) at positions (x-1, y-1), (x-1, y+1), (x+1, y-1), and (x+1, y+1). This is the standard choice for most applications in image processing, as it better captures natural, irregularly shaped objects by allowing diagonal connections. Formally, an 8-path between pixels requires \max(|i_r - i_{r+1}|, |j_r - j_{r+1}|) = 1 for each step (excluding identical pixels).[12][2]
The choice between 4-connectivity and 8-connectivity significantly affects labeling outcomes. Under 4-connectivity, diagonally adjacent foreground pixels may form separate components, potentially increasing the total number of detected objects—for instance, a diagonal chain might split into multiple components. Conversely, 8-connectivity can merge background regions across diagonals, which may lead to fewer but larger foreground components if background connectivity is not dual-managed (e.g., using 4-connectivity for the background complement to preserve topology). This duality ensures consistent topological properties, such as Euler numbers, in binary images.[12][2]
Edge cases arise with image borders and isolated pixels. Pixels at the borders have truncated neighborhoods, including only valid adjacent positions within the image bounds; for example, a corner pixel in 8-connectivity has only three neighbors. Isolated foreground pixels, lacking any neighbors in the chosen model, form singleton components and receive unique labels. These cases are handled inherently by restricting N(p) to the image domain, ensuring no external connections are assumed.[12][13]
Algorithms
One-Component-at-a-Time
The one-component-at-a-time algorithm for connected-component labeling, also known as the flood-fill or seed-fill method, identifies and labels each connected component in a binary image sequentially by fully processing one component before proceeding to the next. It begins by scanning the image in raster order—left-to-right and top-to-bottom—to locate unlabeled foreground pixels (typically represented as 1s). Upon finding such a pixel, the algorithm assigns it a unique label from an incrementing counter and then propagates this label to all connected foreground pixels within the same component using a graph traversal technique. This approach treats the image as an implicit graph where foreground pixels are nodes and edges connect neighboring pixels based on the chosen connectivity model (4- or 8-connected). Developed as a foundational technique in early computer vision, it emerged in the 1970s alongside pioneering work on digital picture processing and graph-based image analysis.
The algorithm's steps are straightforward and rely on a single linear scan interrupted by traversals:
- Initialize a label counter to 1 and ensure all pixels are marked as unlabeled (e.g., via a separate visited array or by setting labels to 0).
- Iterate through each pixel in row-major order.
- If the current pixel is an unlabeled foreground pixel, assign it the current label value and increment the counter.
- Perform a depth-first search (DFS) or breadth-first search (BFS) starting from this seed pixel to label all connected unlabeled foreground pixels with the same value, marking them as processed to prevent revisits.
For the traversal, DFS explores as far as possible along each branch before backtracking, while BFS explores level by level. Data structures are minimal: a stack implements iterative DFS (or recursion can be used, though it risks stack overflow for large components), and a queue implements BFS. Unlike more advanced methods, no equivalence tables or provisional labels are required, as each component receives its final label immediately upon discovery.[14]
This method's simplicity offers key advantages, including ease of implementation and intuitive operation, making it suitable for educational purposes or small-scale applications. It ensures accurate labeling without the complexity of resolving label equivalences, directly producing a final output where connected pixels share identical labels. Historically, it provided the basis for integrating graph traversal concepts from computer science into vision tasks, influencing subsequent developments in image segmentation during the 1970s.
However, the algorithm has notable limitations, particularly for efficiency on larger images. While its time complexity is O(N—with N as the total number of pixels, since each pixel is examined and potentially enqueued/dequeued a constant number of times—the practical constant factor is elevated due to traversal overhead and neighborhood checks during propagation. This makes it less suitable for dense or high-resolution images, where the repeated access to scattered memory locations can lead to poor cache performance. Additionally, for very large components, BFS queues or DFS stacks may consume significant memory.[14]
Two-Pass Labeling
The two-pass labeling algorithm is a sequential method for connected-component labeling that processes a binary image in two raster scans to assign unique labels to connected components, minimizing memory usage while achieving linear time performance. Introduced in the foundational work on digital picture processing, this approach first assigns provisional labels during a forward scan and then resolves equivalences in a second scan. It is particularly efficient for 4- or 8-connectivity models and forms the basis for many practical implementations in image analysis.[10]
In the first pass, the algorithm scans the image from top to bottom and left to right. For each foreground pixel (typically value 1), it examines its relevant neighbors to assign a provisional label. For 8-connectivity, the checked neighbors are the north (N), northwest (NW), and west (W) positions, as these are the only ones that could have been labeled earlier in the scan order. If none of these neighbors is a foreground pixel with a label, a new provisional label is assigned (incrementing a global label counter). If exactly one neighbor has a label, that label is assigned to the current pixel. If multiple neighbors have different labels, the smallest label is assigned, and equivalences between the differing labels are recorded to indicate they belong to the same component. This process ensures that connected foreground pixels receive compatible provisional labels without immediate resolution.[15]
Equivalences are managed using a union-find data structure, which efficiently tracks sets of provisional labels that represent the same component. The structure consists of a parent array where each provisional label points to its representative (root) parent, initially itself. The find-root operation traces the parent chain to the root, often with path compression to optimize subsequent lookups by setting each node directly to the root. The union operation merges two sets by linking the root of one to the root of the other, typically using union-by-rank or union-by-size to maintain balance and near-constant time performance (amortized O(α(n)), where α is the inverse Ackermann function). During the first pass, when multiple neighbor labels are found, the algorithm performs union operations on their roots after assigning the minimum label. This avoids redundant labels while deferring full resolution.[15]
The second pass, typically another forward or backward raster scan, replaces each provisional label with its final equivalent by performing a find-root operation on the union-find structure. Each foreground pixel's label is updated to the root of its provisional label's set, ensuring all pixels in a connected component share the same unique label. To produce consecutive labels starting from 1, the roots can be remapped to sequential integers after identifying all unique roots. This pass handles any remaining conflicts from the first pass by propagating the resolved labels across the image. For 8-connectivity, the neighbor checks in the first pass suffice, as the union-find merges ensure global consistency without additional conflict resolution in the second pass.[15]
The algorithm's time complexity is O(N, where N is the number of pixels, due to constant-time operations per pixel across both passes (with union-find's near-linear amortized cost). Space complexity is O(N in the worst case for the label array and union-find parent array, but practically O(M) where M is the number of provisional labels (bounded by the number of components, often much smaller than N). This makes it suitable for large images compared to recursive methods.[15]
A high-level pseudocode outline for the algorithm (assuming 8-connectivity and a 2D image array img of size rows × cols, with label array for output, and union-find functions find and union_sets) is as follows:
# Initialization
next_label = 1
parent = [] # Union-find parent [array](/page/Array)
[rank](/page/Rank) = [] # For union-by-[rank](/page/Rank)
# First pass
for i from 0 to rows-1:
for j from 0 to cols-1:
if img[i][j] == 1: # Foreground [pixel](/page/Pixel)
candidates = []
if i > 0 and label[i-1][j] > 0: # N [neighbor](/page/Neighbor)
candidates.append(label[i-1][j])
if i > 0 and j > 0 and label[i-1][j-1] > 0: # NW [neighbor](/page/Neighbor)
candidates.append(label[i-1][j-1])
if j > 0 and label[i][j-1] > 0: # W [neighbor](/page/Neighbor)
candidates.append(label[i][j-1])
if candidates empty:
label[i][j] = next_label
parent.append(next_label)
[rank](/page/Rank).append(0)
next_label += 1
else:
min_label = min(candidates)
label[i][j] = min_label
for c in candidates:
if c != min_label:
union_sets(find(c), find(min_label))
# Second pass (after union-find resolution)
for i from 0 to rows-1:
for j from 0 to cols-1:
if label[i][j] > 0:
label[i][j] = find(label[i][j]) # Replace with root
# Optional: Remap roots to 1, 2, ..., num_components
# Initialization
next_label = 1
parent = [] # Union-find parent [array](/page/Array)
[rank](/page/Rank) = [] # For union-by-[rank](/page/Rank)
# First pass
for i from 0 to rows-1:
for j from 0 to cols-1:
if img[i][j] == 1: # Foreground [pixel](/page/Pixel)
candidates = []
if i > 0 and label[i-1][j] > 0: # N [neighbor](/page/Neighbor)
candidates.append(label[i-1][j])
if i > 0 and j > 0 and label[i-1][j-1] > 0: # NW [neighbor](/page/Neighbor)
candidates.append(label[i-1][j-1])
if j > 0 and label[i][j-1] > 0: # W [neighbor](/page/Neighbor)
candidates.append(label[i][j-1])
if candidates empty:
label[i][j] = next_label
parent.append(next_label)
[rank](/page/Rank).append(0)
next_label += 1
else:
min_label = min(candidates)
label[i][j] = min_label
for c in candidates:
if c != min_label:
union_sets(find(c), find(min_label))
# Second pass (after union-find resolution)
for i from 0 to rows-1:
for j from 0 to cols-1:
if label[i][j] > 0:
label[i][j] = find(label[i][j]) # Replace with root
# Optional: Remap roots to 1, 2, ..., num_components
This outline captures the core steps, with full implementations requiring careful indexing and initialization of union-find operations.[15]
Parallel Variants
In the early 1990s, parallel variants of connected-component labeling (CCL) emerged to address the computational demands of large-scale image processing on multiprocessor systems. These approaches typically employed domain decomposition, partitioning the image into independent strips or blocks assigned to separate processors for local labeling. After local processing, border synchronization steps resolved equivalences between adjacent subimages by exchanging and merging provisional labels along shared boundaries, ensuring global consistency. This method, often implemented on mesh-connected or hypercube architectures, reduced sequential bottlenecks but required careful load balancing to minimize communication overhead.[16][1]
Modern implementations have seen a resurgence post-2010 with GPU-based acceleration using CUDA, enabling massive parallelism for high-resolution images. These variants adapt traditional two-pass labeling to single-pass schemes on GPUs, leveraging shared memory for efficient provisional label assignment and union-find operations within thread blocks. For instance, algorithms propagate labels across pixels in a wavefront manner, using atomic operations or warp-level primitives to handle neighborhood checks in parallel. Distributed versions extend this to big data scenarios, such as satellite imagery analysis, by decomposing images across clusters and synchronizing via message passing for cross-node equivalences.[1]
Key challenges in parallel CCL include race conditions during simultaneous label assignments, which can lead to inconsistent provisional labels in concurrent accesses to shared data structures. Solutions mitigate this through prefix-sum operations for compact label allocation, avoiding conflicts by precomputing unique indices, or hierarchical union-find structures that perform root finding in logarithmic steps across processors. OpenCV's connectedComponentsWithAlgorithm function incorporates parallel CCL via multi-threading for CPU execution, supporting both block-based and run-length encodings for efficiency on varied hardware.[1]
Performance gains from these variants are substantial, achieving speedups of 10-100x over sequential methods on multi-core CPUs and GPUs, particularly for images exceeding 1 megapixel. Theoretical complexity often approaches O(N/p + log N), where N is the number of pixels and p is the number of processors, dominated by local processing and logarithmic synchronization phases. As of 2025, parallel CCL integrates seamlessly into deep learning pipelines for real-time segmentation, serving as post-processing to refine binary masks from neural networks in edge AI devices, enhancing applications like autonomous robotics and medical imaging.[1][17]
Examples
Graphical Illustration
To illustrate the execution of connected-component labeling algorithms, consider a step-by-step graphical example using the two-pass method on a 5x5 binary image, where foreground pixels (1) form two distinct components under 8-connectivity rules. The input image is shown below, with 1 representing foreground pixels and 0 background:
| 1 | 2 | 3 | 4 | 5 |
|---|
| 1 | 0 | 0 | 0 | 0 | 0 |
| 2 | 0 | 1 | 1 | 0 | 0 |
| 3 | 0 | 1 | 0 | 1 | 0 |
| 4 | 0 | 0 | 1 | 1 | 0 |
| 5 | 0 | 0 | 0 | 0 | 1 |
In the first pass, the image is scanned from top to bottom and left to right. Provisional labels are assigned to foreground pixels based on neighboring labels in the forward mask (the pixel itself, left, above-left, above, and above-right for 8-connectivity). New labels are issued when no labeled neighbors exist, and equivalences are recorded using a union-find structure for any conflicting provisional labels. The resulting provisional labeling is:
| 1 | 2 | 3 | 4 | 5 |
|---|
| 1 | 0 | 0 | 0 | 0 | 0 |
| 2 | 0 | 1 | 1 | 0 | 0 |
| 3 | 0 | 1 | 0 | 1 | 0 |
| 4 | 0 | 0 | 1 | 1 | 0 |
| 5 | 0 | 0 | 0 | 0 | 3 |
Here, the component at positions (2,2)-(2,3)-(3,2)-(3,4) receives label 1, with (3,4) assigned 1 based on its above-left neighbor at (2,3); the diagonal-connected pixels at (4,3)-(4,4) connect to label 1 via above-left (3,2) and above-right (3,4) neighbors; and the isolated pixel at (5,5) gets label 3. No equivalences are recorded in this example.
The equivalence resolution step uses union-find, but with no merges needed, labels remain as provisional roots. In the second pass, scanning from top to bottom and left to right, each provisional label is replaced by its root equivalent from the union-find structure. To assign consecutive unique labels, roots are mapped (e.g., 3 to 2). The final output assigns unique labels to each connected component:
| 1 | 2 | 3 | 4 | 5 |
|---|
| 1 | 0 | 0 | 0 | 0 | 0 |
| 2 | 0 | 1 | 1 | 0 | 0 |
| 3 | 0 | 1 | 0 | 1 | 0 |
| 4 | 0 | 0 | 1 | 1 | 0 |
| 5 | 0 | 0 | 0 | 0 | 2 |
This resolves any potential conflicts, yielding one main component (label 1) and one small isolated component (label 2).
The 8-connectivity model is particularly evident in the diagonal connections, where pixels like (3,4) connect via above-left neighbor and (4,3) via above-left and above-right neighbors, ensuring the L-shaped component spanning rows 2-4 is treated as a single entity. Without 8-connectivity, this would split into separate components under 4-connectivity rules.
Common pitfalls in the first pass often arise from corner or edge pixels with partial neighbor visibility, leading to temporary label fragmentation in more complex images. For instance, a foreground pixel on the edge might temporarily appear disconnected if its diagonal neighbor is not yet resolved, but the algorithm's forward mask and union-find prevent over-labeling by proper neighbor checks. This visualization highlights how the two-pass process ensures correct labeling without fragmentation in this case.
For the one-component-at-a-time variant, a sequential traversal (e.g., flood-fill) processes each component individually starting from an unvisited foreground pixel. Consider a simple 3x3 circular shape binary image:
Starting at (1,2) with label 1, a depth-first or breadth-first search visits all 8-connected neighbors recursively, labeling the entire disk as 1 in one traversal, then moves to the next unvisited foreground pixel for subsequent components. This approach suits sparse images but requires a visited array to avoid reprocessing.
Such graphical illustrations are commonly employed in teaching and debugging via tools like MATLAB's bwlabel function, which generates labeled images and visualizations for verifying connectivity and label counts in binary data.
Pseudocode
Connected-component labeling algorithms are often implemented using pseudocode that outlines the core logic in a language-agnostic manner. This section presents pseudocode for the primary sequential approaches: the one-component-at-a-time method, which propagates labels through depth-first or breadth-first search, and the two-pass method, which resolves label equivalences using union-find data structures. These assume a 2D binary image represented as a grid where foreground pixels are 1 and background are 0, with connectivity (4- or 8-neighbor) specified as a parameter. For large images, implementations must consider memory constraints, as the label array requires additional space proportional to the image size, potentially leading to out-of-memory errors on resource-limited systems.
One-Component-at-a-Time Algorithm
This algorithm initializes a label counter and scans the image row-by-row, left-to-right. Upon encountering an unlabeled foreground pixel, it assigns a new label and propagates it to all connected neighbors using a stack-based DFS (or queue-based BFS) subroutine. The connectivity parameter determines the neighbor offsets checked (e.g., {(0,-1), (-1,0), (0,1), (1,0)} for 4-connectivity or including diagonals for 8-connectivity). This approach ensures each component is labeled in a single traversal but may require recursion or stack space up to the component size.
function OneComponentAtATime(image, height, width, connectivity):
labels = 2D array of size height x width, initialized to 0
label_counter = 1
neighbor_offsets = get_offsets(connectivity) // e.g., 4 or 8 directions
for y from 0 to height-1:
for x from 0 to width-1:
if image[y][x] == 1 and labels[y][x] == 0:
labels[y][x] = label_counter
stack = push((y, x))
while stack is not empty:
(cy, cx) = stack.pop()
for each (dy, dx) in neighbor_offsets:
ny = cy + dy
nx = cx + dx
if 0 <= ny < height and 0 <= nx < width and image[ny][nx] == 1 and labels[ny][nx] == 0:
labels[ny][nx] = label_counter
stack.push((ny, nx))
label_counter += 1
return labels
function OneComponentAtATime(image, height, width, connectivity):
labels = 2D array of size height x width, initialized to 0
label_counter = 1
neighbor_offsets = get_offsets(connectivity) // e.g., 4 or 8 directions
for y from 0 to height-1:
for x from 0 to width-1:
if image[y][x] == 1 and labels[y][x] == 0:
labels[y][x] = label_counter
stack = push((y, x))
while stack is not empty:
(cy, cx) = stack.pop()
for each (dy, dx) in neighbor_offsets:
ny = cy + dy
nx = cx + dx
if 0 <= ny < height and 0 <= nx < width and image[ny][nx] == 1 and labels[ny][nx] == 0:
labels[ny][nx] = label_counter
stack.push((ny, nx))
label_counter += 1
return labels
Two-Pass Algorithm
The two-pass algorithm performs a forward raster scan to assign provisional labels and detect equivalences, followed by a backward pass to resolve them using union-find for efficient root finding and merging. During the first pass, neighbor checks use the same connectivity parameter, recording equivalences when adjacent pixels have different provisional labels. The union-find structure supports path compression and union-by-rank for near-constant time operations. A second forward pass then assigns the final root label to each pixel. This method is memory-efficient compared to recursive propagation but requires an equivalence table or union-find array sized to the maximum provisional label.
function UnionFind(parent, rank):
function find(x):
if parent[x] != x:
parent[x] = find(parent[x]) // path compression
return parent[x]
function union(x, y):
px = find(x)
py = find(y)
if px != py:
if rank[px] > rank[py]:
parent[py] = px
elif rank[px] < rank[py]:
parent[px] = py
else:
parent[py] = px
rank[px] += 1
function TwoPass(image, height, width, connectivity):
labels = 2D array of size height x width, initialized to 0
forward_offsets = get_forward_offsets(connectivity) // e.g., left, above-left, above, above-right for 8-connectivity
max_label = 0
parent = [] // will resize as needed
rank = []
// Initialize union-find as labels are created
for y from 0 to height-1:
for x from 0 to width-1:
if image[y][x] == 1:
connected = set()
for each (dy, dx) in forward_offsets:
ny = y + dy
nx = x + dx
if 0 <= ny < height and 0 <= nx < width and labels[ny][nx] > 0:
connected.add(labels[ny][nx])
if not connected:
max_label += 1
while len(parent) <= max_label:
parent.append(len(parent))
rank.append(0)
labels[y][x] = max_label
else:
min_label = min(connected)
labels[y][x] = min_label
for l in connected:
if l != min_label:
union(min_label, l)
// Second pass: resolve labels
for y from 0 to height-1:
for x from 0 to width-1:
if labels[y][x] > 0:
labels[y][x] = find(labels[y][x])
// Optional: renumber to consecutive labels
root_to_label = {}
new_label = 1
for y from 0 to height-1:
for x from 0 to width-1:
if labels[y][x] > 0:
root = labels[y][x]
if root not in root_to_label:
root_to_label[root] = new_label
new_label += 1
labels[y][x] = root_to_label[root]
return labels
function UnionFind(parent, rank):
function find(x):
if parent[x] != x:
parent[x] = find(parent[x]) // path compression
return parent[x]
function union(x, y):
px = find(x)
py = find(y)
if px != py:
if rank[px] > rank[py]:
parent[py] = px
elif rank[px] < rank[py]:
parent[px] = py
else:
parent[py] = px
rank[px] += 1
function TwoPass(image, height, width, connectivity):
labels = 2D array of size height x width, initialized to 0
forward_offsets = get_forward_offsets(connectivity) // e.g., left, above-left, above, above-right for 8-connectivity
max_label = 0
parent = [] // will resize as needed
rank = []
// Initialize union-find as labels are created
for y from 0 to height-1:
for x from 0 to width-1:
if image[y][x] == 1:
connected = set()
for each (dy, dx) in forward_offsets:
ny = y + dy
nx = x + dx
if 0 <= ny < height and 0 <= nx < width and labels[ny][nx] > 0:
connected.add(labels[ny][nx])
if not connected:
max_label += 1
while len(parent) <= max_label:
parent.append(len(parent))
rank.append(0)
labels[y][x] = max_label
else:
min_label = min(connected)
labels[y][x] = min_label
for l in connected:
if l != min_label:
union(min_label, l)
// Second pass: resolve labels
for y from 0 to height-1:
for x from 0 to width-1:
if labels[y][x] > 0:
labels[y][x] = find(labels[y][x])
// Optional: renumber to consecutive labels
root_to_label = {}
new_label = 1
for y from 0 to height-1:
for x from 0 to width-1:
if labels[y][x] > 0:
root = labels[y][x]
if root not in root_to_label:
root_to_label[root] = new_label
new_label += 1
labels[y][x] = root_to_label[root]
return labels
Parallel Variant: Prefix-Sum Label Allocation on GPU
For GPU implementations, a parallel variant adapts the two-pass approach by distributing pixel processing across threads and using a prefix-sum operation to allocate unique final labels to equivalence class roots after equivalence resolution. Threads initially assign provisional labels in parallel, detect local equivalences, and compact roots; a global prefix sum then renumbers them uniquely, avoiding atomic conflicts. This leverages CUDA's parallel scan primitives for efficiency on large images.[18]
kernel PrefixSumAllocation(labels, num_pixels): // Simplified GPU [kernel](/page/Kernel) outline
// Phase 1: Local provisional labeling and equivalence detection per thread block
// (Omitted: thread-local union-find and compaction)
// Phase 2: Global [prefix sum](/page/Prefix_sum) on root counts
roots = compact_roots(labels) // Array of unique roots
prefix_sums = parallel_scan(roots, num_roots) // [CUDA](/page/CUDA) [prefix sum](/page/Prefix_sum)
// Phase 3: Assign final labels
for each pixel in parallel:
root = find(labels[pixel])
labels[pixel] = prefix_sums[root_index(root)]
kernel PrefixSumAllocation(labels, num_pixels): // Simplified GPU [kernel](/page/Kernel) outline
// Phase 1: Local provisional labeling and equivalence detection per thread block
// (Omitted: thread-local union-find and compaction)
// Phase 2: Global [prefix sum](/page/Prefix_sum) on root counts
roots = compact_roots(labels) // Array of unique roots
prefix_sums = parallel_scan(roots, num_roots) // [CUDA](/page/CUDA) [prefix sum](/page/Prefix_sum)
// Phase 3: Assign final labels
for each pixel in parallel:
root = find(labels[pixel])
labels[pixel] = prefix_sums[root_index(root)]
Implementation notes emphasize thread synchronization for neighbor access in GPUs and warn that for images exceeding available VRAM (e.g., >10^9 pixels), tiled processing or out-of-core methods are necessary to avoid crashes.
Evaluation Frameworks
Evaluation frameworks for connected-component labeling (CCL) algorithms primarily focus on assessing efficiency, correctness, and robustness across diverse scenarios. Key metrics include execution time, which measures the overall processing speed on binary images; memory usage, evaluating auxiliary data structures like equivalence tables or label arrays; labeling accuracy, ensuring correct assignment of unique labels to equivalent pixels while verifying equivalence relations; and scalability, examining performance as image dimensions increase or component counts vary. These metrics enable standardized comparisons, with hardware-agnostic approaches emphasizing algorithmic efficiency independent of specific architectures.[19]
A prominent benchmarking framework is YACCLAB (Yet Another Connected Components Labeling Benchmark), an open-source C++ library introduced in 2016, supporting over 37 algorithms on both synthetic and real-world 2D/3D images. YACCLAB tests algorithms using datasets like MIRflickr for natural scenes, Fingerprints for textured patterns, and Medical for volumetric data, providing metrics such as average runtime, memory accesses, and correctness verification through ground-truth comparisons. It facilitates reproducible evaluations by including varied image characteristics, including noise and component densities, and supports both CPU and GPU implementations.[20][19][21]
Comparative analyses via YACCLAB and similar tools reveal that two-pass labeling algorithms typically outperform one-component-at-a-time approaches by factors of 2-5x in execution time on raster-scan binary images, due to efficient forward and backward passes that process the entire image sequentially rather than iteratively per component. Parallel variants, particularly GPU-based implementations like CUDA-KE (Komura Equivalence), achieve 5-10x speedups over sequential CPU methods on large datasets, with further gains up to 50x reported for optimized block-based algorithms on high-density images, highlighting the benefits of massive parallelism in handling inter-pixel dependencies. These comparisons underscore the trade-offs: two-pass methods excel in memory efficiency for dense components, while parallel versions scale better for high-resolution inputs.[1][22][23]
Testing methodologies in these frameworks systematically vary parameters to probe algorithm robustness, such as 4- vs. 8-connectivity models, noise levels from 0-20% pixel corruption, and component densities ranging from sparse (few large blobs) to dense (thousands of small regions). Hardware-agnostic metrics, like instructions per pixel or cache hit rates, ensure fair assessments across platforms, often using synthetic generators to control granularity and foreground occupancy. Real-image tests incorporate domain-specific challenges, such as varying contrasts in fingerprints or volumetric noise in medical scans, to evaluate practical performance.[19][24]
Recent developments integrate CCL into hybrid deep learning pipelines, where neural segmentation (e.g., U-Net for binary masks) is followed by post-processing CCL for precise labeling in applications like document analysis and inverse design.[25][26]
Hardware Architectures
Field-programmable gate arrays (FPGAs) offer customizable hardware architectures for accelerating connected-component labeling (CCL), particularly through single-pass designs that enable streaming processing for real-time video analysis. These implementations process binary images row-by-row in a single scan, resolving label equivalences on-the-fly to avoid multiple memory accesses.[27]
Pipelined scanning architectures on FPGAs divide the image into overlapping windows, propagating provisional labels through hardware pipelines while using on-chip union-find data structures for efficient merging. Equivalence tables and resolution stacks manage temporary labels, with root finding performed via path compression to minimize conflicts. A 2021 design on Xilinx Zynq UltraScale+ MPSoC supports 2 or 4 pixels per clock cycle, achieving real-time CCL for 4K-resolution (3840×2160) images at 60 frames per second; the 4-pixel variant operates at 150 MHz with 21% lower power consumption compared to the 2-pixel version at 300 MHz.[27] Earlier single-pass prototypes from 2018 similarly emphasize online conflict resolution via equivalence tables, enabling high-throughput labeling without buffering entire images.[28]
Throughput gains from FPGA designs significantly outperform CPU-based methods; for instance, a run-length encoding approach with 30 parallel cores on FPGA delivers over 108 times the speed of multi-threaded CPU implementations for large binary images.[29] A 2015 FPGA architecture tailored for high-resolution images with complex, heterogeneous structures uses a modified two-pass method optimized for low on-chip memory (under 1 MB), supporting up to 8K×8K resolutions at frame rates exceeding 30 fps on mid-range devices.[30]
Application-specific integrated circuits (ASICs) and custom hardware target CCL in power-constrained embedded systems, such as medical imaging devices and Internet of Things (IoT) sensors, where size, energy efficiency, and integration with other processing units are paramount. These designs often incorporate dedicated labeling cores with fixed-function pipelines for scanning and equivalence resolution, minimizing gate count while supporting 4- or 8-connectivity models. A 2014 ASIP (application-specific instruction-set processor) architecture for embedded CCL provides configurable parallelism levels, allowing trade-offs between throughput and silicon area for integration into system-on-chip platforms.[31]
Low-power variants prioritize sub-threshold operation and clock gating to extend battery life in portable devices; for example, a heterogeneous architecture suitable for medical applications achieves labeling speeds of 100-200 MHz while consuming under 50 mW, enabling on-device processing in ultrasound or wearable imaging systems without offloading to external processors.[32] In IoT contexts, custom ASICs integrate CCL with sensor interfaces, using run-length compression to reduce data movement and power draw to microwatt levels for always-on object detection in edge environments.[32]
Graphics processing units (GPUs) leverage massive parallelism via CUDA kernels for CCL on large-scale images, partitioning the workload into thread blocks that process image blocks independently before global resolution. Block-based approaches assign threads to local neighborhoods, using shared memory for intra-block unions and atomic operations on global memory for inter-block merges, which handle label propagation across boundaries. This avoids full image buffering and supports scalable labeling for volumes up to gigapixel sizes.[33]
Optimized algorithms like Block-based Union-Find (BUF) and Block-based Komura Equivalence (BKE) enhance eight-connectivity handling on NVIDIA GPUs by minimizing atomic contention through hierarchical resolution: local union-find within blocks followed by batched global updates. These methods achieve up to 10-20× speedup over baseline GPU CCL for synthetic and real images, with BUF excelling in dense components due to efficient path halving.[34] Direct GPU algorithms further integrate analysis features, using CUDA atomics for tentative label assignment and synchronization barriers for coherency, reducing passes to one while supporting connected component analysis (e.g., feature extraction) in a single kernel launch.[35]
Hybrid CPU-GPU systems offload the parallel scanning and provisional labeling to GPUs, reserving the CPU for final equivalence resolution and output formatting, which mitigates data transfer overhead in iterative workflows. A 2024 hybrid algorithm employs union-find with path compression tailored for sparse binary data, processing zero-suppressed encodings to cut memory bandwidth by 50% and enabling real-time CCL on datasets from particle physics imaging.[36] This offloading strategy scales to multi-GPU setups, where CPUs coordinate block partitioning for images exceeding single-device memory limits.
Hardware architectures for CCL encounter persistent challenges in synchronization latency and power efficiency, particularly in parallel designs where inter-component merges require atomic updates or barriers that can serialize execution. On GPUs, atomic operations for union-find introduce contention in high-density regions, inflating latency by up to 2-5× during global resolution phases despite overall throughput gains.[35] FPGA and ASIC implementations must optimize pipeline stalls from label propagation, often trading clock frequency for reduced dynamic power (e.g., targeting 100-200 mW for embedded use), while ensuring thermal efficiency in integrated systems.[27]