Single-linkage clustering

Single-linkage clustering is an agglomerative hierarchical clustering algorithm that builds a nested hierarchy of clusters by starting with each data point as an individual cluster and iteratively merging the two closest clusters, where the distance between clusters is defined as the minimum pairwise distance between any point in one cluster and any point in the other.^[1] This method, also known as nearest-neighbor clustering, produces a tree-like structure known as a dendrogram that visualizes the progressive merging process and allows for the extraction of clusters at various levels of granularity.^[2] The algorithm operates in a bottom-up fashion: after initializing singleton clusters, it computes all pairwise distances, identifies and merges the pair with the smallest inter-cluster distance, updates the distance matrix to reflect the new cluster (using the minimum distance criterion to all other clusters), and repeats until only one cluster remains.^[1] This process has a time complexity of O(n³) for n data points in its naive implementation, due to the repeated scanning of the distance matrix to find the minimum; optimized algorithms can achieve O(n²).^[3] A key property is its equivalence to constructing the minimum spanning tree (MST) of a complete graph where nodes are data points and edge weights are distances; the dendrogram heights correspond to the MST edge weights added in sorted order.^[1] Originating in the early 1950s, single-linkage clustering was first introduced by Florek et al. in 1951 as a method for taxonomic classification, with independent reinventions by McQuitty in 1957 and Sneath in 1957.^[4] Despite its simplicity, the algorithm is prone to the chaining effect, where noise or outliers cause elongated, snake-like clusters that may not reflect compact, intuitive groupings, as it emphasizes local connectivity over global structure.^[1] To mitigate this, variants like complete-linkage or average-linkage use maximum or mean distances instead, but single-linkage remains valuable for detecting irregularly shaped or density-based clusters.^[2] Single-linkage clustering finds applications in fields such as bioinformatics for phylogenetic tree construction, document retrieval for grouping similar texts based on proximity in feature space, and pattern recognition where elongated structures are expected, such as in spatial data analysis.^[1] Its sensitivity to outliers underscores the importance of preprocessing, like distance thresholding, to enhance robustness in practical implementations.^[5]

Background and Overview

Hierarchical Clustering Fundamentals

Hierarchical clustering is a fundamental technique in cluster analysis that constructs a hierarchy of nested clusters from a set of data points, either through an agglomerative process that merges smaller clusters into larger ones in a bottom-up manner or a divisive process that splits larger clusters into smaller ones in a top-down manner.^[6] This approach allows for the exploration of data at multiple levels of granularity, revealing relationships that flat partitioning methods might overlook.^[7] The primary output of hierarchical clustering is a dendrogram, a branching diagram that visually represents the hierarchical structure as a tree, where the height of branches corresponds to the level of similarity or dissimilarity at which clusters merge or split, thereby encoding nested subsets of the data.^[6] This tree-like representation facilitates the identification of clusters at any desired resolution by cutting the dendrogram at appropriate heights.^[7] In unsupervised learning, hierarchical clustering serves key use cases such as discovering underlying data structures in domains like pattern recognition, biological taxonomy, and information retrieval, particularly when the number of clusters is unknown or variable.^[6] Unlike partitioning algorithms that require a predefined number of clusters, it provides a multi-scale view of data organization without such constraints, aiding exploratory analysis.^[7] Essential parameters in hierarchical clustering include the selection of a distance metric to quantify similarity between individual data points—such as the Euclidean distance for continuous features or the Manhattan distance for grid-like data—and a linkage criterion to define distances between clusters during the hierarchical construction.^[6] Single-linkage clustering represents one specific agglomerative approach within this framework.^[8]

Agglomerative Clustering Methods

Agglomerative clustering represents a bottom-up strategy within hierarchical clustering, where the process initiates with each data point forming its own singleton cluster and proceeds by iteratively merging the most similar pairs of clusters until a single encompassing cluster remains. This approach constructs a hierarchy of clusters that can be visualized as a dendrogram, capturing nested relationships among data points based on their proximity. The method is particularly suited for datasets where the underlying structure is hierarchical, allowing for flexible cut-off levels to derive flat partitions of varying granularity.^[2] The general algorithm for agglomerative clustering follows these steps:

Initialization: Assign each of the n data points to its own cluster, yielding n singleton clusters. Compute and store a distance matrix containing the pairwise distances between all points, typically using a metric such as Euclidean distance.^[9]
Distance computation and selection: In each iteration, identify the pair of clusters with the smallest inter-cluster distance, as defined by the chosen linkage criterion.^[10]
Merging: Combine the two closest clusters into a new cluster, reducing the total number of clusters by one.^[2]
Matrix update: Recalculate and update the distance matrix to include distances from the newly formed cluster to all remaining clusters, applying the linkage criterion to aggregate distances between cluster members.^[11]
Termination: Repeat steps 2–4 until only one cluster persists or a predefined stopping condition (e.g., desired number of clusters) is met.^[9]

A distance matrix plays a central role in tracking inter-cluster similarities throughout the process; it begins as an n \times n symmetric matrix of point-to-point distances and is progressively updated to reflect cluster-to-cluster distances after each merge, ensuring efficient identification of the next closest pair. This matrix maintenance is essential for scalability, though naive implementations can become computationally intensive for large datasets.^[10] Linkage criteria serve as the primary differentiator among agglomerative methods, specifying how to measure the "closeness" between clusters during distance updates—for instance, by considering the minimum, maximum, or average pairwise distances between their elements. Single-linkage exemplifies a minimum-distance-based criterion in this framework.^[11] Agglomerative clustering methods rose to prominence in the 1960s and 1970s, driven by advancements in numerical taxonomy for biological classification and emerging applications in bioinformatics, as exemplified by foundational works like Sneath and Sokal's principles of numerical taxonomy.

Definition and Properties

Core Definition

Single-linkage clustering, also known as nearest-neighbor clustering, is a hierarchical agglomerative clustering method that defines the distance between two clusters C_1 and C_2 as the minimum distance between any pair of points, with one point from each cluster:

d(C_1, C_2) = \min \{ d(x, y) \mid x \in C_1, y \in C_2 \}

where d(x, y) is the distance metric (typically Euclidean) between individual points x and y.^[12] This criterion emphasizes the closest connections between clusters, making it sensitive to local structure in the data.^[13] The use of the minimum distance often results in "chaining" behavior, where clusters elongate and connect distant points through chains of intermediate points that are successively closer, potentially forming snake-like structures rather than compact groups.^[12] Upon merging two clusters A and B into a new cluster C, the distances from C to any other cluster D are updated using the minimum operation:

d(C, D) = \min \{ d(A, D), d(B, D) \}

This update rule, part of the broader Lance-Williams framework for agglomerative clustering, ensures efficient recomputation of inter-cluster distances without reevaluating all pairwise point distances.^[13] From a graph-theoretic perspective, single-linkage clustering is equivalent to identifying connected components in the minimum spanning tree (MST) of the complete graph formed by the data points, with edge weights equal to the point distances; clusters emerge by removing edges longer than a threshold from the MST.^[14]

Mathematical Properties

Single-linkage clustering produces a dendrogram that induces an ultrametric on the data points, satisfying the ultrametric inequality d(u, v) \leq \max(d(u, w), d(v, w)) for all points u, v, w in the space, where d denotes the cophenetic distance defined by the lowest common ancestor height in the hierarchy.^[15] This property ensures the hierarchy is valid and tree-like, with the subdominant ultrametric emerging as the tightest such structure compatible with the original metric.^[16] The algorithm is mathematically equivalent to constructing the minimum spanning tree (MST) of the complete graph where vertices are data points and edge weights are pairwise distances, followed by removing edges in increasing order of weight to define cluster merges; specifically, the dendrogram levels correspond to the MST edge thresholds where connected components form clusters. This equivalence, first noted in foundational work on graph-based clustering, implies that single-linkage can be computed via efficient MST algorithms like Kruskal's, with the same partitioning outcomes at any height.^[17] In the Lance-Williams framework for recursive distance updates during agglomerative merging, single-linkage employs the formula

d_{k l} = \alpha_i d_{i l} + \alpha_j d_{j l} + \beta d_{i j} + \gamma |d_{i l} - d_{j l}|,

where clusters i and j merge into k, and l is another cluster; the parameters are \alpha_i = 0.5, \alpha_j = 0.5, \beta = 0, \gamma = -0.5. This yields d_{k l} = \min(d_{i l}, d_{j l}), preserving the nearest-neighbor merging criterion.^[18]^[19] Due to its reliance on minimum pairwise distances, single-linkage exhibits sensitivity to noise and outliers, as a single anomalous point can bridge disparate clusters via its nearest neighbor, propagating chain-like merges that distort the hierarchy. This vulnerability stems from the metric's local minima, where outliers lower the effective distance threshold for incorrect unions, often resulting in elongated or chained cluster shapes rather than compact ones. Under certain distributional assumptions, such as when data arise from a mixture of densities separated by low-probability valleys in the underlying metric space, single-linkage is a consistent estimator of the true clustering structure as the sample size grows, converging in probability to the population-level connected components defined by density thresholds.^[20] This consistency holds particularly for high-density clusters in one dimension or low-dimensional settings without pervasive noise, though it fails in higher dimensions due to the "chaining" effect amplifying inconsistencies.^[21]

Algorithm Description

Naive Algorithm Steps

The naive algorithm for single-linkage clustering implements an agglomerative hierarchical approach by starting with each data point as its own singleton cluster. For a dataset consisting of n points, an n \times n distance matrix is computed, where the entry d(i,j) represents the distance between points i and j, typically using a metric such as Euclidean distance, and diagonal entries are set to infinity or excluded to avoid self-distances.^[22] Cluster representations are maintained using lists or union structures to track the membership of points in each cluster.^[23] In each iteration, the algorithm identifies the pair of clusters with the minimum single-linkage distance, defined as the smallest distance between any point in one cluster and any point in the other. The two closest clusters are then merged into a single new cluster by unioning their point memberships. The distance matrix is updated by removing the rows and columns corresponding to the merged clusters and inserting a new row and column for the combined cluster; the new distances from this cluster to all remaining clusters are computed as the minimum of the distances from the original clusters to each remaining cluster.^[22]^[23] The process repeats for n-1 iterations until only one cluster remains, encompassing all points, or until a predefined stopping criterion, such as a maximum distance threshold, is reached. This builds a hierarchy of merges that can be represented as a dendrogram.^[22] The single-linkage criterion, originally introduced in early numerical taxonomy, ensures that merges prioritize the closest inter-cluster connections.^[24] The following pseudocode outlines the naive implementation:

Initialize:
  For each point x_i in [dataset](/page/Data_set) X of size n:
    Create singleton cluster C_i = {x_i}
  Compute [distance matrix](/page/Distance_matrix) D[n][n] where D[i][j] = [distance](/page/Distance)(x_i, x_j) for i ≠ j, D[i][i] = ∞
  Maintain a list of active clusters: active = {C_1, ..., C_n}

While number of active clusters > 1:
  Find clusters C_p and C_q in active that minimize min_{x in C_p, y in C_q} D[x][y]
  Merge: Create new cluster C_new = C_p ∪ C_q
  Remove C_p and C_q from active
  Add C_new to active
  Update D:
    For each remaining cluster C_k in active (k ≠ new):
      D[new][k] = min(D[p][k], D[q][k])
      D[k][new] = D[new][k]
    Remove rows/columns for p and q from D
    Insert row/column for new

Return: Hierarchy of merges (dendrogram root = full dataset)
Initialize:
  For each point x_i in [dataset](/page/Data_set) X of size n:
    Create singleton cluster C_i = {x_i}
  Compute [distance matrix](/page/Distance_matrix) D[n][n] where D[i][j] = [distance](/page/Distance)(x_i, x_j) for i ≠ j, D[i][i] = ∞
  Maintain a list of active clusters: active = {C_1, ..., C_n}

While number of active clusters > 1:
  Find clusters C_p and C_q in active that minimize min_{x in C_p, y in C_q} D[x][y]
  Merge: Create new cluster C_new = C_p ∪ C_q
  Remove C_p and C_q from active
  Add C_new to active
  Update D:
    For each remaining cluster C_k in active (k ≠ new):
      D[new][k] = min(D[p][k], D[q][k])
      D[k][new] = D[new][k]
    Remove rows/columns for p and q from D
    Insert row/column for new

Return: Hierarchy of merges (dendrogram root = full dataset)

^[23]^[22]

Time and Space Complexity

The naive implementation of single-linkage clustering exhibits a time complexity of O(n^3), where n is the number of data points, stemming from the need to perform n-1 merges, each involving an O(n^2) scan of the distance matrix to identify the closest cluster pair and an subsequent O(n^2) update to the matrix for the newly formed cluster.^[3] This cubic scaling arises in the general agglomerative framework applicable to single-linkage, where exhaustive searches and updates occur without specialized optimizations.^[25] The space complexity is O(n^2), dominated by the storage of the full pairwise distance matrix required throughout the algorithm.^[25] Runtime is influenced by data characteristics, such as whether the dataset is dense or sparse—for sparse data, distance computations can leverage efficient sparse matrix operations to reduce effective costs—and by the selected distance metric. Precomputing Euclidean distances, for example, requires O(n^2 d) time, where d is the data dimensionality, adding a preprocessing overhead that scales with feature count.^[26] These demands render the naive algorithm impractical for large-scale applications with n > 1000, where execution times become excessively long even on modern hardware, thereby motivating the development of more efficient variants.^[27] Empirically, matrix update operations constitute the primary bottleneck, as they recur at each merge step and involve propagating minimum distances across the entire structure, while the minimum-finding scans, though frequent, incur lower per-operation costs.^[3]

Illustrative Example

Step-by-Step Clustering Process

To illustrate the step-by-step process of single-linkage clustering, consider a small dataset consisting of five points in two-dimensional Euclidean space: A at (1,1), B at (2,2), C at (3,10), D at (4,11), and E at (10,1). The pairwise Euclidean distances are precomputed as follows (rounded to three decimal places for clarity):

	A	B	C	D	E
A	-	1.414	9.219	10.440	9.000
B	1.414	-	8.062	9.219	8.062
C	9.219	8.062	-	1.414	11.402
D	10.440	9.219	1.414	-	11.662
E	9.000	8.062	11.402	11.662	-

In single-linkage clustering, clusters are merged iteratively based on the minimum distance between any pair of points from distinct clusters. Step 1: The smallest distance is 1.414, shared by pairs A-B and C-D. Merging the lowest-indexed pair first, combine A and B into a new cluster AB. The updated distances from AB to the remaining points are the minima: d(AB, C) = min(9.219, 8.062) = 8.062, d(AB, D) = min(10.440, 9.219) = 9.219, and d(AB, E) = min(9.000, 8.062) = 8.062. The distance between C and D remains 1.414, and other distances are unchanged. Step 2: The smallest distance is now 1.414 between C and D. Merge them into cluster CD. The updated distances are: d(AB, CD) = min(d(AB, C), d(AB, D)) = min(8.062, 9.219) = 8.062 and d(CD, E) = min(11.402, 11.662) = 11.402. The distance d(AB, E) remains 8.062. Step 3: The smallest distances are now 8.062, shared by AB-E and AB-CD. Merging AB and E first (lowest indices), form cluster ABE. The updated distance is d(ABE, CD) = min(d(AB, CD), d(E, CD)) = min(8.062, 11.402) = 8.062. Step 4: The only remaining distance is 8.062 between ABE and CD. Merge them into the final cluster ABCDE. This merge sequence highlights the chaining effect characteristic of single-linkage, where distant points like A and C (original distance 9.219) become connected through intermediate minima along a path (e.g., via B to C at 8.062), rather than requiring all pairs to be close. The hierarchy forms at merge levels of 1.414 (two merges), 8.062 (two merges).

Dendrogram Visualization

The dendrogram for single-linkage clustering is constructed by arranging the individual data points, or leaves, horizontally at the base, with the vertical axis scaled to represent the merge distances, or heights, at which clusters are joined. Each branch connects subclusters at the precise height corresponding to their single-linkage distance, illustrating the sequential mergers; for instance, in the example dataset, points A and B connect at height 1.41, and this subcluster later joins with E at height 8.062.^[28] Interpretation of the dendrogram involves drawing a horizontal cut at a chosen height h, where the resulting clusters correspond to the distinct connected components formed by the branches below that line, thereby revealing the full hierarchy of merges. This visualization highlights the chaining phenomenon typical of single-linkage clustering, manifested as elongated, unbalanced branches that reflect the tendency to form string-like clusters by linking nearest neighbors progressively.^[9]^[29] To extract a specific number of clusters, such as k=2, the dendrogram is cut at an appropriate height, approximately 9.2 in the example, which separates the structure into {A, B, E} and {C, D} as the two primary clusters. Software libraries like SciPy facilitate dendrogram generation from the output linkage matrix of single-linkage clustering, enabling users to gain intuitive visual insights into the hierarchical relationships without manual plotting.^[30] A key limitation of dendrograms in single-linkage clustering is their readability for large datasets, as the horizontal span grows linearly with the number of points n, often rendering the diagram cluttered and difficult to interpret beyond moderate sizes.

Comparisons and Variants

Comparison with Other Linkage Criteria

Single-linkage clustering, which merges clusters based on the minimum distance between any pair of points from the two clusters, contrasts with complete linkage that uses the maximum pairwise distance, resulting in compact, spherical clusters rather than the elongated, chaining formations typical of single-linkage.^[2]^[31] This difference arises because complete linkage avoids bridging distant outliers, promoting tighter groupings suitable for well-separated, globular data structures.^[32] Average linkage, defined by the mean distance across all pairs of points between clusters, strikes a balance between the extremes of single and complete methods, offering greater robustness to outliers compared to single-linkage's sensitivity to nearest-neighbor chaining.^[2]^[31] By considering all inter-point distances, it tends to produce more uniform cluster shapes, though it may still underperform on highly irregular or noisy datasets where single-linkage excels in capturing density-based connections.^[33] Ward's method differs fundamentally by minimizing the increase in within-cluster variance (error sum of squares) upon merging, leading to balanced dendrograms and spherical clusters, unlike the potentially imbalanced, chain-like trees from single-linkage.^[34]^[32] This variance-focused approach, originally proposed by Ward, favors merges that preserve homogeneity and is particularly effective in handling noise, contrasting single-linkage's vulnerability to it.^[35] The following table summarizes the key linkage criteria, their formulas for inter-cluster distance d(C_1, C_2), effects on cluster shapes, and typical suitability:

Linkage Criterion	Formula	Effect on Cluster Shapes	Suitability
Single-linkage	\min_{x \in C_1, y \in C_2} d(x, y)	Elongated, chain-like	Detecting density-based or elongated structures with low noise^[2]^[33]
Complete linkage	\max_{x \in C_1, y \in C_2} d(x, y)	Compact, spherical	Cleanly separated globular clusters^[2]^[31]
Average linkage	$\frac{1}{	C_1	\cdot
Ward's method	Minimize $\Delta ESS = \frac{	C_1	\cdot

Single-linkage is generally chosen for datasets with elongated or density-connected patterns, such as natural clusters in spatial data, whereas complete, average, or Ward's methods are preferred for globular formations or when robustness to noise and balance are priorities.^[33]^[32]^[31]

Strengths and Limitations

Single-linkage clustering excels at revealing chain-like or manifold structures in data, such as elongated or snake-like distributions, by connecting clusters based solely on the nearest points, which allows it to capture non-elliptical shapes effectively.^[36] This property makes it particularly suitable for datasets where objects form extended sequences or irregular paths rather than compact groups.^[20] It also detects clusters with varying densities well, as the method does not impose assumptions about uniform cluster shapes or sizes, enabling the identification of regions with differing point concentrations through minimal distance links.^[20] Furthermore, single-linkage incurs low computational overhead per merge compared to variance-based methods, since it requires only the tracking of minimum pairwise distances without additional calculations like centroid updates or variance recomputations.^[37] A major limitation of single-linkage clustering is its high sensitivity to noise and outliers, which can lead to premature merging of unrelated clusters across irrelevant gaps, as a single anomalous point may serve as a bridge between distant groups.^[36] This vulnerability often results in unbalanced dendrograms, characterized by asymmetric branching and elongated hierarchies that reflect chaining rather than balanced subgroupings.^[38] The algorithm tends to break data into many small clusters easily in sparse or structured settings and performs poorly on globular data, where compact, spherical distributions are not well delineated by nearest-neighbor connections alone. Its equivalence to minimum spanning tree (MST) construction can also yield non-unique clusterings in cases of tied distances, depending on the implementation's edge selection order.^[37] Empirical studies demonstrate that single-linkage exhibits higher error rates on noisy datasets compared to average linkage, as outliers distort the proximity graph and propagate chaining effects throughout the hierarchy.^[39]

Advanced Implementations

Optimized Algorithms

The naive implementation of single-linkage clustering suffers from O(n³) time complexity due to repeated distance matrix updates after each merge, limiting its scalability to large datasets. Optimized algorithms address this by exploiting properties of single-linkage, such as its equivalence to minimum spanning tree (MST) construction, to achieve O(n²) time while using O(n) space.^[40] One foundational optimization is the nearest-neighbor chain algorithm, which leverages mutual nearest neighbors (MNN) to restrict candidate cluster pairs for merging. In this approach, only pairs where each point or cluster is the nearest neighbor of the other are considered, forming chains that guide the search for the global minimum distance. By maintaining a priority queue of these chains, the algorithm reduces the number of distance evaluations, achieving O(n²) time complexity overall. This method ensures all valid merges are eventually considered without exhaustive pairwise checks, as mutual nearest neighbors propagate through the clustering process.^[41] The SLINK algorithm, introduced by Sibson in 1973, provides an optimally efficient O(n²) implementation specifically for single-linkage by storing reciprocal nearest neighbors in a pointer structure and updating distances incrementally during merges. Rather than rebuilding the full distance matrix, SLINK maintains a list of nearest neighbor pointers for each cluster and adjusts them only when necessary, avoiding redundant computations and using linear space. This approach proves the theoretical lower bound for single-linkage, as each of the n-1 merges requires O(n) work in the worst case. An analogous method, CLINK by Defays (1977), applies similar pointer updates for complete-linkage, but single-linkage variants emphasize reciprocal nearest neighbors due to the method's focus on minimum distances.^[40]^[42] Single-linkage's connection to MST construction enables further optimizations using union-find data structures for tracking cluster connectivity. By treating the clustering as building an MST via Kruskal's algorithm—sorting all pairwise distances and adding non-cycle-forming edges with union-find—the process avoids explicit distance updates between merges. With path compression and union-by-rank, the union-find operations approach amortized constant time, reducing overall complexity from the naive O(n³) to O(n² log n) for dense graphs, or better for sparse inputs where edge sorting dominates. This MST view also facilitates complexity reductions to strict O(n²) in implementations that precompute and process distances in a single pass.^[43] In modern implementations, particularly for high-dimensional or large-scale data, sparse matrix representations are employed to store only non-zero distances, significantly reducing memory and computation for datasets with many irrelevant pairs, such as in text or graph clustering. Parallelization on GPUs accelerates distance computations and MST construction; for instance, the PANDORA algorithm constructs dendrograms in near-linear time on multi-core GPUs by distributing edge sorting and union-find operations across threads, achieving speedups of up to 37× on NVIDIA GPUs compared to multithreaded CPU-based implementations on datasets with millions of points. These techniques maintain the conceptual simplicity of single-linkage while enabling practical use on massive datasets.^[27]^[44]

Practical Applications

Single-linkage clustering finds application in bioinformatics for analyzing gene expression data from microarray studies, where it groups genes exhibiting similar expression patterns to uncover biological pathways. In the early 2000s, it was employed to identify non-spherical clusters in datasets like those from yeast cell cycle experiments, leveraging its chaining property to connect linearly arranged genomic features such as co-expressed genes along pathways. For instance, studies on human leukemia microarray data demonstrated its utility in revealing hierarchical relationships among gene modules, though it often required combination with other methods for robustness.^[45]^[46]^[47] In social network analysis, single-linkage clustering facilitates community detection by identifying connected components in graphs, akin to chains of friendships or interactions. Post-2010 applications include analyzing Twitter data during events like the 2014 Ebola outbreak, where it merged users based on minimum retweet distances to reveal communities centered on news propagation, such as retweet networks. This approach highlights diffuse social structures, such as loosely connected user groups sharing rare keywords.^[48]^[49] For image segmentation, single-linkage clustering groups pixels by nearest-neighbor distances in color space, proving effective for delineating elongated objects like roads in satellite imagery. It handles arbitrary shapes without assuming compactness, making it suitable for linear features in high-resolution aerial data, as seen in benchmarks involving thin, extended structures. However, its performance on overlapping regions necessitates careful distance metric selection, such as Euclidean for RGB values.^[50] In anomaly detection, single-linkage clustering identifies outliers through early merging patterns in dendrograms, where isolated points form singleton branches due to large minimum distances to other clusters. This visualization aids in spotting deviations in noisy datasets, with outliers often appearing as bridges or late mergers, enabling their removal for cleaner analysis.^[51] Single-linkage is integrated into popular software libraries for practical implementation. In Python's scikit-learn, it is available via the AgglomerativeClustering class with the 'single' linkage parameter, supporting connectivity constraints for sparse data. Similarly, R's hclust function includes single linkage as a method option, using minimum dissimilarity for merging. In the 2020s, these tools have been applied to single-cell RNA-seq (scRNA-seq) data for trajectory inference, where single-linkage hierarchical clustering constructs minimum spanning trees to model cell differentiation paths, as in tools like SCELLNETOR for pseudotime ordering.^[25]^[52]^[53] Despite these uses, single-linkage clustering in practice often demands preprocessing steps like outlier removal to mitigate its noise sensitivity, as distant points can prematurely chain unrelated clusters and distort results. Techniques such as density-based trimming or imputation are commonly applied beforehand, particularly in high-dimensional data like scRNA-seq, to enhance stability without altering core linkages.^[36]^[54]