Affinity propagation
Affinity propagation is an exemplar-based clustering algorithm that identifies representative data points, or exemplars, by iteratively exchanging real-valued messages of similarity between pairs of data points until a high-quality set of clusters emerges.[1] Introduced in 2007 by Brendan J. Frey and Delbert Dueck, the method takes as input a matrix of pairwise similarities—such as negative Euclidean distances—and a preference parameter that influences the number of clusters by setting the baseline attractiveness of each point as a potential exemplar.[1] Unlike traditional clustering techniques like k-means, which require specifying the number of clusters in advance and rely on random initializations, affinity propagation avoids these by using a message-passing framework inspired by statistical physics to self-organize data into clusters without predefined parameters for cluster count.[1]
The algorithm operates through two types of messages: responsibility messages, which indicate how well-suited a data point is as an exemplar for another point, and availability messages, which reflect how appropriate a point is to serve as an exemplar overall.[1] Responsibility r(i,k) for point i considering point k as exemplar is computed as the similarity s(i,k) minus the maximum of [similarity plus availability] from other potential exemplars, i.e., \max_{k' \neq k} [s(i,k') + a(i,k')], while availability a(i,k) aggregates evidence from other points, capped at zero to prevent negative influences.[1] Updates incorporate a damping factor (typically 0.5) to stabilize convergence, and the process halts after a fixed number of iterations or when assignments stabilize.[1] The output consists of exemplars and cluster assignments, determined by maximizing the sum of responsibility and availability for each point.[1]
Affinity propagation demonstrates superior performance in benchmarks, achieving lower quantization errors—such as 108 versus 119 on a dataset of 900 face images—compared to k-centers clustering, while completing tasks up to 100 times faster, as seen in gene expression analysis on 75,066 DNA segments (6 minutes versus 208 hours).[1] It excels with non-Euclidean similarities and large datasets, making it suitable for applications including image clustering (e.g., faces), bioinformatics (e.g., gene detection in microarray data), natural language processing (e.g., sentence summarization), and network analysis (e.g., identifying central cities in airline routes from 456 airports).[1] The algorithm's flexibility and efficiency have led to extensions for streaming data,[2] mixed data types,[3] and constrained clustering,[4] though it can be computationally intensive for very large similarity matrices due to O(N²) complexity.[1]
Overview
Description
Affinity propagation is an unsupervised clustering algorithm that identifies representative exemplars among data points by exchanging real-valued messages based on pairwise similarities between them, forming clusters around these exemplars to group similar points.[1] Unlike traditional methods that rely on predefined parameters, it dynamically determines the optimal number of clusters through this message-passing process.[5]
The algorithm treats every data point as a potential exemplar initially, evaluating their suitability to serve as cluster centers based on how well they represent nearby points in the similarity space.[5] The number of resulting clusters is controlled by preference values assigned to each point, which reflect their a priori attractiveness as exemplars; higher preferences lead to more clusters, while lower ones promote fewer, more general groupings—for instance, setting all preferences to the median of the input similarities typically produces a moderate number of clusters.[5]
In contrast to centroid-based approaches like k-means, which require specifying the exact number of clusters and initial center positions that can lead to suboptimal results if poorly chosen, affinity propagation avoids these constraints by considering all points as candidates and using similarity measures that need not be Euclidean or symmetric.[5] The primary inputs are a similarity matrix quantifying affinities between pairs of data points and the preference values for each.[1]
History
Affinity propagation was introduced in 2007 by Brendan J. Frey and Delbert Dueck through their seminal paper "Clustering by Passing Messages Between Data Points," published in Science.[1] This work presented the algorithm as an exemplar-based clustering method that exchanges real-valued messages between data points to identify clusters without requiring the number of clusters to be specified in advance.[1]
The algorithm's development was motivated by concepts from spectral clustering and belief propagation techniques in graphical models, aiming to improve upon traditional methods by enabling automatic determination of exemplars and clusters through iterative message passing.[6] In the same year, Dueck and Frey extended its application to computer vision in their ICCV paper "Non-metric Affinity Propagation for Unsupervised Image Categorization," demonstrating its utility in identifying meaningful image categories from unlabeled data.[7]
Post-2007 developments included semi-supervised variants to incorporate prior knowledge, such as the approach by Wang et al. in 2012, which used constraint projections to integrate must-link and cannot-link constraints into affinity propagation for improved clustering on UCI datasets.[8] In economic analysis, Almeida and Balanco applied the algorithm in 2020 to uncover temporal patterns in input-output multipliers across U.S. industries from 1997 to 2017, revealing structural shifts through cluster analysis of economic data.[9]
By the early 2020s, extensions integrated affinity propagation with deep learning for scalable clustering, notably in the 2020 AAAI paper "Unsupervised Deep Learning via Affinity Diffusion" by Huang et al., which combined message-passing mechanisms with neural networks to address error propagation in high-dimensional unsupervised learning tasks.[10] Further advancements have focused on incremental and adaptive versions, such as the 2021 active affinity propagation for streaming data (AAPStream), enhancing efficiency for data streams while preserving the core message-passing framework.[11]
Algorithm
Message Passing Mechanism
Affinity propagation operates through a message-passing mechanism where data points exchange real-valued messages to determine exemplars, which serve as cluster centers.[5] The core of this process involves two primary types of messages: responsibilities and availabilities. A responsibility message from data point i to candidate exemplar k, denoted r(i,k), expresses the accumulated evidence for how well-suited k is to serve as the exemplar for i relative to other candidate exemplars.[5] In contrast, an availability message from candidate exemplar k to data point i, denoted a(i,k), reflects the accumulated evidence for how appropriate it would be for k to serve as the exemplar for i, based on the support from other points that would choose k.[5]
The algorithm begins by initializing all availability values to zero, assuming no initial preferences for any point serving as an exemplar.[5] It then enters an iterative loop where, in each iteration, responsibility messages are updated for all pairs of points based on the current availabilities and the input similarities between points; these updates quantify the relative suitability of each candidate exemplar.[5] Next, availability messages are updated using the newly computed responsibilities, incorporating the self-responsibility of the candidate exemplar to account for its preference to be an exemplar itself.[5] For each data point i, the net influence on its choice of exemplar is computed by summing the responsibility and availability for each candidate k; the point k that maximizes this sum becomes the exemplar for i, and points that are their own exemplars (i.e., k = i) are designated as the final exemplars.[5]
To stabilize the iterative process and prevent oscillations in message values, a damping factor \lambda (typically set to 0.5, where $0 < \lambda < 1) is applied during updates.[5] This factor blends the previous message value with the newly computed value, using the formula: new message = \lambda \times previous message + (1 - \lambda) \times new message, which ensures smoother convergence.[5] The process continues until convergence is achieved, such as when exemplar assignments remain stable over a fixed number of iterations (e.g., 10).[5]
The message-passing loop can be outlined in pseudocode as follows:
Initialize a(i,k) = 0 for all i, k
While not converged:
# Update responsibilities for all pairs (including self)
For each i:
For each k: # includes k = i
tmp_r(i,k) = s(i,k) - max_{k' ≠ k} [a(i,k') + s(i,k')] # for k=i, max over k'≠i of a(i,k') + s(i,k')
# Apply damping to responsibilities
For each i, k:
r(i,k) = λ * old_r(i,k) + (1-λ) * tmp_r(i,k)
# Update availabilities
For each k:
For each i ≠ k:
tmp_a(i,k) = min(0, r(k,k) + sum_{j ≠ i,k} max(0, r(j,k)))
a(k,k) = sum_{j ≠ k} max(0, r(j,k))
tmp_a(k,k) = a(k,k) # self-availability
# Apply damping to availabilities
For each i, k:
a(i,k) = λ * old_a(i,k) + (1-λ) * tmp_a(i,k)
# Assign exemplars using damped messages
For each i:
k* = argmax_k [a(i,k) + r(i,k)]
If k* = i, mark i as exemplar
Check for convergence (e.g., stable exemplars for 10 iterations)
Return exemplars and cluster assignments
Initialize a(i,k) = 0 for all i, k
While not converged:
# Update responsibilities for all pairs (including self)
For each i:
For each k: # includes k = i
tmp_r(i,k) = s(i,k) - max_{k' ≠ k} [a(i,k') + s(i,k')] # for k=i, max over k'≠i of a(i,k') + s(i,k')
# Apply damping to responsibilities
For each i, k:
r(i,k) = λ * old_r(i,k) + (1-λ) * tmp_r(i,k)
# Update availabilities
For each k:
For each i ≠ k:
tmp_a(i,k) = min(0, r(k,k) + sum_{j ≠ i,k} max(0, r(j,k)))
a(k,k) = sum_{j ≠ k} max(0, r(j,k))
tmp_a(k,k) = a(k,k) # self-availability
# Apply damping to availabilities
For each i, k:
a(i,k) = λ * old_a(i,k) + (1-λ) * tmp_a(i,k)
# Assign exemplars using damped messages
For each i:
k* = argmax_k [a(i,k) + r(i,k)]
If k* = i, mark i as exemplar
Check for convergence (e.g., stable exemplars for 10 iterations)
Return exemplars and cluster assignments
This procedure leverages the input similarity measures to iteratively refine exemplar selections through decentralized communication among data points.[5]
Iterative Updates and Convergence
The affinity propagation algorithm operates through an iterative loop that alternates between updating all elements of the responsibility matrix, based on the current availability matrix, and then updating all elements of the availability matrix, based on the updated responsibilities. Following these matrix updates, the combined responsibility and availability values are used to monitor and update the current exemplar decisions for each data point. This process repeats, with messages exchanged between all pairs of data points, until the algorithm converges.[1]
Convergence is typically achieved when the changes in the responsibility and availability messages fall below a predefined threshold, after a fixed maximum number of iterations (such as 200), or when the exemplar assignments remain unchanged for a specified number of consecutive iterations (often 10 to 15). These criteria ensure that the clustering stabilizes without excessive computation, as the message-passing mechanism naturally propagates evidence toward consistent exemplar selections over iterations.[1][12]
To prevent oscillations in message updates that could hinder convergence, especially in cases with similar data points or noisy similarities, a damping factor is applied during each iteration. This factor, commonly set to 0.5, computes new messages as a weighted average of their previous values and the undamped updates, promoting stability while allowing the algorithm to exhibit self-correcting behavior in practice. If oscillations persist, increasing the damping or adding small random noise to the similarities can further aid convergence without altering the final results significantly.[1]
Upon convergence, the algorithm assigns each data point to its preferred exemplar by selecting the candidate that maximizes the sum of the corresponding responsibility and availability values, thereby defining the cluster memberships. The resulting clusters can then be evaluated for quality using the net similarity objective, which sums the similarities of all points to their assigned exemplars, providing a measure of overall clustering coherence.[1]
Similarity Measure
Affinity propagation requires a similarity matrix as input, where the entry s(i,k) quantifies how well-suited data point k is to serve as an exemplar for data point i. This similarity measure reflects the appropriateness of k representing i in a cluster, with higher values indicating stronger suitability.[5]
For real-valued vector data, a common choice for the similarity is the negative squared Euclidean distance, defined as s(i,k) = -\| \mathbf{x}_i - \mathbf{x}_k \|^2, which ensures that closer points yield higher similarities. The diagonal elements s(k,k), known as the preference values, represent the a priori suitability of point k to be an exemplar; these are often set to the median or minimum of the off-diagonal similarities to influence the number of clusters, where higher preferences promote more exemplars and thus more clusters.[5]
In cases involving non-Euclidean data, alternative similarity functions are employed, such as the Gaussian kernel s(i,k) = \exp\left( -\frac{\| \mathbf{x}_i - \mathbf{x}_k \|^2}{2\sigma^2} \right), which captures local similarities effectively for applications like spatiotemporal point pattern analysis. For domain-specific tasks, custom metrics may be used, such as log-likelihood ratios under probabilistic models or measures based on encoding costs for sequences like gene exons.[13][5]
Preprocessing the input data is crucial to ensure robust similarity computation, particularly through normalization to scale features appropriately and prevent dominance by high-variance dimensions. For high-dimensional inputs, dimensionality reduction techniques like principal component analysis are often applied beforehand to mitigate the curse of dimensionality and improve clustering performance in areas such as text analysis.[14]
Responsibility and Availability Matrices
In affinity propagation, the responsibility matrix \mathbf{R} and availability matrix \mathbf{A} encode the message-passing dynamics between data points, where each entry r(i,k) and a(i,k) quantifies the evidence for point k serving as the exemplar for point i. These matrices are interdependent, with responsibilities influencing availabilities and vice versa, enabling the emergence of clusters through iterative updates. The input to these computations is a similarity matrix \mathbf{S}, where s(i,k) measures the suitability of k as exemplar for i.
The responsibility r(i,k) captures the accumulated evidence that point k is the exemplar for point i, relative to other candidate exemplars k' \neq k. It is computed as
r(i,k) = s(i,k) - \max_{k' \neq k} \left[ a(i,k') + s(i,k') \right],
where the subtraction reflects the net attractiveness of k after accounting for competition from more suitable alternatives, weighted by their availabilities. The self-responsibility r(k,k) follows the same formula as the general case: r(k,k) = s(k,k) - \max_{k' \neq k} [a(k,k') + s(k,k')]. This formulation ensures that high-similarity points with low competition receive positive responsibilities, promoting them as potential exemplars.
The availability a(i,k) represents the degree to which point i is available to choose point k as its exemplar, based on the support k receives from other points. For i \neq k,
a(i,k) = \min \left\{ 0, \, r(k,k) + \sum_{i' \neq k, i} \max \left( 0, r(i',k) \right) \right\},
which aggregates positive responsibilities from other points toward k, including k's self-responsibility, but caps negative values at zero to prevent over-penalization. The self-availability a(k,k) is simply the sum of all positive responsibilities directed to k from other points:
a(k,k) = \sum_{i' \neq k} \max \left( 0, r(i',k) \right).
These equations highlight the interdependency: availabilities depend on prior responsibilities, which in turn rely on prior availabilities, necessitating iterative refinement.
To stabilize convergence and avoid oscillations in high-dimensional or noisy data, the updates incorporate damping with a factor \lambda (typically 0.5). The new responsibility and availability are thus
\mathbf{r}_{\text{new}}(i,k) = \lambda \, \mathbf{r}_{\text{old}}(i,k) + (1 - \lambda) \, r_{\text{computed}}(i,k),
and similarly for \mathbf{a}(i,k), where r_{\text{computed}} and a_{\text{computed}} are the undamped values from the equations above. This damping blends the previous message with the current computation, ensuring smoother propagation.
Exemplars are identified by evaluating the net evidence r(i,k) + a(i,k) for each point i, with the exemplar for i being k = \arg\max_k [r(i,k) + a(i,k)]. Points i for which this maximum occurs at k = i (i.e., r(i,i) + a(i,i) \geq r(i,k) + a(i,k) for all k \neq i) are designated as exemplars, as they sufficiently outcompete alternatives based on self-preference and incoming support. Cluster assignments follow directly: all points sharing the same exemplar form a cluster. In practice, points with positive r(k,k) + a(k,k) are more likely to emerge as exemplars, reflecting strong self-evidence.
Applications
Affinity propagation has found significant application in bioinformatics for transcript identification and gene co-expression clustering, particularly in analyzing microarray data. In an early demonstration, the algorithm clustered 75,066 putative exons from mouse chromosome 1 using expression levels across 12 tissue samples, where similarities were defined by genomic proximity and coordinated transcription patterns; this approach identified gene clusters with a true-positive rate of 39% at a 3% false-positive rate, surpassing k-centers clustering's 17% true-positive rate. Such clustering leverages the message-passing mechanism to group co-expressed genes without requiring a predefined number of clusters, aiding in the discovery of functional gene modules from sparse, high-dimensional expression data.
In protein-protein interaction (PPI) networks, affinity propagation identifies functional modules by treating proteins as nodes and deriving similarities from interaction strengths or association scores. An enhanced variant, Overlapped Affinity Propagation (OAP), applies the algorithm to the yeast PPI network (4,928 proteins, 17,201 interactions) to detect overlapping protein complexes, yielding 232 modules with 93% statistical significance (P-value < 0.001), precision of 50.4%, and an F-measure of 47.2% against a benchmark of 428 known complexes. This method outperforms baselines like Markov Clustering (MCL) in precision by up to 200.7% while capturing biologically relevant modules involved in cellular processes, validated using databases such as DIP and MIPS.
A notable case study involves clustering microbial genomes and metagenomic data for species identification, as extended post-2007. The tool BinSanity employs affinity propagation to bin metagenomic contigs based on coverage profiles across samples, effectively grouping sequences from the same microbial species in complex communities; it demonstrated superior performance over tools like MetaBAT and CONCOCT, leaving fewer contigs unclustered in diverse and strain-mixed datasets. These applications highlight affinity propagation's advantages in handling noisy biological data—such as incomplete interactions or variable expression—without assuming cluster shapes or sizes, enabling robust discovery in irregular, real-world genomic datasets.
In Image and Text Processing
Affinity propagation has been applied in computer vision tasks, particularly for clustering images in databases. In one early demonstration, it was used to cluster the Olivetti faces dataset, identifying exemplars that represent distinct facial expressions and identities with low error rates compared to k-centers methods. This approach leverages pairwise similarities, such as Euclidean distances in pixel space, to group similar faces without predefined cluster numbers. Additionally, affinity propagation facilitates exemplar selection for object detection, where it mines representative instances from large image corpora to train detectors. For instance, in class exemplar mining, it selects diverse yet representative object samples from relational graphs of image features, improving detection accuracy on benchmarks like Caltech-101 by focusing on high-quality exemplars.[15]
In text mining, affinity propagation supports document clustering and topic modeling by treating documents or sentences as data points with similarities based on metrics like cosine distance over TF-IDF vectors. The original formulation demonstrated its efficacy in selecting representative sentences from manuscripts, effectively summarizing text by identifying key exemplars that capture thematic content. A semi-supervised variant incorporates labeled constraints to guide clustering of text data, enhancing performance on partially annotated corpora by projecting similarities to respect must-link and cannot-link pairs. For example, it has been applied to group similar news articles, where exemplars serve as headlines for clusters of related stories, or to cluster high-dimensional word embeddings from models like Word2Vec, revealing semantic groupings in large text corpora.[16]
Due to its O(n²) complexity from the full similarity matrix, affinity propagation faces scalability challenges on large image or text datasets exceeding thousands of points. To address this, approximations such as landmark-based methods or local-global message passing reduce computations while preserving cluster quality, enabling applications to datasets with millions of high-dimensional features like image descriptors or document embeddings.[17] These variants maintain the message-passing mechanism's ability to handle irregular data structures in visual and textual domains.
Properties
Advantages
Affinity propagation offers a key advantage over traditional clustering algorithms like k-means by not requiring the number of clusters to be specified in advance. Instead, the algorithm uses a preference parameter to influence the number of exemplars, allowing the data's inherent structure to determine the optimal partitioning through iterative message passing. This flexibility makes it particularly suitable for exploratory data analysis where the cluster count is unknown or variable.[5]
The method excels in handling arbitrary cluster shapes and sizes due to its reliance on pairwise similarity measures rather than geometric assumptions, such as spherical clusters in k-means. By selecting exemplars directly from the data points, affinity propagation demonstrates robustness to outliers, as points with low similarity to others are less likely to become exemplars and can be assigned to nearby clusters without distorting the overall structure. This exemplar-based approach enables effective clustering in non-Euclidean or asymmetric similarity spaces, such as those encountered in bioinformatics or network data.[5]
Empirically, affinity propagation has shown superior performance in benchmarks, achieving faster convergence and lower error rates compared to methods like k-centers. For instance, in face image clustering, it produced solutions with a squared reconstruction error of 108, outperforming the best of 100 k-centers runs (error of 119), and in gene expression analysis, it detected genes with a 39% true-positive rate at a 3% false-positive rate, compared to 17% for k-centers—all while requiring significantly less computation time, such as 6 minutes versus 208 hours for large datasets. These results highlight its efficiency for non-convex clusters and real-world applications.[5]
The interpretability of affinity propagation stems from its use of actual data points as exemplars, providing meaningful representatives that facilitate human analysis and validation of clusters, unlike abstract centroids generated by other algorithms. This feature enhances its utility in domains requiring explainable results, such as pattern recognition in sensory data.[5]
Limitations
Affinity propagation exhibits a quadratic time and space complexity of O(n²) due to the requirement of computing and storing a full similarity matrix among all data points, which limits its practical application to datasets with approximately 10,000 points or fewer on standard hardware without specialized approximations or hardware acceleration.[18]
The algorithm is highly sensitive to the choice of the preference parameter, which controls the number of exemplars and thus the clustering granularity; suboptimal tuning can result in excessive fragmentation (over-clustering) or insufficient separation (under-clustering), necessitating careful selection through methods like median similarity or cross-validation.[19]
Compared to k-means, affinity propagation is generally slower for large-scale datasets owing to its iterative message-passing nature and quadratic scaling, making it less suitable for big data applications where k-means' linearithmic complexity provides better efficiency.[20] Additionally, it tends to underperform on very high-dimensional data without dimensionality reduction preprocessing, as the curse of dimensionality degrades the reliability of similarity measures.
In specific domains like protein interaction networks, Markov clustering often outperforms affinity propagation on dense graphs by demonstrating greater robustness to noise and better recovery of biologically relevant modules.[21] Unlike probabilistic models such as Gaussian mixture models, affinity propagation lacks inherent uncertainty quantification or soft assignments, providing only hard deterministic clusterings without statistical guarantees on data generation assumptions.
Implementations
Open-Source Libraries
Affinity propagation has been implemented in several open-source libraries across popular programming languages, enabling researchers and practitioners to apply the algorithm to diverse datasets. These implementations typically adhere to the core message-passing mechanism while incorporating optimizations for efficiency and flexibility, such as support for custom similarity measures and handling of large-scale data. Most initial open-source versions emerged between 2008 and 2015, with subsequent updates focusing on performance improvements and integration with modern computing environments.[12][22][23][24]
In Python, the scikit-learn library provides the AffinityPropagation class within its clustering module, offering a robust implementation that supports custom similarity matrices through the affinity parameter (e.g., precomputed distances or Euclidean metrics) and adjustable damping factors to control convergence stability. This class handles the responsibility and availability message updates internally, making it suitable for medium-sized datasets, though its quadratic complexity limits scalability for very large inputs. The implementation was introduced in early versions around 2012 and has seen efficiency enhancements in releases post-version 1.0 (from 2022 onward), including better random state handling for reproducibility.[12]
For R users, the apcluster package delivers a comprehensive affinity propagation toolkit, including the core apcluster() function for standard clustering and extensions like leveraged affinity propagation (apclusterL()) for subsampling large datasets to reduce computational load. It supports parallel computation options through integration with R's parallel processing capabilities and Rcpp-based C++ backends for faster execution, particularly beneficial for rectangular similarity matrices or sparse data structures via the Matrix package. Initially released in 2011, the package has been updated regularly, with version 1.4.0 (2014) adding sparse matrix support and the latest version (1.4.14 as of September 2025) incorporating refinements for exemplar-based agglomerative clustering.[25][22][26]
Julia's Clustering.jl package includes affinity propagation via the affprop() function, seamlessly integrating with the language's distance metric ecosystem (e.g., from Distances.jl) to compute similarities such as Euclidean or Manhattan distances on the fly. This implementation emphasizes high-performance computing, leveraging Julia's just-in-time compilation for efficient message passing on numerical arrays, and returns cluster assignments along with exemplar indices. First appearing in package versions around 2015, it has evolved with Julia's ecosystem, maintaining compatibility with recent releases (e.g., v1.10+ by 2025) for optimized handling of multidimensional data.[23][27]
In Java, the ELKI (Environment for Developing KDD-Applications Supported by Index-Structures) framework incorporates affinity propagation through the AffinityPropagation class in its clustering module, tailored for data mining tasks on large datasets with support for various distance functions and indexing structures to mitigate the algorithm's O(n²) complexity. It processes relational data efficiently, outputting clusterings with exemplar points, and is extensible for custom similarity definitions. ELKI's implementation dates back to early releases around 2010-2012, with ongoing updates in versions up to 0.8.0 (2022) enhancing scalability for high-dimensional inputs common in research applications.[24][28]
Integration in Frameworks
Affinity propagation is integrated into the scikit-learn machine learning library as part of its clustering module, where it is implemented as the AffinityPropagation class in sklearn.cluster. This allows seamless use within scikit-learn pipelines, such as combining it with preprocessing steps like principal component analysis (PCA) to handle high-dimensional data and improve computational efficiency by reducing the feature space before computing the similarity matrix. The algorithm's quadratic time and space complexity with respect to the number of samples makes such dimensionality reduction essential for scalability on larger datasets.[12]
In MATLAB, affinity propagation is available through user-contributed implementations on the MATLAB File Exchange, including adaptive and semi-supervised variants that extend the original algorithm for specific use cases. These implementations, often based on the MATLAB code provided by the algorithm's originators, are commonly applied in bioinformatics, such as for clustering gene expression data to reveal temporal patterns in transcriptional programs. For instance, affinity propagation has been used to identify modules in time-series microarray data by accounting for delays and phase shifts in expression profiles.[29][30]
Post-2020 developments have seen custom extensions of affinity propagation within deep learning frameworks like PyTorch and TensorFlow, particularly for clustering high-dimensional neural embeddings generated by models such as autoencoders or transformers. These integrations facilitate end-to-end workflows where embeddings from neural networks serve as input similarities for affinity propagation, enhancing applications in unsupervised learning tasks like anomaly detection in embeddings. Examples include PyTorch-based implementations for speaker verification embeddings and TensorFlow adaptations for general clustering pipelines.[31][32]
To address the O(n²) complexity, practitioners often combine affinity propagation with dimensionality reduction techniques like PCA, which mitigates memory and runtime demands by lowering the effective input size, though it does not fully resolve scalability for very large n. Parallel versions are available through external packages in Apache Spark, such as SparkAffinityPropagation, enabling distributed computation of the message-passing updates across clusters for big data scenarios. The preference parameter, which influences the number of exemplars, can be tuned in these frameworks to balance cluster quality and computational cost.[33][34]