Fact-checked by Grok 2 weeks ago

Correlation clustering

Correlation clustering is a graph-based clustering that partitions a set of objects into an optimal number of clusters based on pairwise similarity relations, without requiring the number of clusters to be specified in advance. Formally, it involves a with representing objects and edges labeled as positive (+) for similar pairs or negative (−) for dissimilar pairs; the objective is to find a partitioning that minimizes disagreements—defined as the number of positive edges cut between clusters or negative edges left uncut within clusters—or equivalently, maximizes agreements. Introduced in 2002 by Nikhil Bansal, , and Shuchi Chawla, the problem is NP-hard, with the decision version proven NP-complete via reduction from the exact cover by 3-sets problem. The approach is particularly valuable in scenarios where the underlying cluster structure is unknown or variable, contrasting with traditional methods like k-means that mandate a fixed k. It models real-world data through a classifier assigning similarity labels, enabling agnostic learning of clusterings even with noisy or incomplete . Key algorithmic advancements include a constant-factor for minimizing disagreements in complete graphs, achieved via a randomized of a , and a polynomial-time scheme (PTAS) for maximizing agreements under the same setting. Extensions handle real-valued edge weights and general graphs, with an O(log n)- for minimizing disagreements in the latter case. Applications span , , and , notably in document clustering—such as grouping related web pages—where pairwise similarities can be derived from features like hyperlinks or content overlap, and the natural number of topics emerges from the data itself. Further developments address , partial information, and adaptive queries, broadening its utility in large-scale and dynamic environments.

Introduction

Definition and Motivation

Correlation clustering is a graph-based clustering designed to a set of n objects into clusters based on pairwise similarity , without requiring the number of clusters to be specified in advance. It operates on a where vertices represent the objects and edges are labeled as positive (+1, indicating similarity) or negative (-1, indicating dissimilarity), often derived from a similarity that captures relational agreements or disagreements between pairs. The core goal is to identify a partitioning that aligns as closely as possible with these labels by minimizing inconsistencies, such as placing dissimilar objects together or separating similar ones. This approach is particularly motivated by real-world applications where relationships between entities are inherently relational and qualitative, rather than quantifiable through metric distances. In , for example, signed graphs model positive ties like friendships or alongside negative ones like conflicts or , enabling the detection of balanced communities that reflect . Unlike traditional distance-based methods such as k-means, which presuppose a fixed number of clusters k and operate in Euclidean-like spaces, correlation clustering flexibly determines the optimal number of clusters—from 1 to n—based solely on the edge labels, making it suitable for agnostic learning scenarios where is noisy or incomplete. A representative example involves clustering documents from a large , such as web pages, where pairwise similarities are computed based on content overlaps or learned functions from historical data. Here, positive labels might indicate thematic agreement, while negative ones signal divergence, allowing the algorithm to group related documents into coherent clusters without predefined parameters. In its workflow, the input consists of a complete signed with +1/-1 edge labels, and the output is a set of clusters that seeks to minimize disagreements (negative edges within clusters or positive edges across clusters in the standard formulation), thereby capturing the underlying relational structure.

Historical Development

Correlation clustering was formally introduced in 2002 by Nikhil Bansal, , and Shuchi Chawla as a for partitioning into clusters based on pairwise similarity or dissimilarity labels, motivated by applications such as clustering documents with noisy expert annotations and modeling social networks where relationships could be positive or negative. This work drew conceptual inspiration from earlier ideas in social sciences, particularly the study of signed graphs and structural developed in the 1950s and 1960s by researchers like and Frank Harary, which analyzed how positive and negative ties influence group formation in social structures. However, Bansal et al. provided the first rigorous computational formulation within , emphasizing the challenge of determining both the number of clusters and their composition without predefined parameters. In the mid-2000s, the field advanced with approximation that made correlation clustering practical for larger datasets. A key contribution came in 2005 from Moses Charikar, Venkatesan Guruswami, and Anthony Wirth, who developed a 3-approximation for the minimization objective, which seeks to minimize the number of inconsistent edges (positive edges between clusters or negative edges within clusters), enabling its integration into pipelines for tasks like entity resolution and community detection. By the late , the approach had gained traction in bioinformatics, with algorithms inspired by correlation clustering, such as the Divisive Correlation Clustering Algorithm (DCCA), adapted for data to identify co-regulated gene groups based on correlation patterns across experiments. The maximization variant, which aims to maximize consistent edges, was explored alongside the minimization form from the outset but saw expanded theoretical and applied developments by 2010, including relaxations for better s. Following the rise of in the 2010s, correlation clustering experienced a resurgence post-2020 with adaptations for streaming and online settings, such as sublinear-time algorithms for dynamic graphs and learning-augmented methods that handle edge arrivals in arbitrary order while maintaining guarantees. Recent advances as of 2025 include a 1.44- algorithm and learning-augmented streaming methods, improving scalability for large-scale applications.

Problem Formulation

Graph-Based Representation

Correlation clustering is commonly modeled using a complete undirected G = (V, E), where V is the set of n vertices representing the objects to be clustered, and E consists of edges labeled either +1 (indicating similarity) or -1 (indicating dissimilarity) for every pair of distinct vertices, with no self-loops assumed. This signed structure captures pairwise relational data, where positive labels suggest objects should be grouped together and negative labels suggest separation into different groups. An equivalent representation employs a similarity S, an n \times n where S_{ij} \in \{+1, -1\} for i \neq j, encoding the same pairwise relations as the edge labels in the , with diagonal entries typically undefined or zero to reflect the absence of self-relations. This matrix form is particularly useful in contexts for computational efficiency with dense relational data. To handle incomplete or noisy data, the model extends to partial labeling, where some edges may be unlabeled (treated as neutral or with weight 0), allowing the to be incomplete while positive and negative labels retain their interpretive roles for available relations. Standard notational conventions include \tau^+ for the total number of positive edges and \tau^- for the total number of negative edges across the , with clusterings denoted as partitions \pi of the vertex set [V](/page/V.). A representative example arises in , where vertices represent individuals, positive denote friendships (suggesting co-clustering), and negative denote enmities (suggesting separation into different ).

Objective Functions and Variants

In correlation clustering, the primary minimization objective seeks to partition the vertices of a signed into such that the total number of disagreements is minimized, where a disagreement occurs either when a negative connects two vertices within the same or when a positive connects vertices in different . This , often denoted as Cost(π) for a clustering π, balances the penalties for intra-cluster dissimilarities and inter-cluster similarities, reflecting the goal of grouping similar items together while separating dissimilar ones. Formally, \text{Cost}(\pi) = \sum_{C \in \pi} \left| \{ (u,v) \in E^- : u,v \in C \} \right| + \left| \{ (u,v) \in E^+ : u \text{ and } v \text{ in different clusters of } \pi \} \right|, where E^+ and E^- are the sets of positive and negative edges, respectively. An equivalent maximization variant aims to maximize the number of agreements, defined as positive edges within clusters plus negative edges between clusters. This formulation is linearly related to the minimization objective, as the total number of edges is fixed, making the problems interchangeable up to a constant shift and scaling; specifically, maximizing agreements is equivalent to minimizing disagreements since their sum equals the total number of edges. A notable variant is cluster deletion, which minimizes the number of edges removed from the to obtain a collection of disjoint positive , effectively penalizing only negative edges within clusters while ignoring positive edges across clusters. This objective simplifies the standard cost by focusing on eliminating intra-cluster negative edges, transforming the signed into one where each in the positive is a . Another variant incorporates a parameter to balance the relative importance of disagreements and separations, often formulated in frameworks like LambdaCC where a λ ∈ (0,1) weights the costs: the objective minimizes λ times the number of intra-cluster negative edges plus (1-λ) times the number of inter-cluster positive edges. When λ = 1, this reduces to the cluster deletion variant; smaller λ emphasizes avoiding separations. Unlike traditional k-means or k-clustering, correlation clustering does not fix the number of clusters k in advance; instead, the optimal number emerges naturally from the optimization process, as the partitioning adapts to the edge labels without a predefined cluster .

Algorithms

Approximation Algorithms

Correlation clustering, being NP-hard in general, relies on approximation algorithms that provide scalable solutions with theoretical guarantees for large instances. One foundational approach is the greedy pivot algorithm introduced by Bansal, Blum, and Chawla, which iteratively selects a random vertex and classifies its remaining neighbors into the same if the majority of edges to the pivot are positive or into different clusters otherwise. This process removes the pivot and its assigned cluster at each step, continuing until all vertices are clustered, and achieves an O(log n)- ratio for minimizing disagreements on general graphs. A notable improvement for complete graphs is the linear programming-based algorithm by , Guruswami, and Wirth, which solves a natural LP relaxation of the min-disagreement objective and rounds the solution via randomized clustering to obtain a 4-approximation. The LP formulation assigns fractional cluster memberships to vertices, and the rounding step leverages the relaxation's integrality properties to bound the expected cost. This method extends to O(log n)-approximations for sparse graphs by incorporating edge weights appropriately. Recent work has improved the approximation ratio to below 3, with a 2.996-approximation algorithm developed in 2023. Local search methods offer another class of heuristics, starting from a random initial and iteratively improving it by flipping assignments, merging clusters, or splitting them to reduce the total cost until a local optimum is reached. These approaches, analyzed in expectation, yield a 3-approximation for the min-disagreement variant on complete graphs by ensuring that local improvements capture a significant fraction of the optimal structure. Such methods are particularly effective in practice due to their simplicity and ability to escape poor local minima through randomized restarts. An adaptation of to correlation clustering involves building a via agglomerative merging, where cluster distances are computed using average linkage on the signed edge labels—treating positive edges as similarities and negative as dissimilarities. This produces a of clusters without specifying the number in advance, allowing cuts at various levels to approximate the optimal partitioning based on the objective function. In terms of practical implementations, correlation clustering algorithms like the greedy and local search variants are available in research codebases and can be integrated with graph libraries like NetworkX for efficient computation. Time complexities typically scale as O(n²) for dense graphs due to neighbor evaluations in or search steps, while heuristics for sparse graphs achieve near-linear time by restricting to local neighborhoods.

Exact Algorithms and Special Cases

Correlation clustering is NP-hard in general, motivating the development of exact algorithms that are feasible for small instances or restricted inputs. One standard approach is to formulate the problem as an integer linear program (ILP). The ILP uses binary variables x_{ij} for each pair of vertices i < j, where x_{ij} = 1 if i and j are assigned to the same cluster and 0 otherwise. The objective is to minimize the total disagreements: \min \sum_{\{i,j\} \in E^+} (1 - x_{ij}) + \sum_{\{i,j\} \in E^-} x_{ij}, where E^+ and E^- are the sets of positive and negative edges, respectively. To enforce that the x_{ij} correspond to a valid clustering, triangle inequalities are imposed for all distinct triples i, j, k: x_{ij} + x_{jk} - x_{ik} \leq 1, \quad x_{ij} + x_{ik} - x_{jk} \leq 1. These constraints ensure transitivity within clusters. The resulting ILP has O(n^2) variables and O(n^3) constraints and can be solved exactly using off-the-shelf solvers such as Gurobi or CPLEX, which employ branch-and-bound internally; however, the worst-case time complexity remains exponential in n. The ILP formulation enables solving exact correlation clustering for small n using commercial solvers on standard hardware. Branch-and-bound techniques tailored to correlation clustering further enhance efficiency by pruning the search tree with strong lower bounds. These bounds are often derived from relaxing the ILP to its linear programming (LP) counterpart and using the LP solution to fathom suboptimal branches early. For instance, combinatorial branch-and-cut methods integrate cutting planes from the LP relaxation to tighten bounds, enabling exact solutions for instances up to n \approx 30 in practice on benchmark signed graphs. Such approaches have been applied to structural balance analysis, where the goal is to partition vertices into antagonistic clusters with minimal disagreements. For even smaller instances, dynamic programming over vertex subsets provides an alternative exact method. Define dp[S] as the minimum disagreement cost to cluster the vertex subset S \subseteq V. The recurrence is dp[S] = \min_{T \subseteq S, T \neq \emptyset} \left( dp[S \setminus T] + c(T, S \setminus T) \right), where c(T, S \setminus T) is the cost of treating T as a new cluster: the sum of negative edges within T plus positive edges between T and S \setminus T. The base case is dp[\emptyset] = 0. A standard implementation requires O(3^n) time, as there are $2^n states and each has O(3^n / 2^n) = O(1.5^n) transitions on average. However, using fast subset convolution, it can be solved in O(2^n \mathrm{poly}(n)) time. For larger n \leq 40, a meet-in-the-middle variant splits V into two halves of size n/2, computes all possible clusterings and costs for each half in O(3^{n/2} \mathrm{poly}(n)) time, then combines compatible partitions by matching cluster interfaces, achieving practicality on modern hardware for dense instances. Special cases admit exact polynomial-time algorithms. For balanced signed graphs, where all cycles have an even number of negative edges, the optimal clustering achieves zero disagreements and can be found in linear time by switching signs (flipping edge signs along paths from a source vertex) to make all edges positive, then taking connected components in the resulting all-positive graph as clusters. This process identifies the unique frustration-free partition up to sign switches. For the two-cluster case in general signed graphs, the problem can be solved in polynomial time using spectral relaxation on the to minimize the signed ratio cut, which penalizes positive edges between clusters and negative edges within clusters. This can be computed via eigenvalue decomposition in O(n^3) time. Complete bipartite graphs also permit exact solutions via dynamic programming over assignments of partite sets to clusters, leveraging the absence of intra-partite edges to reduce the state space to combinations of cluster labels per side, solvable in O(2^{|L| + |R|} \mathrm{poly}(n)) time where |L| and |R| are partite sizes, though tighter bounds exist for balanced labelings. When edge labels form a tournament structure (i.e., the negative edges define a transitive orientation), the problem simplifies to on tournaments, which admits exact polynomial-time solutions via topological sorting after removing inconsistent edges, but in practice uses linear-time heuristics for near-optimality.

Theoretical Results

Approximation Guarantees

The standard approximation algorithm for minimizing disagreements in correlation clustering is the Pivot algorithm, which achieves an expected 3-approximation ratio. This combinatorial method, introduced by Ailon, Charikar, and Newman, selects random pivots to partition the graph and is proven to bound the expected number of disagreements by three times the optimal value through charging arguments involving bad triplets. The ratio is tight in the worst case, as demonstrated by instances where the algorithm's expected cost approaches three times the optimum, such as graphs with nearly all positive edges except a single negative edge in large complete graphs. Linear programming (LP)-based rounding methods also yield a 2.5-approximation for minimization in the same work, by solving a natural LP relaxation and applying pivot rounding to the fractional solution, improving upon the pure combinatorial approach under certain conditions. As of November 2025, the best approximation ratio for minimizing disagreements is 1.485 + ε for any ε > 0, achieved via (SDP) rounding of the cluster LP. For the maximization variant, which seeks to maximize intra-cluster agreements, (SDP) relaxations provide stronger guarantees. Swamy developed an -based achieving a 0.7666-approximation ratio for general instances, leveraging efficient SDP solvers for max-cut-like objectives and rounding techniques that preserve a constant fraction of the optimum. This outperforms the naive 0.5-approximation and is based on embedding the into a higher-dimensional space to capture structure. Local search s offer constant-factor guarantees as well; for instance, iterative flipping of vertices between clusters converges to a 3-approximation solution after O(n log n) local moves, as analyzed in extensions of the framework where each flip reduces potential violations while bounding the total iterations by logarithmic factors in size. Bi-criteria approximations trade off the number of clusters for improved cost ratios. For example, algorithms exist that achieve an O(1)- to the minimum disagreement while using at most O(log n) times the optimal number of extra s, often via hierarchical partitioning or rounding with relaxed cardinality constraints. This is particularly useful when the optimal number of is unknown or large, allowing practical scalability at the expense of slightly more partitions. A key inapproximability result states that it is -hard to approximate the maximization problem within a factor of 80/79 - ε for any ε > 0 unless P=, based on reductions from qualitative clustering problems. Recent advances focus on efficient implementations in and streaming models while nearing the 3-approximation barrier. Behnezhad, , Ma, and Tan presented a that achieves a (3 + ε)-approximation in constant rounds for any ε > 0, using O(log n)-bit local computations, which nearly matches the sequential algorithm's guarantee in distributed settings.

Computational Hardness and Optimal Clusters

Correlation clustering, in its minimization form, is NP-hard even when restricted to complete graphs. This hardness follows from a reduction from the exact cover by 3-sets (X3C) problem, as established in the foundational work on the topic. For general graphs, the problem remains NP-hard, with reductions from problems like 3-SAT demonstrating the intractability of finding an optimal partitioning that minimizes disagreements between edge labels and cluster structure. The problem is also APX-hard, implying that no polynomial-time approximation scheme (PTAS) exists unless P=NP. This result arises from a to the minimum multicut problem, which is known to be APX-hard, showing that achieving approximations better than a constant factor is computationally challenging. For the maximization variant, where the goal is to maximize agreements, similar inapproximability holds, with it being NP-hard to approximate within a factor of 80/79 - ε for any ε > 0 unless P=NP. A key related —determining whether a zero-error clustering exists, meaning a partitioning with no disagreements at all—is NP-complete, as it equates to checking if the signed graph can be perfectly balanced into cliques without violations. The optimal number of clusters, denoted k^*, in correlation clustering emerges naturally without a predefined , defined as k^* = \arg\min_k \text{cost}(k) in the fixed-k variant of the problem. There is no for k^*, but theoretical bounds exist, such as k^* \leq n/2, derived from properties of graph that limit the number of viable s before merging becomes beneficial. In random graph models, such as Erdős–Rényi graphs, k^* exhibits phase transitions influenced by edge label levels; for instance, a percolation-like transition occurs around a of q = 1/2 (the of positive edges), where the relative size of the largest shifts abruptly, causing k^* to scale accordingly with increasing . Despite general intractability, correlation clustering admits fixed-parameter tractability when parameterized by the t of the input . In this regime, the problem can be solved in time in n and in t, leveraging monadic second-order (MSO) logic encodings to express cluster partitions and evaluate costs via dynamic programming on tree decompositions.

Applications and Extensions

Data Mining Applications

Correlation clustering finds practical utility in data mining tasks involving relational data with both positive and negative affinities, such as grouping objects based on incomplete or noisy pairwise labels. In document clustering, it addresses the challenge of organizing large corpora without predefined cluster counts by minimizing disagreements in similarity labels derived from human annotations or feature embeddings. For instance, the original formulation was motivated by clustering web pages using pairwise similarity judgments from a classifier. This approach groups articles by treating topic similarities as positive edges and dissimilarities as negative, enabling robust partitioning even with label noise from annotators. In community detection within signed networks, correlation clustering identifies friend and foe groups by minimizing intra-cluster negative edges and inter-cluster positive edges, aligning with social balance theory. Applied to datasets like , where users label others as friends (+) or foes (-), it resolves inconsistent ties to form antagonistic communities, with algorithms bounding prediction errors relative to the network's inherent frustrations (Δ(Y)). Empirical evaluations on demonstrate its effectiveness in link classification, outperforming baselines in mistake bounds for online settings. This method enhances detection of polarized structures in relational data, such as user interactions. Bioinformatics applications leverage correlation clustering to group genes or proteins in interaction networks, where positive edges indicate co-expression and negative edges . The Divisive Correlation Clustering (DCCA) iteratively splits clusters based on Pearson correlations, ensuring positive intra-cluster affinities while separating negatively correlated pairs. Tested on gene-expression datasets (e.g., 6215 genes across conditions), DCCA yields higher functional enrichment scores than k-means or , with z-scores up to 21.9 for enriched biological attributes like metabolic pathways. In recommendation systems, correlation clustering resolves inconsistent user-item preferences by grouping similar entities via correlations, forming clusters that mitigate in rating matrices. Evaluation often employs the adjusted to assess robustness to label , where correlation clustering shows empirical superiority over k-means on relational benchmarks by better handling signed affinities.

Recent Variants in

Recent variants of correlation clustering have emerged to address dynamic and resource-constrained environments in , particularly focusing on , streaming, and quantum settings. In correlation clustering, vertices arrive sequentially, and the algorithm must assign each to a irrevocably upon arrival, without revisiting prior decisions. A robust approach achieves an O(1/ε)-competitive against the optimal offline solution in a semi- model with a random sample of size εn, ensuring bounded regret even under noisy or adversarial inputs. Streaming algorithms extend this to edge streams, processing graphs in a single pass with limited memory. These methods employ sketches to approximate structures, enabling O(1)-space solutions for constant-factor s in dynamic settings. When augmented with predictions of edge labels, such algorithms improve the beyond the classical 3-factor, achieving better-than-3 guarantees on complete graphs while maintaining sublinear . Quantum-assisted variants leverage hybrid quantum-classical frameworks to solve correlation clustering on signed graphs, where edges indicate positive or negative correlations. One such method formulates the problem as a and uses for partitioning, demonstrating improved robustness and clustering quality over classical baselines on synthetic signed graphs up to n=170 nodes, with potential speedups for instances around n=100 on current hardware (as of 2025). Robust variants address adversarial perturbations, such as edge label flips, in dynamic graphs. Algorithms maintain sublinear-time updates under adaptive adversaries, preserving approximation guarantees while handling noise that could otherwise degrade cluster quality. Learning-augmented techniques further refine these by integrating predictions directly into the optimization process. By using predicted labels to guide decisions, these methods boost the approximation from the standard 3 to nearly 2 in constrained settings, particularly when predictions are accurate, while providing consistency and robustness bounds for erroneous inputs. A key development in parallelizable variants achieves an almost 3-approximation in constant rounds across models like massively parallel computation. For any ε > 0, the algorithm runs in O(1/ε) rounds, nearly closing the gap to the 3-approximation barrier while scaling to large graphs (as of ).

References

  1. [1]
    [PDF] Correlation Clustering - CSE IITB
    Oct 27, 2003 · In this paper we give a constant factor approximation to the problem of minimizing disagreements, and a PTAS1 for maximizing agreements. We also ...
  2. [2]
    [PDF] A Survey of Correlation Clustering - Columbia CS
    In the original paper that introduced the problem, Bansal et al. [1] showed a constant fac- tor approximation algorithm for minimizing disagreements, based on ...
  3. [3]
    [PDF] Correlation clustering - People | MIT CSAIL
    This formulation is motivated from a document clustering problem in which one has a pairwise similarity function f learned from past data, and the goal is to ...
  4. [4]
    [PDF] A Correlation Clustering Approach to Link Classification in Signed ...
    Motivated by social balance theory, we develop a theory of link classification in signed networks using the correlation clustering index as measure of label ...
  5. [5]
    Correlation Clustering | Proceedings of the 43rd Symposium on ...
    Correlation Clustering. Authors: Nikhil Bansal. Nikhil Bansal. View Profile. , Avrim Blum. Avrim Blum. View Profile. , Shuchi Chawla ... November 2002. 569 pages.
  6. [6]
    Clustering with qualitative information - ScienceDirect.com
    We consider the problem of clustering a collection of elements based on pairwise judgments of similarity and dissimilarity.
  7. [7]
    Divisive Correlation Clustering Algorithm (DCCA) for grouping of ...
    Apr 10, 2008 · To detect clusters with high correlation and biological significance, we use the correlation clustering concept introduced by Bansal et al.
  8. [8]
    [PDF] Correlation Clustering: Maximizing Agreements via Semidefinite ...
    The goal is to cluster the documents so that similar documents (+ edges) lie in the same clus- ter and dissimilar documents (− edges) lie in different clusters.
  9. [9]
    Sublinear Time and Space Algorithms for Correlation Clustering via ...
    Jan 25, 2022 · We present a new approach for solving (minimum disagreement) correlation clustering that results in sublinear algorithms with highly efficient time and space ...
  10. [10]
    [PDF] Correlation Clustering - cs.wisc.edu
    Bansal, A. Blum, and S. Chawla. Correlation clustering. (http://www.cs.cmu.edu/˜shuchi/papers/clusteringfull.ps). Manuscript, 2002. [6] S. Ben-David, P. M. ...
  11. [11]
    [PDF] Correlation Clustering in General Weighted Graphs - Erik Demaine
    Feb 5, 2005 · Correlation clustering partitions vertices into clusters to minimize the total weight of cut and uncut edges, where strong correlations ...
  12. [12]
    [PDF] Correlation Clustering with Partial Information - Erik Demaine
    Correlation clustering partitions vertices into clusters to minimize the total absolute weight of cut positive edges and uncut negative edges. Large positive ...
  13. [13]
    [PDF] Correlation Clustering with a Fixed Number of Clusters
    Oct 22, 2006 · Let n+ be the number of positive edges, and n− = n. 2. −n+ be the ... IMMORLICA: Correlation clustering with partial information. In ...
  14. [14]
  15. [15]
    [PDF] Clustering with qualitative information - KIT - ITI Algorithmik
    Clustering with Qualitative Information. Moses Charikar. ∗. Princeton University. Venkatesan Guruswami. †. University of Washington. Anthony Wirth. ‡. Princeton ...
  16. [16]
    [PDF] Lecture 18 1 Correlation Clustering - Soheil Behnezhad
    Nov 15, 2022 · Correlation clustering partitions vertices into clusters so that + pairs are in the same cluster and - pairs in different clusters, minimizing ...
  17. [17]
    [PDF] Aggregating Inconsistent Information: Ranking and Clustering
    For Correlation-Clustering and Consensus-Clustering we present similar combinatorial algorithms and analyses, with a different notion of “bad triplets ...
  18. [18]
    [PDF] Exact and Approximation Algorithms for the Maximum Constraint ...
    We explain an O∗(2n)-time dynamic programming algorithm for Correlation Clustering. Recall that Correlation Clustering is a special case of Max-PA such that ...
  19. [19]
    [PDF] Efficient Enumeration of the Optimal Solutions to the Correlation ...
    Jan 13, 2023 · In this work, we apply an ILP branch-and-bound method for the complete enumeration of the optimal CC solutions, and propose a compromise ...
  20. [20]
    [PDF] Scalable Clustering of Signed Networks Using Balance Normalized ...
    ABSTRACT. We consider the general k-way clustering problem in signed social networks where relationships between entities can be.
  21. [21]
    Aggregating inconsistent information: Ranking and clustering
    Ailon, N., Charikar, M., and Newman, A. 2005. Aggregating inconsistent information: Ranking and clustering. In Proceedings of the 37th Annual Symposium on ...
  22. [22]
    [PDF] APPROXIMATION ALGORITHMS FOR CLUSTERING
    An ˜O(mn) time procedure for solving the semidefinite program used to maximize the correlation, based on existing efficient algorithms for the max cut SDP [55].
  23. [23]
  24. [24]
  25. [25]
    [PDF] Correlation Clustering - Carnegie Mellon University
    In general, finding the optimal cluster- ing is NP-hard, which can be seen via a tedious reduction from X3C (details can be found in [5]). Another simple fact ...
  26. [26]
    [PDF] Correlation Clustering with a Fixed Number of Clusters
    In Proceedings of the 46th IEEE Symposium on Foundations of Computer Science. (FOCS), 2005. [5] N. Bansal, A. Blum, and S. Chawla. Correlation clustering.
  27. [27]
    Conjectures on phase transition at correlation clustering on random ...
    Aug 9, 2025 · We consider from the localization perspective the new critical behavior discovered recently for the regular random graphs (RRG) and constrained ...
  28. [28]
    An FPT Algorithm for the Correlation Clustering Problem
    By way of contrast, a fixed-parameter tractable algorithm is presented that takes treewidth as the parameter, with a running time that is linear in the number ...
  29. [29]
    None
    Error: Could not load webpage.<|separator|>
  30. [30]
    Learning-Augmented Streaming Algorithms for Correlation Clustering
    Sep 28, 2024 · We introduce the first learning-augmented streaming algorithms for Correlation Clustering, achieving the first better-than-$3$-approximation in dynamic streams.Missing: single- pass sketches 1 space
  31. [31]
    [2509.03561] Quantum-Assisted Correlation Clustering - arXiv
    Sep 3, 2025 · This paper introduces a hybrid quantum-classical method for correlation clustering, using GCS-Q to maximize intra-cluster agreement in signed ...Missing: QAOA speedup 100
  32. [32]
    Fully Dynamic Adversarially Robust Correlation Clustering in ... - arXiv
    Nov 15, 2024 · We study the dynamic correlation clustering problem with \textit{adaptive} edge label flips. In correlation clustering, we are given a n-vertex ...Missing: noise | Show results with:noise
  33. [33]
    [2110.02159] Label differential privacy via clustering - arXiv
    Oct 5, 2021 · Our mechanisms cluster the examples in the training set using their (non-private) feature vectors, randomly re-sample each label from examples ...Missing: correlation adversarial flips
  34. [34]
    Almost 3-Approximate Correlation Clustering in Constant Rounds
    May 7, 2022 · We study parallel algorithms for correlation clustering. Each pair among n objects is labeled as either similar or dissimilar.