Fact-checked by Grok 2 weeks ago

Correlation clustering

Correlation clustering is a graph-based clustering technique that partitions a set of objects into an optimal number of clusters based on pairwise similarity relations, without requiring the number of clusters to be specified in advance. Formally, it involves a complete graph with vertices representing objects and edges labeled as positive (+) for similar pairs or negative (−) for dissimilar pairs; the objective is to find a vertex partitioning that minimizes disagreements—defined as the number of positive edges cut between clusters or negative edges left uncut within clusters—or equivalently, maximizes agreements.^[1] Introduced in 2002 by Nikhil Bansal, Avrim Blum, and Shuchi Chawla, the problem is NP-hard, with the decision version proven NP-complete via reduction from the exact cover by 3-sets problem.^[1] The approach is particularly valuable in scenarios where the underlying cluster structure is unknown or variable, contrasting with traditional methods like k-means that mandate a fixed k. It models real-world data through a binary classifier assigning similarity labels, enabling agnostic learning of clusterings even with noisy or incomplete information. Key algorithmic advancements include a constant-factor approximation for minimizing disagreements in complete graphs, achieved via a randomized rounding of a linear programming relaxation, and a polynomial-time approximation scheme (PTAS) for maximizing agreements under the same setting.^[1] Extensions handle real-valued edge weights and general graphs, with an O(log n)-approximation for minimizing disagreements in the latter case.^[2] Applications span machine learning, data mining, and information retrieval, notably in document clustering—such as grouping related web pages—where pairwise similarities can be derived from features like hyperlinks or content overlap, and the natural number of topics emerges from the data itself. Further developments address streaming data, partial information, and adaptive queries, broadening its utility in large-scale and dynamic environments.^[2]

Introduction

Definition and Motivation

Correlation clustering is a graph-based clustering technique designed to partition a set of n objects into clusters based on pairwise similarity information, without requiring the number of clusters to be specified in advance. It operates on a complete graph where vertices represent the objects and edges are labeled as positive (+1, indicating similarity) or negative (-1, indicating dissimilarity), often derived from a similarity matrix that captures relational agreements or disagreements between pairs. The core goal is to identify a partitioning that aligns as closely as possible with these labels by minimizing inconsistencies, such as placing dissimilar objects together or separating similar ones.^[3] This approach is particularly motivated by real-world applications where relationships between entities are inherently relational and qualitative, rather than quantifiable through metric distances. In social network analysis, for example, signed graphs model positive ties like friendships or trust alongside negative ones like conflicts or distrust, enabling the detection of balanced communities that reflect social dynamics. Unlike traditional distance-based methods such as k-means, which presuppose a fixed number of clusters k and operate in Euclidean-like spaces, correlation clustering flexibly determines the optimal number of clusters—from 1 to n—based solely on the edge labels, making it suitable for agnostic learning scenarios where ground truth is noisy or incomplete.^[4]^[3] A representative example involves clustering documents from a large corpus, such as web pages, where pairwise similarities are computed based on content overlaps or learned functions from historical data. Here, positive labels might indicate thematic agreement, while negative ones signal divergence, allowing the algorithm to group related documents into coherent clusters without predefined parameters.^[2] In its basic workflow, the input consists of a complete signed graph with +1/-1 edge labels, and the output is a set of clusters that seeks to minimize disagreements (negative edges within clusters or positive edges across clusters in the standard formulation), thereby capturing the underlying relational structure.^[3]

Historical Development

Correlation clustering was formally introduced in 2002 by Nikhil Bansal, Avrim Blum, and Shuchi Chawla as a framework for partitioning data into clusters based on pairwise similarity or dissimilarity labels, motivated by applications such as clustering documents with noisy expert annotations and modeling social networks where relationships could be positive or negative.^[5] This work drew conceptual inspiration from earlier ideas in social sciences, particularly the study of signed graphs and structural balance theory developed in the 1950s and 1960s by researchers like Fritz Heider and Frank Harary, which analyzed how positive and negative ties influence group formation in social structures. However, Bansal et al. provided the first rigorous computational formulation within machine learning, emphasizing the challenge of determining both the number of clusters and their composition without predefined parameters. In the mid-2000s, the field advanced with approximation algorithms that made correlation clustering practical for larger datasets. A key contribution came in 2005 from Moses Charikar, Venkatesan Guruswami, and Anthony Wirth, who developed a 3-approximation algorithm for the minimization objective, which seeks to minimize the number of inconsistent edges (positive edges between clusters or negative edges within clusters), enabling its integration into data mining pipelines for tasks like entity resolution and community detection.^[6] By the late 2000s, the approach had gained traction in bioinformatics, with algorithms inspired by correlation clustering, such as the Divisive Correlation Clustering Algorithm (DCCA), adapted for gene expression data to identify co-regulated gene groups based on correlation patterns across experiments.^[7] The maximization variant, which aims to maximize consistent edges, was explored alongside the minimization form from the outset but saw expanded theoretical and applied developments by 2010, including semidefinite programming relaxations for better approximations.^[8] Following the rise of big data in the 2010s, correlation clustering experienced a resurgence post-2020 with adaptations for streaming and online settings, such as sublinear-time algorithms for dynamic graphs and learning-augmented methods that handle edge arrivals in arbitrary order while maintaining approximation guarantees. Recent advances as of 2025 include a 1.44-approximation algorithm and learning-augmented streaming methods, improving scalability for large-scale applications.^[9]^[10]^[11]

Problem Formulation

Graph-Based Representation

Correlation clustering is commonly modeled using a complete undirected graph G = (V, E), where V is the set of n vertices representing the objects to be clustered, and E consists of edges labeled either +1 (indicating similarity) or -1 (indicating dissimilarity) for every pair of distinct vertices, with no self-loops assumed.^[12] This signed graph structure captures pairwise relational data, where positive labels suggest objects should be grouped together and negative labels suggest separation into different groups.^[12] An equivalent representation employs a similarity matrix S, an n \times n symmetric matrix where S_{ij} \in \{+1, -1\} for i \neq j, encoding the same pairwise relations as the edge labels in the graph, with diagonal entries typically undefined or zero to reflect the absence of self-relations.^[13] This matrix form is particularly useful in machine learning contexts for computational efficiency with dense relational data.^[13] To handle incomplete or noisy data, the model extends to partial labeling, where some edges may be unlabeled (treated as neutral or with weight 0), allowing the graph to be incomplete while positive and negative labels retain their interpretive roles for available relations.^[14] Standard notational conventions include \tau^+ for the total number of positive edges and \tau^- for the total number of negative edges across the graph, with clusterings denoted as partitions \pi of the vertex set [V](/page/V.).^[15] A representative example arises in social network analysis, where vertices represent individuals, positive edges denote friendships (suggesting co-clustering), and negative edges denote enmities (suggesting separation into different clusters).^[12]

Objective Functions and Variants

In correlation clustering, the primary minimization objective seeks to partition the vertices of a signed graph into clusters such that the total number of disagreements is minimized, where a disagreement occurs either when a negative edge connects two vertices within the same cluster or when a positive edge connects vertices in different clusters. This cost function, often denoted as Cost(π) for a clustering π, balances the penalties for intra-cluster dissimilarities and inter-cluster similarities, reflecting the goal of grouping similar items together while separating dissimilar ones. Formally,

\text{Cost}(\pi) = \sum_{C \in \pi} \left| \{ (u,v) \in E^- : u,v \in C \} \right| + \left| \{ (u,v) \in E^+ : u \text{ and } v \text{ in different clusters of } \pi \} \right|,

where E^+ and E^- are the sets of positive and negative edges, respectively. An equivalent maximization variant aims to maximize the number of agreements, defined as positive edges within clusters plus negative edges between clusters. This formulation is linearly related to the minimization objective, as the total number of edges is fixed, making the problems interchangeable up to a constant shift and scaling; specifically, maximizing agreements is equivalent to minimizing disagreements since their sum equals the total number of edges. A notable variant is cluster deletion, which minimizes the number of edges removed from the graph to obtain a collection of disjoint positive cliques, effectively penalizing only negative edges within clusters while ignoring positive edges across clusters. This objective simplifies the standard cost by focusing on eliminating intra-cluster negative edges, transforming the signed graph into one where each connected component in the positive subgraph is a clique.^[16] Another variant incorporates a trade-off parameter to balance the relative importance of disagreements and separations, often formulated in frameworks like LambdaCC where a parameter λ ∈ (0,1) weights the costs: the objective minimizes λ times the number of intra-cluster negative edges plus (1-λ) times the number of inter-cluster positive edges. When λ = 1, this reduces to the cluster deletion variant; smaller λ emphasizes avoiding separations.^[16] Unlike traditional k-means or k-clustering, correlation clustering does not fix the number of clusters k in advance; instead, the optimal number emerges naturally from the optimization process, as the partitioning adapts to the edge labels without a predefined cluster count.

Algorithms

Approximation Algorithms

Correlation clustering, being NP-hard in general, relies on approximation algorithms that provide scalable solutions with theoretical guarantees for large instances. One foundational approach is the greedy pivot algorithm introduced by Bansal, Blum, and Chawla, which iteratively selects a random pivot vertex and classifies its remaining neighbors into the same cluster if the majority of edges to the pivot are positive or into different clusters otherwise.^[1] This process removes the pivot and its assigned cluster at each step, continuing until all vertices are clustered, and achieves an O(log n)-approximation ratio for minimizing disagreements on general graphs.^[1] A notable improvement for complete graphs is the linear programming-based algorithm by Charikar, Guruswami, and Wirth, which solves a natural LP relaxation of the min-disagreement objective and rounds the solution via randomized clustering to obtain a 4-approximation.^[17] The LP formulation assigns fractional cluster memberships to vertices, and the rounding step leverages the relaxation's integrality gap properties to bound the expected cost.^[17] This method extends to O(log n)-approximations for sparse graphs by incorporating edge weights appropriately. Recent work has improved the approximation ratio to below 3, with a 2.996-approximation algorithm developed in 2023.^[18] Local search methods offer another class of heuristics, starting from a random initial partition and iteratively improving it by flipping vertex assignments, merging clusters, or splitting them to reduce the total cost until a local optimum is reached. These approaches, analyzed in expectation, yield a 3-approximation for the min-disagreement variant on complete graphs by ensuring that local improvements capture a significant fraction of the optimal structure.^[19] Such methods are particularly effective in practice due to their simplicity and ability to escape poor local minima through randomized restarts. An adaptation of hierarchical clustering to correlation clustering involves building a dendrogram via agglomerative merging, where cluster distances are computed using average linkage on the signed edge labels—treating positive edges as similarities and negative as dissimilarities. This produces a hierarchy of clusters without specifying the number in advance, allowing cuts at various levels to approximate the optimal partitioning based on the objective function. In terms of practical implementations, correlation clustering algorithms like the greedy pivot and local search variants are available in research codebases and can be integrated with graph libraries like NetworkX for efficient computation. Time complexities typically scale as O(n²) for dense graphs due to neighbor evaluations in pivot or search steps, while heuristics for sparse graphs achieve near-linear time by restricting to local neighborhoods.^[20]

Exact Algorithms and Special Cases

Correlation clustering is NP-hard in general, motivating the development of exact algorithms that are feasible for small instances or restricted inputs. One standard approach is to formulate the problem as an integer linear program (ILP). The ILP uses binary variables x_{ij} for each pair of vertices i < j, where x_{ij} = 1 if i and j are assigned to the same cluster and 0 otherwise. The objective is to minimize the total disagreements:

\min \sum_{\{i,j\} \in E^+} (1 - x_{ij}) + \sum_{\{i,j\} \in E^-} x_{ij},

where E^+ and E^- are the sets of positive and negative edges, respectively. To enforce that the x_{ij} correspond to a valid clustering, triangle inequalities are imposed for all distinct triples i, j, k:

x_{ij} + x_{jk} - x_{ik} \leq 1, \quad x_{ij} + x_{ik} - x_{jk} \leq 1.

These constraints ensure transitivity within clusters. The resulting ILP has O(n^2) variables and O(n^3) constraints and can be solved exactly using off-the-shelf solvers such as Gurobi or CPLEX, which employ branch-and-bound internally; however, the worst-case time complexity remains exponential in n. The ILP formulation enables solving exact correlation clustering for small n using commercial solvers on standard hardware. Branch-and-bound techniques tailored to correlation clustering further enhance efficiency by pruning the search tree with strong lower bounds. These bounds are often derived from relaxing the ILP to its linear programming (LP) counterpart and using the LP solution to fathom suboptimal branches early. For instance, combinatorial branch-and-cut methods integrate cutting planes from the LP relaxation to tighten bounds, enabling exact solutions for instances up to n \approx 30 in practice on benchmark signed graphs. Such approaches have been applied to structural balance analysis, where the goal is to partition vertices into antagonistic clusters with minimal disagreements. For even smaller instances, dynamic programming over vertex subsets provides an alternative exact method. Define dp[S] as the minimum disagreement cost to cluster the vertex subset S \subseteq V. The recurrence is dp[S] = \min_{T \subseteq S, T \neq \emptyset} \left( dp[S \setminus T] + c(T, S \setminus T) \right), where c(T, S \setminus T) is the cost of treating T as a new cluster: the sum of negative edges within T plus positive edges between T and S \setminus T. The base case is dp[\emptyset] = 0. A standard implementation requires O(3^n) time, as there are $2^n states and each has O(3^n / 2^n) = O(1.5^n) transitions on average. However, using fast subset convolution, it can be solved in O(2^n \mathrm{poly}(n)) time. For larger n \leq 40, a meet-in-the-middle variant splits V into two halves of size n/2, computes all possible clusterings and costs for each half in O(3^{n/2} \mathrm{poly}(n)) time, then combines compatible partitions by matching cluster interfaces, achieving practicality on modern hardware for dense instances.^[21] Special cases admit exact polynomial-time algorithms. For balanced signed graphs, where all cycles have an even number of negative edges, the optimal clustering achieves zero disagreements and can be found in linear time by switching signs (flipping edge signs along paths from a source vertex) to make all edges positive, then taking connected components in the resulting all-positive graph as clusters. This process identifies the unique frustration-free partition up to sign switches. For the two-cluster case in general signed graphs, the problem can be solved in polynomial time using spectral relaxation on the signed graph Laplacian to minimize the signed ratio cut, which penalizes positive edges between clusters and negative edges within clusters. This can be computed via eigenvalue decomposition in O(n^3) time.^[22] Complete bipartite graphs also permit exact solutions via dynamic programming over assignments of partite sets to clusters, leveraging the absence of intra-partite edges to reduce the state space to combinations of cluster labels per side, solvable in O(2^{|L| + |R|} \mathrm{poly}(n)) time where |L| and |R| are partite sizes, though tighter bounds exist for balanced labelings. When edge labels form a tournament structure (i.e., the negative edges define a transitive orientation), the problem simplifies to feedback arc set on tournaments, which admits exact polynomial-time solutions via topological sorting after removing inconsistent edges, but in practice uses linear-time heuristics for near-optimality.^[22]

Theoretical Results

Approximation Guarantees

The standard approximation algorithm for minimizing disagreements in correlation clustering is the Pivot algorithm, which achieves an expected 3-approximation ratio. This combinatorial method, introduced by Ailon, Charikar, and Newman, selects random pivots to partition the graph and is proven to bound the expected number of disagreements by three times the optimal value through charging arguments involving bad triplets. The ratio is tight in the worst case, as demonstrated by instances where the algorithm's expected cost approaches three times the optimum, such as graphs with nearly all positive edges except a single negative edge in large complete graphs. Linear programming (LP)-based rounding methods also yield a 2.5-approximation for minimization in the same work, by solving a natural LP relaxation and applying pivot rounding to the fractional solution, improving upon the pure combinatorial approach under certain conditions. As of November 2025, the best approximation ratio for minimizing disagreements is 1.485 + ε for any ε > 0, achieved via semidefinite programming (SDP) rounding of the cluster LP.^[23] For the maximization variant, which seeks to maximize intra-cluster agreements, semidefinite programming (SDP) relaxations provide stronger guarantees. Swamy developed an SDP-based algorithm achieving a 0.7666-approximation ratio for general instances, leveraging efficient SDP solvers for max-cut-like objectives and rounding techniques that preserve a constant fraction of the optimum.^[8] This outperforms the naive 0.5-approximation and is based on embedding the graph into a higher-dimensional space to capture correlation structure. Local search algorithms offer constant-factor guarantees as well; for instance, iterative flipping of vertices between clusters converges to a 3-approximation solution after O(n log n) local moves, as analyzed in extensions of the Pivot framework where each flip reduces potential violations while bounding the total iterations by logarithmic factors in graph size. Bi-criteria approximations trade off the number of clusters for improved cost ratios. For example, algorithms exist that achieve an O(1)-approximation to the minimum disagreement cost while using at most O(log n) times the optimal number of extra clusters, often via hierarchical partitioning or LP rounding with relaxed cluster cardinality constraints. This is particularly useful when the optimal number of clusters is unknown or large, allowing practical scalability at the expense of slightly more partitions. A key inapproximability result states that it is NP-hard to approximate the maximization problem within a factor of 80/79 - ε for any ε > 0 unless P=NP, based on reductions from qualitative clustering problems.^[24] Recent advances focus on efficient implementations in parallel and streaming models while nearing the 3-approximation barrier. Behnezhad, Charikar, Ma, and Tan presented a randomized algorithm that achieves a (3 + ε)-approximation in constant parallel rounds for any ε > 0, using O(log n)-bit local computations, which nearly matches the sequential Pivot algorithm's guarantee in distributed settings.^[25] ^[26] ^[24]

Computational Hardness and Optimal Clusters

Correlation clustering, in its minimization form, is NP-hard even when restricted to complete graphs. This hardness follows from a reduction from the exact cover by 3-sets (X3C) problem, as established in the foundational work on the topic.^[27] For general graphs, the problem remains NP-hard, with reductions from problems like 3-SAT demonstrating the intractability of finding an optimal partitioning that minimizes disagreements between edge labels and cluster structure.^[13] The problem is also APX-hard, implying that no polynomial-time approximation scheme (PTAS) exists unless P=NP. This result arises from a reduction to the minimum multicut problem, which is known to be APX-hard, showing that achieving approximations better than a constant factor is computationally challenging.^[14] For the maximization variant, where the goal is to maximize agreements, similar inapproximability holds, with it being NP-hard to approximate within a factor of 80/79 - ε for any ε > 0 unless P=NP.^[24] A key related decision problem—determining whether a zero-error clustering exists, meaning a partitioning with no disagreements at all—is NP-complete, as it equates to checking if the signed graph can be perfectly balanced into cliques without violations.^[27] The optimal number of clusters, denoted k^*, in correlation clustering emerges naturally without a predefined parameter, defined as k^* = \arg\min_k \text{cost}(k) in the fixed-k variant of the problem. There is no closed-form expression for k^*, but theoretical bounds exist, such as k^* \leq n/2, derived from properties of graph density that limit the number of viable clusters before merging becomes beneficial.^[28] In random graph models, such as Erdős–Rényi graphs, k^* exhibits phase transitions influenced by edge label noise levels; for instance, a percolation-like transition occurs around a noise threshold of q = 1/2 (the density of positive edges), where the relative size of the largest cluster shifts abruptly, causing k^* to scale accordingly with increasing noise.^[29] Despite general intractability, correlation clustering admits fixed-parameter tractability when parameterized by the treewidth t of the input graph. In this regime, the problem can be solved in time polynomial in n and exponential in t, leveraging monadic second-order (MSO) logic encodings to express cluster partitions and evaluate costs via dynamic programming on tree decompositions.^[30]

Applications and Extensions

Data Mining Applications

Correlation clustering finds practical utility in data mining tasks involving relational data with both positive and negative affinities, such as grouping objects based on incomplete or noisy pairwise labels. In document clustering, it addresses the challenge of organizing large corpora without predefined cluster counts by minimizing disagreements in similarity labels derived from human annotations or feature embeddings. For instance, the original formulation was motivated by clustering web pages using pairwise similarity judgments from a classifier. This approach groups news articles by treating topic similarities as positive edges and dissimilarities as negative, enabling robust partitioning even with label noise from annotators. In community detection within signed social networks, correlation clustering identifies friend and foe groups by minimizing intra-cluster negative edges and inter-cluster positive edges, aligning with social balance theory. Applied to datasets like Slashdot, where users label others as friends (+) or foes (-), it resolves inconsistent ties to form antagonistic communities, with algorithms bounding prediction errors relative to the network's inherent frustrations (Δ(Y)). Empirical evaluations on Slashdot demonstrate its effectiveness in link classification, outperforming baselines in mistake bounds for online settings. This method enhances detection of polarized structures in relational data, such as user interactions. Bioinformatics applications leverage correlation clustering to group genes or proteins in interaction networks, where positive edges indicate co-expression and negative edges mutual exclusion. The Divisive Correlation Clustering Algorithm (DCCA) iteratively splits clusters based on Pearson correlations, ensuring positive intra-cluster affinities while separating negatively correlated pairs. Tested on yeast gene-expression datasets (e.g., 6215 genes across conditions), DCCA yields higher functional enrichment scores than k-means or PAM, with z-scores up to 21.9 for enriched biological attributes like metabolic pathways. In recommendation systems, correlation clustering resolves inconsistent user-item preferences by grouping similar entities via correlations, forming clusters that mitigate noise in rating matrices. Evaluation often employs the adjusted Rand index to assess robustness to label noise, where correlation clustering shows empirical superiority over k-means on relational benchmarks by better handling signed affinities.

Recent Variants in Machine Learning

Recent variants of correlation clustering have emerged to address dynamic and resource-constrained environments in machine learning, particularly focusing on online, streaming, and quantum settings. In online correlation clustering, vertices arrive sequentially, and the algorithm must assign each to a cluster irrevocably upon arrival, without revisiting prior decisions. A robust approach achieves an O(1/ε)-competitive ratio against the optimal offline solution in a semi-online model with a random sample of size εn, ensuring bounded regret even under noisy or adversarial inputs.^[31] Streaming algorithms extend this to edge streams, processing graphs in a single pass with limited memory. These methods employ sketches to approximate cluster structures, enabling O(1)-space solutions for constant-factor approximations in dynamic settings. When augmented with machine learning predictions of edge labels, such algorithms improve the approximation ratio beyond the classical 3-factor, achieving better-than-3 guarantees on complete graphs while maintaining sublinear space.^[32] Quantum-assisted variants leverage hybrid quantum-classical frameworks to solve correlation clustering on signed graphs, where edges indicate positive or negative correlations. One such method formulates the problem as a quadratic unconstrained binary optimization and uses quantum annealing for partitioning, demonstrating improved robustness and clustering quality over classical baselines on synthetic signed graphs up to n=170 nodes, with potential speedups for instances around n=100 on current hardware (as of 2025).^[33] Robust variants address adversarial perturbations, such as edge label flips, in dynamic graphs. Algorithms maintain sublinear-time updates under adaptive adversaries, preserving approximation guarantees while handling noise that could otherwise degrade cluster quality.^[34] Learning-augmented techniques further refine these by integrating ML predictions directly into the optimization process. By using predicted labels to guide decisions, these methods boost the approximation from the standard 3 to nearly 2 in constrained settings, particularly when predictions are accurate, while providing consistency and robustness bounds for erroneous inputs.^[32] A key development in parallelizable variants achieves an almost 3-approximation in constant rounds across models like massively parallel computation. For any ε > 0, the algorithm runs in O(1/ε) rounds, nearly closing the gap to the 3-approximation barrier while scaling to large graphs (as of 2022).^[35]

References

[1]
[PDF] Correlation Clustering - CSE IITB
Oct 27, 2003 · In this paper we give a constant factor approximation to the problem of minimizing disagreements, and a PTAS1 for maximizing agreements. We also ...
[2]
[PDF] A Survey of Correlation Clustering - Columbia CS
In the original paper that introduced the problem, Bansal et al. [1] showed a constant fac- tor approximation algorithm for minimizing disagreements, based on ...
[3]
[PDF] Correlation clustering - People | MIT CSAIL
This formulation is motivated from a document clustering problem in which one has a pairwise similarity function f learned from past data, and the goal is to ...
[4]
[PDF] A Correlation Clustering Approach to Link Classification in Signed ...
Motivated by social balance theory, we develop a theory of link classification in signed networks using the correlation clustering index as measure of label ...
[5]
Correlation Clustering | Proceedings of the 43rd Symposium on ...
Correlation Clustering. Authors: Nikhil Bansal. Nikhil Bansal. View Profile. , Avrim Blum. Avrim Blum. View Profile. , Shuchi Chawla ... November 2002. 569 pages.
[6]
Clustering with qualitative information - ScienceDirect.com
We consider the problem of clustering a collection of elements based on pairwise judgments of similarity and dissimilarity.
[7]
Divisive Correlation Clustering Algorithm (DCCA) for grouping of ...
Apr 10, 2008 · To detect clusters with high correlation and biological significance, we use the correlation clustering concept introduced by Bansal et al.
[8]
[PDF] Correlation Clustering: Maximizing Agreements via Semidefinite ...
The goal is to cluster the documents so that similar documents (+ edges) lie in the same clus- ter and dissimilar documents (− edges) lie in different clusters.
[9]
Sublinear Time and Space Algorithms for Correlation Clustering via ...
Jan 25, 2022 · We present a new approach for solving (minimum disagreement) correlation clustering that results in sublinear algorithms with highly efficient time and space ...
[10]
[PDF] Correlation Clustering - cs.wisc.edu
Bansal, A. Blum, and S. Chawla. Correlation clustering. (http://www.cs.cmu.edu/˜shuchi/papers/clusteringfull.ps). Manuscript, 2002. [6] S. Ben-David, P. M. ...
[11]
[PDF] Correlation Clustering in General Weighted Graphs - Erik Demaine
Feb 5, 2005 · Correlation clustering partitions vertices into clusters to minimize the total weight of cut and uncut edges, where strong correlations ...
[12]
[PDF] Correlation Clustering with Partial Information - Erik Demaine
Correlation clustering partitions vertices into clusters to minimize the total absolute weight of cut positive edges and uncut negative edges. Large positive ...
[13]
[PDF] Correlation Clustering with a Fixed Number of Clusters
Oct 22, 2006 · Let n+ be the number of positive edges, and n− = n. 2. −n+ be the ... IMMORLICA: Correlation clustering with partial information. In ...
[14]
https://erikdemaine.org/papers/Clustering_APPROX2003/paper.pdf
[15]
[PDF] Clustering with qualitative information - KIT - ITI Algorithmik
Clustering with Qualitative Information. Moses Charikar. ∗. Princeton University. Venkatesan Guruswami. †. University of Washington. Anthony Wirth. ‡. Princeton ...
[16]
[PDF] Lecture 18 1 Correlation Clustering - Soheil Behnezhad
Nov 15, 2022 · Correlation clustering partitions vertices into clusters so that + pairs are in the same cluster and - pairs in different clusters, minimizing ...
[17]
[PDF] Aggregating Inconsistent Information: Ranking and Clustering
For Correlation-Clustering and Consensus-Clustering we present similar combinatorial algorithms and analyses, with a different notion of “bad triplets ...
[18]
[PDF] Exact and Approximation Algorithms for the Maximum Constraint ...
We explain an O∗(2n)-time dynamic programming algorithm for Correlation Clustering. Recall that Correlation Clustering is a special case of Max-PA such that ...
[19]
[PDF] Efficient Enumeration of the Optimal Solutions to the Correlation ...
Jan 13, 2023 · In this work, we apply an ILP branch-and-bound method for the complete enumeration of the optimal CC solutions, and propose a compromise ...
[20]
[PDF] Scalable Clustering of Signed Networks Using Balance Normalized ...
ABSTRACT. We consider the general k-way clustering problem in signed social networks where relationships between entities can be.
[21]
Aggregating inconsistent information: Ranking and clustering
Ailon, N., Charikar, M., and Newman, A. 2005. Aggregating inconsistent information: Ranking and clustering. In Proceedings of the 37th Annual Symposium on ...
[22]
[PDF] APPROXIMATION ALGORITHMS FOR CLUSTERING
An ˜O(mn) time procedure for solving the semidefinite program used to maximize the correlation, based on existing efficient algorithms for the max cut SDP [55].
[23]
https://arxiv.org/pdf/2404.17509
[24]
https://arxiv.org/pdf/0704.2092.pdf
[25]
[PDF] Correlation Clustering - Carnegie Mellon University
In general, finding the optimal cluster- ing is NP-hard, which can be seen via a tedious reduction from X3C (details can be found in [5]). Another simple fact ...
[26]
[PDF] Correlation Clustering with a Fixed Number of Clusters
In Proceedings of the 46th IEEE Symposium on Foundations of Computer Science. (FOCS), 2005. [5] N. Bansal, A. Blum, and S. Chawla. Correlation clustering.
[27]
Conjectures on phase transition at correlation clustering on random ...
Aug 9, 2025 · We consider from the localization perspective the new critical behavior discovered recently for the regular random graphs (RRG) and constrained ...
[28]
An FPT Algorithm for the Correlation Clustering Problem
By way of contrast, a fixed-parameter tractable algorithm is presented that takes treewidth as the parameter, with a running time that is linear in the number ...
[29]
None
Error: Could not load webpage.<|separator|>
[30]
Learning-Augmented Streaming Algorithms for Correlation Clustering
Sep 28, 2024 · We introduce the first learning-augmented streaming algorithms for Correlation Clustering, achieving the first better-than-$3$-approximation in dynamic streams.Missing: single- pass sketches 1 space
[31]
[2509.03561] Quantum-Assisted Correlation Clustering - arXiv
Sep 3, 2025 · This paper introduces a hybrid quantum-classical method for correlation clustering, using GCS-Q to maximize intra-cluster agreement in signed ...Missing: QAOA speedup 100
[32]
Fully Dynamic Adversarially Robust Correlation Clustering in ... - arXiv
Nov 15, 2024 · We study the dynamic correlation clustering problem with \textit{adaptive} edge label flips. In correlation clustering, we are given a n-vertex ...Missing: noise | Show results with:noise
[33]
[2110.02159] Label differential privacy via clustering - arXiv
Oct 5, 2021 · Our mechanisms cluster the examples in the training set using their (non-private) feature vectors, randomly re-sample each label from examples ...Missing: correlation adversarial flips
[34]
Almost 3-Approximate Correlation Clustering in Constant Rounds
May 7, 2022 · We study parallel algorithms for correlation clustering. Each pair among n objects is labeled as either similar or dissimilar.