Fact-checked by Grok 2 weeks ago

Graph edit distance

Graph edit distance (GED) is a measure of similarity between two graphs, defined as the minimum cost required to transform one graph into the other through a sequence of edit operations, including the insertion, deletion, and substitution of nodes and edges.^[1] These operations can be assigned costs based on attributes of the nodes and edges, allowing GED to handle both unlabeled and attributed graphs in an error-tolerant manner for inexact graph matching.^[1] Formally, for two graphs G_1 = (V_1, E_1) and G_2 = (V_2, E_2), GED is the cost of the least expensive edit path, often modeled as an assignment problem where mappings between nodes and edges minimize the total edit cost.^[2] The concept of GED was first formalized by Sanfeliu and Fu in 1983 as a distance measure between attributed relational graphs for pattern recognition tasks. It built on earlier ideas from string edit distance and was further developed by Bunke and Allermann in 1983 for inexact graph matching in structural pattern recognition, emphasizing its role in handling distortions and noise in graph representations. Subsequent work by Messmer and Bunke in the 1990s introduced extensions like error-tolerant subgraph matching and probabilistic interpretations, establishing GED as a foundational metric in graph-based machine learning.^[1] Over time, GED has been generalized to include various cost functions, such as those forming convex polygons in labeling spaces, which influence the shape and properties of the edit surface.^[2] Computing the exact GED is NP-hard, equivalent to finding the minimum-cost edit path in a search space that grows factorially with graph size, often reduced to a quadratic assignment problem or bipartite matching for approximations.^[1] Exact algorithms, such as branch-and-bound or A* search, are feasible only for small graphs (typically under 20 vertices), while heuristics like bipartite graph matching, self-organizing maps for cost learning, or neural network-based approximations (e.g., via graph neural networks) enable efficient computation for larger structures. Key challenges include defining appropriate edit costs, handling attribute mismatches, and scaling to real-world datasets, with recent advances incorporating machine learning to estimate distances in sub-quadratic time.^[3] GED finds broad applications in pattern recognition, such as offline handwritten character recognition, shape and image analysis, and video indexing, where graphs represent structural relationships tolerant to deformations.^[1] In bioinformatics, it measures similarity between molecular structures or protein interaction networks; in computer vision, it supports sketch-to-photo matching and scene recognition; and in machine learning, it underpins tasks like graph classification and clustering by providing a flexible dissimilarity metric.^[1] Its adaptability to different domains, combined with ongoing research into faster approximations and learned edit costs, continues to drive its relevance in graph-based data analysis.^[4]

Fundamentals

Definition

Graph edit distance (GED) is a measure of similarity between two graphs, defined as the minimum total cost of a sequence of edit operations required to transform a source graph G_1 into a target graph G_2. This edit path allows for structural and label differences to be quantified in a flexible manner, making GED particularly useful for inexact graph matching in scenarios where perfect isomorphism is unlikely. The concept was introduced to address error-tolerant comparisons in structural pattern recognition, where graphs represent relational data with potential noise or variations. Formally, let G_1 = (V_1, E_1, \ell_1) and G_2 = (V_2, E_2, \ell_2) be two undirected labeled graphs, where V_i denotes the vertex set, E_i the edge set, and \ell_i the labeling function assigning labels from finite alphabets to both vertices (\ell_{i_V}: V_i \to \Sigma_V) and edges (\ell_{i_E}: E_i \to \Sigma_E). The GED is given by

\text{GED}(G_1, G_2) = \min \left\{ \sum_{k=1}^m c(o_k) \;\middle|\; (o_1, \dots, o_m) \in \Psi(G_1, G_2) \right\},

where \Psi(G_1, G_2) is the set of all edit paths transforming G_1 into G_2, and c(o_k) is the cost of the k-th edit operation o_k. Each edit operation incurs a non-negative cost via functions c_V: (\Sigma_V \cup \{\epsilon\}) \times (\Sigma_V \cup \{\epsilon\}) \to \mathbb{R}_{\geq 0} for vertices and c_E: (\Sigma_E \cup \{\epsilon\}) \times (\Sigma_E \cup \{\epsilon\}) \to \mathbb{R}_{\geq 0} for edges, with \epsilon as a dummy label for insertions or deletions; typically, c_V(\alpha, \alpha) = 0 for no substitution, insertions and deletions have unit cost (e.g., 1), and substitutions reflect label dissimilarity (e.g., c_V(\alpha, \beta) = d(\alpha, \beta) for some distance d). To illustrate, consider two simple undirected graphs each with two vertices connected by a single edge and no edge labels. Let G_1 have vertices labeled "A" and "B", while G_2 has vertices labeled "A" and "C". Assuming unit costs for insertions/deletions and substitution cost equal to 1 if labels differ (0 otherwise), the minimum edit path substitutes "B" with "C" (cost 1), leaving the structure intact, yielding GED(G_1, G_2) = 1. If labels were identical, GED would be 0, indicating isomorphism. This basic case highlights how GED captures label-based differences without altering connectivity.

Edit Operations

The primary edit operations in graph edit distance (GED) are vertex insertion, vertex deletion, edge insertion, and edge deletion, which allow for the addition or removal of structural elements between two graphs. These operations are complemented by substitution operations, specifically vertex relabeling—where the label of a vertex is changed from l_1 to l_2 at a cost determined by a label distance function \delta(l_1, l_2)—and edge relabeling, which similarly adjusts edge labels based on their dissimilarity.^[5] Costs are assigned to each operation to quantify the transformation effort, with insertions and deletions typically incurring a unit cost of 1, though higher values are sometimes used in specific applications. Substitution costs are 0 when labels match exactly, promoting mappings of identical elements, and exceed 0 otherwise to reflect the degree of change required. These cost functions ensure the total GED reflects a meaningful structural dissimilarity while allowing flexibility for domain-specific adaptations.^[5] If two graphs are isomorphic, GED is zero, as an edit path exists consisting solely of zero-cost substitutions that preserve all labels and structure.^[5] As an illustrative example, consider transforming a path graph with three unlabeled vertices connected as a - b - c (two edges) into a cycle graph with the same vertices but three edges (a - b - c - a). Under unit costs for insertions and deletions, with zero-cost substitutions for matching vertices and existing edges, the minimum edit path substitutes the vertices and the two existing edges at no cost, then inserts the closing edge c - a at cost 1, yielding a total GED of 1.^[5]

Properties and Variants

Key Properties

Graph edit distance (GED) exhibits several key mathematical properties that underpin its utility as a measure of structural similarity between graphs. It is inherently non-negative, satisfying GED(G₁, G₂) ≥ 0 for any two graphs G₁ and G₂, since edit costs are defined to be non-negative and the distance represents the minimum such cost over all possible edit paths.^[6] This non-negativity holds regardless of the specific cost assignment, as long as individual edit operations do not incur negative costs. A fundamental property is the identity of indiscernibles: GED(G₁, G₂) = 0 if and only if G₁ and G₂ are isomorphic. If the graphs are isomorphic, an edit path consisting solely of zero-cost substitutions (via a bijection preserving labels and edges) achieves this minimum. Conversely, a zero-cost path implies no insertions, deletions, or non-trivial substitutions are required, confirming structural equivalence.^[6] This direct link to graph isomorphism positions GED as an error-tolerant extension of exact matching, where non-zero distances quantify the minimal structural deviations.^[7] Symmetry holds under the condition of symmetric edit costs, where the cost of an operation equals the cost of its inverse (e.g., insertion cost equals deletion cost for vertices and edges). In this case, GED(G₁, G₂) = GED(G₂, G₁). To see this, consider any minimum-cost edit path π from G₁ to G₂ with cost c(π). Reversing π yields an edit path from G₂ to G₁, and symmetry ensures the reversed operations have identical costs, preserving the minimum. If costs are asymmetric, symmetry may fail.^[6] The triangle inequality, GED(G₁, G₃) ≤ GED(G₁, G₂) + GED(G₂, G₃), does not generally hold for GED, particularly when computed via edit paths (GED_path) and with arbitrary non-unit costs. For instance, edit paths may not compose in a way that bounds the direct path cost, even if individual edit functions satisfy the inequality. This property is satisfied only under restrictive conditions, such as unit costs for all operations or when GED is reformulated via bijections (GED_bij) with edit functions that themselves obey the triangle inequality. Consequently, GED is a pseudo-metric rather than a full metric in many settings, which can affect its use in certain geometric or optimization contexts.^[6] GED also displays intuitive behavior relative to graph sizes. For vertex edits, it is lower bounded by the difference in vertex counts: GED(G₁, G₂) ≥ ||V₁| - |V₂|| ⋅ min(c_ins^v, c_del^v), where c_ins^v and c_del^v are the minimum costs for vertex insertion and deletion, respectively. This bound follows directly from the necessity of at least |V₁| - |V₂| (assuming |V₁| ≥ |V₂|) insert or delete operations to balance the vertex sets before any substitutions or edge edits can occur. Similar bounds apply to edge counts, providing quick estimates for dissimilarity in unbalanced graphs.^[8]

Common Variants

The concept of graph edit distance was first formalized by Sanfeliu and Fu in 1983, with initial variants for inexact matching developed by Bunke and Allermann in the same year, to enable tolerance for distortions and noise through a sequence of edit operations with associated costs.^[9]^[10] Subsequent extensions in the 2000s adapted these variants for bioinformatics, incorporating domain-specific edit costs for molecular and genomic graphs. Error-tolerant graph edit distance, inherent to the foundational model, permits approximate matches by assigning costs to edit operations and bounding the total error to reflect practical dissimilarities in noisy data. Bounded graph edit distance restricts the maximum allowable edit cost or path length, facilitating efficient computation while ensuring the distance remains within a predefined threshold for applications requiring scalability.^[11] Variants for directed graphs extend the standard operations to handle edge directionality, such as inserting, deleting, or substituting directed edges while preserving orientation in the edit sequence.^[12] For weighted graphs, adaptations incorporate edge weights into substitution costs beyond mere label matching, allowing the distance to account for quantitative differences in edge properties.^[13] Subgraph edit distance modifies the objective to find the minimum edit cost transforming the source graph into any subgraph of the target graph, rather than an exact isomorphic match, which is useful for partial structure detection.^[14] An example of a practical variant is the size-normalized graph edit distance, which scales the raw distance by the average graph size—such as dividing by (|V_1| + |V_2|)/2, where |V_i| denotes the number of vertices in graph G_i—to enable fair comparisons between graphs of differing scales.^[15]

Applications

Pattern Recognition and Computer Vision

Graph edit distance (GED) was pioneered in the 1980s for syntactic pattern recognition, enabling error-tolerant matching of structural representations in visual data. Introduced by Bunke and Allermann in 1983 as a method for inexact graph matching, it addressed the limitations of exact isomorphism in handling noisy or distorted patterns common in images.^[16] By the 1990s, GED had become a cornerstone for inexact graph matching, with Bunke's 1997 work establishing theoretical links between GED and maximum common subgraph approximations, facilitating its adoption in computer vision tasks.^[7] In object recognition, GED facilitates graph matching by representing shapes as attributed graphs, where nodes capture features like curvature or position, and edges encode spatial relations, allowing similarity measurement through minimal edit costs. This approach excels in error-tolerant scenarios, such as recognizing deformed or partially occluded objects, by quantifying the cost of insertions, deletions, and substitutions to align graphs. For instance, Sebastian et al. (2003) applied GED to shock graphs—skeletal representations of shape boundaries—for robust object recognition, demonstrating high accuracy in matching silhouettes from databases like MPEG-7, where traditional feature-based methods falter under viewpoint variations. A practical example is handwritten character recognition, where stroke sequences form graphs, and GED measures shape similarity despite variations in writing style. For image analysis, GED supports attributed relational graphs (ARGs) in tasks like scene labeling, where nodes represent regions or objects with attributes (e.g., color, texture), and edges denote relations (e.g., adjacency, inclusion). This enables error-tolerant matching for segmenting and interpreting complex scenes, accommodating segmentation errors or noise. Robles-Kelly and Hancock (2005) utilized GED on ARGs derived from image adjacency matrices for content-based image retrieval, achieving lower intra-class distances on datasets like COIL compared to spectral methods alone. Unlike string edit distance, which is limited to linear sequences, GED captures 2D structural relations, making it superior for hierarchical or relational visual patterns, as highlighted in early applications to syntactic image understanding.^[16]

Chemoinformatics and Bioinformatics

In chemoinformatics, graph edit distance (GED) serves as a measure of structural similarity between molecular graphs, where atoms are represented as nodes and chemical bonds as edges, facilitating tasks in drug discovery such as virtual screening of compound libraries to identify potential therapeutic agents. The edit operations—node and edge insertions, deletions, and substitutions—directly model chemical modifications, including atom additions, replacements, or alterations in bonding patterns, thereby quantifying the minimal transformations required to align two molecules. This approach has proven effective for retrieving structurally similar compounds from databases, outperforming some fingerprint-based methods in enrichment factors for active molecules in pharmacological screens.^[17] In bioinformatics, GED is applied to compare protein structures by converting them into contact map graphs, in which nodes denote amino acid residues and edges signify close spatial contacts (typically within a predefined distance threshold), allowing for the assessment of conformational similarities and differences. Such graph representations enable GED to estimate evolutionary distances between proteins by calculating the edit cost to transform one contact map into another, providing insights into structural divergence over phylogenetic time scales. For example, GED computations on contact maps have been used to align and classify protein domains, revealing conserved motifs in families like kinases or globins. As of 2025, GED has also been applied in genomics for constructing edit-distance graphs to cluster large sets of short reads from sequencing data, aiding in genome assembly and variant detection.^[18]^[19] A specific instance of GED's utility involves approximating the maximum common edge subgraph (MCES) for pharmacophore matching, where pharmacophores—simplified graphs capturing key molecular features like hydrogen bond donors or hydrophobic regions—are compared to identify shared interaction patterns with biological targets. By optimizing GED to minimize edit costs between these pharmacophore graphs, MCES approximations highlight overlapping substructures essential for activity, supporting ligand-based virtual screening and scaffold hopping in medicinal chemistry.^[20] The use of GED in chemoinformatics and bioinformatics emerged prominently in the early 2000s, with foundational evaluations by Raymond and Willett demonstrating its viability for similarity searching in chemical databases containing thousands of 2D structures. Subsequent advancements have incorporated GED into machine learning pipelines via specialized kernels that approximate edit distances, enabling supervised classification of biological graphs such as protein-protein interaction networks or metabolic pathways in datasets from sources like the Protein Data Bank. These GED kernels, often based on random walk or convolution formulations, enhance accuracy in tasks like predicting protein function or molecular bioactivity without requiring exact distance computations.^[21] Recent applications as of 2024 include using GED for clustering travel-activity patterns in transportation analysis, where graphs represent weekly mobility sequences to identify behavioral similarities tolerant to variations in data collection.^[22]

Algorithms

Exact Methods

Exact methods for computing graph edit distance (GED) aim to find the minimum cost sequence of edit operations that transforms one graph into another, guaranteeing the optimal solution without approximation. These approaches are essential in scenarios where precision is paramount, such as verifying structural similarities in small datasets from pattern recognition tasks. However, due to the inherent complexity of the problem, exact methods are typically feasible only for graphs with a limited number of vertices, often up to 10-20 nodes depending on the graph density and edit costs.^[16] One prominent exact method employs a tree search strategy based on the A* algorithm, which explores the space of possible edit paths in a best-first manner. The search tree is constructed by sequentially assigning nodes from the source graph G_1 = (V_1, E_1) to nodes in the target graph G_2 = (V_2, E_2), including dummy nodes to account for insertions and deletions. At each step, the algorithm selects the partial mapping with the lowest estimated total cost, where the total cost is the sum of the actual cost incurred so far and an admissible heuristic lower bound on the remaining cost. Admissible heuristics, such as those derived from bipartite matching on unmatched nodes or lower bounds on edge edit costs, ensure that the first complete path found is optimal. This approach was formalized in early works on inexact graph matching and has been refined in subsequent implementations to reduce memory usage, such as through depth-first variants like DF-GED.^[16]^[23] The GED problem can be formulated as finding an optimal bijection \phi: V_1 \cup \epsilon_1 \to V_2 \cup \epsilon_2, where \epsilon_1 and \epsilon_2 are sets of dummy (null) nodes representing insertions and deletions, respectively (typically |\epsilon_1| = |V_2|, |\epsilon_2| = |V_1| to balance sizes). The total edit cost under this mapping is the sum of vertex edit costs \sum_{v \in V_1 \cup \epsilon_1} c_v(v, \phi(v)), where c_v denotes substitution, insertion, or deletion costs based on whether \phi(v) is a real node or dummy, plus edge edit costs determined by the induced mapping: for each edge (u, v) \in E_1, a cost c_e((u, v), (\phi(u), \phi(v))) for substitution (if both images are real and edge exists in E_2), deletion (if image edge missing), etc.; symmetrically, insertion costs for edges in E_2 without preimages under \phi. This formulation captures the interdependence between node and edge edits, making the search space combinatorial. For the heuristic in A*, a common lower bound is the cost of optimal bipartite matching on the remaining nodes using vertex substitution costs alone, ignoring edge dependencies to ensure admissibility.^[16]^[23] To illustrate the A* approach, consider two small undirected graphs without labels for simplicity (unit costs: 1 for insert/delete/substitute node or edge). Let G_1 have vertices \{a, b\} with edge (a, b), and G_2 have vertices \{x, y, z\} with edges (x, y), (y, z). The goal is to transform G_1 to G_2 with minimum cost.

Initial state: Empty partial mapping, g = 0 (cost so far), heuristic h based on matching 2 nodes to 3 (e.g., h = 1 for one deletion/insertion imbalance, plus edge costs).
Step 1: Assign a \to x (substitute cost 1, partial edge match possible), new g=1, h=2 (remaining b to y or z, plus edge inserts/deletes). Alternative: a \to \epsilon (delete cost 1), but higher h.
Step 2: From a \to x, assign b \to y (g=2), check induced edge: (a,b) \to (x,y) matches, no edge cost; h=1 (insert z and edge (y,z)). Total f = g + h = 3.
Step 3: Complete by inserting z and edge (y,z) (g=4), h=0, total cost 4. Explore alternatives like b \to z (would require edge substitute/delete, higher cost). The search prunes branches where f > 4, confirming optimality (actual minimum: substitute a \to x, b \to y, insert z and edge).

This example demonstrates how A* efficiently prunes the $3^2 = 9 possible assignments (including dummies) to find the exact GED of 4.^[23] Despite their accuracy, exact methods suffer from exponential time complexity in the worst case, as the search space grows factorially with the number of vertices (O(n!) partial mappings). Practical implementations mitigate this through strong heuristics and pruning, but they remain intractable for graphs beyond 20 vertices on standard hardware.^[16]^[23]

Approximation and Heuristic Approaches

Approximation and heuristic approaches for graph edit distance (GED) address the computational challenges of exact methods by trading optimality for efficiency, enabling applications on larger or more complex graphs. These methods often rely on suboptimal mappings or relaxations to estimate the minimum edit cost, providing upper or lower bounds that can guide search or serve as direct approximations. Seminal heuristics, such as those based on bipartite graph matching, have been widely adopted for their polynomial-time performance in pattern recognition tasks.^[24] A common heuristic models the node mapping as a bipartite assignment problem. Vertices of G_1 are matched to vertices of G_2 or dummy nodes using a cost matrix that incorporates vertex edit costs, solvable via the Hungarian algorithm in O(n^3) time. Edge edit costs are then added post-assignment by evaluating the induced mapping's edge discrepancies. While this provides a structured way to approximate node correspondences, it ignores the quadratic interdependence of edge costs during assignment, yielding an upper bound rather than the exact minimum. Adaptations include iterative refinement or beam search to explore promising assignments, achieving quadratic or cubic time complexity and approximations within 10-20% error on benchmark datasets like letter and protein graphs.^[25]^[24]^[26] Lower and upper bound methods leverage structural similarities to prune the search space or refine estimates without full computation. A common upper bound derives from approximations of the maximum common subgraph (MCS), where the edit cost is bounded above by the number of unmatched nodes and edges under a suboptimal common subgraph, as GED is at least as large as the cost of transforming the MCS to the full graphs. For lower bounds, techniques like branch edit distance compute relaxed costs on local substructures (e.g., stars or paths around nodes), ensuring the bound is admissible for branch-and-bound searches; for instance, the BRANCH algorithm refines these by iteratively optimizing over node neighborhoods, providing tighter bounds than prior methods with O(n³) time for graphs of size n. These bounds are particularly effective in filtering candidates during similarity search, reducing runtime by up to 90% in large databases while maintaining guarantees on the final estimate.^[27]^[28] Machine learning aids, particularly graph neural networks (GNNs), approximate GED by embedding graphs into a latent space where Euclidean or other metric distances correlate with edit costs. In these approaches, siamese GNN architectures (e.g., using Graph Isomorphism Networks) generate node and graph-level embeddings independently for input pairs, capturing structural and label information through message passing. The GED estimate is then obtained via a distance metric on the embeddings, such as L2 norm, optionally refined by a learnable matching matrix to simulate edit paths. Training on pairs with known GED labels enables end-to-end optimization, achieving mean absolute errors of 5-15% on synthetic and real-world datasets like AIDS and MUTAG, with inference times under milliseconds for graphs up to 100 nodes. This method scales well to high-dimensional graphs, outperforming traditional heuristics in accuracy for non-isomorphic structures.^[29]^[30] An example of partitioning for large graphs is the variable partitioning local search (VPLS) heuristic, which divides the assignment problem into manageable subsets to approximate GED. It formulates GED as a mixed-integer linear program, then iteratively partitions the vertex assignment variables into fixed and free sets based on instance analysis (e.g., degree distributions), solving the reduced problem via local search and recombining solutions. This divide-and-conquer strategy aggregates partial edit costs across partitions, yielding high-quality approximations for graphs with hundreds of nodes, where exact methods fail due to time limits; on the CMU-HOUSE dataset, it improves solution quality over baselines by 15-25% in under 10 seconds per pair.^[31] Recent advances in the 2020s integrate deep learning hybrids for scalable approximations, such as the GREED framework, which uses GNN embeddings to learn parametric distance functions that preserve GED's metric properties while enabling fast retrieval. By training on diverse graph pairs, these methods achieve sub-second queries on million-scale databases, with approximation ratios under 1.1 for unit-cost edits, and have been applied in chemoinformatics for rapid molecular similarity screening. More recent approaches, such as optimal transport-based approximations (as of 2024), offer provable bounds and scalability to larger graphs.^[30]^[32]

Complexity

Computational Complexity

The computation of the graph edit distance (GED) between two graphs is a fundamental problem in graph theory and pattern recognition, but it is computationally intractable in general. The decision version of GED—determining whether there exists an edit path of cost at most a given threshold k—is NP-complete, even for unlabeled undirected graphs with unit edit costs. This result follows from polynomial-time reductions from NP-complete problems like subgraph isomorphism.^[33] The NP-hardness of GED was recognized early in its development following its formalization in 1983, with seminal analyses in the late 1980s and 1990s exploring reductions from related hard problems like subgraph isomorphism. These works highlighted the inherent difficulty, showing that exact GED computation requires exponential time in the worst case due to the combinatorial explosion in possible edit sequences. In terms of parameterized complexity, GED remains challenging, though it exhibits fixed-parameter tractability when parameterized by structural graph parameters such as treewidth or pathwidth for certain restricted graph classes; however, the general case for arbitrary graphs of bounded treewidth is an open problem. Empirically, exact algorithms for GED, such as those based on A* search or integer linear programming, are feasible only for small graphs with up to approximately 15–20 vertices, beyond which the runtime becomes prohibitive even on modern hardware due to the exponential search space. For larger instances, this limits practical applications to domains with modest graph sizes, motivating the development of approximation and heuristic methods.^[34]^[35]

Approximation Guarantees

The computation of the graph edit distance (GED) is NP-hard to approximate to within a factor of n^{1-\epsilon} for any constant \epsilon > 0, unless P = NP, where n denotes the number of vertices in the input graphs. This inapproximability result holds for the case of metric edit costs, where the costs satisfy the triangle inequality, as well as for uniform costs where all edit operations have unit cost. For non-metric edit costs, GED is non-approximable in a stronger sense: there exists no polynomial-time \alpha-approximation algorithm for any constant \alpha \geq 1, unless P = NP. These hardness results extend even to very sparse graphs with maximum degree at most two and unlabeled vertices, demonstrating that structural restrictions like bounded degree do not yield constant-factor approximations in general.^[5] Despite these strong inapproximability barriers, some positive theoretical results exist for restricted settings. For certain restricted edit problems (e.g., editing to specific graph classes like bounded-degree graphs), dynamic programming techniques enable fixed-parameter tractable algorithms, though for general GED between arbitrary graphs with bounded treewidth, exact computation remains open. Constant-factor approximations remain elusive for broader classes like planar graphs, where GED computation retains NP-hardness without known sublinear approximation ratios beyond heuristics. No polynomial-time approximation scheme (PTAS) exists for metric GED, as there is no (1 + \epsilon)-approximation for sufficiently small \epsilon > 0, unless P = NP. Recent machine learning approaches, including graph neural networks, provide practical approximations in sub-quadratic time for larger graphs.^[5]^[36]^[3] Relative approximations have been explored using linear programming (LP) relaxations of the mixed-integer programming formulation of GED, particularly for cost functions where substitution costs dominate and satisfy specific monotonicity conditions. These relaxations provide lower bounds on the optimal GED and can yield (1 + \epsilon)-approximate solutions in polynomial time when the edit operations reduce to a linear assignment problem, such as in cases with zero insertion/deletion costs. For general costs, however, the integrality gap of these LP relaxations does not guarantee bounded relative error. Bounded-error approximations via semidefinite programming (SDP) have been proposed by formulating GED as a special case of the quadratic assignment problem (QAP), where SDP relaxations offer tight bounds in practice for instances with structured cost matrices, though theoretical guarantees are limited to specific QAP subclasses like those with permutationally invariant costs.^[5]^[13]^[37] Significant gaps persist in the literature on GED approximations. While the inapproximability threshold of n^{1-\epsilon} rules out sublinear factors, it remains open whether logarithmic-factor approximations (e.g., O(\log n)) are achievable in polynomial time for general graphs under metric or uniform costs. Further exploration of approximation guarantees for intermediate graph classes, such as bounded-genus surfaces or minor-closed families, could bridge these gaps, but current results indicate persistent hardness even under mild restrictions.^[5]

References

[1]
None
### Summary of Key Sections from "A Survey of Graph Edit Distance"
[2]
[PDF] On the Graph Edit Distance cost: Properties and Applications1
Our specific definition and interpretation of the Graph Edit Distance allows each class of cost to be described using a plane equation and allows the shape of ...
[3]
https://arxiv.org/pdf/2508.03111
[4]
[PDF] Graph Edit Distance Learning via Modeling Optimum Matchings with ...
Graph edit distance (GED) is a fundamental mea- sure for graph similarity analysis in many real ap- plications. GED computation has known to be. NP-hard and ...Missing: definition | Show results with:definition
[5]
[PDF] New Techniques for Graph Edit Distance Computation - arXiv
... Graph Edit Distance, Graph Matching, Algorithm Design. Copyright c 2019 by ... original definition of GED due to Bunke and Allermann [29], which is ...
[6]
https://doi.org/10.1016/j.patcog.2019.01.043
[7]
On a relation between graph edit distance and maximum common ...
Graph edit distance ... Bunke, G. Allerman. Inexact graph matching for structural pattern recognition. Pattern Recognition Letters, 1 (4) (1983), pp. 245-253.Missing: Allermann | Show results with:Allermann
[8]
[PDF] Comparing heuristics for graph edit distance computation
A very flexible, sensitive and therefore widely used measure is the graph edit distance (GED), which is defined as the minimum cost of an edit path between G ...
[9]
[PDF] Graph Kernels and Applications in Bioinformatics
A standard set of edit operations include insertions, deletions, and substitutions of both vertices and edges. Graph comparison methods based on edit distance ...
[10]
https://www.sciencedirect.com/science/article/abs/pii/016786558390013X
[11]
Graph edit distance : a new binary linear programming formulation
May 21, 2015 · Graph edit distance (GED) is a powerful and flexible graph matching paradigm that can be used to address different tasks in structural pattern ...Missing: Bunke seminal
[12]
[PDF] Graph Edit Distance as a Quadratic Assignment Problem
The Graph Edit Distance (GED) is a flexible measure of dissimilarity between graphs which arises in error-correcting graph matching.
[13]
Towards Accurate Subgraph Similarity Computation via Neural ...
Oct 19, 2022 · Based on this idea, we further design a novel neural network to approximate a type of subgraph distance: the subgraph edit distance (SED).
[14]
(PDF) Fast Suboptimal Algorithms for the Computation of Graph Edit ...
Aug 7, 2025 · In this paper, we propose two simple, but effective modifications of a standard edit distance algorithm that allow us to suboptimally compute ...Missing: seminal | Show results with:seminal
[15]
Inexact graph matching for structural pattern recognition
Inexact graph matching for structural pattern recognition. Author links open ... Bunke et al., 1982. H Bunke, G Sagerer, H Niemann. Model based analysis ...
[16]
Effectiveness of graph-based and fingerprint-based similarity ...
This paper reports an evaluation of both graph-based and fingerprint-based measures of structural similarity, when used for virtual screening of sets of 2D ...
[17]
Graph‐based methods for protein structure comparison - Fober - 2013
Jul 26, 2013 · Many structure-based approaches to the comparative analysis of proteins and the inference of protein function rely on graph formalisms for ...
[18]
Ligand-Based Virtual Screening Using Graph Edit Distance as ...
This study investigates the effectiveness of a graph-only driven molecular comparison by using extended reduced graphs along with graph edit distance methods.Missing: variants | Show results with:variants<|control11|><|separator|>
[19]
[PDF] An Exact Graph Edit Distance Algorithm for Solving Pattern ... - HAL
Jun 26, 2015 · Abstract: Graph edit distance is an error tolerant matching technique emerged as a powerful and flexible graph matching.
[20]
[PDF] A Hungarian Algorithm for Error-Correcting Graph Matching? - HAL
We propose in this paper a new formulation of the bipartite graph matching algorithm designed to solve efficiently the associated graph edit distance problem.
[21]
Approximate graph edit distance computation by means of bipartite ...
Jun 4, 2009 · In the present paper we introduce a novel algorithm which allows us to approximately, or suboptimally, compute edit distance in a substantially faster way.Missing: Hungarian | Show results with:Hungarian
[22]
[PDF] Approximate Graph Edit Distance Computation Combining Bipartite ...
Oct 28, 2016 · In this paper, we used the heuristic function defined by Riesen and Bunke in [?]. This heuristic allows to find an optimal mapping between ...Missing: seminal | Show results with:seminal
[23]
Improved Lower Bounds for Graph Edit Distance - ResearchGate
The problem of deriving lower and upper bounds for the edit distance between undirected, labelled graphs has recently received increasing attention.Missing: V2| | Show results with:V2|<|control11|><|separator|>
[24]
Comparing stars: on approximating graph edit distance
In this paper we introduce three novel methods to compute the upper and lower bounds for the edit distance between two graphs in polynomial time.
[25]
[PDF] Computing Graph Edit Distance via Neural Graph Matching
Graph edit distance (GED) computation is a fundamental NP-hard problem in graph theory. Given a graph pair (G1,G2), GED is defined as the minimum number of ...
[26]
[PDF] GREED: A Neural Framework for Learning Graph Distance Functions
Among various distance functions for graphs, graph and subgraph edit distances. (GED and SED respectively) are two of the most popular and expressive mea-.<|control11|><|separator|>
[27]
Solving the Graph Edit Distance Problem with Variable Partitioning ...
Jun 19, 2019 · In the world of graph matching, the Graph Edit Distance (GED) problem is a well-known distance measure between graphs.
[28]
Comparing stars: on approximating graph edit distance
Comparing Stars: On Approximating Graph Edit Distance ∗. Zhiping Zeng§ , Anthony K.H. Tung† , Jianyong Wang§ , Jianhua Feng§ , Lizhu Zhou§. § Tsinghua ...
[29]
Text-independent speaker recognition using graph matching
Jul 1, 2008 · With the exact algorithm, the size of the graph is restricted to 20 vertices in practise due to its exponential time complexity. This is ...
[30]
On the exact computation of the graph edit distance - ScienceDirect
Gao et al. A survey of graph edit distance. Pattern Anal. Applic. (2010). Z. Zeng et al. Comparing stars: on approximating graph edit distance. PVLDB. (2009). K ...
[31]
A Fixed-Parameter Tractable Algorithm for Elimination Distance to ...
In the literature on parameterized graph problems, there has been an increased effort in recent years aimed at exploring novel notions of graph edit-distance
[32]
Bounds for the quadratic assignment problem using the bundle ...
The nature of the quadratic assignment problem (QAP) suggests SDP as a way to derive tractable relaxations. We recall some SDP relaxations of QAP and solve them ...