Fact-checked by Grok 2 weeks ago

E-graph

An e-graph, short for equality graph, is a in that compactly represents a set of expressions (terms) and a over them, where equivalent expressions are grouped into classes to enable efficient reasoning about equalities. It consists of e-nodes, which are representations of function applications with children pointing to e-classes, and e-classes, which e-nodes based on provable equivalence under a set of rewrite rules. This structure maintains a invariant, ensuring that if two subexpressions are equivalent, the enclosing expressions are merged accordingly, allowing for compact storage of exponentially many equivalent program variants. E-graphs originated from theorem provers like Simplify, where they were used to represent equalities in , but gained prominence in programming language research through the introduction of equality saturation in 2009. Equality saturation leverages e-graphs to apply rewrite rules exhaustively and order-independently, addressing the phase-ordering problem in optimizations by saturating the graph with equalities before extracting the best representation using cost-based heuristics. In this approach, an initial expression is added to the e-graph, rules are repeatedly matched and applied until no new equalities are found (), and then the minimal-cost expression is extracted, enabling without destructive transformations. Subsequent advancements, such as the library introduced in , have optimized e-graph implementations for speed and extensibility by incorporating techniques like rebuilding for efficient invariant maintenance and e-class analyses for domain-specific computations like . These improvements have broadened e-graph applications beyond compilers to include , where they generate optimal code snippets; symbolic computation in tools like for floating-point accuracy; and even hardware design verification. Today, e-graphs power libraries in languages like , , and , facilitating automated and checking across diverse domains.

Fundamentals

Definition

An E-graph, short for equality graph, is a that compactly represents a set of equivalent expressions over uninterpreted symbols and constants, maintaining a among them. It consists of e-nodes, which are canonical representations of applications pointing to child e-classes, and e-classes, which equivalent e-nodes into classes. Terms in an E-graph are modeled as trees, where leaves are constants or variables and internal nodes are applications of uninterpreted functions; each such application forms an e-node whose children belong to specific e-classes. To enforce equivalence, E-graphs use a union-find to dynamically merge e-classes when new equalities are asserted, ensuring that equivalent terms share the same representative. Additionally, hash-consing is employed to create a representation for e-nodes by mapping unique combinations of symbols and child e-classes to a single instance, preventing duplication and enabling efficient lookups. Key invariants include congruence closure, which guarantees that if two subterms are equivalent, then any enclosing term applications are also considered equivalent, and the persistence of all established equivalences throughout operations on the structure. For example, consider arithmetic expressions where is commutative: inserting both (add x y) and (add y x) into an E-graph would merge their root e-classes via union-find, as the subterms x and y are equivalent to y and x respectively under congruence closure, resulting in a single e-class representing both forms. This structure supports techniques like equality saturation, where rewrites are applied exhaustively to explore equivalent expressions.

History

The concept of e-graphs originated in the late 1970s within the domain of , particularly through foundational work on congruence closure algorithms, such as the 1980 paper by C. G. Nelson and D. C. Oppen. These structures were initially employed to efficiently represent relations over terms in , enabling scalable reasoning about equalities in logical formulas. A pivotal development occurred in 2007 with the introduction of efficient E-matching techniques in the Z3 solver by Leonardo de Moura and Nikolaj Bjørner. Their work formalized E-graphs as a core component for incremental and closure, allowing Z3 to handle complex quantifier instantiations and theory-specific equalities with improved performance. This integration marked the first widespread use of E-graphs in production tools, influencing subsequent solver designs. In the 2010s, E-graphs were further formalized and extended beyond solvers for applications in term rewriting and . A key influence came from abstract congruence matching, as explored by Michael Stepp, Ross Tate, and Sorin Lerner in their 2011 work on equality-based translation validation for like . This approach leveraged E-graphs to verify semantic equivalence between source and target code representations, bridging automated verification with correctness. The egg library, developed by Max Willsey and colleagues and introduced in 2020, provided an efficient implementation of equality saturation (a technique originally introduced in 2009), addressing scalability limitations of prior E-graph implementations by introducing specialized analysis and reconstruction algorithms, enabling its application to program synthesis and optimization tasks that were previously infeasible. Post-2020 developments expanded E-graphs into production compilers and diverse optimization domains. Integration efforts in LLVM-based systems, such as the eqsat dialect for MLIR in 2025, allowed non-destructive rewriting passes using equality saturation directly within the compiler infrastructure. Concurrently, 2022–2023 research incorporated acyclic E-graphs into compilers like Cranelift (part of the LLVM ecosystem) to resolve pass-ordering issues and enhance mid-end optimizations. Key milestones included relational E-matching in 2022, which improved efficiency by modeling pattern searches as relational joins for worst-case optimal query plans, and Wasm-mutate in the same year, which used E-graphs to generate semantically equivalent WebAssembly variants for compiler fuzzing and binary diversification. By 2025, E-graphs saw innovative applications in diagram-based theorem proving at the , where they were adapted to maintain relations over diagrammatic representations for algebraic proofs. Additionally, educational and practical advancements, such as technology techniques taught in Cornell's CS 6120 course, demonstrated E-graphs' utility in synthesizing optimal circuit implementations from expression equivalences. These evolutions underscore E-graphs' transition from niche verification tools to versatile structures in optimization and synthesis.

Structure and Operations

E-classes and Invariants

In an e-graph, e-classes form the core partitioning mechanism, grouping terms that are equivalent under the defined rewrite rules into . Each e-class encapsulates multiple e-nodes, serving as compact representations of equivalent expressions while avoiding redundant storage of identical terms. This structure allows the e-graph to efficiently represent an exponentially large set of equivalent programs as a polynomial-sized . An e-node within an e-class specifies a function symbol, such as an like or , along with an ordered list of child e-classes that denote its subterms. For instance, an e-node for might reference two child e-classes representing the operands. To maintain a , e-nodes are uniquely identified through hash-consing, ensuring that no two e-nodes with the same and equivalent children coexist in the same e-class, which prevents duplication and facilitates efficient lookups. E-graphs uphold several key invariants to ensure correctness and usability. Acyclicity prohibits cycles in the graph to prevent the representation of infinite terms, typically enforcing a directed acyclic graph (DAG) structure for term languages. Completeness guarantees that all equivalences logically implied by the initially added equalities are fully represented across the e-classes. Congruence ensures that equivalence is preserved under contextual substitution: if subterms in child e-classes are equivalent, then the corresponding parent e-nodes must reside in the same e-class. Invariant maintenance relies on a union-find to track e-class equivalences, employing path compression during finds to canonicalize references and union-by-rank to balance the for amortized efficiency in merges. Each e-class also tracks parent lists—enumerating e-nodes that reference it as a —and child references for traversal, enabling systematic checks for new opportunities during updates. In practice, these are often relaxed during incremental additions for , with periodic rebuilding to restore full compliance. For example, consider merging e-classes under a commutative rewrite like x + y \equiv y + x: the e-graph adds an e-node for y + x with children swapped from the original x + y e-node, then unions the parent e-classes, linking all variants without duplication while propagating the change through parent lists to identify further congruent merges. This process upholds the invariants, allowing the e-graph to compactly capture both orders as equivalent without .

Basic Operations

The add operation in an E-graph inserts a into the structure by creating an e-node representing the term's and its child e-classes, if they do not already exist. To avoid duplicates, the operation employs hash-consing: a checks for an identical e-node (same and child e-classes), returning its existing e-class identifier if present; otherwise, a new singleton e-class is created for the e-node and added to the e-graph. This ensures efficient storage and sharing of subterms, as ground terms (constants like variables or literals) are inserted as leaf e-nodes in dedicated e-classes without children. The merge operation unions two e-classes to enforce , updating the e-graph to reflect this while preserving —the property that equivalent inputs to the same yield equivalent outputs. It begins by izing the input e-class identifiers using the underlying union-find structure to locate their representatives. If the representatives differ, one is redirected to the other, effectively merging the e-classes and combining their e-nodes. To maintain , the merge triggers propagation: parents of the merged e-classes are examined, and any e-nodes with now-equivalent children are added to a worklist for rebuilding, where congruent e-nodes are detected via hash-consing and merged iteratively. For efficiency, implementations handle inverse implications during path compression in the union-find find operation, allowing quick retrieval of canonical representatives without repeated traversals. The following pseudocode illustrates the core merge process, adapted from standard e-graph implementations:
function merge(egraph, class1, class2):
    rep1 = find(egraph, class1)
    rep2 = find(egraph, class2)
    if rep1 == rep2:
        return false  // already equivalent
    // Union: redirect rep1 to rep2
    egraph.union_find[rep1] = rep2
    // Propagate: add parents to worklist for congruence
    for parent in parents_of(rep1):
        add_to_worklist(parent)
    rebuild(egraph, worklist)  // canonicalize and merge congruents
    return true
Here, find follows union-find paths with to return the representative e-class, and rebuild processes the worklist by recreating e-nodes with children and merging duplicates. The find operation retrieves the representative e-class identifier for a given or e-node by following the union-find parent pointers, applying path for amortized efficiency. extends this by rebuilding a from its representative e-class, selecting a e-node (often the first or simplest in some order) and recursively reconstructing children, simulating without full evaluation—for instance, applying an operator to child e-classes yields a new e-node in the result e-class. This allows querying equivalences and extracting terms while preserving invariants like . As an example, consider building an E-graph for arithmetic equalities such as x + y = y + x. First, add the ground terms x and y as singleton e-classes. Adding x + y creates an e-node with children pointing to the x and y e-classes, forming a new e-class. Similarly, adding y + x creates another e-node in a distinct e-class. To enforce commutativity, merge the two e-classes: their representatives are unioned, and propagation examines any parent e-nodes (none initially), but subsequent additions like (x + y) + z would trigger merging with (y + x) + z to maintain . The resulting structure shares the x and y subterms, with find returning the same representative for both addition forms.

Equivalent Formulations

E-graphs can be formulated as an augmented union-find data structure, where equivalence classes are managed via union-find components to track equal terms, and each component stores a list of e-nodes representing the expressions in that class. This approach extends the standard union-find to maintain congruence closure, allowing efficient merging of classes when new equalities are added. In the term DAG view, E-graphs are represented as directed acyclic graphs of terms, where shared subterms across nodes encode equivalences, generalizing conventional term DAGs that typically handle only common subexpression elimination without full equivalence classes. When each e-class contains a single e-node, the structure reduces to a standard term DAG, but in general, multiple e-nodes per class enable richer sharing of equivalent substructures. The relational formulation models E-graphs using binary relations to capture congruences between terms, treating e-nodes and e-classes as relational tables where matches are computed via joins, as in relational e-matching. This perspective draws on database techniques to optimize over the equivalence structure. Algebraically, an E-graph can be viewed as a quotient structure on the term modulo the generated by the stored equalities, where e-classes terms into equivalence sets under the induced . These formulations support distinct optimizations: the union-find approach enables fast in-place unions for dynamic , while the term DAG view promotes space efficiency through subterm sharing, reducing redundancy in large expression sets compared to naive tree representations. For example, consider an E-graph encoding the term (a \times 2) / 2 with the x \times 2 \equiv x \ll 1. In the term DAG view, this translates to a with a shared for the constant 2 and a for a, where the and operators point to these shared children, and the adds a left-shift also sharing the a and 2 nodes, visualizing equivalences without duplicating subterms.

Key Algorithms

E-matching

E-matching is the process of searching an e-graph for all substitutions that map the variables of a given to e-classes such that the instantiated pattern is equivalent to some represented in the e-graph. This operation is essential for identifying opportunities to apply rewrite rules during equality saturation, where patterns represent the left-hand sides of such rules. Traditional e-matching algorithms employ bottom-up matching augmented with discrimination trees to efficiently locate potential matches by indexing e-nodes grouped by their function symbols. These trees, often structured as code trees or tries, allow for rapid traversal and filtering of e-graph enodes that could bind to pattern subterms, while an inverted path index prunes irrelevant branches to avoid exhaustive search. To handle and multiple matches—arising from non-deterministic bindings or commutative operators—an with a is used, enabling the enumeration of all valid substitutions without redundant recomputation. A key challenge in e-matching is mitigating exponential blowup in time , as the problem is NP-hard in general due to the of pattern variables and e-graph depth. Optimizations such as caching intermediate match results in the discrimination trees and variable ordering during help bound the search space, ensuring practicality for patterns encountered in solvers and . More recent advancements introduce a relational of e-matching, which models the search as a Datalog-like conjunctive query over the e-graph's relational representation, reducing the problem to a fixed-point via generic joins. This approach achieves worst-case optimality, with bounded by O(|M| + N \log N) in many practical scenarios, where N is the e-graph size and |M| is the output size of matches, outperforming traditional methods by orders of magnitude on complex patterns. It naturally accommodates multiple patterns and through query unfolding and join ordering, leveraging database techniques for efficiency. For example, consider matching the pattern (add ?x ?y) in an e-graph containing e-classes for numbers and operations; e-matching would yield substitutions for all pairs of e-classes and where the e-graph entails [a + b], including commutative variants like (add ?y ?x) without duplicating bindings.

Extraction

In e-graphs, refers to the process of selecting a ground from an e-class that minimizes a user-defined , such as the size or depth of the (AST), or a domain-specific metric like the number of instructions in a backend. This step typically follows saturation, where the e-graph has been populated with equivalent expressions, and aims to produce an optimal or near-optimal representation for the final output. Cost functions are often local and recursive, computing the total cost of a based on the costs of its subterms plus a for the root ; for instance, in , the cost might reflect register pressure or code size. Common algorithms for leverage dynamic programming to traverse the e-graph bottom-up, annotating each e-class with the minimal cost and corresponding e-node selection achievable from its children. For , heuristics select the lowest-cost e-node at each step without , while A* search can incorporate heuristics like admissible estimates of remaining costs to guide exploration toward optimality. Exact solutions frequently reduce the problem to integer linear programming (ILP), encoding assignments for e-node selections subject to acyclicity and coverage constraints, though this scales poorly for large e-graphs. The problem is NP-hard in general, even when allowing common subexpression sharing and using simple additive costs, as it encompasses challenges like finding minimum-cost spanning trees in dense graphs. Polynomial-time algorithms exist for restricted cases, such as e-graphs with bounded —where the graph's "tree-likeness" limits structural complexity—or when costs are linear in the term structure, enabling fixed-parameter tractable solutions via dynamic programming on tree decompositions. Cost models in practice prioritize metrics that align with application goals; for example, AST size minimizes the number of nodes to reduce parse time, while instruction count in compilers estimates generated length, often incorporating recursive analyses to propagate semantic properties like data types or bounds. Handling recursive costs requires e-class analyses that merge information during , ensuring costs reflect shared substructures without recomputation. As an illustrative example, consider an e-class representing the arithmetic expression equivalent to a \times b + a \times c, which might contain variants like (a \times b) + (a \times c) and a \times (b + c). Using a cost model where each operator (+ or ×) incurs a cost of 1 and leaves cost 0, dynamic programming would evaluate sub-e-classes to select a \times (b + c) as the minimal term with cost 2, rather than the unfolded variant with cost 3. Recent advancements from 2023 to 2025 have focused on scalable heuristics for extraction in tools like and LLVM-integrated systems. In , the ILP extractor was enhanced with acyclic constraints to detect and penalize cycles during solving, improving performance on real-world optimizer benchmarks by up to 10×. LLVM's eqsat dialect for MLIR incorporates e-graph rewriting with heuristic extraction tailored to hardware mapping, while broader innovations include e-boost's parallelized adaptive heuristics that warm-start exact solvers for 558× speedups on tensor programs, and SmoothE's differentiable approach for GPU-accelerated optimization under complex, non-linear costs.

Equality Saturation

Equality saturation is a for that leverages e-graphs to exhaustively apply rewrite rules until a fixed point is reached, where no further rewrites are possible. Introduced as a solution to the phase-ordering problem in compilers, it represents expressions as sets of equivalent terms within an e-graph, allowing the exploration of multiple optimization paths without committing to a single rewrite sequence early on. This approach ensures that all possible equivalences derivable from the initial term and a set of rules are discovered, enabling the selection of an optimal form post-saturation. The workflow begins with an initial term inserted into an empty e-graph, forming the starting e-class. Rewrite rules, which are bidirectional (i.e., if A = B, then B = A), are then applied iteratively using e-matching to identify subterms that match rule patterns. Each successful match adds new e-nodes and merges equivalent e-classes, growing the e-graph while canonicalizing representations to prevent exponential term explosion. This process repeats—scanning all rules against the current e-graph—until saturation, where additional iterations yield no changes, at which point extraction selects the best representative from the e-graph. E-matching serves as the core engine for rule application in this loop. A key benefit of equality saturation is its nature, as it operates on abstract syntax trees or term representations without relying on domain-specific heuristics. It excels at discovering non-local rewrites that traditional linear-pass optimizers might miss, such as algebraic identities spanning multiple subexpressions. Prominent implementations include the library, released in 2021 and written in , which optimizes e-graph operations for speed and extensibility through techniques like analysis-driven e-matching and of cost information. Recent extensions from 2022 to 2025 have focused on scalability, such as guided equality saturation, which decomposes complex problems into sequences of smaller saturations to handle larger inputs efficiently. Consider a simple example of saturating the expression (a + b) \times c using algebraic rules. The initial e-graph contains the term as a single e-node with children + (of a and b) and \times (applied to the sum and c). Applying the rule x + y = y + x (commutativity) adds a symmetric variant, merging the children into the same e-class. Next, the rule (x + y) \times z = x \times z + y \times z matches the structure, introducing new e-nodes for a \times c and b \times c, which are merged under a e-class equivalent to the original. Further rules, like associativity x + (y + z) = (x + y) + z, may propagate if additional terms are present, but saturation halts once all derivable forms—such as a \times c + b \times c—are represented without redundancy. Extraction then costs these equivalents to pick the simplest, like the distributed form for . Despite its power, equality saturation faces challenges from rule explosion, where a large set of rules can cause the e-graph to grow uncontrollably in size and time. This is often mitigated by scheduling—prioritizing high-impact rewrites—or partial , which limits iterations or focuses on subgraphs.

Theoretical Aspects

The construction of an E-graph from a set of n equalities can be performed in O(n \log n) time using algorithms based on union-find structures with path compression and union by rank, as originally analyzed for congruence closure. Space usage is O(n) when employing hash-consing to represent terms compactly, avoiding redundant storage of equivalent subexpressions. E-matching, the process of finding subgraphs matching a given , has worst-case exponential in the size of the due to the need to explore all possible bindings in approaches. However, relational variants of e-matching, which model the search as a conjunctive query over a relational representation of the E-graph, achieve polynomial , such as O(|Q(I)|^{1/2} \cdot N^{m/2}) where N is the number of e-nodes, m is fixed for the (number of relations), and |Q(I)| is the output size; this provides worst-case optimality relative to the output. Extraction, which selects a minimal-cost representative term from an e-class, is NP-hard in general when costs are assigned to subexpressions and common subexpression sharing is allowed. For restricted cases, such as linear cost functions (where cost depends only on the root operator) or fixed-width patterns (bounded ), extraction can be solved in O(n) time via dynamic programming or fixed-point traversals. In practice, E-graph operations exhibit amortized efficiency due to the use of invariants like e-class analyses, which redundant computations during ; however, dense E-graphs with many overlapping equivalences risk exponential blowup in size and matching time. Theoretical bounds from 2008 to 2022 emphasize these worst-case behaviors, while recent 2024 analyses on show that for linear-matching rules, the overall time is O(N \cdot I \cdot R) where N is the number of e-nodes, I is the number of iterations, and R is the number of rules, improving to sub-quadratic bounds under disjointness parameters. For example, constructing an E-graph via equality with 1000 rewrites on a program of size N \approx 2500 (as in eggcc benchmarks) typically completes in near-linear time relative to N, leveraging these parameterized bounds with I=2 iterations and modest rule counts.

Limitations and Challenges

One major practical limitation of E-graphs is their , particularly the risk of memory explosion due to the proliferation of e-nodes during . As rewrite rules are applied, the e-graph can grow exponentially in size, making it impractical for very large expressions or complex programs. To mitigate this, techniques such as rebuilding—where invariant maintenance like is deferred and recomputed periodically—provide significant speedups, up to 87.85× for , while bounded or timeouts prevent unbounded growth. strategies, including e-class analyses that simplify or discard redundant nodes, further address memory usage in production settings. Rule engineering poses significant challenges in deploying E-graphs effectively, as developing complete and non-conflicting rewrite sets requires substantial domain expertise. Rules may be subtly incorrect, overlook profitable transformations, or necessitate revalidation when underlying semantics change, such as in numerical domains like where precision and rounding introduce non-trivial equivalences. Incomplete rulesets can lead to suboptimal saturation, exacerbating the phase-ordering problem where the sequence of applications affects outcomes, though equality saturation mitigates this by exploring all possibilities exhaustively. Debugging E-graphs is complicated by their opaque representations, which compactly encode vast sets of equivalents but obscure the reasoning behind transformations. Inspecting the structure to trace rule applications or identify inefficiencies often requires custom tools, such as interactive visualizers that render e-graphs as graphs with e-classes and e-nodes for easier navigation. Recent developments, including JavaScript-based visualizers using Cytoscape.js, have emerged around 2024 to facilitate this, enabling users to explore and debug large e-graphs more intuitively. E-class analyses also support debugging by verifying properties like constant values during saturation. Domain-specific issues arise when applying E-graphs beyond pure functional settings, particularly in handling side effects or imperative languages. Rewrites that alter side effects, such as mutable state updates, are harder to express and maintain soundly, as E-graphs assume contextual equivalence without inherent support for or . In non-functional languages with loops or conditionals, additional mechanisms like partial evaluation graphs are needed to extend , but these introduce overhead and limit generality. Open problems in E-graphs include efficient incremental updates to handle dynamic changes without full resaturation, which remains underdeveloped and workload-dependent. Recent research explores integrating for automated rule discovery, such as guided equality saturation using to prioritize rewrites and generate domain-specific rules, addressing the manual engineering bottleneck. A notable example of saturation challenges occurs with cyclic rules that introduce infinite loops, such as repeated applications of identities like x \times 1 \equiv x without proper merging, leading to non-terminating growth despite cycles compactly representing infinite expressions. In such cases, equality saturation may fail to converge, requiring timeouts or depth bounds to halt, as formalized in semantic models allowing infinite E-graphs for non-terminating scenarios.

Applications

Compiler and Program Optimization

E-graphs have emerged as a powerful tool in and , primarily through the application of equality saturation, which builds an e-graph by exhaustively applying to generate equivalent variants and then extracts the optimal one based on a cost model. This approach enables systematic of intermediate representations (IRs) for loops, arithmetic expressions, and structures, allowing compilers to explore a vast space of optimizations without manual ordering of passes. In contrast to traditional optimizers, e-graphs facilitate global reasoning over equivalences, leading to more comprehensive transformations. A prominent example is the library, implemented in since 2020, which integrates e-graphs into compiler pipelines for equality saturation-based optimization of IRs. has been adopted for building custom optimizer passes, enabling automatic optimizations and the discovery of novel instruction sequences that outperform hand-written rules. For instance, in LLVM-based compilers, the eqsat dialect for MLIR, introduced in 2025, embeds e-graphs directly into the IR to support non-destructive passes, improving optimization of and constructs in workflows. Similarly, Wasm-mutate, developed in 2022, leverages e-graphs to generate semantically equivalent variants of binaries, aiding optimization and of compiler outputs for better code diversity and performance. In technology mapping, recent work from in 2025 applies e-graphs to map functions to networks, particularly for FPGA LUT remapping, by saturating equivalences and extracting minimal-cost implementations that surpass traditional methods. This technique discovers efficient circuit layouts through exhaustive equivalence exploration, reducing area and delay in hardware synthesis. Benefits of e-graphs in this domain include automated across larger scopes and the of previously unknown sequences, as demonstrated in egg's applications where yield up to 2x speedups in numerical kernels by simplifying expressions like evaluations. Case studies highlight integrations with MLIR for dialect-specific , achieving 15-30% performance gains in tensor computations through targeted optimizations. In numerical code, e-graph-based tools have produced substantial speedups in symbolic-numeric simulations by fusing operations and eliminating redundancies. Recent extensions in 2025 focus on GPU kernel optimization, with differentiable e-graph extraction methods like SmoothE enabling gradient-based cost modeling for parallel code generation, resulting in optimized kernels that reduce execution time by leveraging hardware-specific equivalences.

Automated Theorem Proving and SMT Solvers

E-graphs play a central role in satisfiability modulo theories (SMT) solvers for handling congruence closure in the theory of equalities with uninterpreted functions (EUF). In this context, they efficiently represent and propagate equalities among terms, enabling the solver to maintain a compact structure of equivalent expressions. The Z3 SMT solver, introduced in 2008, utilizes E-graphs as its core data structure for congruence closure, where equalities from the SAT solver are propagated to infer additional congruences. Similarly, the CVC4 solver employs E-graph-based procedures to compute congruence closure, supporting decision procedures for EUF logics. This representation allows SMT solvers to manage uninterpreted functions by treating them as opaque symbols while enforcing structural congruence. Beyond basic EUF, E-graphs facilitate advanced equality reasoning in solvers, particularly during quantifier instantiation and lemma learning. Quantifier instantiation relies on E-matching techniques that operate over the E-graph to identify relevant triggers and generate ground instances, avoiding redundant computations. In Z3, this E-matching process dynamically modifies the E-graph during search, enabling new instantiations based on updated congruences. For lemma learning, E-graphs maintain equivalences that inform conflict-driven clause generation, ensuring that learned clauses respect the current set of equalities. This integration enhances the solver's ability to handle quantified formulas and theory-specific lemmas efficiently. Recent extensions of E-graphs in include diagrammatic approaches for visual equation manipulation, as explored by the Topos Institute in 2025. These methods model equations as e-diagrams—hypergraph-like structures where rewrites correspond to visual transformations—leveraging E-graphs to prove equivalences in categorical settings. The benefits of E-graphs in these domains include scalable handling of large term sets through canonicalization and union-find structures, alongside tight integration with SAT solvers for hybrid decision procedures. In case studies, E-graphs serve as backends for equivalence oracles in tools like , supporting verification by generating proofs of semantic equivalence via rewrite sequences. solvers with E-graph cores, such as Z3, are routinely integrated into interactive theorem provers like and Isabelle for automated of logical properties. Emerging 2024-2025 research advances hybrid symbolic-numeric solving with E-graphs, combining discrete with . For instance, differentiable E-graph extraction enables gradient-based refinement of equivalences, aiding tasks like expression optimization in scientific computing. These developments extend E-graphs' utility in theorem proving by bridging symbolic reasoning with numerical methods, improving scalability for complex satisfiability problems.

Other Domains

E-graphs have been applied to by leveraging saturated e-graphs to generate equivalent code snippets that satisfy specifications, enabling efficient search over large program spaces. Tools like egglog, introduced around , extend equality saturation to support tasks by combining e-graph with datalog-style querying for relational reasoning over programs. For instance, egglog facilitates the construction of synthesizers that enumerate functionally equivalent implementations, improving scalability for tasks such as . In and static analysis, e-graphs represent sets of equivalent values to approximate program behaviors, aiding in bug detection by capturing relational properties that traditional lattices might miss. A 2022 framework integrates e-graphs with abstract interpretation domains, where e-classes model abstract values and rewrites propagate equivalences across , enhancing precision in tools like those for pointer analysis or taint tracking. This combination allows for bidirectional refinement: abstract domains inform e-graph saturation, while e-graphs provide over-approximations of equality for scalable analysis. For and testing, e-graphs enable equivalence-based mutation to generate semantically identical variants, supporting testing and validation. The Wasm-mutate tool, developed in 2022, constructs e-graphs from binaries via to produce diverse yet equivalent mutations, which have uncovered bugs in multiple Wasm s by amplifying subtle differences in optimization paths. This approach ensures mutations preserve behavior while exploring edge cases, outperforming random mutation in coverage for resource-constrained environments. Beyond these, e-graphs contribute to machine learning model optimization by extracting optimal subgraphs under complex, differentiable cost models. In 2025, SmoothE introduced a probabilistic algorithm that optimizes e-graphs on GPUs for ML compute graphs, reducing latency in frameworks like by rewriting tensor operations while respecting hardware constraints. Similarly, equality saturation verifies semantic in large-scale models, enabling safe parallelism and fusion optimizations in distributed training pipelines. In database systems, e-graphs optimization through rewrite rule application; for example, the library has been used to build SQL optimizers that enumerate equivalent plans via saturation, improving join ordering and subquery elimination in engines like those based on Cascades frameworks. Recent work applies learned rewriting on e-graphs to generate optimal SQL execution plans, reducing query latency by up to 50% on TPC-H benchmarks. Case studies illustrate these applications in practice. The library, originating from 2020 research, has been adapted to Haskell via the hegg package (released in 2022 and updated through 2025), enabling equality saturation for functional and optimization in a pure functional setting without external dependencies. In scientific computing, e-graphs support algebraic simplification of symbolic expressions as of 2025, such as in systems for optimizing numerical simulations by equating variant forms of differential equations, though adoption remains experimental. Looking ahead, e-graphs show promise for integration with simulations, where equivalence reasoning could optimize circuit representations in graph-based models, though current efforts focus on preliminary extensions like hypergraphs for monoidal categories underlying quantum protocols.

References

  1. [1]
    None
    Summary of each segment:
  2. [2]
    [PDF] Equality Saturation: a New Approach to Optimization ∗ - CSE Home
    Jan 18, 2009 · To this end, an E-PEG is a graph that groups together PEG nodes that are equal into equivalence classes. As an example, Figure 2(b) shows the E- ...
  3. [3]
    Fast Decision Procedures Based on Congruence Closure
    A simple proof is given that the congruence closure algorithm provides a decision procedure for the quantifier-free theory of equality.
  4. [4]
    [2004.03082] egg: Fast and Extensible Equality Saturation - arXiv
    Apr 7, 2020 · This work contributes two techniques that make e-graphs fast and extensible, specializing them to equality saturation.Missing: original | Show results with:original
  5. [5]
    [PDF] Efficient E-matching for SMT Solvers - Leonardo de Moura
    Abstract. Satisfiability Modulo Theories (SMT) solvers have proven highly scalable, efficient and suitable for integrating theory reasoning.
  6. [6]
    Z3: an efficient SMT solver - Microsoft Research
    Mar 28, 2008 · Z3 is a new and efficient SMT Solver freely available from Microsoft Research. It is used in various software verification and analysis applications.
  7. [7]
    (PDF) Z3: an efficient SMT solver - ResearchGate
    Aug 7, 2025 · Z3 is a new and efficient SMT Solver freely available from Microsoft Research. It is used in various software verification and analysis applications.
  8. [8]
    egg: Fast and extensible equality saturation - ACM Digital Library
    This work contributes two techniques that make e-graphs fast and extensible, specializing them to equality saturation.Missing: original | Show results with:original
  9. [9]
    Acyclic E-graphs for Efficient Optimization in a Production Compiler ...
    We decided in 2022 to rearchitect Cranelift's mid-end to use an e-graph representation. This solved a pre-existing pass-ordering problem, and provided us an ...
  10. [10]
    Relational e-matching | Proceedings of the ACM on Programming ...
    Relational e-matching is a new approach using relational join and database query techniques, searching the e-graph with an optimized query plan.
  11. [11]
    Wasm-mutate: Fuzzing WebAssembly Compilers with E-Graphs ...
    Wasm-mutate represents the search space for new programs as an e-graph and exploits the property that any traversal through the e-graph represents a ...
  12. [12]
    How to prove equations using diagrams, part 1 - Topos Institute
    May 27, 2025 · An e-graph, short for “equality graph,” is a data structure that maintains a congruence relation on expression trees: an equivalence relation ...Missing: science | Show results with:science
  13. [13]
    CS 6120: Technology Mapping with Egraphs
    May 13, 2025 · E-graphs are directed graphs with edges from expressions (e-nodes in an e-graph), to a group of e-nodes (an e-class in an e-graph, where all the ...
  14. [14]
    egg::tutorials::_01_background - Rust - Docs.rs
    An e-graph is a data structure that maintains equivalence over expressions, and it is used to store similar expressions compactly.
  15. [15]
    [1012.1802] Equality Saturation: A New Approach to Optimization
    Dec 2, 2010 · Title:Equality Saturation: A New Approach to Optimization. Authors:Ross Tate (University of California, San Diego), Michael Stepp (University ...
  16. [16]
    [2108.02290] Relational E-Matching - arXiv
    Aug 4, 2021 · Relational e-matching is a new approach using relational join and database query techniques, searching the e-graph with an optimized query plan.
  17. [17]
    [PDF] Relational E-matching - Yihong Zhang
    We present a new approach to e-matching based on relational join; in particular, we apply recent database query execution techniques to guarantee worst-case ...
  18. [18]
    [PDF] egg: Fast and Extensible Equality Saturation - Chandrakana Nandi
    1 Definitions. Intuitively, an e-graph is a set of equivalence classes (e-classes). Each e-class is a set of e- ...
  19. [19]
  20. [20]
    [PDF] Improving Term Extraction with Acyclic Constraints - cs.Princeton
    In this paper, inspired by these prior works, we propose breaking cycles by identifying the cycles. The cycles detected in a given e-graph will be encoded as ...<|control11|><|separator|>
  21. [21]
    Boosted E-Graph Extraction with Adaptive Heuristics and Exact ...
    Aug 18, 2025 · e-boost is a framework for e-graph extraction using parallelized heuristics, adaptive pruning, and initialized exact solving, achieving 558x ...
  22. [22]
    Fast and Optimal Extraction for Sparse Equality Graphs
    Oct 8, 2024 · Extraction is a challenging problem and is notably known to be NP-hard ... In fact, in this paper, we show that e-graph extraction is hard to ...
  23. [23]
    SmoothE: Differentiable E-Graph Extraction - ACM Digital Library
    In this work, we propose SmoothE, a differentiable e-graph extraction algorithm designed to handle complex cost models and optimized for GPU acceleration.
  24. [24]
    eqsat: An Equality Saturation Dialect for Non-destructive Rewriting
    May 14, 2025 · We take LLVM's MLIR framework and propose a new MLIR dialect named eqsat that represents e-graphs in MLIR code. This not only provides ...<|control11|><|separator|>
  25. [25]
    Equality saturation: a new approach to optimization
    Equality saturation: a new approach to optimization. Authors: Ross Tate.
  26. [26]
    Guided Equality Saturation | Proceedings of the ACM on ...
    Jan 5, 2024 · An e-graph efficiently represents a large set of equivalent terms and is grown by repeatedly applying rewrite rules in a purely additive way.<|control11|><|separator|>
  27. [27]
  28. [28]
    The E-graph extraction problem is NP-complete - Yihong Zhang
    Jun 22, 2023 · The extraction problem is in NP 1 because it can be reduced to integer linear programming (ILP). Moreover, we show the extraction problem is NP-hard by ...
  29. [29]
    Parameterized Complexity of Running an E-Graph - UW PLSE
    Jun 16, 2025 · The key insight of this blog post is that using linear-matching rules results in a much better complexity bound compared to the super- ...
  30. [30]
  31. [31]
    [PDF] Rewrite Rule Inference Using Equality Saturation
    E-graphs can compactly represent the exponentially large sets of enumerated terms and potential rewrite rules. We show that equality saturation efficiently ...Missing: engineering | Show results with:engineering
  32. [32]
    [PDF] Practical and Flexible Equality Saturation - Max Willsey
    The first equality saturation paper used an intermediate representation called Program Expression Graphs (PEGs) to encode loops and conditionals. PEGs have ...
  33. [33]
    @saulshanabrook/egraph-visualizer - NPM
    Sep 19, 2024 · Interactive visualizer for e-graphs in the serialized JSON format using Cytoscape JS and Eclipse Layout Kernel.
  34. [34]
    [PDF] for Efficient Optimization in a Production Compiler - Chris Fallin
    - harder to express rewrites that alter side-effects. - need special ... E-graphs… in Industry? • Isn't this bona-fide research? Am I not a software ...
  35. [35]
    [PDF] Machine Learning Guided Equality Saturation - Michel Steuwer
    Equality saturation overcomes downsides of naive greedy rewrite systems. Still, e-graphs face scaling issues when deal- ing with real-world problems where just ...
  36. [36]
    [PDF] Semantic foundations of equality saturation - arXiv
    Jan 5, 2025 · By explicitly allowing infinite E-graphs we can define a formal semantics even when EqSat does not terminate. We show that concepts in tree ...
  37. [37]
    [PDF] equality saturation: a new approach to optimization - Ross Tate
    EQUALITY SATURATION: A NEW APPROACH TO OPTIMIZATION. 3. A. Thus, there is no ... Our E-PEGs are in essence specialized E-graphs for reasoning about PEGs.
  38. [38]
    Egg: E-Graphs Good
    The egg project uses e-graphs to provide a new way to build program optimizers and synthesizers. An e-graph compactly represents many equivalent programs.
  39. [39]
    Fast and Effective Binary Diversification for WebAssembly - arXiv
    Sep 14, 2023 · Our results highlight that WASM-MUTATE can produce tens of thousands of unique and efficient WebAssembly variants within minutes.
  40. [40]
    [PDF] EqMap: FPGA LUT Remapping using E-Graphs
    In this paper, we introduce a way to augment FPGA technology mapping with e-graph data structures in order to find more exact solutions, without significantly ...Missing: CS 6120
  41. [41]
    [PDF] Automated Code Optimization with E-Graphs - arXiv
    Dec 30, 2021 · An e-graph cost function must return a positive, non-complex number value and must accept 3 arguments: 1. The ENode n that is being inspected. 2 ...
  42. [42]
    [PDF] SmoothE: Differentiable E-Graph Extraction
    Mar 30, 2025 · Abstract. E-graphs have gained increasing popularity in compiler op- timization, program synthesis, and theorem proving tasks.Missing: seminal | Show results with:seminal
  43. [43]
    [PDF] Z3: An Efficient SMT Solver
    Equalities asserted by the SAT solver are propagated by the congruence closure core using a data structure that we will call an E-graph following [8]. Nodes in ...
  44. [44]
  45. [45]
    How to prove equations using diagrams, part 2 - Topos Institute
    Jun 10, 2025 · As a proof of concept, my colleague Kris Brown has implemented e-graphs using hypergraph rewriting (Brown 2025). Rewrites of an e-diagram J ...
  46. [46]
    [PDF] Small Proofs from Congruence Closure - arXiv
    Sep 7, 2022 · We compare our greedy algorithm against the state-of-the-art SMT solver Z3, which performs proof reduction (see Section II) to find smaller ...<|separator|>
  47. [47]
    Programming Z3 - Stanford CS Theory
    Let $\mathcal{T}$ be a set of terms and $\mathcal{E}$ set of equalities over $\mathcal{T}$ . A congruence closure over $\mathcal{T}$ modulo $\mathcal{E}$ ...
  48. [48]
    [PDF] SmoothE: Di!erentiable E-Graph Extraction
    E-graph is an extended union-!nd [43] data structure com- pactly representing many equivalent terms. Originally devel- oped for automated theorem provers (ATPs) ...
  49. [49]
    egglog - crates.io: Rust Package Registry
    Oct 31, 2023 · egglog is a language that combines the benefits of equality saturation and datalog. It can be used for analysis, optimization, and synthesis of programs.
  50. [50]
    egglog - Rust - Egg: E-Graphs Good
    egglog is a language specialized for writing equality saturation applications. It is the successor to the rust library egg. egglog is faster and more general ...
  51. [51]
    [2205.14989] Combining E-Graphs with Abstract Interpretation - arXiv
    May 30, 2022 · Abstract interpretation is a general method to learn such properties, traditionally used in static analysis tools. We demonstrate that abstract ...
  52. [52]
    Combining E-Graphs with Abstract Interpretation - ACM Digital Library
    Abstract interpretation is a general method to learn such properties, traditionally used in static analysis tools. We demonstrate that abstract interpretation ...
  53. [53]
    [PDF] Wasm-Mutate: Fast and Effective Binary Diversification for ... - arXiv
    WASM-MUTATE is a tool that quickly generates diverse WebAssembly variants, preserving functionality and using an e-graph data structure.
  54. [54]
    [PDF] Verifying Semantic Equivalence of Large Models with Equality ...
    Apr 3, 2025 · In this work, we propose AERIFY, a framework to auto- matically expose silent errors by verifying semantic equiva- lence of models with equality ...
  55. [55]
    Building an SQL Optimizer with Egg (EGRAPHS 2023 - E-Graph ...
    In this talk, we will share our experiences from a user's perspective on how we utilized egg to construct the database optimizer, our approaches to solving ...Missing: rustc | Show results with:rustc<|control11|><|separator|>
  56. [56]
  57. [57]
    [ANN] E-graphs and equality saturation: hegg 0.1 - Haskell Discourse
    Aug 25, 2022 · It is a pure haskell implementation found on two recent awesome papers: egg: Fast and Extensible Equality Saturation (2020) and Relational E- ...Missing: original | Show results with:original
  58. [58]
    E-graphs | Hey There Buddo! - Philip Zucker
    Nov 2, 2025 · E-graphs are a data structure that efficiently holds terms and equalities between terms. It is useful for algebraic simplification, program transformations and ...
  59. [59]
    [EGRAPHS24] Equivalence Hypergraphs: E-Graphs for Monoidal ...
    Jul 23, 2024 · Michael Beverland: A Modular Quantum Computer Based on Bivariate Bicycle Codes. IQuS - The InQubator For Quantum Simulation New 37 views · 22:08.