Fact-checked by Grok 2 weeks ago

Common subexpression elimination

Common subexpression elimination (CSE) is a optimization technique that identifies repeated computations of identical subexpressions within a program and replaces subsequent occurrences with references to the initial result, thereby eliminating redundancy and improving execution efficiency. The technique dates back to the early days of optimizing , with roots in the I compiler developed in the 1950s, and saw significant advancements in the 1960s through IBM's H and academic research on global analysis methods. Pioneering work by John Cocke in 1970 introduced global common subexpression elimination using value numbering to detect redundancies across the entire program flow graph, enabling optimizations beyond local scopes. Frances Allen's 1971 catalogue of optimizing transformations further formalized CSE as a key method for reducing executed instructions and code size, often in conjunction with code motion. CSE operates by analyzing the program's and data dependencies, typically in intermediate representations like static single assignment () form, to ensure that a prior computation dominates all uses and remains valid without intervening modifications. It encompasses variants such as local CSE within a single , global CSE across multiple blocks using dominance relations, spatial CSE for redundancies in the same iteration, and temporal CSE for reusing values across loop iterations. Challenges include handling memory aliasing, side effects like exceptions, and ensuring no invalid reuse due to variable modifications, often addressed through alias analysis and data-flow algorithms. In modern compilers, CSE remains a foundational optimization, contributing substantially to performance gains—such as up to 80% speed improvements in historical systems like H—and integrates with other techniques like and to generate efficient . Its application extends beyond general-purpose programming to domain-specific areas, including where it minimizes operations in constant multiplication decompositions.

Fundamentals

Definition

Common subexpression elimination (CSE) is a optimization technique that searches for instances of identical or similar expressions within a and replaces subsequent computations of those expressions with references to the result of the first , thereby eliminating redundant calculations. The core goal of CSE is to minimize execution time and reduce code size by avoiding the recomputation of identical subexpressions, such as arithmetic operations (e.g., additions or multiplications) or calls that yield the same value under identical inputs. This optimization targets redundancies that can be detected at , ensuring that the benefits are realized without introducing errors from side effects or . In terms of scope, CSE applies to both scalar expressions, like basic or logical operations on single values, and aggregate expressions, including memory loads or operations on structures, provided the redundancies are statically identifiable and the expressions produce the same value regardless of execution path. A subexpression is defined as any reusable component within a larger expression—such as a sub-tree in an —that computes the same value multiple times and can be safely hoisted or referenced to avoid repetition.

Historical Development

Common subexpression elimination (CSE) emerged in the late 1950s as part of early efforts to optimize compiled code, with initial implementations appearing in the Fortran I compiler developed at IBM. This system incorporated basic forms of CSE alongside techniques like constant folding and fast index computations to reduce redundant operations in generated machine code. Pioneering work in the field was advanced by Soviet researcher Andrei Ershov, who in 1957 introduced a fast CSE algorithm using hashing for the BESM programming problem solver, enabling efficient reuse of computed values through backward analysis. The and marked a foundational period for CSE through theoretical advancements in . Frances E. Allen's 1969 paper on formalized key techniques, including CSE within basic blocks and invariant code motion, laying the groundwork for systematic optimizations. In 1970, John Cocke presented "Global Common Subexpression Elimination" at the Symposium on Compiler Optimization, extending CSE across entire programs by leveraging analysis to identify and eliminate redundant expressions beyond local scopes. Allen and Cocke further collaborated on a 1971 catalogue of optimizing transformations, which detailed CSE as a core method for eliminating computations that recompute available values, influencing subsequent design. By the 1980s, CSE had become integrated into production compilers, with the GNU Compiler Collection (GCC), first released in 1987, incorporating local and global variants as standard optimization passes to enhance code efficiency. The 1990s brought significant evolution through the introduction of static single assignment (SSA) form in 1991, which facilitated more precise global CSE via value numbering techniques that track expression equivalences across program flows. This progression from rudimentary optimizations in early systems to sophisticated interprocedural analysis in modern frameworks like reflects CSE's enduring role in compiler technology.

Core Principles

Local Common Subexpression Elimination

Local common subexpression elimination (CSE) is a optimization technique that identifies and removes redundant computations within a single , which consists of straight-line code without branches or loops. This approach focuses on intra-block redundancy by detecting identical expressions and replacing subsequent occurrences with references to the initial computation, thereby reducing code size and execution time. It employs simple mechanisms such as value numbering or hashing to assign equivalence classes to expressions based on their operands and operators. The detection method involves a linear traversal of the basic block from top to bottom, maintaining a table that maps computed expressions to their results or value numbers. For each expression encountered, such as an assignment x ← y op z, the hashes the combination of the op and the value numbers of operands y and z. If the hash matches an existing entry and the stored value remains available (i.e., not overwritten by side effects), the current expression is replaced with a reference to the stored value; otherwise, a new entry is created. This process ensures that equivalent expressions share the same value number, enabling efficient redundancy elimination without complex . A typical algorithm for local value numbering can be sketched as follows, processing instructions sequentially in a basic block:
Initialize an empty [hash table](/page/Hash_table) H mapping expression signatures to value numbers
Initialize a counter for new value numbers starting at 1
For each instruction in the [basic block](/page/Basic_block) (top to bottom):
    If the instruction is x ← y op z:
        Compute signature = [hash](/page/Hash)(VN[y], op, VN[z])  // VN is value number of [operand](/page/Operand)
        If signature exists in H and value is available:
            Assign VN[x] = H[signature]
            Replace instruction with x ← temp (where temp holds the stored value)
        Else:
            Assign VN[x] = new value number (increment counter)
            Store H[signature] = VN[x]
    Else if the instruction is x ← constant or x ← input:
        Assign VN[x] = new value number
This illustrates how hashing facilitates quick lookups and replacements, with value numbers (VN) tracking equivalence. Consider a with the instructions a ← b + c followed by d ← b + c. Upon processing the first instruction, the expression b + c receives a new value number, say 5, and is stored in the . For the second instruction, the identical expression hashes to the same entry, so d is assigned value number 5 and replaced with d ← a, eliminating the redundant . Despite its efficiency, local CSE has limitations: it cannot detect redundancies across boundaries or handle expressions interrupted by , side effects like variable redefinitions, or , which may invalidate stored values. In contrast, global CSE extends this analysis across the of an entire function.

Global Common Subexpression Elimination

Global common subexpression elimination extends the scope of the optimization to program regions encompassing multiple basic blocks, including those with branches and loops, by leveraging to identify redundant expressions across dependencies. Unlike approaches confined to blocks, global CSE uses reaching definitions to track which definitions propagate to a point and available expressions to determine expressions computed on all incoming paths without redefinition of their operands. This enables the safe reuse of computations that would otherwise be repeated due to divergent . The detection method begins with constructing a (CFG) that models the program's execution paths as s (basic blocks) and edges (control transfers). On this graph, the available expressions lattice is computed through forward , where each maintains a set of expressions that are guaranteed to be available upon entry. Redundancies are then eliminated by replacing recomputations with references to prior evaluations whenever an expression is available at the point of use, provided no intervening modifications occur. This process iteratively propagates information across the CFG to converge on fixed-point solutions for availability. Key concepts in CSE include the use of and post-dominators to ensure safe expression reuse. A d dominates a point p if every path from the entry to p passes through d, allowing computations in d to be safely hoisted for availability at p; conversely, post-dominators ensure that reuse sites reach all necessary exit paths without omission. Handling side effects is critical, as expressions involving operations may alter ; is employed to conservatively determine whether two references might point to the same location, preventing incorrect eliminations that could change program semantics. Building upon local CSE as a foundational step within blocks, integrates these elements to address inter-block redundancies. The available expressions set A(p) at a program point p is defined as the expressions computed on all paths to p, with no subsequent redefinition of their variables. The propagation equation for this forward must-analysis is: A(p) = \bigcap_{q \in \mathrm{pred}(p)} \big( G(q) \cup A(q) \big) where G(q) is the set of expressions generated within predecessor block q, A(q) is the available set upon exit from q, and \mathrm{pred}(p) denotes the predecessors of p. This intersection-based meet operator ensures only universally available expressions are retained, and the equation is solved iteratively until the data flow framework stabilizes, typically using worklist algorithms on the CFG.

Illustrative Examples

Basic Arithmetic Example

Consider a simple arithmetic sequence in a program where the same subexpression is computed multiple times without any intervening modifications to its operands. For instance, the following unoptimized code computes the addition of variables a and b twice:
c
int x = a + b;
int y = a + b;
int z = x * 2;
This assumes that variables a and b have no side effects, such as being modified by function calls or pointers between uses, ensuring the subexpression's value remains consistent. Common subexpression elimination identifies the redundant computation of a + b in the assignment to y and replaces it with a reference to the previously computed value in x. The optimized code becomes:
c
int x = a + b;
int y = x;
int z = x * 2;
This transformation reuses the value from x, eliminating the second addition operation. The step-by-step process involves: (1) scanning the code to detect identical subexpressions like a + b; (2) verifying that the subexpression's value has not changed since its first computation; and (3) substituting the reference to the stored value, which reduces the number of arithmetic instructions from three (two additions and one multiplication) to two (one addition and one multiplication). This demonstrates the core principle of value reuse in local common subexpression elimination by avoiding recomputation in straight-line code.

Control Flow Example

Consider a simple structure where the same expression is computed in both branches of a conditional . Although only one branch is executed at , the source code contains duplicate instances of the expression, which global CSE can eliminate by hoisting the computation to before the conditional. The following illustrates this :
if (cond) {
    t1 = [a + b](/page/List_of_French_composers);
    // other statements using t1, e.g., result1 = t1 * 2;
} else {
    t2 = [a + b](/page/List_of_French_composers);
    // other statements using t2, e.g., result2 = t2 * 3;
}
In this unoptimized form, the a + b appears redundantly in both branches, leading to that increases size, assuming no side effects in the conditional test or branches that modify a or b. To apply global common subexpression elimination (CSE), the constructs a (CFG) for the function, where nodes represent basic blocks and edges denote possible control transfers. Available expressions analysis is then performed on the CFG to identify expressions like a + b that are computed on all paths to their uses and remain unchanged thereafter. In this case, since the expression is identical in both branches and the uses are local to each, the can hoist the computation to a point before the branch, replacing the duplicate evaluations with a single shared temporary. The optimized code becomes:
t = a + b;
if (cond) {
    // other statements, using t, e.g., result1 = t * 2;
} else {
    // other statements, using t, e.g., result2 = t * 3;
}
This transformation requires verifying that the expression a + b has no intervening definitions or side effects (e.g., no modifications to a or b within the branches that could alter the value), as such issues would invalidate the availability. If side effects are present or the expression is redundant only on some paths, the optimization may need to be limited or deferred to more advanced techniques like . The outcome is a reduction in code size by eliminating the duplicate computation, improving instruction cache efficiency and potentially simplifying further optimizations, while preserving program semantics and ensuring the expression is evaluated exactly once per execution path. This demonstrates how global CSE, building on available expressions detection, resolves redundancies across divergent control paths in the source code.

Implementation Techniques

Algorithmic Approaches

Classical algorithms for common subexpression elimination typically operate on expression trees by performing a bottom-up traversal to label subtrees and detect identical structures, enabling the construction of a (DAG) where common subexpressions are shared. This approach, as outlined in the Sethi-Ullman method for generating optimal code from arithmetic expressions, assigns labels to each based on the and the labels of its children, allowing the to recognize and reuse identical subtrees during . By representing the expression as a DAG rather than a tree, redundant computations are inherently eliminated at the stage. To extend this to control flow graphs for global analysis, expressions are hashed to unique identifiers during a traversal of the basic blocks, facilitating the substitution of redundant computations across the program. Hashing typically combines the with the numbers (unique IDs) of operands, stored in a table that maps es to temporary variables holding the computed s. If a matching hash is found, the current expression is replaced by a reference to the stored value, ensuring consistency provided the operands have not been modified. Modern variants leverage frameworks, particularly available expressions analysis, to handle global common subexpressions more robustly across paths. This forward data flow problem iteratively solves equations to determine, at each program point, the set of expressions computed on all incoming paths without intervening modifications to their operands. Backward propagation may be incorporated in extensions, such as for partial redundancy elimination, to refine by considering conditional anticipability of expressions. For efficiency in implementations with a large number of potential expressions, bit vectors represent sets of available expressions, allowing fast and operations during iterative fixed-point computation over the . The of local common subexpression elimination is linear O(n) in the size of the , achieved through a single with constant-time lookups. Global variants exhibit worst-case O(n²) complexity due to the need for multiple iterations over the edges of the until convergence in data flow solving. A typical for expression hashing and substitution in a can be outlined as follows:
Initialize an empty [hash table](/page/Hash_table) mapping expression hashes to temporary variables
Initialize a value number mapping for [variables](/page/Variable) (initially each has unique ID)

For each [statement](/page/Statement) in the [basic block](/page/Basic_block):
    If the [statement](/page/Statement) assigns a [variable](/page/Variable) v to an expression e = op(left, right):
        Compute hash = hash_function(op, value_number(left), value_number(right))
        If hash exists in the table:
            Replace the [statement](/page/Statement) with v = temp (where temp is table[hash])
        Else:
            Generate a new temporary temp for the value of e
            Add hash -> temp to the table
            Update value_number(v) = new value number for temp
    Else if the [statement](/page/Statement) is an [assignment](/page/Assignment) v = source:
        Update value_number(v) = value_number(source)
    Update value numbers for any modified [variables](/page/Variable)
This performs local elimination; global extension requires propagating value numbers across blocks using dominance or availability information.

Integration with Form

Static Single Assignment () form represents code such that each variable is assigned exactly once, with all uses of that variable referring to its unique definition. This structure aids common subexpression elimination (CSE) by enabling identical expressions to share the same SSA name across multiple uses, as long as the defining assignment dominates those uses and no intervening redefinitions occur. In this way, SSA explicitly tracks value equalities through variable names, making it straightforward to detect and reuse computations without re-evaluating them. The integration process begins with constructing the intermediate representation () from the program's , inserting functions at merge points to reconcile values from different paths. Once in SSA form, sparse conditional constant propagation (SCCP) is performed to refine variable values by propagating constants along executable paths while marking . This is followed by copy propagation, which substitutes redundant expressions with references to the original SSA variable, and removal to excise unused instructions, thereby implementing CSE efficiently. A key advantage of SSA for CSE is its simplification of global analysis, as the single static assignment per variable provides precise def-use chains that highlight value equalities without requiring complex checks. Phi functions further enhance this by managing value merges at joins, allowing CSE to proceed without explicit tracking of expression availability across branches. This approach reduces the overhead of traditional availability computations in global CSE. For example, consider the following code snippet before transformation:
label1: x = a + b
        ... (some code)
label2: y = a + b
Assuming label1 dominates label2, the form assigns the same name to both expressions:
label1: x1 = a + b
        ... ([phi](/page/Phi) insertions if needed)
label2: y1 = x1  // Copy [propagation](/page/Propagation) replaces the redundant computation
This direct reuse eliminates the second addition, with dead code removal cleaning up any remnants.

Advantages and Limitations

Performance Benefits

Common subexpression elimination (CSE) delivers key performance benefits by removing redundant computations, resulting in faster execution, more compact code, and reduced energy use. These gains are most notable in compute-bound applications where expressions are recomputed frequently, such as in scientific simulations or numerical algorithms. Runtime improvements from CSE arise from avoiding unnecessary operations, particularly in loops. A verified global CSE implementation in a formally verified achieved average speedups of 10% to 20% across Polybench/C benchmarks, which emphasize compute-intensive workloads similar to those in SPEC CPU suites. These reductions stem from fewer arithmetic instructions and better utilization of resources, with the impact scaling based on the density of redundant expressions in the code. CSE also contributes to smaller code size by substituting repeated expressions with temporary variables or registers, thereby decreasing the overall instruction count in optimized binaries and improving instruction efficiency. This reduction in code footprint lowers miss rates, providing indirect runtime benefits through better access patterns. In contexts, optimizations including CSE lead to savings; for instance, analyses on ARM-based platforms using benchmarks such as FDCT have shown reductions of up to 4% in . Empirical evaluations in LLVM-based compilers highlight CSE's role in broader optimization pipelines.

Potential Drawbacks

One major limitation of common subexpression elimination (CSE) arises from pointer , where multiple pointers or references may access the same memory location. In such cases, the compiler cannot safely eliminate redundant expressions involving memory loads or stores, as a write through one pointer might invalidate the value loaded via another, leading to false negatives where potential optimizations are conservatively skipped. This issue is particularly pronounced in languages like and , where alias analysis must be precise to avoid incorrect assumptions, but conservative analyses often err on the side of caution to prevent runtime errors. For instance, if two pointers potentially alias, a subsequent load cannot reuse a prior computation even if the expressions appear identical, requiring additional alias analysis infrastructure that may still miss opportunities due to its inherent limitations. Global CSE, which performs analysis across function boundaries or entire programs, introduces substantial compilation overhead compared to local variants. The interprocedural required to identify common subexpressions at a global scope demands significant computational resources, particularly for large or complex codebases, often extending overall compile times due to repeated passes over the . This overhead stems from the need to track expression availability and liveness across multiple basic blocks and functions, which can be resource-intensive and may not always yield proportional runtime benefits in memory-constrained or time-sensitive build environments. Basic CSE is ineffective against partial redundancies, where an expression is computed multiple times along some but not all paths to a program point. Unlike fully redundant expressions that CSE can eliminate outright, partial redundancies require insertions of computations at strategic points to transform them into fully redundant ones, a capability beyond standard CSE algorithms. As a result, basic CSE leaves such inefficiencies unaddressed, necessitating extensions like partial redundancy elimination (PRE) to achieve more comprehensive optimization, though this adds further analytical complexity. CSE can complicate by obscuring the relationship between and the optimized binary. By replacing multiple instances of an expression with a single computation, the optimizer may eliminate intermediate assignments or reorder instructions, causing debuggers to display noncurrent values that do not match expectations during single-stepping or hits. This leads to issues such as the program jumping unexpectedly or failing to halt at apparent source lines, as the mapping from source statements to becomes distorted, often requiring specialized techniques like recovery analysis to approximate unoptimized behavior.

References

  1. [1]
    [PDF] Lecture Notes on Common Subexpression Elimination
    Oct 29, 2015 · The thorny issue for common subexpression elimination is determining when the optimization above is performed. Consider the following program ...
  2. [2]
    [PDF] a survey of compiler optimization techniques
    The history of optimizing compilers dates back at least as ... common subexpression compiler to measure the degree of parallelism obtainable in a elimination.
  3. [3]
    [PDF] A Catalogue of Optimizing Transformations - Rice University
    This optimization, which is also called aommon subexpression elimination, involves finding and eliminating those computations which calculate values already.
  4. [4]
    SA-C compiler optimizations
    Common Subexpression Elimination (CSE) is an old and well known compiler optimization that eliminates redundancies by looking for identical subexpressions that ...
  5. [5]
  6. [6]
    [PDF] Code Optimization April 22, 2015 - Columbia CS
    Apr 22, 2015 · the AFTER block by local common subexpression elimination: BEFORE ... 596). 9. Reading. • ALSU, Sections 8.5, 8.7, 9.1 aho@cs.columbia.edu.
  7. [7]
    [PDF] A history of compilers
    Feb 21, 2014 · • Knuth: A history of writing compilers (1962). – Few references, names ... – common subexpression elimination. – fast index computations ...
  8. [8]
    Frances Allen Bibliography - A.M. Turing Award Winner - ACM
    The methods described—procedure integration (today called inlining), loop unrolling, loop fusion, common subexpression elimination, code motion, constant ...
  9. [9]
    Global common subexpression elimination - ACM Digital Library
    Common subexpression elimination is commonly employed to reduce the number of operations in DSP algorithms after decomposing constant multiplications into ...
  10. [10]
    History - GCC Wiki
    A brief history of GCC. The very first (beta) release of GCC (then known as the "GNU C Compiler") was made on 22 March 1987.The Early Days · The Cygnus Years · EGCS
  11. [11]
    [PDF] Value Numbering - Department of Computer Science
    SUMMARY. Value numbering is a compiler-based program analysis method that allows redundant computations to be removed. This paper compares hash-based ...
  12. [12]
    [PDF] Compilers: Principles, Techniques, and Tools
    ... Compilers, principles, techniques, and tools / Alfred V. Aho, Ravi. Sethi ... common to cover the first half in an undergraduate course and the second ...
  13. [13]
    [PDF] Optimizations - UT Computer Science
    Common Subexpression Elimination. • If program computes same expression ... int f(int n) { int b=1; while (n--) { b = 2*b }; return b; } int g(int x) ...
  14. [14]
    [PDF] CS153: Compilers Lecture 19: Optimization - Harvard University
    •Common subexpression elimination removes the redundant add and multiply: t = a + i*4; [t] = [t] + 1. •For safety, you must be sure that the shared.
  15. [15]
    [PDF] Lecture 7: Redundancy elimination
    We can eliminate a common subexpression by storing its value into a new temporary variable when it is first computed, and reusing that variable later when the ...
  16. [16]
    [PDF] Maximally and arbitrarily fast implementation of linear and feedback ...
    Most often the treatment of common subexpression in com- piler research and development is based on value numbering [4] and Sethi–Ullman numbering [33] ...
  17. [17]
    [PDF] Global Value Numbers and Redundant Computations 1 Introduction
    values. Common subexpression elimination on basic blocks is straightforward. If a symbolic value is computed twice, eliminate it the second time. Thus, in ...
  18. [18]
    Global common subexpression elimination
    (A more complete and elementary description of this process can be found in John Cocke and J. T. Schwartz (1970),. Programming Languages and Their Compilers ...
  19. [19]
    [PDF] Global Common Subexpression
    A global common subexpression (CSE) is an expression that must be evaluated at least once on every path to a point, and no variables in it are redefined after ...
  20. [20]
    [PDF] Program and Proof Optimizations with Type Systems 1 - Ando Saabas
    May 12, 2008 · 3 Common Subexpression Elimination. 3.1 Type System for Available Expressions Analysis ... available expressions analysis and is a backward.
  21. [21]
    [PDF] Global Common Subexpression Elimination ith D t fl A l i with Data ...
    Managing the name space. Need a unique name ∀ e ∈ AVAIL(b). 1. Can generate them as replacements are done (Fortran H). 2. Can compute a static mapping.
  22. [22]
    [PDF] Chapter 10 Scalar Optimizations
    Jan 1, 2003 · common-subexpression elimination as a way ... The result, called sparse conditional constant propagation (sccp), appears in Figure 10.20.<|control11|><|separator|>
  23. [23]
    [PDF] Simple, light, yet formally verified, global common subexpression ...
    May 4, 2021 · We present an approach for implementing a formally certified loop- invariant code motion optimization by composing an unrolling pass and a ...<|control11|><|separator|>
  24. [24]
    [PDF] Identifying Compiler Options to Minimise Energy Consumption for ...
    Aug 27, 2013 · This paper presents an analysis of the energy consumption of an extensive number of the optimisations a modern compiler can perform.
  25. [25]
    [PDF] Enabling polyhedral optimizations in LLVM - Polly
    Sustained growth in high performance computing and the availability of advanced mo- bile devices increase the use of computation intensive applications.Missing: empirical | Show results with:empirical
  26. [26]
    Aliasing - an overview | ScienceDirect Topics
    Aliasing occurs when two or more pointers or references refer to the same memory location, which can complicate compiler optimizations such as code motion, ...
  27. [27]
    LLVM Alias Analysis Infrastructure — LLVM 22.0.0git documentation
    Functions with this property are side-effect free and only depend on their input arguments, allowing them to be eliminated if they form common subexpressions or ...
  28. [28]
    Optimize Options (Using the GNU Compiler Collection (GCC))
    GCC optimization options control various optimizations, improving performance/code size, but may increase compilation time. -O levels control different  ...
  29. [29]
    Global CSE/Partial Redundancy Elimination - GNU Project
    May 18, 1998 · PRE is a more powerful scheme for eliminating redundant computations throughout a function. Our PRE optimizer is currently based on Fred Chow's thesis.
  30. [30]
    [PDF] Lecture 10 Partial Redundancy Elimination
    Partially redundant expression can be eliminated if we can insert computations to make it fully redundant.
  31. [31]
    Debugging Optimized Code (GNAT User's Guide for Native Platforms)
    This may result from any of the following optimizations: 'Common subexpression elimination:' using a single instance of code for a quantity that the source ...
  32. [32]
    [PDF] Debugging optimized code without being misled - People @EECS
    Optimization can cause an inconsistency between where the user expects a breakpoint to be located and the breakpoint's actual location. This article describes a ...