Fact-checked by Grok 2 weeks ago

Superoptimization

Superoptimization is a technique that aims to automatically discover the most efficient possible of a given , typically the shortest sequence of machine instructions or the fastest-executing code, while preserving exact input-output behavior. This is achieved through systematic search methods, such as exhaustive or sampling, which explore vast spaces of possible instruction sequences to identify globally optimal solutions unattainable by traditional heuristic-based compilers. The term and foundational approach were introduced by Henry Massalin in 1987 in the seminal paper "Superoptimizer: A Look at the Smallest Program," published in ACM SIGARCH News. Massalin's superoptimizer targeted the instruction set architecture, using a probabilistic equivalence testing method to verify candidate programs efficiently and make exhaustive searches practical for small functions, often yielding counterintuitive bit-manipulation sequences that minimized code length. This work highlighted superoptimization's potential not only for but also for evaluating and improving instruction set designs by revealing underutilized or suboptimal opcodes. Over the decades, superoptimization has advanced significantly, addressing scalability challenges through innovations in search algorithms and . Early extensions included constraint-based synthesizers that model equivalence as problems, enabling broader application. Notable tools include STOKE, a superoptimizer developed in 2013 that employs sampling to optimize x86 assembly functions by balancing correctness guarantees with performance gains. Complementing this, Souper, introduced in 2017 as a synthesizing superoptimizer, integrates with the compiler infrastructure to automatically derive and apply optimizations for intermediate representation, discovering novel rewrite rules that enhance real-world compiler passes. Recent developments have incorporated to further scale superoptimization to larger, real-world programs. Neural sequence-to-sequence models, as explored in 2021, learn from vast datasets to predict optimized transformations, outperforming traditional methods on benchmarks like the Big dataset. By 2025, large language models have been adapted for superoptimization, generating faster programs while maintaining semantic equivalence through guided stochastic search. These ML-driven approaches, alongside traditional methods, have expanded superoptimization's utility to domains like bytecode optimization and optimization on blockchains, where compactness and efficiency are critical. Despite its computational intensity, superoptimization remains a cornerstone of advanced compiler research, providing provably optimal code snippets that inform broader optimization strategies and continue to push the boundaries of software efficiency.

Fundamentals

Definition and Principles

Superoptimization is the automated process of searching for the shortest or fastest loop-free sequence of machine instructions that implements a given function or specification. This involves systematically exploring possible instruction combinations to identify the optimal one according to a defined cost metric. Unlike traditional compiler optimizations, which rely on heuristic rules and approximations for efficiency, superoptimization pursues provably optimal results through exhaustive or guided enumeration, often yielding surprising and non-intuitive code sequences. The fundamental principles of superoptimization center on exhaustive enumeration within a bounded space, where the search begins with sequences of increasing length until a valid optimal one is found. It focuses exclusively on straight-line, loop-free code without or , as these structures exponentially increase the search space and render exhaustive methods impractical for larger programs. The goal is to achieve verifiable optimality for small code fragments, contrasting with broader program optimizations that prioritize over perfection. Central to superoptimization are input-output specifications for , where candidate sequences are evaluated against a comprehensive set of test inputs to ensure they produce the same outputs as the original . Probabilistic or deterministic testing, often supplemented by formal checks, confirms correctness while pruning invalid candidates early. models define optimality, such as minimizing count for compact or estimating counts for , allowing adaptation to specific characteristics. This targeted approach suits peephole-style improvements on operations rather than whole-program transformation. A representative example is the function, where the input x resides in d0 on a processor. A superoptimized sequence avoids conditional branches by leveraging arithmetic and the subtract-with-extend () to handle the :
move.l d0, d1
add.l d1, d1
subx.l d1, d1
eor.l d1, d0
sub.l d1, d0
This five-instruction routine computes |x| in d0, reducing code size and potential branch mispredictions compared to naive implementations.

Relation to Compiler Optimization

Traditional compiler optimizations, such as , , and , typically employ heuristic rules applied locally to improve efficiency without guarantees of global optimality. These methods rely on predefined transformations that approximate improvements but may miss superior sequences due to their rule-based, non-exhaustive nature. In contrast, superoptimization searches exhaustively for the shortest or fastest instruction sequence that implements a given , providing provably optimal results for targeted code blocks and addressing limitations in traditional approaches by exploring the full space of possibilities. Superoptimization enhances compilers by generating highly efficient code patterns that can be integrated as rewrite rules into optimizers or applied as refinements in later passes. For instance, it can synthesize machine-specific idioms—such as optimized arithmetic combinations tailored to a particular —that methods often overlook, leading to measurable performance gains in specialized domains. This integration allows compilers to leverage superoptimized fragments for critical paths, improving overall code quality without overhauling existing optimization pipelines. Hybrid systems further exemplify this , where superoptimization techniques generate or refine the rules used by conventional optimizers, combining the scalability of heuristics with the precision of exhaustive search. Such approaches enable compilers to achieve better approximations of optimality in practice, particularly for low-level operations. However, superoptimization is generally scoped to small, loop-free code fragments, differing from full-program analysis in compilers that handle , memory access, and larger structures across entire applications. This limitation ensures computational feasibility but positions superoptimization as a complementary tool rather than a replacement for comprehensive strategies.

Historical Development

Origins and Early Research

The term superoptimization was first coined by Henry Massalin in his seminal paper "Superoptimizer: A Look at the Smallest Program," presented at the Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS II). In this work, Massalin described a system that performs an exhaustive search over a processor's instruction set to identify the shortest possible machine-language program for computing a given , focusing on small, straight-line code sequences without branches or jumps. This approach contrasted with traditional heuristics by aiming for provably optimal code tailored to specific tasks. Massalin's research was driven by frustrations with the inefficiencies of compiler-generated code in performance-critical kernels. At the time, manual assembly programming was often necessary to achieve compact and fast implementations for low-level routines, but this was labor-intensive and error-prone. The superoptimizer addressed this by automating the discovery of unconventional instruction combinations—such as intricate bit manipulations—that compilers overlooked, thereby providing insights into both and instruction set design. Implemented for the architecture, the system restricted its search to a subset of common instructions to manage computational demands, targeting functions up to about 13 instructions in length. Initial experiments demonstrated significant improvements, such as reducing the signum function from 9 to 4 instructions, through brute-force of all possible sequences up to length 10. A crucial innovation was the use of input-output testing to bound the search space: candidate programs were generated and verified against a large set of random inputs to ensure functional equivalence, with probabilistic methods accelerating the process while maintaining high confidence in optimality. By limiting the scope to straight-line code, Massalin made exhaustive search practical on 1980s hardware, highlighting the trade-offs between completeness and feasibility in automated optimization.

Evolution and Key Milestones

Building upon the foundational superoptimizer concept introduced by Henry Massalin in 1987, subsequent developments in the 1990s focused on practical integration into production compilers. In 1992, Torbjörn Granlund and Richard Kenner developed the GNU Superoptimizer (GSO), a tool designed to generate optimal peephole optimization patterns by exhaustively searching for the shortest instruction sequences that implement given input-output behaviors. This system was integrated into the GNU Compiler Collection (GCC), where it automated the discovery of branch elimination transformations for the IBM RS/6000 architecture, enabling the compiler to produce more efficient code by replacing suboptimal sequences with verified optimal ones. The GSO's approach marked a shift toward automated, verifiable optimizations that could be directly embedded in compiler backends, demonstrating superoptimization's viability beyond academic prototypes. By 1995, GSO was released as a standalone tool by the , supporting multiple architectures such as , , and RS/6000, and featuring capabilities for automated generation of optimal instruction tables for assemblers. This release broadened accessibility, allowing users to superoptimize small functions independently of GCC and generate reusable optimization tables, which facilitated its adoption in diverse embedded and contexts. Advancements in the early to mid-2000s introduced more sophisticated search strategies to address the limitations of exhaustive . In , the project at Compaq's Systems Research Center employed goal-directed search powered by an automatic theorem prover to generate near-optimal code for critical loops and subroutines on modern processors, emphasizing mathematical verification over brute-force . Building on this, the 2006 (Total Optimisation using Answer Set Technology) system applied —a declarative paradigm for combinatorial problems—to superoptimize acyclic functions, achieving efficiency gains by modeling instruction selection as a solvable via stable model computation. Post-2010 milestones emphasized scalability and stochastic techniques for complex architectures. In 2013, the STOKE framework pioneered stochastic superoptimization for binaries, using guided by cost functions and symbolic verification to outperform LLVM's -O0 optimizations on sequences, with applications in improving functions like memcpy. By 2016, efforts to superoptimize larger code blocks advanced through the LENS algorithm, which pruned invalid candidates during search to handle sequences up to 20 instructions long on , enabling the discovery of optimizations infeasible for prior tools and highlighting superoptimization's potential for broader integration. In 2017, Souper was introduced as a synthesizing superoptimizer that integrates with the infrastructure to automatically derive and apply optimizations for , discovering novel rewrite rules that enhance real-world passes.

Core Techniques

Exhaustive and Brute-Force Methods

Exhaustive and brute-force methods in superoptimization rely on systematically enumerating all possible sequences within a constrained search space to identify the optimal program that computes a given . This approach generates candidates starting from the shortest lengths and expands them incrementally, typically using to ensure the first valid sequence found is the minimal one according to a predefined cost metric. By exhaustively exploring combinations of machine instructions, these methods guarantee global optimality for the bounded space, distinguishing them from alternatives that may miss superior solutions. To manage the of the search space, techniques limit the maximum instruction sequence length to between 5 and 12 instructions, depending on the target architecture and function complexity. Additionally, the set of allowable opcodes is to 10-15 semantically relevant instructions, excluding those unlikely to contribute to the target computation, such as unrelated branches or operations for tasks. Domain-specific further bounds exploration by eliminating redundant or semantically invalid partial sequences early, such as those that produce or exceed register constraints, thereby reducing the number of candidates evaluated without sacrificing completeness. Cost functions prioritize sequences by minimizing factors like instruction count or estimated execution cycles, with the search halting upon discovering the lowest-cost candidate that meets the optimality criterion. ensures correctness through input-output testing on a suite of test cases designed to cover all possible inputs for the , such as exhaustive over small integer domains for arithmetic operations; this confirms the candidate produces identical results to a across the tested behaviors. For instance, in early implementations targeting the architecture, this exhaustive process yielded optimal code for arithmetic operations, including a reduction of the signum from several instructions to a compact sequence leveraging carry flags and negation, demonstrating improvements of up to 50% in code size for critical kernels.

Stochastic and Search-Based Approaches

Stochastic superoptimization employs probabilistic sampling techniques to navigate the enormous of possible sequences, avoiding the exhaustive required for smaller fragments. A seminal example is STOKE, which uses (MCMC) sampling to generate candidate programs equivalent to an input function while optimizing for performance. This approach treats as a , proposing mutations to the stream and accepting or rejecting them based on a probabilistic criterion derived from a . By focusing on loop-free , STOKE explores billions of potential programs without complete , enabling the discovery of non-obvious optimizations like novel implementations of Montgomery multiplication or linear algebra routines. Central to these methods are cost models that balance functional correctness and execution efficiency. Correctness is typically ensured through empirical testing on input-output pairs, augmented by rewrite rules or symbolic validation to prune invalid candidates early; for instance, STOKE approximates equivalence by measuring bit-level output differences across test cases before final verification. Performance is modeled using hardware-specific metrics, such as instruction latencies or cycle counts from microarchitectural simulators, which correlate strongly with measured runtimes (e.g., Pearson correlation >0.9 for x86). To avoid local optima, techniques like adjust the exploration bias over time—starting with high "temperature" for broad sampling and cooling to favor high-quality solutions—allowing STOKE to converge on optimal code in phases dedicated to and refinement. Heuristic-guided searches complement this by incorporating evolutionary strategies, such as genetic algorithms that evolve populations of code variants through mutation, crossover, and fitness-based selection; , for example, applies this to GPU kernels in IR, achieving up to 4x speedups on benchmarks like by iteratively improving large sequences while preserving semantics via runtime testing. These approaches scale to practical sizes, optimizing 20-30 blocks in minutes on multi-core . STOKE, running on a 40-node cluster, synthesizes and optimizes such sequences in about 30 minutes per phase, yielding up to 29% speedups over -O3 on cryptographic and numerical benchmarks. Random restarts enhance robustness by initializing multiple MCMC chains, mitigating the impact of poor starting points in the vast space. Advancements like the framework further scale search-based methods by using bidirectional with selective refinement to invalid paths, automatically extracting generalizable rewrite rules from superoptimized examples—such as fusing operations across basic blocks—which can then inform searches for larger programs, solving 20 out of 22 benchmarks twice as fast as prior enumerative techniques and enabling 82% average speedups on embedded workloads.

Formal Methods and Solvers

Formal methods and solvers in superoptimization leverage and to model program semantics precisely, enabling the discovery and validation of optimal code sequences without exhaustive enumeration. These approaches encode instruction behaviors and optimization goals as constraints or logical formulas, which are then solved using specialized tools to find minimal or equivalent implementations that satisfy functional specifications. This contrasts with searches by providing provable guarantees of correctness and optimality within defined bounds. Satisfiability Modulo Theories (SMT) solvers play a central role by modeling instruction semantics as constraints over bit-vectors and arrays, capturing how operations transform program states. For instance, semantics are represented symbolically, and candidate instruction sequences are checked for observational equivalence—ensuring identical input-output behavior—while minimizing s like sequence length or execution cycles. Solvers such as Z3 iteratively tighten cost bounds on satisfying assignments until no better solution exists, allowing unbounded search over arbitrary-length sequences without predefined limits. This technique has been applied to loop-free code, demonstrating improvements on benchmarks like those in . Goal-directed methods, exemplified by the system, employ backward reasoning to derive optimal code by starting from desired output specifications and tracing feasible paths to input instructions. Using refutation-based theorem provers and SAT solvers like , Denali matches goal terms against an equality graph of expressions, resolving propositional constraints to prune invalid paths and ensure the generated code meets cycle budgets on architectures such as Alpha or . This formal approach sidesteps manual rule creation, directly synthesizing near-optimal subroutines through automated proof search. Answer Set Programming (ASP) provides a declarative framework for superoptimization, as in the TOAST system, where problems are encoded as logic programs representing constraints on and optimization objectives. ASP solvers like Smodels or DLV compute stable models—minimal sets of instructions satisfying the program—enabling scalable search over non-monotonic reasoning spaces for acyclic functions. This encoding allows declarative specification of machine-specific rules, yielding optimal sequences without imperative search algorithms. Verification in formal superoptimization ensures equivalence between original and optimized code using techniques like symbolic execution and model checking, avoiding full testing by proving properties over abstract states. Symbolic execution translates code into SMT formulas, exploring paths under preconditions derived from testing or abstract interpretation to confirm bit-precise correctness; for example, in LLVM IR, tools generate constraints for peephole optimizations, verifying replacements like instruction fusions without runtime evaluation. Model checking complements this by exhaustively validating state transitions in bounded contexts, such as confirming no side effects in IR passes, thus scaling proofs to larger fragments. These methods, often integrated with SMT, provide conditional guarantees for optimizations in systems like COVE.

Implementations

Open-Source Tools

The GNU Superoptimizer (GSO), developed between 1991 and 1995, is a standalone program that applies an exhaustive generate-and-test methodology to discover the shortest instruction sequences equivalent to a specified target function across supported instruction set architectures (ISAs). Distributed under the GNU General Public License (GPL), it provides source code for generating optimal assembly code tables and has been applied to enhance branch elimination in the GNU Compiler Collection (GCC) for architectures such as the IBM RS/6000. Souper, launched by around 2014, functions as an LLVM-based superoptimizer that employs (SMT) solvers to detect and synthesize overlooked peephole optimizations within LLVM intermediate representation (IR). Integrated into the /LLVM pipeline, it facilitates local code improvements for C and C++ programs, automatically deriving novel optimization patterns to bolster the LLVM midend. Slumps, an open-source project initiated in the late 2010s, targets superoptimization of WebAssembly (WASM) modules by performing exhaustive searches over bounded code blocks to produce semantically equivalent binaries with reduced instruction counts. It extends Souper's capabilities through a dedicated pipeline for translating and refining WASM bytecode, supporting research in code analysis and diversification for web applications. The Unbounded Superoptimizer, developed in the 2010s for the ARMv7-A ISA, is an open-source tool that employs SMT-based techniques to exhaustively explore instruction sequences of arbitrary length for systems, generating code from IR without predefined bounds on program size. It shifts the optimization search directly into the solver domain to ensure correctness and efficiency on modern processors.

Research and Specialized Systems

One prominent example of academic superoptimization research is STOKE, developed at in 2013 as a stochastic superoptimizer targeting the instruction set for optimizing hot code paths in loop-free binaries. STOKE formulates superoptimization as a search problem, employing sampling to explore vast program spaces while balancing code correctness and performance through a that penalizes invalid transformations and rewards shorter or faster equivalents. In experiments on benchmarks like SPEC CPU2006, STOKE achieved speedups of up to 20% over -O3 for selected functions by discovering non-obvious instruction sequences. Extensions to STOKE in 2016 focused on automating the of optimization rules for into production such as , enabling the superoptimizer to generate peephole optimizations that could be applied more broadly without exhaustive per-function searches. These enhancements used STOKE's search machinery to infer rules from synthesized , demonstrating improvements in LLVM's optimizer by incorporating rules that reduced instruction counts in real workloads by 5-10% on average. This work highlighted the potential of methods to bridge research prototypes with practical compiler ecosystems. Earlier academic prototypes laid foundational groundwork using formal methods. Denali, introduced in 2002 by researchers at Compaq (now part of HP), was a goal-directed superoptimizer that employed an automatic theorem prover to verify and generate optimal code sequences, targeting straight-line code without loops. By framing code generation as a proof search problem, Denali proved the equivalence of candidate programs to specifications using decision procedures for linear arithmetic and arrays, achieving optimal code for small SPARC assembly fragments that outperformed hand-tuned assembly in cycle counts. Its influence persists in later formal verification approaches to superoptimization, emphasizing provable correctness over heuristic searches. Building on such formal techniques, the system from 2006 at the applied —a declarative logic paradigm—to superoptimize for general instruction set architectures. encoded the superoptimization task as a of constraints on instruction semantics, using stable model semantics to enumerate and select minimal programs that satisfy functional equivalence and resource limits, such as minimizing cycles or code size. This prototype influenced subsequent tools by showing how non-monotonic reasoning could handle the in program search spaces. More recent advancements include the framework from 2016, developed collaboratively at Stanford and , which scales enumerative superoptimization for larger code fragments through aggressive pruning of invalid candidates. employs a bidirectional search strategy combined with version-space algebra to refine equivalence classes incrementally, verifying partial programs against test cases with reduced bitwidths (e.g., 4 bits) to accelerate exploration while ensuring full-precision correctness. Applied to server code and embedded benchmarks like MiBench, solved 29 out of 32 superoptimization problems up to 16 instructions long, yielding up to 82% speedups over -O3 in some cases, and integrated effectively with stochastic methods for hybrid search. In the 2020s, ongoing research at has explored search techniques to approximate -O∞ optimization levels, treating superoptimization as an unbounded quest for asymptotically optimal code in contexts. These efforts, documented in 2025 course investigations, build on STOKE-like approaches but adapt them for modern ISAs with vector extensions, aiming to optimize kernels in scientific simulations by iteratively refining cost models that incorporate and metrics. Preliminary results suggest potential 10-25% gains in throughput for operations on x86, underscoring the continued evolution of methods in academic settings.

Challenges and Applications

Computational and Practical Limitations

Superoptimization faces fundamental computational challenges due to the exponential growth of the search space. For a code block of length n instructions drawn from a set of k possible opcodes, the number of candidate sequences can reach up to k^n, rendering exhaustive enumeration intractable beyond small values of n. This complexity confines traditional brute-force methods to short, straight-line code fragments, typically limited to 5-12 instructions, as longer sequences demand prohibitive resources. Even with pruning techniques like dataflow analysis or canonicalization, which can reduce the effective search space by factors of 50x or more for short lengths, scalability remains a barrier for practical application to larger blocks. Verification of superoptimized code introduces significant overhead, requiring rigorous checks to ensure semantic equivalence with the original program. Comprehensive test suites or via solvers are essential to confirm correctness, but these methods are computationally intensive; for instance, -based validation in tools like Souper can extend compilation times to around 37 hours for benchmarks like SPEC CPU 2017. Despite such efforts, risks persist, including missed edge cases due to incomplete test coverage or overlaps in feature spaces between valid and invalid candidates, potentially leading to subtle bugs if pruning heuristics discard promising optimizations. Test-case-based approximations, while faster, further compromise exhaustiveness by relying on finite inputs that may not capture all behaviors. Optimizations derived from superoptimization are inherently tied to specific instruction set architectures (ISAs), complicating portability across hardware platforms. Achieving optimality often exploits ISA-unique features, such as fused multiply-add instructions or register conventions in , necessitating ISA-specific emulators, performance models, and equivalence checkers for retargeting. This dependency makes superoptimizers laborious to adapt—requiring custom implementations for each new ISA—while the runtime costs during , including low validation throughput (under 100 checks per second in some systems), amplify delays in production environments. Practical deployment is further hindered by issues in methods and integration. Stochastic approaches, such as MCMC sampling in tools like STOKE, introduce non-determinism by forgoing for , yielding variable results across runs and potentially suboptimal outcomes on irregular search spaces. Integrating superoptimization into production compilers incurs additional overhead, including restrictions to loop-free code and the need for expert-crafted rules to handle broader contexts, limiting its seamless adoption without substantial engineering effort.

Real-World Uses and Impact

Superoptimization has been integrated into the GNU Compiler Collection () through the GNU Superoptimizer (GSO), which generates optimal instruction sequences for arithmetic operations, bit manipulations, and branch eliminations, enabling the compiler to produce more compact code for resource-constrained environments. For instance, GSO-derived patterns replace multi-instruction sequences—such as four-instruction comparisons—with single instructions like rlinm on the architecture, directly reducing code size and execution cycles in systems where and efficiency are critical. This approach has influenced GCC's peephole optimizations, prioritizing minimal instruction counts to shrink binaries without sacrificing correctness. In the compiler infrastructure, the Souper superoptimizer serves as a pass that synthesizes optimizations for (IR), identifying and applying thousands of local transformations during . Souper's solver-based discovery has been invoked multiple times at optimization levels - and -, yielding reductions in binary sizes such as 4.4% for the compiler itself and up to 15 KB in SPEC CINT 2006 benchmarks, enhancing performance in just-in-time () scenarios. These optimizations extend to browsers and servers through LLVM's in engines, where Souper contributes to faster code generation for dynamic workloads, including execution via backends. Practical applications of superoptimization appear in embedded systems and (HPC), particularly for performance-critical library functions in sandboxed environments, where techniques like search replace instruction sequences to minimize cycles and access. For example, superoptimizers have reduced sizes by approximately 30% in loop-heavy while maintaining , benefiting components with tight constraints. In for browsers, superoptimization pipelines applied to IR and Souper have achieved median instruction count reductions of 0.33% across benchmarks, with case studies like the Babbage problem showing up to 46.67% size compression through and constant propagation, leading to measurable speedups in compute-intensive routines such as cryptographic operations. The broader impact of superoptimization includes automated generation of optimization rules for MLIR dialects in AI compiler stacks, enabling efficient tensor graph transformations in domain-specific languages for . In the , equality saturation-based superoptimizers have been integrated into MLIR workflows to superoptimize tensor programs, yielding speedups in AI inference kernels by exploring vast rewrite spaces beyond manual . These advancements facilitate auto-tuning in frameworks like XLA, where superoptimized patterns reduce compilation overhead and improve runtime efficiency for models.

References

  1. [1]
    Superoptimizer: a look at the smallest program - ACM Digital Library
    The key idea in the superoptimizer is a probabilistic test that makes exhaustive searches practical for programs of useful size.
  2. [2]
    [1211.0557] Stochastic Superoptimization - arXiv
    Nov 2, 2012 · Stochastic superoptimization uses a stochastic search problem and Markov Chain Monte Carlo to explore programs for binary superoptimization, ...Missing: souper | Show results with:souper
  3. [3]
    [2109.13498] Learning to Superoptimize Real-world Programs - arXiv
    Sep 28, 2021 · In this paper, we propose a framework to learn to superoptimize real-world programs by using neural sequence-to-sequence models.Missing: key | Show results with:key
  4. [4]
    Super-optimization of Smart Contracts - ACM Digital Library
    Jul 12, 2022 · Super-optimization is a technique to find the best translation of a block of instructions by trying all possible sequences of instructions that ...<|control11|><|separator|>
  5. [5]
    [PDF] Automatic Generation of Peephole Superoptimizers
    Jul 31, 2006 · [10] H. Massalin. Superoptimizer: A look at the smallest program. In Second International Conference on Architectural Support for. Programming ...
  6. [6]
    [PDF] Compiler Optimization via Superoptimization - Louis Jenkins
    [15] Henry Massalin. 1987. Superoptimizer: A Look at the Smallest Program. In Proceedings of the Second International Conference on. Architectual Support for ...
  7. [7]
    Scaling up Superoptimization - ACM Digital Library
    Superoptimization is a compilation strategy that uses search to improve code quality, rather than relying on a canned sequence of transformations, as ...
  8. [8]
    None
    **Summary of "Superoptimizer -- A Look at the Smallest Program"**
  9. [9]
    Eliminating branches using a superoptimizer and the GNU C compiler
    Torbjörn Granlund. Torbjörn Granlund. View Profile. , Richard Kenner. Richard ... Richard Stallman, Using and Porting GNU CC, Frcc Software Foundation, 1992.
  10. [10]
    Superopt - GNU Project - Free Software Foundation
    GNU superoptimizer (superopt) finds the shortest instruction sequence for a given function. It was written between 1991 and 1995. Download. The superopt package ...
  11. [11]
    TOAST: Applying Answer Set Programming to Superoptimisation
    This paper describes such a large-scale application, the TOAST (Total Optimisation using Answer Set Technology) system, which seeks to generate optimal machine ...
  12. [12]
    Stochastic superoptimization - ACM Digital Library
    We formulate the loop-free binary superoptimization task as a stochastic search problem. The competing constraints of transformation correctness and ...Missing: souper | Show results with:souper
  13. [13]
    [PDF] Stochastic Superoptimization - Stanford CS Theory
    Massalin's original paper on superoptimization [14] describes a system that explicitly enumerates sequences of code of increasing length and selects the first ...
  14. [14]
    [PDF] Superoptimization - Embecosm
    Massalin, “Superoptimizer - A Look at the Smallest Program,” ACM SIGARCH. Comput. Archit. News, pp. 122–126, 1987. [2]. E. Schkufza, R. Sharma, and A. Aiken ...
  15. [15]
    [PDF] Symbolic Optimization with SMT Solvers - cs.wisc.edu
    By assigning costs to program instructions and API calls, we can detect such executions using an optimizing SMT solver. • Program synthesis: Program synthesis ...
  16. [16]
    [PDF] Unbounded Superoptimization - Greta Yorsh
    Oct 27, 2017 · Our aim is to enable software to take full advantage of the capabilities of emerging microprocessor designs without modifying the compiler.<|separator|>
  17. [17]
    [PDF] Denali: a goal-directed superoptimizer - Bitsavers.org
    Jul 30, 2001 · This paper provides a very preliminary report on a research project that is just getting under way that aims to construct a code generator that.
  18. [18]
    [PDF] Conditionally Correct Superoptimization - Stanford CS Theory
    Traditional compiler optimizers and superoptimizers are limited in the optimizations they can perform by their lack of knowledge of the actual inputs that ...
  19. [19]
    google/souper: A superoptimizer for LLVM IR - GitHub
    Souper is a superoptimizer for LLVM IR. It uses an SMT solver to help identify missing peephole optimizations in LLVM's midend optimizers.
  20. [20]
    GNU Superoptimizer - Summary [Savannah]
    Oct 20, 2020 · The superoptimizer is a function sequence generator that uses an exhaustive generate-and-test approach to finding the shortest instruction sequence for a given ...Missing: GSO 1995
  21. [21]
    [1711.04422] Souper: A Synthesizing Superoptimizer - arXiv
    Nov 13, 2017 · We developed Souper, a synthesizing superoptimizer, to see how far these ideas might be pushed in the context of LLVM.Missing: 2014 | Show results with:2014
  22. [22]
    Unbounded superoptimization - ACM Digital Library
    2006. TOAST: Applying Answer Set Programming to Superoptimisation. In Int. Conf. on Logic Programming (ICLP). 270–284. Digital Library · Google Scholar. [9].
  23. [23]
    [PDF] Stochastic Program Optimization - Stanford CS Theory
    This article is based on work that was originally pub- lished in 2013. Since then, STOKE has undergone substan- tial improvement; the updated results that ...
  24. [24]
    [PDF] Denali: A Goal-directed Superoptimizer - Rajeev Joshi
    ABSTRACT. This paper provides a preliminary report on a new research project that aims to construct a code generator that uses.Missing: 2001 | Show results with:2001
  25. [25]
    [PDF] Scaling up Superoptimization - Phitchaya Mangpo Phothilimthana
    [18] H. Massalin. Superoptimizer: a look at the smallest program. In ASPLOS, 1987. [19] P. Merolla, J. Arthur ...
  26. [26]
    CS 6120: Superoptimization: a quest for -O∞
    Apr 24, 2025 · The -O flag makes your program faster. -O2 makes it faster still, and ideally, -O3 squeezes out even more speed, and the cost of being ...
  27. [27]
  28. [28]
    GreenThumb: superoptimizer construction framework
    We propose GreenThumb, an extensible framework that reduces the cost of constructing superoptimizers and provides a fast search algorithm that can be reused ...
  29. [29]
    [PDF] Eliminating Branches using a Superoptimizer and the GNU C ...
    In 1987, Henry Massalin, of Columbia University, described a superoptimizer that generates optimal instruc- tion sequences given a function to be performed [1].
  30. [30]
    GSO 2.0 - Embecosm
    The original GNU Superoptimizer (GSO) was pioneering, but has some limitations: it only handles arithmetic instructions, delivers a single result, generates ...Missing: 1995 GPL
  31. [31]
    Superoptimization of WebAssembly bytecode - ACM Digital Library
    Aug 4, 2020 · Motivated by the fast adoption of WebAssembly, we propose the first functional pipeline to support the superoptimization of WebAssembly ...Missing: routines speedup
  32. [32]
    [PDF] Sound Loop Superoptimization for Google Native Client
    Apr 12, 2017 · The x86-64 ISA has 64- bit registers %rdi, %rsi, %r15, etc. The ... Dynamo: A trans- parent dynamic optimization system. In Programming ...Missing: dependencies | Show results with:dependencies
  33. [33]
    [PDF] Superoptimization of WebAssembly Bytecode - arXiv
    Feb 24, 2020 · The concept of superoptimizing a program dates back to 1987, with the seminal work of Massalin [12] which proposes an exhaustive exploration of.
  34. [34]
    [PDF] Mirage: A Multi-Level Superoptimizer for Tensor Programs - USENIX
    Jul 9, 2025 · Mirage is a multi-level superoptimizer for tensor programs using µGraphs to discover and verify sophisticated optimizations.Missing: Sprite | Show results with:Sprite<|separator|>