Fact-checked by Grok 2 weeks ago

Binary translation

Binary translation is a technique that recompiles from a source () into an equivalent form for a target , enabling the execution of software binaries on incompatible architectures without access to the original . This process reconstructs the program's semantics by mapping instructions while preserving behavior, such as and data dependencies, despite the absence of high-level information like types or subroutine boundaries. Binary translation serves as a key enabler for , , and legacy system , often outperforming pure by generating native executable code. The technique divides into static and dynamic categories based on when translation occurs. Static binary translation performs a complete, offline recompilation of the entire prior to , making it efficient for fixed, non-modifying but limited in handling dynamic linking, self-modifying instructions, or unresolved references. Dynamic binary translation, in contrast, operates at by translating and caching small units of —such as basic blocks or execution traces—as they are encountered, allowing adaptation to runtime behaviors like computed branches or system calls while applying optimizations to hot paths. This on-the-fly approach incurs initial overhead but achieves better long-term performance through and profiling-driven improvements, sometimes reaching within 2-6 times of native speed. Historically, binary translation gained prominence in the late and early for transitioning enterprise systems to new hardware, exemplified by Hewlett-Packard's offline translator from minicomputers to processors and Digital Equipment Corporation's VEST and mx systems for migrating VAX and MIPS binaries to Alpha AXP. Commercial milestones include Apple's (2006–2012), a dynamic translator that bridged PowerPC applications to x86 during the Macintosh architecture shift, and Transmeta's Crusoe microprocessor (2000–2005), which used just-in-time binary translation to run x86 software on its custom VLIW core for power-efficient . Frameworks like ' Dynamo further advanced the field by integrating dynamic optimization, demonstrating up to 20% performance gains through trace-based translation. In contemporary applications, binary translation supports by rewriting guest OS instructions to avoid hardware conflicts, as in early implementations that translated x86 code for non-privileged execution. It also powers tools for binary instrumentation, such as DynamoRIO and , which translate code to insert profiling or debugging hooks at runtime, and open-source emulators like , which uses dynamic translation for cross-platform execution. Apple's Rosetta 2 (introduced 2020), enables running applications on ARM-based Macs via ahead-of-time and just-in-time translation.#Rosetta_2) Emerging uses in embedded systems involve accelerating frequent binary loops to custom hardware via translation, yielding up to 12x speedups and 11x energy reductions while exploiting untapped . Core challenges persist, including precise across ISAs, efficient amid architectural mismatches, and scaling to multi-threaded or just-in-time generated code without excessive overhead.

Fundamentals

Definition

Binary translation is the process of converting sequences of machine code instructions from a source (ISA) to an equivalent set for a target ISA, enabling the execution of binaries compiled for one platform on another without requiring access to the original . This technique allows software designed for legacy or incompatible hardware to run on modern systems, often achieving performance close to native execution by generating optimized target code. Unlike , which typically involves interpreting source instructions on the fly through of the original state, binary translation compiles the code into native target instructions ahead of or during execution, reducing overhead from repeated interpretation. In contrast to recompilation, which rebuilds executables from high-level source code for a new , binary translation operates solely on the compiled , preserving or unavailable source material. The scope of binary translation encompasses both static (ahead-of-time) approaches, where the entire binary is translated before execution, and dynamic () methods, which translate as the program runs. Implementations can be purely software-based or hardware-accelerated, supporting migrations between diverse architectures such as CISC to RISC. The basic workflow involves disassembling the source binary into an , mapping instructions and semantics to target equivalents while handling architectural differences, and reassembling the result into an executable target binary.

Key Concepts

Binary translation involves converting machine code from a source (ISA) to a target ISA, enabling execution on different platforms. The core includes the source ISA, which defines the original binary's instruction format and semantics, and the target ISA, which specifies the destination architecture's instructions for optimized execution. The translation process typically employs a front-end for disassembly, which decodes source instructions into a higher-level form, and a back-end for , which emits target . An (IR) often bridges these stages, facilitating analysis and optimization independent of the specific ISAs. Fundamental mechanisms ensure functional equivalence between source and target code. Instruction decoding parses the source binary to identify operations, operands, and semantics, often expanding complex instructions into simpler primitives. maps source s to target s, potentially using for overflow or to align with differing register counts, while preserving dependencies. preservation is critical, involving the reconstruction of branches, function calls, and to maintain program semantics, such as by inserting traps or handlers for interrupts. Binary translation faces unique challenges due to low-level code characteristics. , where instructions alter themselves at runtime, complicates static analysis and requires dynamic detection and retranslation of affected regions. Indirect jumps, whose targets are computed at runtime, hinder precise graphing and demand runtime resolution mechanisms like dispatchers. Architecture-specific features, such as varying floating-point instruction precisions or extensions, necessitate careful or approximation to avoid precision loss. Optimization passes enhance translated code efficiency without altering behavior. Dead code elimination removes unused instructions or computations identified through liveness analysis on the IR. Instruction scheduling reorders operations to minimize stalls, exploiting parallelism within basic blocks while respecting dependencies. These passes, applied post-decoding, improve performance. In static binary translation, they rely on static analysis and avoid runtime-specific adaptations, while dynamic binary translation can incorporate runtime information, such as through just-in-time profiling, for enhanced optimizations.

Historical Development

Origins and Early Systems

Binary translation emerged in the as a solution for software compatibility during hardware transitions in the mainframe era. One of the earliest documented systems was 's Liberator, introduced in 1963, which translated into equivalent instructions for the Honeywell Series 200 computers. This tool addressed the obsolescence of the by enabling customers to migrate their existing applications to Honeywell's faster architecture without rewriting code, focusing primarily on mainframe environments where hardware upgrades were costly and disruptive. In the , binary translation gained traction for migrations, exemplified by Hewlett-Packard's Object Code Translator (OCT) developed in 1987. OCT facilitated the shift from the Series running MPE V to the new HP Precision Architecture systems, such as the Series 930 and 950 under MPE XL, by converting from the older instruction set into native executable modules. Designed for simple single-file translations, it handled applications without requiring recompilation, emphasizing compatibility in commercial computing settings where minicomputers were becoming obsolete. This approach provided 2-5 times the of by generating optimized native code that leveraged the new architecture's 32 general-purpose registers. By the early 1990s, more sophisticated systems tackled complex architectural differences, as seen in Digital Equipment Corporation's VEST translator released in 1993. VEST converted VAX binaries to run on Alpha AXP processors, addressing challenges like instruction mapping, , and timing preservation to ensure near-native performance. Written in C++ and supported by the Translator Interface Environment () runtime, it enabled migration from VAX minicomputers to the 64-bit Alpha architecture amid hardware evolution. Early systems like VEST highlighted key limitations, including inadequate support for in multitasking environments, intricate OS interactions such as calling standards, and issues with read-write that could affect program correctness. These challenges arose from the need to maintain atomicity and granularity in translated code without full overhead.

Key Milestones and Modern Tools

In the early s, the Crusoe processor marked a significant milestone in dynamic binary translation by implementing a software layer known as Code Morphing Software to translate x86 instructions into native VLIW instructions on its underlying , enabling full x86 while optimizing for low consumption in devices. This approach, introduced in , demonstrated the practical viability of runtime translation for bridging complex instruction set architectures in commercial processors. Later in the decade, Apple's , released in 2006 as part of the transition from PowerPC to Intel x86 processors in computers, provided dynamic translation to allow legacy PowerPC applications to run seamlessly on Intel-based systems without recompilation. The 2010s saw continued evolution with tools emphasizing cross-platform emulation and performance. QEMU's Tiny Code Generator (TCG), integrated into the emulator starting around 2008 and refined through the decade, facilitated cross-ISA binary translation by converting guest instructions into an before generating host code, supporting efficient across diverse architectures like x86 to . In 2020, Apple's Rosetta 2 extended this legacy for the shift to , translating x86-64 binaries to ARM64 with and caching, achieving approximately 78-80% of native performance in many workloads on chips. Advancements in the 2020s focused on open-source and Linux-centric solutions for emerging hardware. FEX-Emu, launched in 2021, emerged as a high-performance user-mode emulator for running x86 and x86-64 Linux applications on ARM64 systems, leveraging dynamic translation with adaptive caching to support gaming and productivity software. By 2023, integrations of LLVM backends in binary translators, such as in hybrid systems like MFHBT, enabled retargetable translation pipelines that lift binaries to LLVM IR for multi-stage optimization and feedback-driven improvements, reducing memory accesses by up to 81% in benchmarks. Modern tools continue to build on these foundations for and ecosystem support. DynamoRIO, a dynamic first publicly released in 2002 and evolved through ongoing updates, provides a platform for runtime code manipulation and analysis across x86 and , powering tools for , optimization, and with low overhead. Microsoft's x86-to- translator, enhanced in updates around 2022 and formalized as the emulation layer by 2024, just-in-time compiles x86/x64 code to ARM64 with optimizations for compatibility, enabling unmodified Windows applications to run on ARM devices while improving support for vector instructions like AVX. In June 2025, Apple announced at WWDC that macOS 27 (released in 2026) would be the last version supporting Intel-based Macs, with 2 support phased out by late 2027 for most applications except select , marking the full transition to . Recent trends as of 2025 continue to advance hybrid static-dynamic binary translation methods, combining ahead-of-time static lifting with runtime adjustments for optimized performance on heterogeneous hardware, as demonstrated in systems like BP-QEMU which improve execution efficiency through branch prediction.

Motivations

Compatibility and Migration

Binary translation serves a primary role in instruction set architecture (ISA) migrations by enabling the execution of legacy binaries on new hardware platforms without requiring recompilation. This capability is essential during CPU upgrades, where organizations aim to leverage more efficient architectures while maintaining with established software ecosystems. For example, Equipment Corporation's transition from VAX to Alpha AXP utilized binary translation to port applications, allowing seamless execution of existing binaries on the new RISC-based processors. Such migrations preserve investments in legacy code, which often spans decades and involves critical . In addition to ISA shifts, binary translation addresses OS and ecosystem compatibility challenges, particularly in handling (ABI) differences, system calls, and library dependencies during cross-platform ports. For instance, translating from x86 to requires mapping divergent calling conventions, access patterns, and OS-specific semantics to ensure functional equivalence on the host system. This is critical in environments like , where dynamic translation layers convert x86 instructions to ARM64 equivalents, accommodating variations in weak models and to support diverse software stacks. Practical use cases demonstrate binary translation's versatility across industries. In settings, it facilitates migrations from mainframes to infrastructures, as seen in historical efforts like VAX-to-Alpha ports that enabled applications to run on modern hardware without modifications. In gaming, it supports for older titles on new consoles, such as accelerating x86 on ARM-based mobile or handheld devices through optimized translation techniques. For embedded systems updates, specialized dynamic translators adapt binaries to resource-constrained processors, ensuring during hardware refreshes in and automotive applications. The approach offers significant benefits for developers, particularly in closed-source applications where is unavailable or , thereby reducing timelines and costs compared to full rewrites. However, ensuring semantic equivalence poses challenges, especially for non-deterministic behaviors like threading and concurrency, where architectural differences—such as in x86 versus —can introduce discrepancies in parallel execution. Translators must emulate these aspects precisely to avoid behavioral deviations, often requiring advanced handling of atomic operations and thread synchronization.

Performance Considerations

Binary translation introduces several sources of overhead that impact overall system efficiency. Translation time represents an initial cost in static approaches, where the entire binary must be processed upfront, potentially delaying application startup. In dynamic translation, runtime overhead arises from on-the-fly translation and management of code , including the cost of evicting and reloading translated fragments. Additionally, code size expansion is common, with translated binaries often growing by a factor of 1.46x or more due to differences in encoding and the need to emulate complex semantics, leading to increased and potential cache pressure. Performance metrics for binary translation vary by approach and optimization level. Static binary translation typically achieves 60-80% of native execution speed on large benchmarks, as exemplified by a of 67% relative to native in peephole-optimized translations of PowerPC to x86 code. Dynamic binary translation, leveraging just-in-time () and caching, often reaches 80-95% of native speed for steady-state execution, though overall slowdowns can be minor in optimized systems like Rosetta 2. Several factors influence the efficiency of binary translation. Differences in density between and ISAs can lead to expanded code, reducing fetch efficiency and increasing cache misses. Branch prediction accuracy is affected by translation-induced changes in , potentially degrading predictor effectiveness and incurring more misprediction penalties. Cache pollution occurs when translated code fragments evict useful native or data, exacerbating misses in shared , particularly in dynamic systems with frequent code updates. Binary translation involves inherent trade-offs between static and dynamic methods. Static translation provides predictable performance without runtime overhead but demands complete upfront analysis, limiting adaptability to or dynamic loads. Dynamic translation offers flexibility and runtime adaptations, such as profile-guided optimizations, but suffers initial slowdowns from translation and caching during warmup phases. Broader impacts of binary translation extend to resource-constrained environments. In and devices, performance overheads directly increase , as slower execution prolongs CPU activity and raises power draw; optimized translations can mitigate this by reducing . Scalability for large applications is challenged by code cache management and memory demands, where persistent caching helps sustain performance but risks bloat in systems with vast code footprints.

Static Binary Translation

Process and Techniques

Static binary translation involves an offline process that disassembles the entire source binary ahead of time, reconstructing its and data dependencies to generate a complete for the target . This begins with disassembly using tools like IDA Pro or to recover the instruction stream and build a (CFG), identifying basic blocks, functions, and call graphs without runtime execution. Key techniques include instruction mapping, where source instructions are semantically equivalent to target instructions, often via an (IR) like to facilitate retargeting across ISAs. Register allocation addresses mismatches in register counts or semantics by spilling to or remapping, while address translation handles differences in memory models, such as segment registers in x86 to flat addressing in RISC. Control flow recovery resolves indirect branches and jumps through or jump-target identification, though unresolved targets may require runtime resolution stubs. Optimization passes, such as peephole rewriting, eliminate redundancies and apply target-specific idioms post-mapping, improving code density and performance. Handling dynamic elements like or dynamic linking often necessitates assumptions of static behavior or approaches with minimal support, as full static translation assumes non-modifying code. External references, such as calls, are resolved by linking against libraries or providing wrappers. The output is a standalone target binary, enabling direct execution without translation overhead, though initial translation time can be significant for large programs. Frameworks like QEMU's user-mode emulation can incorporate static modes, but pure static tools focus on complete recompilation for portability.

Examples

A notable modern application occurred in 2014 when developer "notaz" performed static recompilation of the 1998 game StarCraft from x86 to architecture, facilitating its port to handheld devices like the OpenPandora without access to . This effort involved and direct translation of the binary to generate an equivalent executable, demonstrating static translation's utility for legacy game migration to mobile platforms. Among open-source tools, RevGen, developed in the early at EPFL, serves as a retargetable static binary translator that lifts x86 binaries to (IR), enabling cross-architecture analysis and optimization without . Similarly, McSema, released by Trail of Bits starting in 2014, is an executable lifter that statically translates x86 and binaries to bitcode, supporting both and Windows formats for tasks like decompilation and recompilation. A practical illustrating outcomes is the 2014 static recompilation of Cube World's x86 terrain generation binary to and other architectures, part of an open-server implementation project. This translation converted the original executable's code sections into portable C++ equivalents, allowing successful generation of terrain data across platforms while integrating with a for handling relocations and flags. In practice, static binary translation faces limitations when dealing with obfuscated or packed binaries, as these techniques disrupt disassembly and control-flow recovery, often leading to incomplete or erroneous translations. For instance, packers commonly employ code encryption and dynamic unpacking that evade static , requiring additional dynamic techniques for resolution.

Dynamic Binary Translation

Process and Techniques

Dynamic binary translation operates through a process that involves on-demand disassembly of guest code blocks, often in the form of traces—sequences of frequently executed instructions—into an (IR). This IR is then optimized and compiled just-in-time () into host-native code, which is executed and stored in a code cache for reuse, minimizing repeated overhead. The process begins with an interpreter or that executes initial code fragments until a hot path is detected, triggering translation to avoid interpretive slowdowns. Key techniques include trace selection, where execution counters identify hot code paths based on branch frequencies, prioritizing translation of these paths to focus resources on performance-critical regions. Binary instrumentation inserts profiling code during disassembly to gather runtime data, such as branch outcomes or memory accesses, enabling adaptive decisions without halting execution. Runtime optimizations, like loop unrolling, expand repetitive structures in traces to reduce branch overhead and improve instruction-level parallelism during JIT compilation. To handle program dynamism, dynamic binary translators employ for conditional branches, predicting paths and generating code accordingly, with mechanisms—such as cache exits to the interpreter—if mispredictions occur, ensuring correctness. Syscall integration involves intercepting system calls, emulating them on OS via wrappers that preserve state and handle asynchronous events like signals. Optimization passes leverage profile data from to guide retranslation of traces, refining code based on observed behaviors like frequencies. transforms scalar operations in IR to (SIMD) equivalents on the host, exploiting wider vector units for data-parallel workloads when guest instructions align. Garbage collection of the code cache evicts cold traces using heuristics like least-recently-used or generational policies, reclaiming space to prevent fragmentation and maintain translation efficiency. Frameworks like facilitate by dynamically translating code to , applying tool-specific insertions for or , and resynthesizing to host code in a , emphasizing heavyweight analysis over lightweight speed.

Software Implementations

Software implementations of dynamic binary translation primarily involve just-in-time () compilers and that translate and execute instructions on the host CPU at , enabling cross-architecture without assistance. These systems often employ code to reuse translated blocks, reducing overhead for frequently executed code paths. Notable examples include frameworks optimized for user-mode , full-system , and . Apple's Rosetta 2, introduced in 2020 with the transition to , serves as a JIT-based translator for running applications on ARM-based Macs. It performs ahead-of-time (AOT) translation for static code and JIT for dynamically generated code, such as from just-in-time compilers, storing translated binaries in a to achieve near-native —typically 78-90% of equivalent ARM-native execution in benchmarks across various workloads. This caching mechanism minimizes repeated , allowing most x86 programs to run efficiently after an initial phase. QEMU, developed since 2003, utilizes its Tiny Code Generator (TCG) as a dynamic backend for full-system and user-mode across multiple set architectures (ISAs). TCG breaks down instructions into intermediate micro-operations, which are then optimized and emitted as host-native code blocks stored in a translation cache, supporting translations like MIPS to x86 with features for handling and exceptions. This portable approach enables to emulate entire operating systems, such as running on x86 hosts for guests, while maintaining reasonable performance through block chaining and optimizations. The project from Laboratories in the late 1990s pioneered dynamic optimization via binary translation on processors under . It interpreted code to identify hot traces—frequently executed paths—and translated them into optimized fragments stored in a software code cache, applying optimizations like redundancy elimination to yield average speedups of 7-12% on SPECint95 benchmarks. Building on this, DynamoRIO, released in 2002, evolved into an open-source dynamic instrumentation framework for on Windows and , allowing clients to insert code for analysis and optimization with minimal overhead, achieving up to 40% performance gains in select cases through adaptive code modification. It has been widely adopted for research prototypes and security tools, such as intrusion detection via . More recent developments include FEX-Emu, launched in 2021 as an open-source usermode for x86 and binaries on ARM64 hosts. It focuses on low-overhead execution for gaming and desktop applications, supporting Wine and Proton for Windows titles through API forwarding (e.g., , ) and an experimental code cache to reduce stuttering, while maintaining broad compatibility with 32- and 64-bit binaries on distributions like and . FEX-Emu achieves this via a fast translation pipeline optimized for ARMv8+ hardware, enabling practical performance for demanding workloads like commercial games. Beyond specific tools, dynamic binary translation underpins broader applications in , where systems like DynamoRIO enable reversible execution and taint analysis for vulnerability detection; , as in QEMU's full-system emulation for OS migration; and , facilitating cross-platform binary inspection and instrumentation without access. These uses leverage translation caches and to balance accuracy and efficiency in analyzing opaque executables.

Hardware Implementations

Hardware implementations of dynamic (DBT) integrate specialized processor circuitry and architectural features to accelerate runtime translation, minimizing the overhead of decoding, optimization, and compared to software-only systems. These approaches often involve co-designed hardware and software, where dedicated units handle initial decoding or caching of translated micro-operations, enabling across instruction set architectures (ISAs) while optimizing for power and performance. Early examples focused on VLIW-based hosts to exploit in translated code, while modern designs leverage caches and buffers to reduce re-translation costs. A pioneering hardware implementation is the Crusoe processor family, launched in 2000, which featured VLIW cores with integrated support for an on-chip dynamic translator to emulate x86 instructions. The Code Morphing Software (CMS) layer interpreted and translated x86 binaries into native VLIW code, speculatively optimizing for common execution paths to reduce power consumption in mobile applications; this co-design achieved near-native performance for many workloads while simplifying hardware complexity. The successor, Efficeon in 2004, enhanced this architecture with wider issue widths and improved translation caching, further boosting efficiency for x86 compatibility on non-x86 silicon. IBM's DAISY (Dynamically Architecture Instruction Set from Yorktown) system, developed in the 1990s for AS/400 enterprise servers, provided hardware-assisted DBT to execute System/390 binaries on a custom VLIW host processor. DAISY used tree-structured intermediate representations for rapid translation and optimization, with hardware units managing exception handling and architectural state to ensure 100% compatibility; this enabled seamless migration from legacy System/390 code to PowerPC without recompilation, achieving up to 90% of native performance in key workloads. Key techniques in hardware DBT include dedicated translation engines, which perform front-end tasks like instruction fetching, decoding, and basic remapping in specialized circuits to offload the main processor core. Micro-op caches, prominent in Intel processors since the Nehalem microarchitecture (2008), store decoded micro-operations from complex CISC instructions, allowing fast retrieval and fusion during translation to avoid repeated decoding overheads. Hardware trace buffers, akin to trace caches in out-of-order processors, capture sequences of executed instructions or translated blocks in on-chip memory, enabling quick replay and optimization of hot code paths to improve translation throughput by up to 2-3x in simulated DBT scenarios. In contemporary systems, ARM Cortex processors (2010s onward) incorporate features like enhanced branch prediction and configurable hierarchies that facilitate efficient JIT compilation and , supporting software translators in low-power embedded environments without dedicated DBT units. Similarly, Intel's ongoing refinements to micro-op caches in and series (2020s) provide indirect acceleration for by streamlining the handling of translated instruction streams in and contexts.

References

  1. [1]
    [PDF] binary-translation.pdf
    A translated binary program is a sequence of new-architecture in- structions that reproduces the behav- ior of an old-architecture program. Typically, much of ...Missing: science | Show results with:science
  2. [2]
    [PDF] Dynamic Binary Translation - Compilers and Languages
    Dynamic binary translation is the process of translating code for one instruction set architecture to another on the fly.Missing: computer | Show results with:computer
  3. [3]
    [PDF] Machine-Adaptable Dynamic Binary Translation-
    Dynamic binary translation is the process of translating and optimizing executable code for one machine to another at runtime, while the program is "executing" ...
  4. [4]
    [PDF] Binary-to-Binary Translation Literature Survey
    Mar 16, 1998 · In this paper, we will briefly review the history of binary translation in section II. In section III, we will discuss the alternatives to ...
  5. [5]
  6. [6]
    [PDF] P3.An Overview on Binary Translation | PEPCC
    From Hack to Elaborate Technique – A Survey on Binary Rewriting. ACM Comput. Surv. 52, 3, Article 49 (June 2019), 37 pages. 6. Page 7. Binary Translation for ...
  7. [7]
    Experience in the design, implementation and use of a retargetable ...
    Binary translation, the process of translating binary executables, makes it possible to run code compiled for source (input) machine Ms on target (output) ...
  8. [8]
    Machine-adaptable dynamic binary translation - ACM Digital Library
    Dynamic binary translation is the process of translating and optimizing executable code for one machine to another at runtime, while the program is "executing" ...
  9. [9]
    Hardware-accelerated dynamic binary translation - ACM Digital ...
    Abstract—Dynamic Binary Translation (DBT) is often used in hardware/software co-design to take advantage of an architecture model while using binaries from ...
  10. [10]
  11. [11]
    [PDF] Virtual Machines and Binary Translation
    May 4, 2016 · More SBT problems: Self-modifying code. • Rare in most code, but has to be handled if allowed by guest ISA. • Usually handled by including ...
  12. [12]
    Microprogramming History -- Mark Smotherman - Clemson University
    ... conversion tools, like the Honeywell "Liberator" program that accepted IBM 1401 programs and converted them into programs for the 1401-like Honeywell H-200.
  13. [13]
    [PDF] hewlett - vtda.org
    Dec 8, 1987 · MPE V machine emulation is supported by the HP 3000. Emulator and the HP 3000 Object Code Translator (OCT).8. The emulator is a program that ...
  14. [14]
    Binary translation | Communications of the ACM - ACM Digital Library
    Low overhead dynamic binary translation on ARM​​ The ARMv8 architecture introduced AArch64, a 64-bit execution mode with a new instruction set, while retaining ...
  15. [15]
    [PDF] The Technology Behind Crusoe™ Processors
    The software layer is called Code Morphing™ software because it dynamically “morphs” x86 instructions into VLIW instructions. The Code Morphing software ...
  16. [16]
    [PDF] The Transmeta Code Morphing Software: Using Speculation ...
    Transmeta's Crusoe microprocessor is a full, system- level implementation of the x86 architecture, comprising a native VLIW microprocessor with a software ...
  17. [17]
    Apple Unveils New MacBook Featuring Intel Core Duo Processors
    PRESS RELEASE May 16, 2006. Apple Unveils New MacBook Featuring Intel Core Duo Processors ... ***See https://www.apple.com/rosetta/ for information on ...<|control11|><|separator|>
  18. [18]
    Translator Internals — QEMU documentation
    QEMU's dynamic translation backend is called TCG, for “Tiny Code Generator”. For more information, please take a look at TCG Intermediate Representation.Missing: ISA | Show results with:ISA
  19. [19]
    How x86 to arm64 Translation Works in Rosetta 2 - InfoQ
    Nov 30, 2020 · Thanks to Rosetta 2, most x86 programs will be able to execute after an initial translation step. Apple started to use binary translation ...
  20. [20]
    FEX-Emu/FEX: A fast usermode x86 and x86-64 emulator for Arm64 ...
    FEX allows you to run x86 applications on ARM64 Linux devices, similar to qemu-user and box64. It offers broad compatibility with both 32-bit and 64-bit ...FEX-Emu · Issues 176 · Pull requests 14 · Discussions
  21. [21]
    (PDF) MFHBT: Hybrid Binary Translation System with Multi-stage ...
    We implement a prototype of this new system powered by LLVM. Experimental results demonstrate an 81% decrease in the number of memory access instructions and a ...<|separator|>
  22. [22]
    History of DynamoRIO
    DynamoRIO originated from MIT and HP in 2001, was used by Determina, acquired by VMware in 2007, and open-sourced in 2009.
  23. [23]
  24. [24]
    Binary Translation and Cross-architecture compatibility with focus on ...
    Sep 18, 2025 · This paper provides a comprehensive review of Binary Translation and Cross-architecture, focusing on operating system-level implementations.
  25. [25]
    [PDF] Instruction Set Migration at Warehouse Scale - arXiv
    Oct 16, 2025 · Modern ISA migrations can often build on a robust open-source ecosystem, making it possible to recompile all relevant software from scratch.<|separator|>
  26. [26]
    A Dynamic and Static Binary Translation Method Based on Branch ...
    Jul 10, 2023 · Binary translation is a technique that automatically translates code from a target architecture into functionally equivalent code for a host ...
  27. [27]
    Porting OpenVMS from VAX to Alpha AXP - ACM Digital Library
    Porting OpenVMS from VAX to Alpha AXP · Formats available · References · Cited By · Index Terms · Recommendations · Comments · Information & Contributors.
  28. [28]
    Static/dynamic real-time legacy software migration
    Oct 30, 2020 · Binary translation can address this incompatibility by migrating applications from one legacy ISA to a new one, although binary translation has ...
  29. [29]
    A Dynamic Binary Translator for Weak Memory Model Architectures
    If we translate the MP pro- gram's binary from x86 to Arm, without taking their memory mod- els into account, the resulting Arm binary may exhibit undesirable.
  30. [30]
    Dynamic binary translation specialized for embedded systems
    This paper describes the design and implementation of a novel dynamic binary translation technique specialized for embedded systems.
  31. [31]
    ARMing x86 Games: Accelerating Binary Translation Using Software ...
    Sep 25, 2025 · We propose a novel optimization method that enhances compatibility and performance by leveraging software-only strategies tailored to ARM ...
  32. [32]
    No Source Code? No Problem! - ACM Queue
    Oct 2, 2003 · What if you have to port a program, but all you have is a binary? Typical software development involves one of two processes: the creation of ...
  33. [33]
  34. [34]
    An Instruction Inflation Analyzing Framework for Dynamic Binary ...
    Mar 23, 2024 · Dynamic binary translation enables applications built for a guest ISA to run on a host ISA machine, with uses in several areas.
  35. [35]
    [PDF] Binary Translation Using Peephole Superoptimizers - USENIX
    This paper presents a new binary translation scheme that automatically learns translation rules using superoptimization techniques and peephole rules.Missing: science | Show results with:science
  36. [36]
    [PDF] Using Dynamic Binary Translation to Fuse Dependent Instructions
    fect the instruction set and dynamic binary translation. In this ... Instruction Density. An ISA with good coding density can reduce instruc- tion ...
  37. [37]
    [PDF] HDTrans: A Low-Overhead Dynamic Translator
    In order to reduce register pressure and cache pollution, the sieve is implemented using blocks of in- structions rather than blocks of data. An indirect ...
  38. [38]
    [PDF] Fast Binary Translation: Translation Efficiency and Runtime Efficiency
    Fast binary translation is a key component for modern software, using dynamic translation at runtime. fastBT is a generator for low-overhead, table-based ...
  39. [39]
    [PDF] Efficient and Retargetable Dynamic Binary Translation
    Dynamic binary translation (DBT) is a core technology to many important applications such as system virtualization, dynamic binary instrumentation and ...
  40. [40]
    [PDF] A General Persistent Code Caching Framework for Dynamic Binary ...
    Jun 22, 2016 · 4.3 Performance Overhead and Code Size. Figure 11 shows the ... Binary Translation. Journal of Computer Research and. Development 51, 10 ...
  41. [41]
    [PDF] Dynamic Binary Translation & Instrumentation
    Instrumented code needs extra registers. E.g.: • Virtual registers available to the tool. • A virtual stack pointer pointing to the instrumentation stack. • ...
  42. [42]
    [PDF] Dynamic Binary Translation and Optimization Erik R. Altman Kemal ...
    Dec 13, 2000 · ¯ Control Speculation: Operations Above Branches. ¯ Data Speculation: Loads above possibly aliased stores. Page 39. DAISY Data Speculation.
  43. [43]
    [PDF] A Framework for Heavyweight Dynamic Binary Instrumentation
    Valgrind is a dynamic binary instrumentation (DBI) framework that occupies a unique part of the DBI framework design space. This paper describes how it works, ...
  44. [44]
    Optimising hot paths in a dynamic binary translator
    In dynamic binary translation, code is translated "on the fly" at run-time, while the user perceives ordinary execution of the program on the target machine.
  45. [45]
    Improving SIMD Parallelism via Dynamic Binary Translation
    This article presents a dynamic binary translation technique that enables short-SIMD binaries to exploit benefits of new SIMD architectures by rewriting short- ...
  46. [46]
    Playing StarCraft On An ARM - Hackaday
    Jul 31, 2014 · Blizzard could take the code for StarCraft, port it to an ARM ... Static recompilation, but literally recompiled! What a hack. This is ...Missing: binary | Show results with:binary
  47. [47]
    [PDF] Enabling Sophisticated Analyses of x86 Binaries with RevGen
    RevGen uses static binary translation to convert binary code to the widely-used LLVM IR, without relying on the source code.Missing: 1990s retargetable
  48. [48]
    lifting-bits/mcsema - GitHub
    Aug 23, 2022 · McSema is an executable lifter. It translates ("lifts") executable binaries from native machine code to LLVM bitcode.
  49. [49]
    Practical and Portable X86 Recompilation - that mat blog
    Apr 14, 2014 · Binary recompilation is a subject of intense research, but for mere mortals, recompiling binary code or executables can seem completely off-limits.Missing: reduction | Show results with:reduction
  50. [50]
    [PDF] Static Disassembly of Obfuscated Binaries - UCSB Computer Science
    The paper presents novel binary analysis techniques that substantially improve the success of the disassem- bly process when confronted with obfuscated binaries ...Missing: limitations packed
  51. [51]
    [PDF] Binary-code obfuscations in prevalent packer tools - Paradyn Project
    By contrast, static analysis of indirect control-transfer targets is particularly difficult in packed binaries, as they frequently use instructions whose tar-.
  52. [52]
    About the Rosetta translation environment - Apple Developer
    Rosetta is a translation process that allows users to run apps that contain x86_64 instructions on Apple silicon. Rosetta is meant to ease the transition to ...
  53. [53]
    Apple Silicon M1 Emulating x86 is Still Faster Than Every Other Mac ...
    Nov 15, 2020 · Rosetta 2 running x86 code appears to be achieving 78%-79% of the performance of native Apple Silicon code. Despite the impact on performance, ...
  54. [54]
    [PDF] QEMU, a Fast and Portable Dynamic Translator - USENIX
    We present the internals of QEMU, a fast machine em- ulator using an original portable dynamic translator. It emulates several CPUs (x86, PowerPC, ...
  55. [55]
    [PDF] Dynamo TR
    Dynamic optimization refers to the runtime optimization of a native program binary. This report describes the design and implementation of Dynamo, ...
  56. [56]
    [PDF] An Infrastructure for Adaptive Dynamic Optimization
    The main contribution of this paper is a framework for implementing dynamic analyses and optimizations. The framework is based on the DynamoRIO dynamic code.
  57. [57]
    [PDF] Dynamic Analysis and Debugging of Binary Code for Security ...
    In this paper, we present our work on developing a cross-platform interactive analysis tool, which leverages techniques such as symbolic execution and taint ...
  58. [58]
    [PDF] DYNAMIC BINARY TRANSLATION FOR DETERMINISTIC REPLAY
    The translate and execute loop contin- ues until the program terminates. Several optimization techniques are introduced to do this efficiently (discussed in ...
  59. [59]
    [PDF] Hardware-Accelerated Dynamic Binary Translation
    Dynamic binary translation (DBT) consists in translating. – at runtime – a program written for a given instruction set to another instruction set. Dynamic ...
  60. [60]
    [PDF] Transmeta Crusoe and efficeon:
    Jan 10, 2003 · Code Morphing Software layer provides a completely compatible implementation of the x86 architecture on the embedded VLIW processor:.
  61. [61]
    [PDF] DAISY: Dynamic Compilation for 100% Architectural Compatibility
    The paper is organized as follows: We first give an exam- ple illustrating the new fast dynamic compilation algorithm used by DAISY. Next, various architectural ...
  62. [62]
    (PDF) DAISY/390: Full System Binary Translation of IBM System/390
    We describe the design issues in an implementation of the ESA#390 architecture based on binary translation to a very long instruction word #VLIW# processor.
  63. [63]
    [PDF] Hardware-Accelerated Dynamic Binary Translation - Hal-Inria
    Apr 3, 2017 · In future work, we will perform register allocation and apply optimizations such as loop unrolling and superblock formation. Thanks to the.
  64. [64]
    [PDF] I See Dead μops: Leaking Secrets via Intel/AMD Micro-Op Caches
    Modern Intel and AMD processors cache decoded micro- ops in a dedicated streaming cache, often called the decoded stream buffer or the micro-op cache, in order ...
  65. [65]
    [PDF] Evaluating the Impact of Dynamic Binary Translation Systems on ...
    The effect of dynamic binary translation is lower in the L2 cache, with an increase in the number of misses by 12% for Pin and 24% for. DynamoRIO. The L1 data ...
  66. [66]
    Just-In-Time Compilation on ARM—A Closer Look at Call-Site Code ...
    This article studies how the lack of strong hardware support for Self Modifying Code (SMC) in low-power architectures (eg, absence of cache coherence)
  67. [67]
    [PDF] Mobilizing the Micro-Ops: Exploiting Context Sensitive Decoding for ...
    Flushing the micro-op cache every translation mode switch could have a major performance impact. We instead choose to extend the tag bits of the micro-op cache ...