Fact-checked by Grok 2 weeks ago

Binary translation

Binary translation is a computing technique that recompiles machine code from a source instruction set architecture (ISA) into an equivalent form for a target ISA, enabling the execution of software binaries on incompatible processor architectures without access to the original source code.^[1] This process reconstructs the program's semantics by mapping instructions while preserving behavior, such as control flow and data dependencies, despite the absence of high-level information like types or subroutine boundaries.^[2] Binary translation serves as a key enabler for software portability, emulation, and legacy system migration, often outperforming pure interpretation by generating native executable code.^[3] The technique divides into static and dynamic categories based on when translation occurs. Static binary translation performs a complete, offline recompilation of the entire binary prior to runtime, making it efficient for fixed, non-modifying code but limited in handling dynamic linking, self-modifying instructions, or unresolved references.^[4] Dynamic binary translation, in contrast, operates at runtime by translating and caching small units of code—such as basic blocks or execution traces—as they are encountered, allowing adaptation to runtime behaviors like computed branches or system calls while applying optimizations to hot code paths.^[2] This on-the-fly approach incurs initial overhead but achieves better long-term performance through code reuse and profiling-driven improvements, sometimes reaching within 2-6 times of native speed.^[3] Historically, binary translation gained prominence in the late 1980s and early 1990s for transitioning enterprise systems to new hardware, exemplified by Hewlett-Packard's offline translator from HP 3000 minicomputers to PA-RISC processors and Digital Equipment Corporation's VEST and mx systems for migrating VAX and MIPS binaries to Alpha AXP.^[1] Commercial milestones include Apple's Rosetta (2006–2012), a dynamic translator that bridged PowerPC applications to x86 during the Macintosh architecture shift, and Transmeta's Crusoe microprocessor (2000–2005), which used just-in-time binary translation to run x86 software on its custom VLIW core for power-efficient computing.^[5] Frameworks like HP Labs' Dynamo further advanced the field by integrating dynamic optimization, demonstrating up to 20% performance gains through trace-based translation.^[2] In contemporary applications, binary translation supports virtualization by rewriting guest OS instructions to avoid hardware conflicts, as in early VMware implementations that translated x86 code for non-privileged execution.^[3] It also powers tools for binary instrumentation, such as DynamoRIO and Valgrind, which translate code to insert profiling or debugging hooks at runtime,^[2] and open-source emulators like QEMU, which uses dynamic translation for cross-platform execution.^[6] Apple's Rosetta 2 (introduced 2020), enables running x86-64 applications on ARM-based Apple Silicon Macs via ahead-of-time and just-in-time translation.#Rosetta_2) Emerging uses in embedded systems involve accelerating frequent binary loops to custom hardware via translation, yielding up to 12x speedups and 11x energy reductions while exploiting untapped instruction-level parallelism.^[7] Core challenges persist, including precise exception handling across ISAs, efficient register allocation amid architectural mismatches, and scaling to multi-threaded or just-in-time generated code without excessive overhead.^[1]

Fundamentals

Definition

Binary translation is the process of converting sequences of machine code instructions from a source instruction set architecture (ISA) to an equivalent set for a target ISA, enabling the execution of binaries compiled for one platform on another without requiring access to the original source code.^[8] This technique allows software designed for legacy or incompatible hardware to run on modern systems, often achieving performance close to native execution by generating optimized target code.^[9] Unlike emulation, which typically involves interpreting source instructions on the fly through simulation of the original hardware state, binary translation compiles the code into native target instructions ahead of or during execution, reducing overhead from repeated interpretation.^[8] In contrast to recompilation, which rebuilds executables from high-level source code for a new architecture, binary translation operates solely on the compiled binary, preserving proprietary or unavailable source material.^[8] The scope of binary translation encompasses both static (ahead-of-time) approaches, where the entire binary is translated before execution, and dynamic (runtime) methods, which translate code on demand as the program runs.^[9] Implementations can be purely software-based or hardware-accelerated, supporting migrations between diverse architectures such as CISC to RISC.^[10] The basic workflow involves disassembling the source binary into an intermediate representation, mapping instructions and semantics to target equivalents while handling architectural differences, and reassembling the result into an executable target binary.^[8]

Key Concepts

Binary translation involves converting machine code from a source instruction set architecture (ISA) to a target ISA, enabling execution on different hardware platforms.^[11] The core terminology includes the source ISA, which defines the original binary's instruction format and semantics, and the target ISA, which specifies the destination architecture's instructions for optimized execution.^[2] The translation process typically employs a front-end for disassembly, which decodes source instructions into a higher-level form, and a back-end for code generation, which emits target machine code.^[11] An intermediate representation (IR) often bridges these stages, facilitating analysis and optimization independent of the specific ISAs. Fundamental mechanisms ensure functional equivalence between source and target code. Instruction decoding parses the source binary to identify operations, operands, and semantics, often expanding complex instructions into simpler primitives.^[2] Register allocation maps source registers to target registers, potentially using memory for overflow or to align with differing register counts, while preserving data dependencies.^[11] Control flow preservation is critical, involving the reconstruction of branches, function calls, and exception handling to maintain program semantics, such as by inserting traps or handlers for interrupts. Binary translation faces unique challenges due to low-level code characteristics. Self-modifying code, where instructions alter themselves at runtime, complicates static analysis and requires dynamic detection and retranslation of affected regions.^[12] Indirect jumps, whose targets are computed at runtime, hinder precise control flow graphing and demand runtime resolution mechanisms like dispatchers.^[2] Architecture-specific features, such as varying floating-point instruction precisions or vector extensions, necessitate careful emulation or approximation to avoid precision loss.^[11] Optimization passes enhance translated code efficiency without altering behavior. Dead code elimination removes unused instructions or computations identified through liveness analysis on the IR. Instruction scheduling reorders operations to minimize stalls, exploiting parallelism within basic blocks while respecting dependencies.^[2] These passes, applied post-decoding, improve performance. In static binary translation, they rely on static analysis and avoid runtime-specific adaptations, while dynamic binary translation can incorporate runtime information, such as through just-in-time profiling, for enhanced optimizations.^[2]

Historical Development

Origins and Early Systems

Binary translation emerged in the 1960s as a solution for software compatibility during hardware transitions in the mainframe era. One of the earliest documented systems was Honeywell's Liberator, introduced in 1963, which translated IBM 1401 object code into equivalent instructions for the Honeywell Series 200 computers. This tool addressed the obsolescence of the IBM 1401 by enabling customers to migrate their existing applications to Honeywell's faster architecture without rewriting code, focusing primarily on mainframe environments where hardware upgrades were costly and disruptive.^[13] In the 1980s, binary translation gained traction for minicomputer migrations, exemplified by Hewlett-Packard's Object Code Translator (OCT) developed in 1987. OCT facilitated the shift from the HP 3000 Series running MPE V to the new HP Precision Architecture systems, such as the Series 930 and 950 under MPE XL, by converting object code from the older instruction set into native executable modules. Designed for simple single-file translations, it handled legacy applications without requiring source code recompilation, emphasizing compatibility in commercial computing settings where minicomputers were becoming obsolete. This approach provided 2-5 times the performance of emulation by generating optimized native code that leveraged the new architecture's 32 general-purpose registers.^[14] By the early 1990s, more sophisticated systems tackled complex architectural differences, as seen in Digital Equipment Corporation's VEST translator released in 1993. VEST converted OpenVMS VAX binaries to run on Alpha AXP processors, addressing challenges like instruction mapping, exception handling, and timing preservation to ensure near-native performance. Written in C++ and supported by the Translator Interface Environment (TIE) runtime, it enabled migration from VAX minicomputers to the 64-bit Alpha architecture amid hardware evolution. Early systems like VEST highlighted key limitations, including inadequate support for parallel processing in multitasking environments, intricate OS interactions such as calling standards, and issues with read-write memory ordering that could affect program correctness. These challenges arose from the need to maintain atomicity and granularity in translated code without full emulation overhead.^[15]

Key Milestones and Modern Tools

In the early 2000s, the Transmeta Crusoe processor marked a significant milestone in dynamic binary translation by implementing a software layer known as Code Morphing Software to translate x86 instructions into native VLIW instructions on its underlying hardware, enabling full x86 compatibility while optimizing for low power consumption in mobile devices.^[16] This approach, introduced in 2000, demonstrated the practical viability of runtime translation for bridging complex instruction set architectures in commercial processors.^[17] Later in the decade, Apple's Rosetta, released in 2006 as part of the transition from PowerPC to Intel x86 processors in Mac computers, provided dynamic translation to allow legacy PowerPC applications to run seamlessly on Intel-based systems without recompilation.^[18] The 2010s saw continued evolution with tools emphasizing cross-platform emulation and performance. QEMU's Tiny Code Generator (TCG), integrated into the emulator starting around 2008 and refined through the decade, facilitated cross-ISA binary translation by converting guest instructions into an intermediate representation before generating host code, supporting efficient emulation across diverse architectures like x86 to ARM.^[19] In 2020, Apple's Rosetta 2 extended this legacy for the shift to Apple Silicon, translating x86-64 binaries to ARM64 with just-in-time compilation and caching, achieving approximately 78-80% of native performance in many workloads on M1 chips.^[20] Advancements in the 2020s focused on open-source and Linux-centric solutions for emerging hardware. FEX-Emu, launched in 2021, emerged as a high-performance user-mode emulator for running x86 and x86-64 Linux applications on ARM64 systems, leveraging dynamic translation with adaptive caching to support gaming and productivity software.^[21] By 2023, integrations of LLVM backends in binary translators, such as in hybrid systems like MFHBT, enabled retargetable translation pipelines that lift binaries to LLVM IR for multi-stage optimization and feedback-driven improvements, reducing memory accesses by up to 81% in benchmarks.^[22] Modern tools continue to build on these foundations for instrumentation and ecosystem support. DynamoRIO, a dynamic instrumentation framework first publicly released in 2002 and evolved through ongoing updates, provides a platform for runtime code manipulation and analysis across x86 and ARM, powering tools for security, optimization, and debugging with low overhead.^[23] Microsoft's x86-to-ARM translator, enhanced in Windows 11 updates around 2022 and formalized as the Prism emulation layer by 2024, just-in-time compiles x86/x64 code to ARM64 with optimizations for compatibility, enabling unmodified Windows applications to run on ARM devices while improving support for vector instructions like AVX.^[24] In June 2025, Apple announced at WWDC that macOS 27 (released in 2026) would be the last version supporting Intel-based Macs, with Rosetta 2 support phased out by late 2027 for most applications except select legacy games, marking the full transition to Apple Silicon.^[25] Recent trends as of 2025 continue to advance hybrid static-dynamic binary translation methods, combining ahead-of-time static lifting with runtime adjustments for optimized performance on heterogeneous hardware, as demonstrated in systems like BP-QEMU which improve execution efficiency through branch prediction.^[26]

Motivations

Compatibility and Migration

Binary translation serves a primary role in instruction set architecture (ISA) migrations by enabling the execution of legacy binaries on new hardware platforms without requiring recompilation. This capability is essential during CPU upgrades, where organizations aim to leverage more efficient architectures while maintaining compatibility with established software ecosystems. For example, Digital Equipment Corporation's transition from VAX to Alpha AXP utilized binary translation to port OpenVMS applications, allowing seamless execution of existing binaries on the new RISC-based processors.^[27] Such migrations preserve investments in legacy code, which often spans decades and involves critical business logic.^[28] In addition to ISA shifts, binary translation addresses OS and ecosystem compatibility challenges, particularly in handling application binary interface (ABI) differences, system calls, and library dependencies during cross-platform ports. For instance, translating from x86 to ARM requires mapping divergent calling conventions, memory access patterns, and OS-specific semantics to ensure functional equivalence on the host system. This is critical in environments like Windows on ARM, where dynamic translation layers convert x86 instructions to ARM64 equivalents, accommodating variations in weak memory models and synchronization to support diverse software stacks.^[29]^[24] Practical use cases demonstrate binary translation's versatility across industries. In enterprise settings, it facilitates migrations from legacy mainframes to cloud infrastructures, as seen in historical efforts like VAX-to-Alpha ports that enabled enterprise applications to run on modern hardware without source code modifications. In gaming, it supports backward compatibility for older titles on new consoles, such as accelerating x86 PC games on ARM-based mobile or handheld devices through optimized translation techniques.^[30] For embedded systems updates, specialized dynamic translators adapt binaries to resource-constrained processors, ensuring compatibility during hardware refreshes in IoT and automotive applications.^[31]^[30] The approach offers significant benefits for developers, particularly in porting closed-source applications where source code is unavailable or proprietary, thereby reducing migration timelines and costs compared to full rewrites. However, ensuring semantic equivalence poses challenges, especially for non-deterministic behaviors like threading and concurrency, where architectural differences—such as memory ordering in x86 versus ARM—can introduce discrepancies in parallel execution. Translators must emulate these aspects precisely to avoid behavioral deviations, often requiring advanced handling of atomic operations and thread synchronization.^[32]^[33]^[29]

Performance Considerations

Binary translation introduces several sources of overhead that impact overall system efficiency. Translation time represents an initial cost in static approaches, where the entire binary must be processed upfront, potentially delaying application startup. In dynamic translation, runtime overhead arises from on-the-fly translation and management of code caches, including the cost of evicting and reloading translated fragments. Additionally, code size expansion is common, with translated binaries often growing by a factor of 1.46x or more due to differences in instruction encoding and the need to emulate complex semantics, leading to increased memory footprint and potential instruction cache pressure.^[34] Performance metrics for binary translation vary by approach and optimization level. Static binary translation typically achieves 60-80% of native execution speed on large benchmarks, as exemplified by a median of 67% performance relative to native compilation in peephole-optimized translations of PowerPC to x86 code.^[35] Dynamic binary translation, leveraging just-in-time (JIT) compilation and caching, often reaches 80-95% of native speed for steady-state execution, though overall slowdowns can be minor in optimized systems like Rosetta 2.^[34] Several factors influence the efficiency of binary translation. Differences in instruction density between source and target ISAs can lead to expanded code, reducing fetch efficiency and increasing instruction cache misses. Branch prediction accuracy is affected by translation-induced changes in control flow, potentially degrading predictor effectiveness and incurring more misprediction penalties. Cache pollution occurs when translated code fragments evict useful native instructions or data, exacerbating misses in shared caches, particularly in dynamic systems with frequent code cache updates.^[36]^[37] Binary translation involves inherent trade-offs between static and dynamic methods. Static translation provides predictable performance without runtime overhead but demands complete upfront analysis, limiting adaptability to self-modifying code or dynamic loads. Dynamic translation offers flexibility and runtime adaptations, such as profile-guided optimizations, but suffers initial slowdowns from translation and caching during warmup phases.^[38]^[38] Broader impacts of binary translation extend to resource-constrained environments. In mobile and embedded devices, performance overheads directly increase energy consumption, as slower execution prolongs CPU activity and raises power draw; optimized translations can mitigate this by reducing cycles per instruction. Scalability for large applications is challenged by code cache management and memory demands, where persistent caching helps sustain performance but risks bloat in systems with vast code footprints.^[39]^[40]

Static Binary Translation

Process and Techniques

Static binary translation involves an offline process that disassembles the entire source binary ahead of time, reconstructing its control flow and data dependencies to generate a complete executable for the target architecture. This begins with disassembly using tools like IDA Pro or objdump to recover the instruction stream and build a control flow graph (CFG), identifying basic blocks, functions, and call graphs without runtime execution.^[41] Key techniques include instruction mapping, where source instructions are semantically equivalent to target instructions, often via an intermediate representation (IR) like LLVM to facilitate retargeting across ISAs. Register allocation addresses mismatches in register counts or semantics by spilling to memory or remapping, while address translation handles differences in memory models, such as segment registers in x86 to flat addressing in RISC. Control flow recovery resolves indirect branches and jumps through data-flow analysis or jump-target identification, though unresolved targets may require runtime resolution stubs.^[42]^[43] Optimization passes, such as peephole rewriting, eliminate redundancies and apply target-specific idioms post-mapping, improving code density and performance. Handling dynamic elements like self-modifying code or dynamic linking often necessitates assumptions of static behavior or hybrid approaches with minimal runtime support, as full static translation assumes non-modifying code. External references, such as library calls, are resolved by linking against target libraries or providing emulation wrappers.^[1] The output is a standalone target binary, enabling direct execution without translation overhead, though initial translation time can be significant for large programs. Frameworks like QEMU's user-mode emulation can incorporate static modes, but pure static tools focus on complete recompilation for portability.^[41]

Examples

A notable modern application occurred in 2014 when developer "notaz" performed static recompilation of the 1998 game StarCraft from x86 to ARM architecture, facilitating its port to handheld devices like the OpenPandora without access to source code. This effort involved reverse engineering and direct translation of the binary to generate an equivalent ARM executable, demonstrating static translation's utility for legacy game migration to mobile platforms.^[44] Among open-source tools, RevGen, developed in the early 2010s at EPFL, serves as a retargetable static binary translator that lifts x86 binaries to LLVM intermediate representation (IR), enabling cross-architecture analysis and optimization without source code. Similarly, McSema, released by Trail of Bits starting in 2014, is an executable lifter that statically translates x86 and x86-64 binaries to LLVM bitcode, supporting both Linux and Windows formats for tasks like decompilation and recompilation.^[43]^[45] A practical case study illustrating outcomes is the 2014 static recompilation of Cube World's x86 terrain generation binary to x86-64 and other architectures, part of an open-server implementation project. This translation converted the original executable's code sections into portable C++ equivalents, allowing successful generation of terrain data across platforms while integrating with a runtime library for handling relocations and flags.^[46] In practice, static binary translation faces limitations when dealing with obfuscated or packed binaries, as these techniques disrupt disassembly and control-flow recovery, often leading to incomplete or erroneous translations. For instance, packers commonly employ code encryption and dynamic unpacking that evade static analysis, requiring additional dynamic techniques for resolution.^[47]^[48]

Dynamic Binary Translation

Process and Techniques

Dynamic binary translation operates through a runtime process that involves on-demand disassembly of guest code blocks, often in the form of traces—sequences of frequently executed instructions—into an intermediate representation (IR). This IR is then optimized and compiled just-in-time (JIT) into host-native code, which is executed and stored in a code cache for reuse, minimizing repeated translation overhead.^[2] The process begins with an interpreter or dispatcher that executes initial code fragments until a hot path is detected, triggering translation to avoid interpretive slowdowns. Key techniques include trace selection, where execution counters identify hot code paths based on branch frequencies, prioritizing translation of these paths to focus resources on performance-critical regions. Binary instrumentation inserts profiling code during disassembly to gather runtime data, such as branch outcomes or memory accesses, enabling adaptive decisions without halting execution.^[49] Runtime optimizations, like loop unrolling, expand repetitive structures in traces to reduce branch overhead and improve instruction-level parallelism during JIT compilation.^[50] To handle program dynamism, dynamic binary translators employ speculative execution for conditional branches, predicting paths and generating code accordingly, with rollback mechanisms—such as cache exits to the interpreter—if mispredictions occur, ensuring correctness. Syscall integration involves intercepting guest system calls, emulating them on the host OS via wrappers that preserve state and handle asynchronous events like signals.^[51] Optimization passes leverage profile data from instrumentation to guide retranslation of traces, refining code based on observed behaviors like loop frequencies.^[52] Vectorization transforms scalar operations in IR to single instruction, multiple data (SIMD) equivalents on the host, exploiting wider vector units for data-parallel workloads when guest instructions align.^[53] Garbage collection of the code cache evicts cold traces using heuristics like least-recently-used or generational policies, reclaiming space to prevent fragmentation and maintain translation efficiency.^[50] Frameworks like Valgrind facilitate instrumentation by dynamically translating code to IR, applying tool-specific insertions for profiling or debugging, and resynthesizing to host code in a cache, emphasizing heavyweight analysis over lightweight speed.^[51]

Software Implementations

Software implementations of dynamic binary translation primarily involve just-in-time (JIT) compilers and emulators that translate and execute guest instructions on the host CPU at runtime, enabling cross-architecture compatibility without hardware assistance. These systems often employ code caching to reuse translated blocks, reducing overhead for frequently executed code paths. Notable examples include frameworks optimized for user-mode emulation, full-system virtualization, and runtime instrumentation. Apple's Rosetta 2, introduced in 2020 with the transition to Apple Silicon, serves as a JIT-based translator for running x86-64 applications on ARM-based Macs. It performs ahead-of-time (AOT) translation for static code and JIT for dynamically generated code, such as from just-in-time compilers, storing translated binaries in a cache to achieve near-native performance—typically 78-90% of equivalent ARM-native execution in benchmarks across various workloads. This caching mechanism minimizes repeated translation, allowing most x86 programs to run efficiently after an initial compilation phase.^[54]^[20] QEMU, developed since 2003, utilizes its Tiny Code Generator (TCG) as a dynamic translation backend for full-system and user-mode emulation across multiple instruction set architectures (ISAs). TCG breaks down guest instructions into intermediate micro-operations, which are then optimized and emitted as host-native code blocks stored in a translation cache, supporting translations like MIPS to x86 with features for handling self-modifying code and exceptions. This portable approach enables QEMU to emulate entire operating systems, such as running Linux on x86 hosts for ARM guests, while maintaining reasonable performance through block chaining and register allocation optimizations.^[55] The Dynamo project from Hewlett-Packard Laboratories in the late 1990s pioneered dynamic optimization via binary translation on PA-RISC processors under HPUX. It interpreted code to identify hot traces—frequently executed paths—and translated them into optimized fragments stored in a software code cache, applying runtime optimizations like redundancy elimination to yield average speedups of 7-12% on SPECint95 benchmarks. Building on this, DynamoRIO, released in 2002, evolved into an open-source dynamic instrumentation framework for IA-32 on Windows and Linux, allowing clients to insert code for analysis and optimization with minimal overhead, achieving up to 40% performance gains in select cases through adaptive code modification. It has been widely adopted for research prototypes and security tools, such as intrusion detection via runtime monitoring.^[56]^[57] More recent developments include FEX-Emu, launched in 2021 as an open-source usermode emulator for x86 and x86-64 binaries on ARM64 Linux hosts. It focuses on low-overhead execution for gaming and desktop applications, supporting Wine and Proton for Windows titles through API forwarding (e.g., Vulkan, OpenGL) and an experimental code cache to reduce stuttering, while maintaining broad compatibility with 32- and 64-bit binaries on distributions like Ubuntu and Fedora. FEX-Emu achieves this via a fast translation pipeline optimized for ARMv8+ hardware, enabling practical performance for demanding workloads like commercial games.^[21] Beyond specific tools, dynamic binary translation underpins broader applications in debugging, where systems like DynamoRIO enable reversible execution and taint analysis for vulnerability detection; virtualization, as in QEMU's full-system emulation for OS migration; and reverse engineering, facilitating cross-platform binary inspection and instrumentation without source code access. These uses leverage translation caches and runtime feedback to balance accuracy and efficiency in analyzing opaque executables.^[58]^[59]

Hardware Implementations

Hardware implementations of dynamic binary translation (DBT) integrate specialized processor circuitry and architectural features to accelerate runtime translation, minimizing the overhead of decoding, optimization, and code generation compared to software-only systems. These approaches often involve co-designed hardware and software, where dedicated units handle initial instruction decoding or caching of translated micro-operations, enabling compatibility across instruction set architectures (ISAs) while optimizing for power and performance. Early examples focused on VLIW-based hosts to exploit instruction-level parallelism in translated code, while modern designs leverage caches and buffers to reduce re-translation costs.^[60] A pioneering hardware implementation is the Transmeta Crusoe processor family, launched in 2000, which featured VLIW cores with integrated support for an on-chip dynamic translator to emulate x86 instructions. The Code Morphing Software (CMS) layer interpreted and translated x86 binaries into native VLIW code, speculatively optimizing for common execution paths to reduce power consumption in mobile applications; this co-design achieved near-native performance for many workloads while simplifying hardware complexity. The successor, Efficeon in 2004, enhanced this architecture with wider issue widths and improved translation caching, further boosting efficiency for x86 compatibility on non-x86 silicon.^[61] IBM's DAISY (Dynamically Architecture Instruction Set from Yorktown) system, developed in the 1990s for AS/400 enterprise servers, provided hardware-assisted DBT to execute System/390 binaries on a custom VLIW host processor. DAISY used tree-structured intermediate representations for rapid translation and optimization, with hardware units managing exception handling and architectural state to ensure 100% compatibility; this enabled seamless migration from legacy System/390 code to PowerPC without recompilation, achieving up to 90% of native performance in key workloads.^[62]^[63] Key techniques in hardware DBT include dedicated translation engines, which perform front-end tasks like instruction fetching, decoding, and basic remapping in specialized circuits to offload the main processor core.^[64] Micro-op caches, prominent in Intel processors since the Nehalem microarchitecture (2008), store decoded micro-operations from complex CISC instructions, allowing fast retrieval and fusion during translation to avoid repeated decoding overheads.^[65] Hardware trace buffers, akin to trace caches in out-of-order processors, capture sequences of executed instructions or translated blocks in on-chip memory, enabling quick replay and optimization of hot code paths to improve translation throughput by up to 2-3x in simulated DBT scenarios.^[66] In contemporary systems, ARM Cortex processors (2010s onward) incorporate features like enhanced branch prediction and configurable cache hierarchies that facilitate efficient JIT compilation and DBT, supporting software translators in low-power embedded environments without dedicated DBT units.^[67] Similarly, Intel's ongoing refinements to micro-op caches in Xeon and Core series (2020s) provide indirect acceleration for DBT by streamlining the handling of translated instruction streams in virtualization and emulation contexts.^[68]

References

[1]
[PDF] binary-translation.pdf
A translated binary program is a sequence of new-architecture in- structions that reproduces the behav- ior of an old-architecture program. Typically, much of ...Missing: science | Show results with:science
[2]
[PDF] Dynamic Binary Translation - Compilers and Languages
Dynamic binary translation is the process of translating code for one instruction set architecture to another on the fly.Missing: computer | Show results with:computer
[3]
[PDF] Machine-Adaptable Dynamic Binary Translation-
Dynamic binary translation is the process of translating and optimizing executable code for one machine to another at runtime, while the program is "executing" ...
[4]
[PDF] Binary-to-Binary Translation Literature Survey
Mar 16, 1998 · In this paper, we will briefly review the history of binary translation in section II. In section III, we will discuss the alternatives to ...
[5]
https://www.sciencedirect.com/science/article/pii/B9780123745149000264
[6]
[PDF] P3.An Overview on Binary Translation | PEPCC
From Hack to Elaborate Technique – A Survey on Binary Rewriting. ACM Comput. Surv. 52, 3, Article 49 (June 2019), 37 pages. 6. Page 7. Binary Translation for ...
[7]
Experience in the design, implementation and use of a retargetable ...
Binary translation, the process of translating binary executables, makes it possible to run code compiled for source (input) machine Ms on target (output) ...
[8]
Machine-adaptable dynamic binary translation - ACM Digital Library
Dynamic binary translation is the process of translating and optimizing executable code for one machine to another at runtime, while the program is "executing" ...
[9]
Hardware-accelerated dynamic binary translation - ACM Digital ...
Abstract—Dynamic Binary Translation (DBT) is often used in hardware/software co-design to take advantage of an architecture model while using binaries from ...
[10]
https://dl.acm.org/doi/pdf/10.5555/3130379.3130632
[11]
[PDF] Virtual Machines and Binary Translation
May 4, 2016 · More SBT problems: Self-modifying code. • Rare in most code, but has to be handled if allowed by guest ISA. • Usually handled by including ...
[12]
Microprogramming History -- Mark Smotherman - Clemson University
... conversion tools, like the Honeywell "Liberator" program that accepted IBM 1401 programs and converted them into programs for the 1401-like Honeywell H-200.
[13]
[PDF] hewlett - vtda.org
Dec 8, 1987 · MPE V machine emulation is supported by the HP 3000. Emulator and the HP 3000 Object Code Translator (OCT).8. The emulator is a program that ...
[14]
Binary translation | Communications of the ACM - ACM Digital Library
Low overhead dynamic binary translation on ARM The ARMv8 architecture introduced AArch64, a 64-bit execution mode with a new instruction set, while retaining ...
[15]
[PDF] The Technology Behind Crusoe™ Processors
The software layer is called Code Morphing™ software because it dynamically “morphs” x86 instructions into VLIW instructions. The Code Morphing software ...
[16]
[PDF] The Transmeta Code Morphing Software: Using Speculation ...
Transmeta's Crusoe microprocessor is a full, system- level implementation of the x86 architecture, comprising a native VLIW microprocessor with a software ...
[17]
Apple Unveils New MacBook Featuring Intel Core Duo Processors
PRESS RELEASE May 16, 2006. Apple Unveils New MacBook Featuring Intel Core Duo Processors ... ***See https://www.apple.com/rosetta/ for information on ...<|control11|><|separator|>
[18]
Translator Internals — QEMU documentation
QEMU's dynamic translation backend is called TCG, for “Tiny Code Generator”. For more information, please take a look at TCG Intermediate Representation.Missing: ISA | Show results with:ISA
[19]
How x86 to arm64 Translation Works in Rosetta 2 - InfoQ
Nov 30, 2020 · Thanks to Rosetta 2, most x86 programs will be able to execute after an initial translation step. Apple started to use binary translation ...
[20]
FEX-Emu/FEX: A fast usermode x86 and x86-64 emulator for Arm64 ...
FEX allows you to run x86 applications on ARM64 Linux devices, similar to qemu-user and box64. It offers broad compatibility with both 32-bit and 64-bit ...FEX-Emu · Issues 176 · Pull requests 14 · Discussions
[21]
(PDF) MFHBT: Hybrid Binary Translation System with Multi-stage ...
We implement a prototype of this new system powered by LLVM. Experimental results demonstrate an 81% decrease in the number of memory access instructions and a ...<|separator|>
[22]
History of DynamoRIO
DynamoRIO originated from MIT and HP in 2001, was used by Determina, acquired by VMware in 2007, and open-sourced in 2009.
[23]
https://dynamorio.org/page_history.html
[24]
Binary Translation and Cross-architecture compatibility with focus on ...
Sep 18, 2025 · This paper provides a comprehensive review of Binary Translation and Cross-architecture, focusing on operating system-level implementations.
[25]
[PDF] Instruction Set Migration at Warehouse Scale - arXiv
Oct 16, 2025 · Modern ISA migrations can often build on a robust open-source ecosystem, making it possible to recompile all relevant software from scratch.<|separator|>
[26]
A Dynamic and Static Binary Translation Method Based on Branch ...
Jul 10, 2023 · Binary translation is a technique that automatically translates code from a target architecture into functionally equivalent code for a host ...
[27]
Porting OpenVMS from VAX to Alpha AXP - ACM Digital Library
Porting OpenVMS from VAX to Alpha AXP · Formats available · References · Cited By · Index Terms · Recommendations · Comments · Information & Contributors.
[28]
Static/dynamic real-time legacy software migration
Oct 30, 2020 · Binary translation can address this incompatibility by migrating applications from one legacy ISA to a new one, although binary translation has ...
[29]
A Dynamic Binary Translator for Weak Memory Model Architectures
If we translate the MP pro- gram's binary from x86 to Arm, without taking their memory mod- els into account, the resulting Arm binary may exhibit undesirable.
[30]
Dynamic binary translation specialized for embedded systems
This paper describes the design and implementation of a novel dynamic binary translation technique specialized for embedded systems.
[31]
ARMing x86 Games: Accelerating Binary Translation Using Software ...
Sep 25, 2025 · We propose a novel optimization method that enhances compatibility and performance by leveraging software-only strategies tailored to ARM ...
[32]
No Source Code? No Problem! - ACM Queue
Oct 2, 2003 · What if you have to port a program, but all you have is a binary? Typical software development involves one of two processes: the creation of ...
[33]
https://dl.acm.org/doi/10.1145/3483790
[34]
An Instruction Inflation Analyzing Framework for Dynamic Binary ...
Mar 23, 2024 · Dynamic binary translation enables applications built for a guest ISA to run on a host ISA machine, with uses in several areas.
[35]
[PDF] Binary Translation Using Peephole Superoptimizers - USENIX
This paper presents a new binary translation scheme that automatically learns translation rules using superoptimization techniques and peephole rules.Missing: science | Show results with:science
[36]
[PDF] Using Dynamic Binary Translation to Fuse Dependent Instructions
fect the instruction set and dynamic binary translation. In this ... Instruction Density. An ISA with good coding density can reduce instruc- tion ...
[37]
[PDF] HDTrans: A Low-Overhead Dynamic Translator
In order to reduce register pressure and cache pollution, the sieve is implemented using blocks of in- structions rather than blocks of data. An indirect ...
[38]
[PDF] Fast Binary Translation: Translation Efficiency and Runtime Efficiency
Fast binary translation is a key component for modern software, using dynamic translation at runtime. fastBT is a generator for low-overhead, table-based ...
[39]
[PDF] Efficient and Retargetable Dynamic Binary Translation
Dynamic binary translation (DBT) is a core technology to many important applications such as system virtualization, dynamic binary instrumentation and ...
[40]
[PDF] A General Persistent Code Caching Framework for Dynamic Binary ...
Jun 22, 2016 · 4.3 Performance Overhead and Code Size. Figure 11 shows the ... Binary Translation. Journal of Computer Research and. Development 51, 10 ...
[41]
[PDF] Dynamic Binary Translation & Instrumentation
Instrumented code needs extra registers. E.g.: • Virtual registers available to the tool. • A virtual stack pointer pointing to the instrumentation stack. • ...
[42]
[PDF] Dynamic Binary Translation and Optimization Erik R. Altman Kemal ...
Dec 13, 2000 · ¯ Control Speculation: Operations Above Branches. ¯ Data Speculation: Loads above possibly aliased stores. Page 39. DAISY Data Speculation.
[43]
[PDF] A Framework for Heavyweight Dynamic Binary Instrumentation
Valgrind is a dynamic binary instrumentation (DBI) framework that occupies a unique part of the DBI framework design space. This paper describes how it works, ...
[44]
Optimising hot paths in a dynamic binary translator
In dynamic binary translation, code is translated "on the fly" at run-time, while the user perceives ordinary execution of the program on the target machine.
[45]
Improving SIMD Parallelism via Dynamic Binary Translation
This article presents a dynamic binary translation technique that enables short-SIMD binaries to exploit benefits of new SIMD architectures by rewriting short- ...
[46]
Playing StarCraft On An ARM - Hackaday
Jul 31, 2014 · Blizzard could take the code for StarCraft, port it to an ARM ... Static recompilation, but literally recompiled! What a hack. This is ...Missing: binary | Show results with:binary
[47]
[PDF] Enabling Sophisticated Analyses of x86 Binaries with RevGen
RevGen uses static binary translation to convert binary code to the widely-used LLVM IR, without relying on the source code.Missing: 1990s retargetable
[48]
lifting-bits/mcsema - GitHub
Aug 23, 2022 · McSema is an executable lifter. It translates ("lifts") executable binaries from native machine code to LLVM bitcode.
[49]
Practical and Portable X86 Recompilation - that mat blog
Apr 14, 2014 · Binary recompilation is a subject of intense research, but for mere mortals, recompiling binary code or executables can seem completely off-limits.Missing: reduction | Show results with:reduction
[50]
[PDF] Static Disassembly of Obfuscated Binaries - UCSB Computer Science
The paper presents novel binary analysis techniques that substantially improve the success of the disassem- bly process when confronted with obfuscated binaries ...Missing: limitations packed
[51]
[PDF] Binary-code obfuscations in prevalent packer tools - Paradyn Project
By contrast, static analysis of indirect control-transfer targets is particularly difficult in packed binaries, as they frequently use instructions whose tar-.
[52]
About the Rosetta translation environment - Apple Developer
Rosetta is a translation process that allows users to run apps that contain x86_64 instructions on Apple silicon. Rosetta is meant to ease the transition to ...
[53]
Apple Silicon M1 Emulating x86 is Still Faster Than Every Other Mac ...
Nov 15, 2020 · Rosetta 2 running x86 code appears to be achieving 78%-79% of the performance of native Apple Silicon code. Despite the impact on performance, ...
[54]
[PDF] QEMU, a Fast and Portable Dynamic Translator - USENIX
We present the internals of QEMU, a fast machine em- ulator using an original portable dynamic translator. It emulates several CPUs (x86, PowerPC, ...
[55]
[PDF] Dynamo TR
Dynamic optimization refers to the runtime optimization of a native program binary. This report describes the design and implementation of Dynamo, ...
[56]
[PDF] An Infrastructure for Adaptive Dynamic Optimization
The main contribution of this paper is a framework for implementing dynamic analyses and optimizations. The framework is based on the DynamoRIO dynamic code.
[57]
[PDF] Dynamic Analysis and Debugging of Binary Code for Security ...
In this paper, we present our work on developing a cross-platform interactive analysis tool, which leverages techniques such as symbolic execution and taint ...
[58]
[PDF] DYNAMIC BINARY TRANSLATION FOR DETERMINISTIC REPLAY
The translate and execute loop contin- ues until the program terminates. Several optimization techniques are introduced to do this efficiently (discussed in ...
[59]
[PDF] Hardware-Accelerated Dynamic Binary Translation
Dynamic binary translation (DBT) consists in translating. – at runtime – a program written for a given instruction set to another instruction set. Dynamic ...
[60]
[PDF] Transmeta Crusoe and efficeon:
Jan 10, 2003 · Code Morphing Software layer provides a completely compatible implementation of the x86 architecture on the embedded VLIW processor:.
[61]
[PDF] DAISY: Dynamic Compilation for 100% Architectural Compatibility
The paper is organized as follows: We first give an exam- ple illustrating the new fast dynamic compilation algorithm used by DAISY. Next, various architectural ...
[62]
(PDF) DAISY/390: Full System Binary Translation of IBM System/390
We describe the design issues in an implementation of the ESA#390 architecture based on binary translation to a very long instruction word #VLIW# processor.
[63]
[PDF] Hardware-Accelerated Dynamic Binary Translation - Hal-Inria
Apr 3, 2017 · In future work, we will perform register allocation and apply optimizations such as loop unrolling and superblock formation. Thanks to the.
[64]
[PDF] I See Dead μops: Leaking Secrets via Intel/AMD Micro-Op Caches
Modern Intel and AMD processors cache decoded micro- ops in a dedicated streaming cache, often called the decoded stream buffer or the micro-op cache, in order ...
[65]
[PDF] Evaluating the Impact of Dynamic Binary Translation Systems on ...
The effect of dynamic binary translation is lower in the L2 cache, with an increase in the number of misses by 12% for Pin and 24% for. DynamoRIO. The L1 data ...
[66]
Just-In-Time Compilation on ARM—A Closer Look at Call-Site Code ...
This article studies how the lack of strong hardware support for Self Modifying Code (SMC) in low-power architectures (eg, absence of cache coherence)
[67]
[PDF] Mobilizing the Micro-Ops: Exploiting Context Sensitive Decoding for ...
Flushing the micro-op cache every translation mode switch could have a major performance impact. We instead choose to extend the tag bits of the micro-op cache ...