Long mode
Long mode is the primary operating mode of the x86-64 architecture that enables 64-bit processing by extending the x86 instruction set architecture to support 64-bit virtual addressing, wider registers, and enhanced computational capabilities, while ensuring full backward compatibility with legacy 32-bit and 16-bit applications through specialized sub-modes.[1][2] Developed by AMD as an extension to the x86 architecture and first implemented in the AMD Opteron processor released on April 22, 2003, long mode represents a seamless evolution that allows 64-bit operating systems to run unmodified 32-bit and 16-bit software without requiring hardware changes or recompilation.[3][1] Intel adopted the technology in 2004 under the Intel 64 branding, integrating it into processors like the Xeon Nocona to standardize 64-bit computing across the industry.[4][2] Long mode is activated via the Long Mode Enable (LME) bit in the Extended Feature Enable Register (EFER) after enabling Physical Address Extension (PAE) paging, transitioning the processor from protected mode.[2] It encompasses two sub-modes: 64-bit mode, where the default operand size is 32 bits but addresses are 64 bits wide, allowing access to up to $2^{48} bytes (256 TiB) of virtual memory with 4-level paging, extendable to $2^{57} bytes (128 PiB) with 5-level paging on supported hardware, and utilizing extended registers such as R8–R15 and RIP-relative addressing; and compatibility mode, which emulates the 32-bit protected mode environment for legacy applications, restricting them to a 4 GB virtual address space.[1][2][5] Key enhancements in long mode include the addition of eight new general-purpose registers (R8 to R15), eight additional SSE registers (XMM8 to XMM15), and the REX prefix to access these and specify 64-bit operands, significantly boosting performance for 64-bit applications without disrupting existing x86 code.[1] The mode eliminates legacy features like the global descriptor table for segmentation in favor of a flat memory model, though compatibility sub-mode retains segment support for older software.[2] This design has made long mode the foundation for modern 64-bit operating systems such as Windows, Linux, and macOS, powering the vast majority of personal computers and servers today.[1]Introduction
Definition and Purpose
Long mode is a primary operating mode of x86-64 processors that enables 64-bit virtual addressing with a theoretical maximum of 2^64 bytes (16 exabytes), expands general-purpose registers to 64 bits (including eight new registers R8–R15), and incorporates a superset of the x86 instruction set with extensions like the REX prefix for 64-bit operations and RIP-relative addressing.[6] This mode consists of two submodes—64-bit mode for native 64-bit code execution and compatibility mode for running legacy 32-bit applications—while ensuring binary compatibility with existing x86 software through a flat memory model and disabled segmentation in 64-bit mode.[6] The purpose of long mode is to overcome the inherent limitations of the 32-bit IA-32 architecture, such as the 4 GB virtual address space ceiling and 32-bit register constraints, which restricted scalability for memory-intensive workloads in servers, databases, and scientific computing.[6] By supporting up to 256 TB using 48-bit addressing in early implementations and up to 128 PB using 57-bit addressing in modern processors with 5-level paging support (as of 2025) of practical virtual address space and 4 PB of physical memory (via 52-bit physical addresses in current implementations), it facilitates larger datasets, enhanced multitasking, and better performance for 64-bit applications without requiring a complete architectural overhaul.[6][2] x86-64, the architecture encompassing long mode, extends the IA-32 instruction set and was first specified by AMD in 1999 as a compatible evolution for 64-bit computing.[7] Intel subsequently adopted the specification, rebranding it as Intel 64, enabling widespread deployment of 64-bit operating systems including Windows 64-bit editions and Linux x86_64 kernels.[7] Support for long mode can be detected in software using the CPUID instruction with extended function 0x80000001 in the EAX register, where bit 29 (the Long Mode flag) in the EDX register indicates availability if set to 1.[6]Historical Development
Long mode, a 64-bit operating mode in the x86-64 architecture, originated from AMD's efforts in the late 1990s to extend the 32-bit x86 instruction set architecture (ISA) while maintaining backward compatibility. AMD developed the AMD64 architecture—initially known as x86-64—as a response to the limitations of 32-bit addressing in growing computing demands for larger memory capacities. It was announced by AMD in 1999, with the first detailed public specification released in 2000, which outlined the core principles of 64-bit extensions including wider registers, enhanced addressing, and compatibility modes for legacy software.[1][8] Key milestones in long mode's implementation began with AMD's hardware rollout in 2003, marking the transition from design to practical deployment. The AMD Opteron server processors, based on the K8 microarchitecture, were the first to implement x86-64 and thus long mode, launching on April 22, 2003, followed by the consumer-oriented Athlon 64 on September 23, 2003.[9][10] Intel adopted the architecture in 2004 under the name Extended Memory 64 Technology (EM64T), initially shipping with Xeon processors on June 28, 2004, and later rebranding it as Intel 64; this move validated AMD's approach across the industry.[11] The UEFI specification, version 2.0 released in 2005 by the Unified EFI Forum, further facilitated long mode adoption by providing a standardized firmware interface for 64-bit booting on x86-64 systems, replacing legacy BIOS limitations.[12] The evolution of long mode continued through AMD's architectural advancements, with initial implementations in the K8 family (2003) establishing the foundational 64-bit capabilities, including up to 48-bit virtual addressing. By 2007, the K10 microarchitecture (Family 10h) expanded these features, introducing shared L3 caches, improved memory controller support for DDR3, and enhanced physical addressing up to 48 bits, enabling better scalability for multi-core systems and larger memory footprints.[13] Integration into mobile processors accelerated in 2006, with AMD's Turion 64 X2 launch in May and Intel's Core 2 Merom in July, extending long mode to laptops for portable 64-bit computing. By 2010, long mode had become dominant in servers and personal computers, powering the majority of new x86-based systems amid rapid software ecosystem growth. Long mode's development had profound industry impact, particularly in shifting the 64-bit computing paradigm away from Intel's incompatible IA-64 (Itanium) architecture toward the backward-compatible x86-64. The emphasis on x86 compatibility reduced migration costs for existing software, leading to IA-64's marginalization in favor of x86-64 for servers and desktops. As of 2025, x86-64 processors— all supporting long mode—account for over 94% of the PC market, with AMD and Intel commanding the vast majority of shipments in desktops, laptops, and servers.[14][15]Operating Modes in x86-64
Comparison with Legacy Modes
x86 processors support multiple operating modes to accommodate evolving software requirements while maintaining backward compatibility. These include real mode, a 16-bit environment emulating the original Intel 8086 processor for legacy DOS applications; protected mode, a 32-bit IA-32 mode introducing memory protection and multitasking; and long mode, a 64-bit extension designed for modern operating systems.[16] Mode switches occur through control registers such as CR0 and CR4, along with the IA32_EFER model-specific register.[16] Key differences among these modes lie in addressing capabilities, memory management, and protection mechanisms. Real mode limits physical addressing to 20 bits (1 MB maximum) using a segment:offset scheme with no inherent protection or paging support.[16] Protected mode expands to 32-bit linear addressing (up to 4 GB) with optional paging (4 KB or 4 MB pages) and complex segmentation via descriptor tables for access control and privilege levels (rings 0-3).[16] In contrast, long mode employs 64-bit virtual addressing, up to 256 TB (48 bits) with 4-level paging or 128 PB (57 bits) with 5-level paging, in canonical form with mandatory multi-level paging (4-level or 5-level hierarchies supporting 4 KB, 2 MB, or 1 GB pages) and a simplified flat segmentation model that disables legacy segment limits and bases except for FS and GS registers.[16] This flat model in long mode eliminates much of the segmentation overhead from protected mode, prioritizing efficiency for 64-bit applications while inheriting enhanced protection features like the no-execute (NX) bit.[16] Mode transitions reflect the hierarchical nature of the architecture, with processors powering on in real mode to ensure compatibility with early firmware.[16] Bootloaders typically transition from real mode to protected mode by setting the PE bit in CR0, enabling segmentation and paging as needed, before entering long mode via the long-mode enable (LME) bit in IA32_EFER and paging activation.[16] Direct transitions from real mode to long mode are not possible, as long mode requires the foundational protected mode infrastructure.[16] Within long mode, compatibility submode allows execution of legacy 32-bit protected-mode code by adjusting the code segment descriptor, providing a bridge without full mode switches.[16] Architecturally, long mode builds directly on protected mode's foundations to address the limitations of 32-bit addressing in an era of expanding memory demands, while eliminating 16-bit support and legacy segmentation complexities for improved performance and scalability in 64-bit environments.[16] This evolution balances the need for legacy software support with the efficiency required for contemporary operating systems.[16]Submodes of Long Mode
Long mode in the x86-64 architecture encompasses two primary submodes: 64-bit mode and compatibility mode. These submodes enable the processor to execute both modern 64-bit code and legacy 32-bit code within a unified environment that supports extended addressing and protection mechanisms.[6][17] 64-bit mode serves as the native execution environment for 64-bit applications, utilizing the full capabilities of the extended register set and instruction architecture. In this submode, general-purpose registers such as RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP, and the additional R8 through R15 are 64 bits wide, allowing for operations on larger data types and addresses. The instruction pointer, designated as RIP, is also 64 bits, supporting a vast virtual address space. New instructions and extensions are available, including those prefixed by the REX opcode, which specify 64-bit operand sizes, access to extended registers, and other enhancements not present in legacy modes.[6][17] Compatibility mode, in contrast, emulates the IA-32 protected mode environment to allow 32-bit applications to run seamlessly under a 64-bit operating system. Here, the processor restricts general-purpose registers to their 32-bit subsets (e.g., EAX, EBX), and addressing is limited to 32 bits, mimicking the behavior of traditional 32-bit x86 code. The RIP remains 64 bits internally but is interpreted within 32-bit constraints, and legacy segment registers function as in IA-32 mode. This submode operates within long mode's paging system, ensuring that memory management remains consistent with 64-bit requirements while preserving compatibility for unmodified 32-bit software.[6][17] Switching between these submodes occurs under the control of specific flags in the Extended Feature Enable Register (EFER) Model-Specific Register (MSR) and the code segment descriptor. The Long Mode Enable (LME) flag in EFER (bit 8) activates long mode capabilities, while the Long Mode Active (LMA) flag (bit 10, read-only) indicates that long mode is operational once paging is enabled. The Long attribute bit (CS.L) in the code segment descriptor then determines the active submode: CS.L set to 1 selects 64-bit mode, and CS.L set to 0 selects compatibility mode. Transitions typically involve far jumps, calls, or returns that load a new code segment descriptor.[6][17] In practice, 64-bit mode is employed for developing and running new applications that benefit from expanded address spaces and performance optimizations, such as those in high-performance computing or large-scale data processing. Compatibility mode, meanwhile, facilitates the execution of legacy 32-bit protected mode software without the need for recompilation, enabling 64-bit operating systems to support existing IA-32 applications alongside native 64-bit code.[6][17]| Aspect | 64-bit Mode | Compatibility Mode |
|---|---|---|
| Register Width | 64-bit (e.g., RAX, RIP) | 32-bit subsets (e.g., EAX, EIP) |
| Addressing | 64-bit virtual addresses | 32-bit virtual addresses |
| Instruction Extensions | REX prefix for 64-bit ops and extended registers | IA-32 instructions; no REX for 64-bit |
| Submode Selection | CS.L = 1 | CS.L = 0 |
Technical Implementation
Enabling Long Mode
To enable long mode on x86-64 processors, the CPU must first support the feature, which is verified using the CPUID instruction with function 80000001H; bit 29 (LM flag) in the EDX register indicates long mode capability.[18][19] The processor must also operate in protected mode, with paging enabled via CR0.PG bit 31 set to 1 and Physical Address Extension (PAE) activated by setting CR4.PAE bit 5 to 1, ensuring valid page tables are in place for 64-bit addressing.[18][19] The activation sequence begins by writing to the Extended Feature Enable Register (EFER) at MSR address C0000080H using the WRMSR instruction, setting the LME (Long Mode Enable) bit 8 to 1 to enable long mode capability.[18][19] Next, CR4.PAE is confirmed set to 1, and CR3 is loaded with the physical base address of the Page Map Level 4 (PML4) table to establish PAE paging structures.[18][19] Paging is then enabled by setting CR0.PG to 1, which automatically sets EFER.LMA (Long Mode Active) bit 10 to 1 if LME was previously enabled.[18][19] The Global Descriptor Table Register (GDTR) and Interrupt Descriptor Table Register (IDTR) are loaded using LGDT and LIDT instructions with tables containing 64-bit compatible descriptors, including a code segment with the L (long) bit set to 1.[18][19] Finally, a far jump to the 64-bit code segment (e.g.,jmp far 0x08:long_mode_start) flushes the instruction pipeline and switches the processor into long mode.[18][19]
In legacy BIOS environments, the processor starts in 16-bit real mode after reset, requiring the bootloader to transition through protected mode before following the long mode enablement steps.[18][19] Bootloaders such as GRUB perform this sequence, setting up descriptors and paging before jumping to the 64-bit kernel entry point. UEFI firmware also initiates in real or protected mode but transitions to long mode during the Driver Execution Environment (DXE) phase, handing off to the OS loader (e.g., BOOTX64.EFI) already in 64-bit mode with paging and IA-32e active.[20]
Invalid configurations during enablement, such as setting EFER.LME without PAE or paging, trigger a general protection fault (#GP).[18][19] Other faults like stack segment (#SS) can occur from non-canonical addresses post-transition.[19] Verification after enablement involves re-executing CPUID function 80000001H to confirm the LM flag and reading EFER.LMA bit 10 via RDMSR to ensure long mode is active.[18][19]
Register and Instruction Set Extensions
Long mode significantly expands the register set available to software, building upon the legacy 32-bit IA-32 architecture to support 64-bit operations. The general-purpose registers are extended from eight 32-bit registers (EAX through EDI) to sixteen 64-bit registers, named RAX through R15, where the original eight are widened to 64 bits and eight new ones (R8 through R15) are added.[6] These registers can be accessed in their full 64-bit form or in lower-precision subsets (e.g., RAX's lower 32 bits as EAX, 16 bits as AX, or 8 bits as AL), with 32-bit operations automatically zero-extending results into the upper 32 bits of the 64-bit register.[21] The flags register is likewise extended from the 32-bit EFLAGS to a 64-bit RFLAGS, incorporating all legacy flags plus additional bits for 64-bit-specific status and control, while the upper 32 bits remain reserved and zeroed.[6] The instruction pointer evolves from the 32-bit EIP to a 64-bit RIP, enabling addressing of the full 64-bit virtual address space.[6] For vector and SIMD processing, long mode mandates support for eight additional 128-bit XMM registers (XMM8 through XMM15), extending the legacy set to sixteen total, as part of the required SSE2 extension.[6] Subsequent extensions introduce wider registers: AVX adds eight 256-bit YMM registers (YMM0 through YMM15, with the upper 128 bits of each aligning with XMM registers), while AVX-512 further expands to sixteen 512-bit ZMM registers (ZMM0 through ZMM31) in 64-bit mode, along with eight 64-bit opmask registers (K0 through K7) for masked operations.[22] These vector registers enhance parallel processing for floating-point, integer, and multimedia workloads, with AVX and AVX-512 instructions encoded using VEX and EVEX prefixes, respectively, to support three-operand formats and conditional execution.[22] The instruction set in long mode incorporates a new REX prefix (a single-byte extension to legacy instruction encoding) to specify 64-bit operand sizes, access the additional registers (R8-R15 and XMM8-XMM15), and handle other extensions like high 8-bit register access.[6] Arithmetic and logical instructions default to 32-bit operands, with the REX.W prefix used to specify 64-bit operands, providing native support for 64-bit integer operations, while legacy instructions operating on 32-bit or smaller operands either zero-extend results to 64 bits (for 32-bit) or leave upper bits unchanged (for 16-bit and 8-bit).[6][2] SSE2 instructions, mandatory in long mode, extend scalar and vector floating-point operations to 128 bits, forming the baseline for SIMD capabilities.[6] Addressing modes are refined for efficiency in a flat 64-bit memory model, where segment registers (CS, DS, ES, FS, GS, SS) have their base addresses and upper limits ignored except for FS and GS base adjustments via MSRs, eliminating legacy segmentation overhead.[6] A key addition is RIP-relative addressing, which allows displacements relative to the 64-bit RIP value of the next instruction, facilitating position-independent code without relying on absolute addresses or register indirection.[6] This mode, encoded via the ModR/M byte, supports efficient data access in large address spaces and is widely used in modern operating systems and libraries.[21]Addressing and Memory Management
Virtual and Physical Addressing
In long mode, virtual addressing employs 64-bit logical addresses, but the canonical form restricts effective addressing to 48 bits in the standard configuration, where bits 63 through 48 are sign-extended from bit 47, resulting in an addressable virtual space of 256 terabytes (2^48 bytes). Modern processors support optional 5-level paging, extending canonical addressing to 57 bits (bits 63:57 sign-extended from bit 56), for a virtual space of 128 petabytes (2^57 bytes).[23] This space is conventionally divided into a user space occupying the lower half (addresses from 0 to 2^47 - 1 in 4-level mode, or 0 to 2^56 - 1 in 5-level mode) and a kernel space in the upper half (addresses from -2^47 to -1 or -2^56 to -1, represented as 0xFFFF800000000000 to 0xFFFFFFFFFFFFFFFF in 4-level mode), enforced through paging mechanisms to separate privilege levels.[23] Physical addressing in long mode varies by processor implementation, with early AMD64 processors, such as the initial Opteron family, supporting 40 bits for a 1 terabyte physical address space, later expanded to 48 bits (256 terabytes) in subsequent cores, and ultimately up to 52 bits (4 petabytes) in modern implementations.[24] Intel's initial 64-bit processors similarly started with 36 bits, progressing to 48 bits and beyond to 52 bits in later generations, determined by the MAXPHYADDR value reported via CPUID.[23][25] The canonical addressing rule mandates that for a virtual address to be valid, its upper bits (63:48 in 4-level mode or 63:57 in 5-level mode) must either all be 0s (for positive addresses) or all 1s (for negative addresses), matching the sign extension of bit 47 or bit 56; violation triggers a general-protection fault (#GP).[23] This mechanism ensures compatibility and reserves higher address bits for potential future expansion to a full 64-bit virtual space without requiring hardware modifications.[23] Software in long mode utilizes full 64-bit pointers for portability across implementations, with operating systems typically masking or zeroing unused upper bits to maintain canonical form during address generation and validation.[23] This approach allows seamless translation from virtual to physical addresses via paging structures, as detailed in subsequent sections on page size and translation.[23]Page Size and Translation
In long mode, the memory management unit (MMU) employs a four-level paging hierarchy to translate 48-bit canonical virtual addresses by default, enabling efficient memory protection and virtualization; modern processors also support an optional five-level hierarchy for 57-bit canonical virtual addresses, enabled by setting the LA57 bit in control register CR4 (or equivalent on AMD).[26][27] The four-level structure consists of the page map level-4 table (PML4), page directory pointer table (PDPT), page directory (PD), and page table (PT), with each level containing 512 entries addressed by 9 bits from the virtual address. The five-level structure adds a page map level-5 table (PML5) using an additional 9-bit index (bits 56:48).[26][27] The use of 512 entries per level, derived from 9-bit indices and 4 KB page table size, supports a vast address space while maintaining compatibility with translation lookaside buffer (TLB) efficiency.[26] Page sizes in long mode include the standard 4 KB granularity for fine-grained control, as well as larger 2 MB pages mapped directly via PD entries and 1 GB huge pages via PDPT entries. These extended sizes apply similarly in five-level paging, with additional support for 512 GB and 1 TB pages via higher-level entries.[26][27] These extended sizes, indicated by the page size (PS) bit in the respective table entries, reduce TLB pressure by covering larger memory regions with fewer translations, which is particularly beneficial for operating systems handling large data structures or file-backed mappings.[26] For instance, a 2 MB page uses a 21-bit offset, while a 1 GB page employs a 30-bit offset, allowing the MMU to bypass lower-level tables when PS is set.[26][27] The address translation process begins with the canonical virtual address, sign-extended to 64 bits (bits 63:48 matching bit 47 in 4-level mode, or bits 63:57 matching bit 56 in 5-level mode).[26][27] For 4 KB pages in four-level mode, the address splits into a 12-bit byte offset and four 9-bit indices: bits 47:39 for PML4, 38:30 for PDPT, 29:21 for PD, and 20:12 for PT. In five-level mode, an additional 9-bit PML5 index (56:48) is used.[26][27] The CR3 control register holds the physical base address of the PML4 table (or PML5 in five-level mode, aligned to 4 KB), from which the MMU walks the hierarchy by adding the indices to fetch subsequent bases until reaching the physical page frame in the PT entry, then appending the offset.[26][27] This process supports protections like read/write permissions and the no-execute (NX) bit, located at bit 63 of page table entries, which prevents instruction fetches from the page when the extended feature enable register's NXE bit is set, aiding in executable space protection.[26][27] Long mode paging builds on physical address extension (PAE) from protected mode, requiring CR4.PAE to be enabled for entry into long mode, but extends it with 64-bit base addresses and the full four- or five-level hierarchy for 48- or 57-bit virtual addressing.[26][27] While PAE in 32-bit mode uses a three-level structure with 32-bit entries for up to 36-bit physical addresses, long mode adopts 64-bit entries throughout, allowing up to 52-bit physical addresses without altering the core translation mechanics.[26][27]| Virtual Address Bit Range | Field | Purpose (for 4 KB Pages in 4-Level Paging) |
|---|---|---|
| 63:48 | Sign Extension | Canonical form (all 0s or 1s) |
| 47:39 | PML4 Index | Selects entry in PML4 table |
| 38:30 | PDPT Index | Selects entry in PDPT |
| 29:21 | PD Index | Selects entry in PD (or 2 MB map) |
| 20:12 | PT Index | Selects entry in PT |
| 11:0 | Page Offset | Byte offset within 4 KB page |