Fact-checked by Grok 2 weeks ago

x86 assembly language

x86 assembly language is a that provides a symbolic representation of the machine instructions executed by processors implementing the x86 (ISA), originally developed by for its 8086 and subsequently extended by both and AMD. It enables direct manipulation of resources such as registers, , and ports, making it essential for tasks requiring fine-grained control over system behavior, including operating system kernels, device drivers, and performance-optimized applications. The language supports multiple addressing modes, a rich set of arithmetic, logical, and control-flow instructions, and has evolved to include extensions like MMX, , and AVX for vector processing and multimedia operations. The x86 architecture traces its origins to the Intel 8086, a 16-bit microprocessor introduced in 1978 that laid the foundation for the personal computer revolution through its use in the IBM PC. Subsequent processors, such as the 80286 in 1982, introduced protected mode for enhanced memory management and multitasking capabilities, while the 80386 in 1985 extended the architecture to 32-bit operations with virtual memory support. The shift to 64-bit computing came in 2003 when AMD launched the Opteron processor family, introducing the AMD64 extension (also known as x86-64) that added 64-bit registers and addressing while maintaining backward compatibility with 32-bit and 16-bit code. Intel adopted this extension as Intel 64 (formerly EM64T) starting with its Nocona-based Xeon processors in 2004, solidifying x86-64 as the dominant mode for modern computing. Key features of x86 assembly include its segmented memory model in real mode, flat memory model in protected and long modes, and a variety of general-purpose registers (e.g., EAX, EBX in 32-bit; RAX, RBX in 64-bit) alongside specialized ones for floating-point (x87 FPU) and vector operations (XMM, YMM, ZMM). Processors operate in several modes—real mode for legacy 16-bit compatibility, protected mode for 32-bit multitasking with privilege levels, and long mode for 64-bit execution—allowing flexible transitions during boot and runtime. Assembly code can be written in Intel syntax, which is mnemonic-based and source-destination ordered (e.g., mov eax, ebx), as used in official Intel documentation, or AT&T syntax, common in Unix-like systems and GAS (e.g., movl %ebx, %eax), which prefixes operands with sizes and uses percent signs for registers. Despite its complexity due to backward compatibility and irregular instruction encodings, x86 assembly remains vital for embedded systems, reverse engineering, and high-performance computing where higher-level languages fall short.

Overview

History and Evolution

The x86 assembly language originated with the microprocessor, introduced in 1978 as a 16-bit complex instruction set computing (CISC) architecture designed to support advanced applications and serve as a template for future processors. Developed in just 18 months, the 8086 featured microcode implementation and became the foundation for the x86 family, powering the PC released in 1981, which used the closely related 8088 variant and established widespread software and hardware compatibility standards. This integration into the IBM PC ecosystem ensured the persistence of x86 despite the rise of reduced instruction set computing (RISC) alternatives, as backward compatibility drove industry adoption and locked in a vast software base. The architecture evolved significantly with the in 1982, which introduced to enable multitasking and , enhancing system reliability for emerging multi-user environments. This was followed by the Intel 80386 in 1985, marking the shift to 32-bit processing with support for and a flat memory model, allowing larger address spaces and improved efficiency for operating systems like Windows. The series, launched in 1993, advanced the design with superscalar execution for parallel instruction processing, dropping the "86" suffix while maintaining compatibility to sustain the PC market's growth. A pivotal extension occurred in 2003 with the introduction of 64-bit addressing via AMD's AMD64 architecture, which Intel adopted as Intel 64 in 2004, enabling larger memory capacities and enhanced performance for data-intensive applications without breaking legacy support. Key instruction set extensions further propelled x86's relevance: MMX in 1996 added multimedia acceleration to the Pentium MMX; SSE in 1999 with Pentium III introduced SIMD for vector operations; AVX in 2011 expanded vector widths to 256 bits for high-performance computing; and AVX-512 in 2016 provided 512-bit vectors optimized for AI and machine learning workloads. In 2023, Intel announced AVX10, a converged instruction set incorporating AVX-512 features to simplify implementation across processors. These developments maintained x86's dominance by balancing innovation with the enduring IBM PC compatibility legacy.

Key Characteristics and Usage

x86 assembly language is rooted in the Complex Instruction Set Computing (CISC) architecture, which supports a diverse array of instructions designed to perform complex operations in a single command, contrasting with the simpler, fixed-length instructions typical of Reduced Instruction Set Computing (RISC) designs. This CISC approach enables x86 instructions to vary in length from 1 to 15 bytes, allowing for flexible encoding that optimizes for both common and specialized tasks while maintaining high code density. A hallmark of the x86 architecture is its strong emphasis on , supporting execution in 16-bit, 32-bit, and 64-bit modes through mechanisms like in , which permits unmodified legacy applications to run alongside modern 64-bit software without requiring emulation. In practice, x86 assembly is primarily employed in domains demanding precise control and efficiency, such as development where it facilitates low-level system calls and handling, drivers for direct interaction, and embedded systems constrained by resource limitations. It also plays a key role in performance-critical applications like game engines, where optimized routines enhance rendering and physics simulations, and in compiler optimization through inline assembly embedded in higher-level languages like C/C++ to bypass generated code inefficiencies. Despite these strengths, x86 assembly presents challenges due to its inherent , including variable lengths and intricate addressing modes that can lead to programming errors and difficult . However, it offers significant advantages in code density, reducing program size compared to equivalent RISC implementations, and provides unparalleled direct , enabling fine-tuned access to CPU registers, memory, and peripherals for maximal performance. As of 2025, x86 remains the dominant architecture in desktops and servers, holding the majority powered by and processors. Its relevance persists in security research, where assembly-level uncovers vulnerabilities in low-level , and in just-in-time () compilers for engines like V8 and , which generate optimized x86 to accelerate web applications while posing novel attack surfaces studied in defenses against JIT spraying and exploits.

Syntax and Notation

Syntax Variants

x86 assembly language supports multiple syntax variants, each tailored to different assemblers and development environments, primarily differing in operand ordering, notation for registers and memory, and directive usage. The most prominent variants are Intel syntax, used by assemblers like Microsoft's MASM, and AT&T syntax, employed by the GNU Assembler (GAS). Intel syntax, as implemented in MASM, places the destination before the source (e.g., mov rax, rbx), aligning with the conventional reading of instructions from left to right. Registers are denoted without prefixes (e.g., rax), memory addresses use square brackets (e.g., [rcx + r10 * 2 + 100h]), and data sizes are specified via qualifiers like DWORD PTR when ambiguous (e.g., mov [eax](/page/EAX), DWORD PTR [ecx]). Directives include .data for initialized data sections and .code for executable code, with EQU for defining constants (e.g., myvar EQU 100). Comments begin with a (;). This syntax is prevalent in Windows development tools due to its integration with ecosystems. In contrast, syntax in GAS reverses the operand order, placing sources before destinations (e.g., movl %esi, %ebx), and requires explicit size suffixes on mnemonics (e.g., movl for 32-bit, movb for 8-bit). Registers are prefixed with % (e.g., %eax), immediates with $ (e.g., movb $10, %al), and memory operands use parentheses with an offset-base format (e.g., 4(%esp)). Directives such as .data and .text organize sections, and comments start with #. This variant originated from Unix systems and emphasizes explicitness to avoid ambiguity in operand types. Other assemblers introduce portable or specialized variants of Intel syntax. NASM employs a clean, portable Intel-like syntax with destination-first ordering (e.g., mov eax, ebx), mandatory square brackets for memory (e.g., [ebx + esi * 4 + 10]), and no register prefixes. It uses section .data and section .text for segments, EQU for constants (e.g., MAX EQU 100), and ; for comments. NASM's design prioritizes cross-platform compatibility and modularity. FASM adopts a flat-model-focused Intel syntax, also destination-first (e.g., mov eax, [ebx]), with square brackets for memory and size operators like dword (e.g., mov eax, dword [100h]). Equates use = (e.g., x = 1), sections are defined via section directives similar to NASM, and comments use ;. FASM emphasizes optimization and , supporting multiple passes for code size reduction without high-level MASM constructs like PROC. Converting between these variants presents challenges, such as reversing operand orders, adding/removing prefixes like % for registers in , adjusting memory notation from parentheses to brackets, and harmonizing directives (e.g., .data vs. section .data). Tools like syntax converters or manual rewriting are often required, as automated translation can introduce errors in complex addressing or macros.
assembly
; Example in Intel/MASM syntax
.data
msg db "Hello", 0
.code
mov rax, offset msg  ; Destination first, no % prefix
assembly
# Example in AT&T/GAS syntax
.data
msg: .ascii "Hello\0"
.text
movq $msg, %rax  ; Source first, % prefix, $ for immediate[](https://cs61.seas.harvard.edu/site/2018/Asm1/)
assembly
; Example in NASM syntax
section .data
msg db 'Hello', 0
section .text
mov rax, msg     ; Square brackets for memory if needed
assembly
; Example in FASM syntax
section .data
msg db 'Hello',0
section .code
mov rax, msg    ; = for equates, flat model

Mnemonics and Opcodes

In x86 assembly language, mnemonics serve as human-readable symbolic representations of instructions, such as MOV for data movement or ADD for , which directly correspond to specific opcodes executed by the . These opcodes are fixed values that define the operation, with examples including 0x89 for MOV from to or 0x01 for ADD from to . The mapping ensures that assemblers translate mnemonic-based source code into the processor's native format, maintaining compatibility across Intel 64 and architectures. x86 instructions employ a variable-length encoding scheme, typically ranging from 1 to 15 bytes, comprising optional prefixes, one or more bytes, a byte (if required for specification), an optional Scale-Index-Base (SIB) byte, fields, and immediate data. The byte, an 8-bit field, encodes addressing modes and selection using three subfields: (2 bits for mode), Reg/Opcode (3 bits for register or extension), and R/M (3 bits for register or memory base). This flexible structure allows efficient encoding of diverse types, from register-to-register operations to complex memory accesses. Opcode organization relies on hierarchical tables: primary opcodes use a single byte (e.g., 0x00 to 0xFF for basic operations like ADD), while secondary opcodes extend via a two-byte escape prefix such as 0x0F (e.g., 0F 01 for system instructions). Further extensions include three-byte formats like 0F 38 or 0F 3A for advanced instructions (e.g., 0F 38 01 for packed horizontal addition). Modern extensions differentiate encodings from enhanced ones; for instance, the prefix (0x40 to 0x4F) in 64-bit mode extends operand sizes, adds high registers (R8-R15), and enables RIP-relative addressing. Similarly, the (2- or 3-byte forms starting with 0xC4 or 0xC5) supports AVX instructions by embedding prefixes and specifying vector lengths. Prefixes modify instruction behavior and contribute to variable length: the LOCK prefix (0xF0) ensures atomic operations on for , while REP (0xF3) or REPNE (0xF2) repeats operations until a condition is met. These elements allow instructions to adapt dynamically, such as a simple MOV r32, imm32 expanding to 5 bytes with B8 plus the immediate value. Vendor-specific extensions introduce additional opcode spaces; AMD's 3DNow! uses a secondary of 0x0F 0x0F followed by a byte and an 8-bit immediate (imm8) to SIMD floating-point operations, such as 0F 0F /r 9E for packed floating-point (PFADD). This format reserves the imm8 for up to 256 unique operations, distinguishing it from Intel's /AVX paths, though now recommends migrating to standard extensions for broader compatibility. Disassembly tools like from the GNU Binutils suite reverse this process, displaying both opcodes and corresponding mnemonics from object files or executables, as in objdump -d binary outputting lines like 89 c3: mov %eax,%ebx alongside the raw bytes. This aids in verifying encodings and low-level code.

Reserved Words and Directives

In x86 assembly language, reserved words encompass identifiers that the assembler treats as fixed and cannot be redefined by the programmer, including names and certain symbols, to prevent conflicts with the processor's . These reservations ensure consistent interpretation during assembly, as redefining them can lead to errors or unexpected behavior, such as failed compilation when attempting to use a register name as a . Register names like , , and their variants (e.g., , , AX for 8-bit and 16-bit portions) are prime examples of reserved words across assemblers, as they directly map to registers and cannot be reassigned without triggering errors. In (MASM), the full list includes , EBX, ECX, EDX, ESI, EDI, EBP, , and segment registers like , DS, SS, , FS, GS, all of which are protected under all CPU modes to maintain compatibility. Similarly, in the (NASM), registers such as RAX (in 64-bit mode) and their low/high byte variants are reserved, with legacy high bytes like inaccessible in certain 64-bit contexts via the prefix. Misuse, such as redefining as a , results in immediate failure, emphasizing the need for programmers to avoid keyword conflicts. Directives, also known as assembler pseudo-instructions, are non-executable commands that guide the assembly process, such as defining , managing layout, or structuring code, and they vary slightly between assemblers like MASM and NASM. For data definition, common directives include (define byte), (define word, 2 bytes), and (define doubleword, 4 bytes), which allocate and initialize with specified values; for example, DB 42 reserves one byte with the value 42, while DD 0x12345678 reserves four bytes for a 32-bit . These are universal in x86 assemblers and essential for embedding constants or arrays without runtime overhead. Segment and layout directives control how and are organized in the output . In MASM, (or ) defines a memory , such as .DATA [SEGMENT](/page/Segment) to group variables, and ASSUME specifies register-segment associations, like ASSUME [DS](/page/DS):[DATA](/page/Data), to inform the assembler of addressing assumptions for optimization. NASM uses (or ) similarly to switch between sections like .text for or .bss for uninitialized , with setting the absolute origin address in flat binary outputs, e.g., ORG 0x1000 to start at a specific . The INCLUDE directive, supported in both, incorporates external source files, e.g., INCLUDE "macros.inc", to modularize assembly. Improper use, such as mismatched ASSUME declarations, can cause linker errors or incorrect memory references during execution. Program structure directives mark the boundaries of code units. In MASM, PROC declares a , e.g., main PROC, paired with ENDP to close it, enabling with local labels, while END signals the program's termination and optionally specifies an like END main. NASM lacks native PROC/ENDP but uses %define for definitions, e.g., %define MAX 100, which acts as a text substitution for constants or simple macros without procedure semantics. These directives ensure proper scoping; for instance, omitting ENDP in MASM leads to unresolved symbol errors at assembly time. Assembler-specific variations, such as MASM's DUP for repeating data definitions (e.g., array DW 10 DUP(0)), highlight the need to consult variant-specific documentation to avoid portability issues.

Processor Architecture

Registers

The x86 architecture features a diverse set of registers that form the core of its , enabling efficient data manipulation, memory addressing, and of processor state across various operating modes. These registers have evolved from the original 16-bit design of the to support 32-bit and 64-bit extensions, with additional specialized registers introduced through SIMD and other enhancements. The general-purpose, segment, , and debug registers provide the foundational hardware for programming, while the captures execution status for conditional operations. The x87 (FPU) includes eight 80-bit floating-point registers organized as a stack (ST0 through ST7), along with (FCW), status (FSW), tag (FTW), instruction pointer (FIP), data pointer (FDP), and opcode (FOp) registers for managing floating-point operations and exceptions. General-purpose registers (GPRs) serve as the primary storage for operands, addresses, and computation results in x86 assembly. In the original 16-bit IA-32 architecture, there are eight 16-bit GPRs: AX, BX, CX, DX, SI, DI, BP, and SP, each of which can be accessed via 8-bit sub-registers for the high and low bytes (e.g., AH and AL for AX). These were extended to 32-bit registers in the 80386 processor (EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP), allowing larger data handling while maintaining backward compatibility through the lower 16- and 8-bit portions. In 64-bit mode (Intel 64), these expand to 64-bit registers (RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP) plus eight additional ones (R8 through R15), requiring the REX prefix for access to the new registers and full 64-bit widths; all GPRs support byte-level subregister access (e.g., AL, R8B), though the REX prefix is required for certain subregisters like SPL, BPL, SIL, DIL and the new registers R8B–R15B. The ESP/RSP register specifically functions as the stack pointer, while EBP/RBP acts as the base pointer for stack frames.
Register Group16-bit32-bit64-bitKey Roles
AccumulatorAXRAXArithmetic, I/O operations
BaseBXEBXBase addressing, data storage
CounterCXECXRCXLoop counters, shifts
DataDXEDXRDXI/O port addressing, multiplication/division
Source IndexSIESIRSIString source addressing
Destination IndexDIEDIRDIString destination addressing
Base PointerBPEBPRBPStack frame base
Stack PointerSPESPRSPStack top management
Additional (64-bit only)--R8–R15General data and addressing
Segment registers manage in real and protected modes, defining the base addresses and attributes for different memory regions. There are six 16-bit segment registers: (code segment), (data segment), (extra segment), FS and GS (general-purpose segments, often used for in modern systems), and (stack segment). These registers hold selectors that index into the (GDT) or Local Descriptor Table (LDT) to compute segment bases, limits, and access rights; in 64-bit mode, segmentation is largely flat, with , , , and becoming non-segmented and FS/GS retaining base functionality via model-specific registers. The instruction pointer (EIP in 32-bit mode, in 64-bit mode) holds the address of the next instruction to execute, facilitating sequential and branched program flow. The , EFLAGS in 32-bit mode and RFLAGS in 64-bit mode (with upper 32 bits reserved), is a 32-bit (or 64-bit) that stores processor status and control information. Key status flags include the (ZF, bit 6) set when the result of an operation is zero, the (CF, bit 0) indicating carry or borrow in arithmetic, and the (OF, bit 11) detecting signed arithmetic overflow; these flags influence conditional jumps and other control-flow instructions. Additional bits manage interrupts (IF, bit 9), direction for string operations (DF, bit 10), and other modes. Control registers oversee operating modes, memory management, and extensions. CR0 (32-bit) controls basic features like protected mode enablement (PE bit 0) and numeric error handling; CR3 holds the physical base address of the page directory for virtual memory paging; CR4 extends controls for features like SIMD exception handling (OSXMMEXPT bit). Debug registers (DR0–DR3 for 32/64-bit breakpoint addresses, DR6 for status, DR7 for control) support hardware breakpoints and watchpoints for debugging. The x86 register set has evolved with SIMD extensions to support vector processing. The MMX extension (1997) introduced eight 64-bit MMX registers (MM0–MM7) aliased to the FPU stack for packed integers. SSE (1999) added 128-bit XMM registers (XMM0–XMM7, extended to 16 in 64-bit mode), while AVX (2011) introduced 256-bit YMM registers (YMM0–YMM7 in 32-bit mode, YMM0–YMM15 in 64-bit mode) and AVX-512 (2016) added 512-bit ZMM registers (ZMM0–ZMM7 in 32-bit mode, ZMM0–ZMM31 in 64-bit mode), enabling wider parallel operations on floating-point and integer data across multiple lanes. These extensions significantly expand the register file for high-performance computing without altering the core GPRs.

Memory Addressing

In x86 assembly language, memory addressing modes determine how operands are specified for instructions, allowing access to registers, immediate values, or locations. These modes provide flexibility in forming effective addresses, which are computed as offsets within or linear addresses in flat models. The primary modes include immediate, register, direct, register indirect, and more complex forms combining base registers, indices, scales, and displacements. Immediate addressing embeds a constant value directly in the instruction, used for operations like loading a literal into a . For example, mov eax, 5 places the value 5 into the register without referencing . Register addressing operates solely on registers, such as mov eax, ebx, which copies the contents of EBX to . These modes are efficient as they avoid memory access. Direct addressing specifies a fixed in the instruction, as in mov eax, [100h], where the contents at address 100h are loaded into . Register indirect addressing uses a to hold the , for instance mov eax, [ebx], dereferencing the value in EBX as the . Unlike some architectures like , x86 does not support automatic pre- or post-increment in these indirect modes; increments require separate instructions such as . The most versatile mode is the base-plus-index-plus-scale-plus-displacement form, which computes the effective address as base register + (index register × scale) + displacement. Here, the base and index are general-purpose registers (e.g., EBX and ESI), the scale is 1, 2, 4, or 8 for array access, and the displacement is an optional constant. An example is mov eax, [ebx + esi*4 + 10h], useful for traversing data structures like arrays. In 64-bit mode, this mode supports 64-bit registers but limits displacements to 32 bits, sign-extended during calculation. RIP-relative addressing, available only in 64-bit mode, forms addresses relative to the instruction pointer (RIP) plus a 32-bit signed displacement, enabling position-independent code without absolute addresses. For example, mov eax, [rip + offset] loads from a location offset from the current instruction. This mode enhances portability in shared libraries. When operand sizes are ambiguous, especially for memory references, explicit size specifiers disambiguate the instruction. Directives like BYTE PTR for 8-bit, WORD PTR for 16-bit, or DWORD PTR for 32-bit ensure correct interpretation, as in mov byte ptr [esi], 5. Failure to specify can lead to assembler errors or unintended sizes. x86 supports both flat and segmented addressing models. In the flat model, prevalent in 64-bit mode, addresses are linear without segment bases (defaults to zero), simplifying access to a continuous . Segmented addressing, used in IA-32 real or protected modes, combines segment selectors with offsets but is detailed separately; the addressing modes here form the offset component in both cases.

Segmented Memory Model

The segmented memory model in divides the memory address space into variable-sized segments to facilitate addressing beyond the limitations of early processors. In , which is the default execution mode upon processor reset and emulates the 8086 environment, memory addressing employs a 20-bit space calculated using a segment:offset pair. The segment register, such as for code or for data, holds a 16-bit value that is shifted left by 4 bits (multiplied by 16) and added to a 16-bit offset to yield the effective address, allowing access to up to 1 MB of while each is limited to 64 KB. For instance, the instruction pointer combined with the forms the as * 16 + . In , introduced with the and expanded in subsequent processors, the segmented model evolves to support , larger address spaces, and multitasking through descriptor tables. The (GDT) provides system-wide segment definitions, while the Local Descriptor Table (LDT) allows task-specific segments, both loaded into memory and referenced by the GDTR and LDTR registers, respectively. Each is an 8-byte structure containing a base address (up to 4 GB in 32-bit mode), a limit defining the segment size (expandable via granularity bits to 4 GB), and access rights including privilege levels (0-3 for ring protection), type (code, data, stack), and attributes like readability or writability. Segment registers in protected mode hold 16-bit selectors that index into the GDT or LDT to retrieve the corresponding descriptor, enabling dynamic segment relocation and protection checks. A selector comprises a 13-bit index, a 1-bit Table Indicator (TI) to distinguish GDT (TI=0) from LDT (TI=1), and a 2-bit Requestor Privilege Level (RPL) for access validation against the descriptor's privilege. Upon loading a selector, the processor uses the descriptor's base and limit to compute the linear address as base + offset, with violations triggering exceptions like general-protection (#GP) for out-of-limit accesses or privilege mismatches. In 32-bit and 64-bit modes, typically adopt a flat memory model that minimizes segmentation's complexity by using a single, continuous spanning 0 to 4 in 32-bit or 0 to 2^64 bytes in 64-bit . This is achieved by configuring segment descriptors with a base address of 0 and a limit of 4 (or unlimited in 64-bit), effectively ignoring segmentation for most operations while retaining the mechanism for compatibility. Exceptions include the FS and GS segments, which can have non-zero bases to support (TLS) and other OS-specific uses without altering the flat addressing for code, data, and stack. The segmented model's legacy from introduces challenges, such as wraparound behavior where offsets exceeding 64 KB modulo back to 0, potentially causing unintended overlaps between segments and complicating legacy code porting. These issues persist for with 8086 software, requiring careful handling in emulators or mode transitions to avoid faults like invalid memory accesses.

Operating Modes

Real Mode

Real mode, also known as real-address mode, is the default operating mode for x86 processors upon or boot, providing with the original architecture. In this environment, the processor uses a segmented model with 16-bit registers and 16-bit offsets to form 20-bit es, limiting the addressable space to 1 MB (from 0x00000 to 0xFFFFF). The is calculated by shifting the 16-bit value left by 4 bits (multiplying by 16) and adding the 16-bit offset, with no mechanisms in place, allowing unrestricted access to the full at privilege level 0. Interrupt handling in real mode relies on the (IVT), a fixed structure located at 0000:0000 (the first 1 KB of memory), containing 256 four-byte entries that point to interrupt service routines. This setup enables direct invocation of and services through software interrupts, as seen in traditional programming where applications interact with hardware via standardized interrupt vectors such as INT 21h for DOS functions and for disk operations. Real mode imposes several key limitations suited to early 16-bit systems. It supports no native multitasking, as there are no privilege rings or task switching mechanisms, and all code executes with equal access to and ports. Segments are capped at 64 KB in size and must align on 16-byte boundaries, restricting code and data blocks while permitting direct I/O operations without mediation, which facilitates low-level control but risks system instability. In contemporary systems, persists primarily for compatibility in bootloaders, such as the initial stage of on x86 platforms, where it loads the core image and modules before transitioning to higher modes. It also remains relevant for embedded applications running under legacy environments, enabling direct hardware manipulation in resource-constrained settings like industrial controllers or vintage software . To exit real mode and enter , software must first initialize a (GDT) and then execute the LMSW (Load Machine Status Word) instruction to set the Protection Enable (PE) bit in the CR0 register, enabling and expanded addressing.

Protected Mode

is a 32-bit operational mode of the x86 architecture introduced with the 80386 processor, enabling advanced , protection mechanisms, and support for multitasking. It is activated from by setting the Protection Enable (PE) bit (bit 0) in the CR0 using a MOV CR0 instruction, followed by a far jump or intersegment return to load a selector from the (GDT). The GDT, loaded into the GDTR register via the LGDT instruction, contains segment descriptors that define up to 4 GB of linear through base addresses, limits (up to 4 GB per segment with granularity extensions), and access rights. This segmentation allows logical addresses (segment selector + offset) to be translated into linear addresses, providing a foundation for protected execution. A key feature of protected mode is its hierarchical protection rings, which enforce privilege levels to isolate code execution and prevent unauthorized access to system resources. There are four rings (0 to 3), with Ring 0 designated for the most privileged kernel-mode code and Ring 3 for least-privileged user-mode applications. The Current Privilege Level (CPL), encoded in bits 0-1 of the CS and SS segment registers, determines the executing ring, while the Descriptor Privilege Level (DPL) in segment descriptors and the Requested Privilege Level (RPL) in selectors govern access checks. Privilege transitions, such as from Ring 3 to Ring 0, are controlled through mechanisms like call gates, interrupt gates, and task gates, which validate levels before allowing sensitive operations like system calls. Virtual memory in protected mode is implemented via paging, which maps linear addresses to physical addresses for abstraction and isolation. Paging is enabled by setting the PG bit (bit 31) in CR0, after which the CR3 register points to the base of a page directory containing 1024 entries, each referencing a page table with another 1024 entries for 4 KB pages. A linear address is divided into three parts: a directory index (bits 31-22), a table index (bits 21-12), and a page offset (bits 11-0), enabling up to 4 GB of virtual address space per process. The Translation Lookaside Buffer (TLB), a hardware cache, stores recent address translations to accelerate paging operations and reduce latency. Multitasking support in relies on the Task State Segment (TSS) for context switching between tasks and the (IDT) for handling interrupts. The TSS, described in the GDT or LDT as a system segment, stores the complete state of a task, including general-purpose registers, segment registers, and stack pointers for each privilege level (0-2), and is loaded into the task register via the LTR instruction. Task switches occur via the CALL, JMP, IRET, or exception/ mechanisms, saving the current task state to its TSS and loading the new one. The IDT, loaded via LIDT into the IDTR register, contains up to 256 interrupt vectors, each as a gate descriptor (task, , or gate) that directs control to handlers, often in Ring 0, with privilege checks enforced. In practice, operating systems such as Windows and utilize with a flat memory model, where segment registers are set to cover the entire 4 linear (base 0, limit 4 ), minimizing segmentation overhead while relying on paging for and management. This approach allows each process to have its own page directory for isolated virtual s, enabling secure multitasking without complex segment usage.

Long Mode

Long Mode, also known as 64-bit mode within the x86-64 architecture, represents the core extension introduced by AMD to enable full 64-bit processing on x86 processors, first implemented in the AMD Opteron in 2003. This mode expands the address space to 64-bit virtual addresses, though current implementations use 48-bit effective addressing with higher bits sign-extended for canonical form, allowing access to up to 256 terabytes of virtual memory per process. General-purpose registers are widened to 64 bits (e.g., RAX, RBX), and eight additional 64-bit registers (R8 through R15) are provided to support more efficient 64-bit computation without legacy 32-bit constraints. RIP-relative addressing further enhances this mode by permitting memory operands to be offset from the instruction pointer (RIP), facilitating position-independent code commonly used in modern shared libraries and executables. Long Mode operates in two sub-modes to balance new capabilities with legacy support: 64-bit mode for native execution of 64-bit instructions and applications, and , which allows unmodified 32-bit and 16-bit protected-mode code to run under a 64-bit operating system by emulating the protected-mode (e.g., default size of 32 bits or 16 bits). addressing enforces validity by requiring all virtual addresses to lie within the signed range from -247 to 247 - 1, where bits 63 through 48 must replicate the sign of bit 47; non-canonical addresses trigger general-protection faults to prevent invalid memory access. In 64-bit mode, the segmented memory model is simplified to a flat , with most segment registers (, , , ) ignored and treated as having base address 0 and limit 264 - 1, eliminating the need for segment descriptors in user code. Exceptions are the and segments, which remain functional for and can specify 64-bit base addresses loaded via model-specific registers such as FS_BASE (MSR C000_0100h) and GS_BASE (MSR C000_0101h). Paging is required for all operations in and mandates the use of Extensions (PAE), employing four-level page tables to map 48-bit virtual addresses or optional five-level page tables (supported since 2017 in processors and widely adopted by 2025) to map 57-bit virtual addresses, to up to 52-bit physical addresses, with support for 4 , 2 , and 1 GB page sizes. Adoption of accelerated with major operating systems: the introduced x86-64 support in version 2.6.0, released on December 17, 2003, enabling widespread use in distributions by 2004. followed with , released on April 25, 2005, marking the first consumer x86-64 version of Windows and building on earlier server editions from 2003.

Mode Transitions

Mode transitions in x86 assembly language involve precise sequences of instructions to switch between operating modes, ensuring compatibility with the processor's state and avoiding exceptions. These transitions are critical for bootloaders and operating system kernels, as they enable access to advanced features like protected memory and 64-bit addressing while maintaining backward compatibility. The process typically requires configuring control registers, loading descriptor tables, and executing jumps to update the processor's execution environment. The transition from real mode to protected mode begins with enabling the A20 address line to access memory above 1 MB, followed by loading the Global Descriptor Table (GDT) using the LGDT instruction to specify its base address and limit. Interrupts are disabled (CLI) to prevent interference, and the protection enable (PE) bit in CR0 is set to 1 via MOV CR0, eax (with the appropriate value in EAX). A far jump (JMP FAR) or intersegment return (IRET) is then executed to load a valid 32-bit code segment selector into CS from the GDT, flushing the instruction prefetch queue and switching the processor to protected mode. Finally, other segment registers (DS, SS, ES, FS, GS) are loaded with appropriate selectors, and the Interrupt Descriptor Table (IDT) is loaded using LIDT. This sequence allows the use of segmented memory and privilege levels. Switching from protected mode to long mode (IA-32e mode) requires first enabling (PAE) by setting the PAE bit in CR4 to 1. The CR3 register is loaded with the of the Page Directory Pointer Table (PDPT), which contains pointers to page directories for 64-bit paging. The long mode enable (LME) bit in the Extended Feature Enable Register (EFER) is set to 1 using a write. Paging is then enabled by setting the PG bit in CR0 to 1, and a far jump is performed to a 64-bit selector (with the L bit set in the GDT descriptor) to enter 64-bit submode. These steps establish four-level paging and RIP-relative addressing. Transitioning from 64-bit mode to 32-bit within occurs by loading a descriptor with the L bit cleared (indicating 32-bit operation) via a far return (RETF) or return (IRET) , using a selector from the GDT or LDT that points to a compatibility-mode . Alternatively, the SYSCALL can invoke a 32-bit handler if configured. This allows legacy 32-bit code to execute without leaving , preserving the paging and segment structures. Invalid mode transitions can trigger exceptions, such as a general-protection fault (#GP) from malformed GDT entries or a (#PF) from invalid paging setups, potentially escalating to a double fault (#DF) if the handler fails. A results when the double-fault handler itself causes an exception (e.g., due to an invalid entry or ), leading to a shutdown and reset with no software recovery possible. In real-mode transitions, failing to enable the risks address wraparound, corrupting data access above 1 MB. Initial mode handling is managed by : traditional initializes the processor in , loading the at 0x7C00 and requiring intervention for transitions. firmware, in contrast, operates in from the start on systems, providing a PE/COFF loader for applications and handling initial paging and descriptor setup before transferring control.

Instruction Set

Data Movement Instructions

Data movement instructions in x86 assembly language facilitate the transfer of data between registers, locations, and immediate values, forming the foundation for data manipulation without performing arithmetic or logical operations. These instructions support various sizes, including bytes, words ( bits), doublewords ( bits), and quadwords ( bits) in 64-bit mode, and adhere to the processor's addressing modes for efficient . They are essential for initializing variables, passing parameters, and managing data flow in programs, with operations typically not affecting the processor's flags unless specified otherwise. The instruction performs a general-purpose transfer, copying the contents of the source to the destination while leaving the source unchanged. It supports transfers between registers (e.g., , EBX), from memory to registers or vice versa (e.g., , [EBX]), and from immediate values to registers or memory (e.g., , 42), but does not allow immediate-to-immediate or segment register as a source in register-to-segment transfers. In 64-bit mode, operates on 64-bit registers like RAX, and it does not affect any flags. For example, the assembly code MOV ECX, [[EAX](/page/EAX) + 4] loads a 32-bit value from the + 4 into ECX, leveraging scaled-index addressing modes. ensures during transfers and can be prefixed with LOCK for atomicity in multiprocessor environments when accessing memory. PUSH and POP instructions handle stack-based data movement, automatically adjusting the stack pointer ( in 32-bit mode or RSP in 64-bit mode) to push or pop values onto or from the . PUSH decrements the stack pointer by the operand size (e.g., 8 bytes for quadwords in 64-bit mode) and stores the source (register, , or immediate) at the new top of the , as in PUSH EAX, which saves the value of before a subroutine call. Conversely, POP loads the value from the top of the into the destination (register or ) and increments the stack pointer, restoring the saved value with POP EAX after the subroutine returns. These instructions do not affect flags and are crucial for calls, local variable allocation, and handling, with PUSH supporting immediate values up to 32 bits even in 64-bit mode. In stack overflow scenarios, they rely on the operating system's stack limits for protection. The XCHG instruction exchanges the contents of two operands atomically, swapping a with another or with a , which is particularly useful for implementing locks in multithreaded applications. For instance, XCHG [EAX](/page/EAX), EBX interchanges the values in and EBX, while XCHG [EAX](/page/EAX), [MEM] swaps with the at MEM. It supports byte, word, doubleword, or quadword sizes, with the LOCK prefix ensuring operation on operands in multiprocessor systems by preventing other processors from reading or writing the during the . XCHG does not affect flags and requires at least one to be a , making it efficient for operations without additional primitives. In 64-bit mode, it operates on 64-bit s like RAX. LEA (Load Effective Address) computes the effective address of a and stores it in a without accessing the itself, enabling efficient such as scaling and indexing. An example is LEA EAX, [EBX + 4*ECX], which calculates the EBX + 4*ECX and loads it into , useful for pointer manipulation or array indexing. It supports all addressing modes, including , , , and , but treats the as an expression rather than dereferencing it. LEA does not affect flags and is available in 32-bit and 64-bit modes, where it can produce 64-bit addresses in s like RAX. This optimizes code by combining multiple ADD operations into a single , though it cannot load s. String movement instructions like MOVS and LODS enable efficient block transfers of data using dedicated index registers (ESI/RSI for source and EDI/RDI for destination in 64-bit mode), with the direction determined by the DF (Direction Flag) in the EFLAGS register. MOVS copies a byte, word, doubleword, or quadword from the source string (at [RSI]) to the destination string (at [RDI]), then auto-increments or decrements the pointers based on DF (forward if DF=0, backward if DF=1), as in MOVS DWORD PTR [EDI], DWORD PTR [ESI]. The REP prefix repeats the operation ECX/RCX times, decrementing the counter until zero, making it ideal for memcpy-like operations on large buffers. Similarly, LODS loads a string element from [RSI] into AL/AX/EAX/RAX and updates RSI, with REP LODS loading sequential elements into the accumulator for processing. These instructions support byte-level alignment and can be combined with segment overrides, but require explicit size prefixes (e.g., BYTE PTR) for clarity; in 64-bit mode, they handle up to quadwords with 64-bit indices. They do not affect arithmetic flags, focusing purely on data relocation.

Arithmetic and Logic Instructions

The arithmetic and logic instructions in x86 assembly language form the core of computations performed by the (ALU), operating on , , or immediate values while updating status flags in the EFLAGS register to indicate results such as zero, sign, carry, and overflow. These instructions support both unsigned and signed operations, with flag updates enabling conditional branching for error handling and flow control. Unlike data movement instructions, which merely transfer values, and logic operations modify operands to produce new results, often with multi-byte handling for . Addition instructions include ADD, which adds the source operand to the destination operand and stores the result in the destination, setting the carry flag (CF) if there is a carry out of the most significant bit and the overflow flag (OF) for signed overflow. The ADC variant extends this by adding the carry flag from a previous operation, facilitating multi-precision arithmetic; for example, in 32-bit mode, ADC EAX, EBX adds EBX and CF to EAX, updating flags including auxiliary carry (AF) for BCD arithmetic. Both instructions affect parity (PF), sign (SF), and zero (ZF) flags based on the result, with operands sized from 8 to 64 bits depending on mode and prefixes. Subtraction mirrors addition with , subtracting the source from the destination and storing the result in the destination, setting CF for borrow and OF for signed underflow. The SBB form subtracts the source and CF (as borrow) from the destination, essential for chained subtractions; for instance, SBB EAX, EBX computes EAX - EBX - CF, preserving flags for subsequent operations in multi-word subtraction. These instructions clear no flags inherently but set them according to the arithmetic outcome, supporting atomic operations via the LOCK prefix in . Multiplication instructions handle unsigned and signed integers using the accumulator registers. MUL performs unsigned multiplication: for byte operands, it multiplies by the source and stores the 16-bit result in AX; for word, AX by source into :AX; and for doubleword, by source into :, setting and OF if the high half is nonzero. The signed counterpart IMUL supports one, two, or three operands—for two-operand form, it multiplies source by destination (e.g., register or ) and stores in destination, or for one-operand, accumulator by source into accumulator pair—setting and OF if the result does not fit in the destination (i.e., high bits are not sign-extended). In 64-bit mode, .W extends to RAX and RDX:RAX. Division instructions divide the accumulator by the source, producing and without affecting most flags. DIV is unsigned: for byte, AX divided by source yields in AL and in AH; for word, DX:AX by source into AX () and DX (); doubleword uses EDX:EAX similarly, raising a divide-error exception (#DE) on or . Signed division via IDIV follows the same register conventions but uses two's-complement arithmetic, also triggering #DE on invalid results like or out-of-range . These are slower than due to iterative algorithms in early implementations, though modern processors optimize them. Shift instructions manipulate bit positions for , , or . SHL (or synonym SAL) shifts the destination left by a count in or immediate (1-31 bits), filling with zeros and setting to the last shifted-out bit; for single-bit shifts, OF indicates sign-bit change. SHR shifts right logically, filling the high bit with zero and setting to the shifted-out bit, with OF cleared for multi-bit or set based on sign change for one bit. Arithmetic right shift SAR preserves the when filling, ideal for signed division by powers of two, clearing OF and setting similarly. Rotate variants ROL and ROR shift bits circularly without loss, moving the overflow bit into ; for example, ROL EAX, 1 rotates left, with receiving the original MSB. All affect SF, ZF, and , but undefined AF, and counts operand size to avoid excess shifts. Logical instructions perform bitwise operations, typically clearing CF and OF while setting other flags per result. AND computes the bitwise AND of source and destination, storing in destination and setting ZF if zero; it masks bits, useful for clearing flags or testing. OR performs bitwise OR, setting bits where either operand has a 1, and XOR exclusive-OR toggles differing bits—XOR EAX, EAX clears EAX to zero. NOT inverts all bits in the destination without flag changes, serving as a unary complement. TEST ANDs source and destination but discards the result, solely updating flags for conditional checks, such as TEST EAX, 1 to probe the least significant bit. These operate on any operand size and support memory access. Overflow handling relies on the OF flag, set by signed arithmetic instructions like ADD, , when the result's sign differs from expected (e.g., positive + positive yielding negative). The instruction jumps if OF is 1, branching to an overflow handler, while JNO jumps if OF is 0 to continue normal execution; both use relative offsets (short or near) without modifying flags. For example, following ADD EAX, EBX, JO overflow_label detects signed overflow, ensuring program robustness in computations.
assembly
; Example: Multi-precision addition with overflow check
ADD EAX, EBX      ; Add low words, set flags
ADC EDX, ECX      ; Add high words + carry
JO overflow_handler ; Jump if signed overflow

Control Flow Instructions

Control flow instructions in x86 assembly language enable dynamic alteration of program execution by transferring control to different addresses, either unconditionally or based on processor flags set by prior arithmetic or logic operations. These instructions are essential for implementing conditional logic, procedure calls, loops, and interrupt handling in both IA-32 and Intel 64 architectures. They operate by modifying the instruction pointer (IP, EIP, or RIP) and, in some cases, the code segment register (CS), supporting both near transfers (within the same code segment) and far transfers (across segments in non-flat memory models like real or protected mode).

Unconditional Transfers

Unconditional jumps, calls, and returns provide direct control flow changes without testing conditions. The JMP instruction transfers execution to a specified target address, either near (updating only IP/EIP/RIP) or far (also loading a new CS value in segmented modes). Near JMP supports immediate, register, or memory operands, while far JMP uses a pointer operand for segment:offset addressing. Neither variant affects flags. For example:
JMP rel32    ; Relative jump by 32-bit signed displacement
JMP FAR ptr16:32  ; Far jump to segment:offset
The CALL instruction invokes a subroutine by pushing the return (current EIP/RIP for near calls, or CS:EIP/RIP for far calls) onto the and jumping to the target, enabling modular code structure; far CALLs are features in 64-bit . RET reverses this by popping the return from the to resume execution, with an optional immediate to adjust the stack pointer for cleanup. Like JMP, CALL and RET do not modify flags and support both near and far variants. Example:
CALL near_proc  ; Near call, pushes EIP/RIP
RET 8           ; Near return, pops EIP/RIP and adds 8 to RSP
These instructions are available in all operating modes, including real, protected, and 64-bit modes.

Conditional Branches

Conditional jump instructions (Jcc) branch to a target only if a specific flag condition is met, facilitating constructs and . They use relative displacements (8-, 16-, or 32-bit signed) and do not alter flags themselves. Common variants include JZ (jump if ZF=1, after operations like CMP yielding equality) and JNZ (ZF=0, for inequality); JC ( CF=1, e.g., after unsigned ) and JNC (CF=0); as well as signed comparisons like JG (greater: ZF=0 and SF=OF for no overflow in signed ) and JL (less: SF≠OF). For instance:
CMP [EAX](/page/EAX), EBX     ; Sets flags based on EAX - EBX
JG positive      ; Jump if EAX > EBX (signed)
JNZ not_equal    ; Jump if EAX != EBX
These branches support short (rel8), near (rel16/rel32), and in 64-bit mode, RIP-relative addressing, operating in all modes but with far jumps limited to submodes. They test flags generated by arithmetic/logic instructions, such as ADD, SUB, or CMP.

Loops

Loop instructions simplify repetitive execution by combining counter decrement with conditional jumps. The LOOP instruction decrements the ECX (32-bit) or RCX (64-bit) register and jumps to a label if the counter is non-zero, providing a basic counted loop without flag involvement. It uses a relative 8-bit displacement and is supported in IA-32 and Intel 64 modes. Example:
MOV ECX, 10      ; Set loop count
loop_start:
  ; Loop body
LOOP loop_start  ; Decrement ECX, jump if !=0
REP (repeat) prefixes enhance string operations (like MOVS or CMPS) for iteration, repeating the instruction ECX/RCX times until the counter reaches zero. Variants include REPE/REPZ (repeat while equal: ZF=1, stops on mismatch or zero count) and REPNE/REPNZ (repeat while not equal: ZF=0, stops on match or zero count), useful for memory scans or copies. These do not affect flags directly but inherit effects from the repeated instruction. For example:
REP MOVSB        ; Copy ECX bytes from [ESI] to [EDI]
REPE CMPSB       ; Compare bytes until mismatch or ECX=0
LOOP and REP family instructions are available across all x86 modes, with 64-bit extensions using RCX and RFLAGS.

Interrupts

Interrupt instructions handle software-generated exceptions and returns from handlers. INT n causes a software interrupt by pushing the current flags, CS, and EIP/RIP onto the , clearing the (IF), and jumping to the vector at interrupt number n (0-255), which indexes the . It supports immediate 8-bit n and operates in all modes, though vector handling differs (e.g., IDT in ). Example:
INT 21h          ; DOS interrupt (legacy)
IRET (interrupt return) restores execution by popping EIP/RIP, CS, and flags from the stack, reinstating the prior state including IF; a 64-bit variant IRETQ uses RIP and RFLAGS. Unlike RET, IRET handles privilege-level changes in protected mode. These instructions are fundamental for system calls and exception handling in x86 architectures. Far control transfers, such as far JMP, CALL, RET, and IRET, involve segment register updates ( loading) in non-flat modes like or segmented , enabling inter-segment jumps without flat memory assumptions. In 64-bit , far variants are restricted to for legacy support.

Stack Instructions

The in x86 serves as a last-in, first-out (LIFO) primarily used for temporary storage during procedure calls, local variable allocation, and parameter passing. Stack instructions manage this structure by manipulating the stack pointer (SP or ESP/RSP depending on mode) and facilitating stack frame creation for function prologs and epilogs. These operations ensure efficient without direct address calculations, leveraging the hardware-supported stack (). The PUSH instruction decrements the stack pointer by the size of the operand (2, 4, or 8 bytes in 16-, 32-, or 64-bit modes, respectively) and stores the source operand at the new top of the stack. For example, in 32-bit mode, PUSH EAX first subtracts 4 from ESP, then writes the value of EAX to memory at [ESP]. This instruction supports immediate values, registers, or memory operands but does not affect the flags register. Variants like PUSHF (or PUSHFD/PUSHFQ) push the flags register onto the stack for preservation during interrupts or context switches. Additionally, PUSHAD (32-bit) and PUSHFQ (64-bit) push all general-purpose registers or flags, respectively, enabling atomic register saves. Conversely, the POP instruction loads the value from the top of the into the destination and then increments the pointer by the size. For instance, POP EAX reads the 4-byte value at [ESP] into and adds 4 to in 32-bit mode. Like PUSH, it supports registers or but cannot pop into the CS segment register; instead, RET is used for control transfers involving CS. The POPF (or POPFD/POPFQ) variant restores the , while POPAD (32-bit) and POPFQ (64-bit) restore all general-purpose registers or flags, providing symmetric bulk operations to counterparts. These instructions also do not modify flags except when popping them explicitly. For procedure management, the ENTER establishes a stack frame by pushing the frame pointer (EBP/RBP), allocating space for local variables based on a specified size, and handling nesting levels for languages like Pascal with recursive calls. It takes two operands: the allocation size (in bytes) and a nesting level (0-31), adjusting EBP to point to the frame base and reserving space on the . The companion LEAVE reverses this by restoring the stack pointer from the frame pointer ( ESP, EBP) and popping EBP, effectively deallocating the frame just before a RET. This pair simplifies / code compared to manual // and POP/MOV sequences, though modern compilers often use the latter for optimization. For example, ENTER 8, 0 in 32-bit mode pushes EBP, sets EBP to ESP, and subtracts 8 from ESP for two local dwords. In 64-bit mode under the System V ABI (common on /Unix), the must maintain 16-byte upon entry to optimize SIMD operations and reduce alignment faults; this requires if necessary during pushes or allocations. The ABI specifies that the pointer (RSP) 16 equals 0 at the start of each , with the ensuring after the return address push. Misalignment can degrade or cause exceptions in aligned instructions like MOVAPS. Stack overflow occurs when PUSH or ENTER exceeds the stack segment limit or page boundaries, triggering a #SS (stack segment) exception in protected or ; underflow from excessive POP or LEAVE attempts accesses invalid , potentially causing a #GP (. These hardware-detected conditions rely on segment descriptors and page tables rather than EFLAGS bits like overflow (OF) or carry (CF), which apply to arithmetic operations. Detection integrates with the OS for handling, such as expanding the or terminating the process.

Floating-Point Instructions

The x87 (FPU) provides scalar floating-point operations in x86 assembly language, integrated into the processor since the 8087 and later embedded in the CPU core. It employs a -based with eight 80-bit s, denoted ST(0) through ST(7), where ST(0) serves as the top of the (TOS). Each holds in extended-precision : a 1-bit , a 15-bit biased exponent, and a 64-bit (with an explicit leading 1 for normalized numbers). The pointer TOP, stored in bits 11-13 of the FPU status word, dynamically indicates the current TOS, allowing implicit operand addressing relative to ST(0). The tag word tracks the content type of each (valid, zero, special, or empty) to optimize operations and . Basic arithmetic instructions in the x87 FPU perform operations primarily on the TOS and the next stack element, ST(1), with results replacing the TOS unless specified otherwise. The instruction adds the source (ST(i) or ) to ST(0), storing the result in ST(0); for example, FADD ST(1), ST(0) computes ST(0) + ST(1) and places it in ST(0). Similarly, FSUB subtracts the source from ST(0), FMUL multiplies them, and FDIV divides ST(0) by the source, each with variants like FADDP that pop the stack post-operation to free ST(1). These instructions support real operands in single (32-bit), double (64-bit), or extended (80-bit) precision, using the FPU's internal 80-bit format for computations to minimize rounding errors. Opcodes vary by operand type, such as D8 /0 for FADD with a 32-bit memory operand or DC C0+i for register-to-register. For storing results, the FST instruction copies the TOS to a destination without altering the , such as FST m64fp to write ST(0) as a 64-bit double-precision to ; the popping variant FSTP additionally decrements the stack pointer. These operations ensure compatibility with formats when interfacing with , though internal computations retain for accuracy. Transcendental instructions compute specialized functions on the TOS. FSIN calculates the sine of ST(0) in radians (range -2^63 to +2^63), replacing ST(0) with the result and setting the C2 flag for out-of-range inputs; FCOS does likewise for cosine. FATAN computes the arctangent of ST(1)/ST(0), stores it in ST(1), and pops the , useful for computations with accuracy better than 1 ulp on processors and later. Comparison instructions like FCOM evaluate the TOS against a source , setting condition codes C0, , and in the word to indicate relations: =0 and =0 for ST(0) > source, =1 and =0 for ST(0) < source, =0 and =1 for equality, or unordered () otherwise. For instance, FCOM ST(1) compares ST(0) and ST(1), raising an invalid-operation exception if either is . This enables conditional branching via subsequent instructions like FSTSW to transfer flags to the EFLAGS . Control instructions manage FPU : FINIT initializes the FPU by setting the control word to 037FH (masking all exceptions, to nearest), clearing the word, and tagging all registers as empty; FCLEX (or FNCLEX without wait) clears pending exception flags in the word after checking for unmasked exceptions.
InstructionPrimary OperationKey Flags/EffectsExample Usage
FADDAdditionUpdates C1 for inexact results[FADD](/page/FADD) ST(2), ST(0) (ST(0) += ST(2))
FSUBSubtractionAs aboveFSUBR ST(0), m32fp (ST(0) = memory - ST(0), reverse subtract)
FMULMultiplicationAs aboveFMULP ST(1), ST(0) (pops after multiply)
FDIVDivisionAs aboveFDIV ST(3), ST(0) (ST(3) /= ST(0))
FSTStore TOSNo stack popFSTSW AX (store status word)
FSINSineC2=1 if out-of-rangeFSIN (ST(0) = sin(ST(0)))
FCOSCosineC2=1 if |ST(0)| ≥ 2^63FCOS (ST(0) = cos(ST(0)))
FATANArctangentPops stackFATAN (ST(1) = atan(ST(1)/ST(0)))
FCOMCompareSets C0/C2/C3FCOM m80fp (compare to extended memory)
FINITInitializeResets to defaultFINIT (clear exceptions, empty stack)
FCLEXClear exceptionsClears flagsFCLEX (reset after error)
Although the x87 FPU remains fully supported in modern x86 processors for , it has been largely supplanted by instructions for higher performance in scalar and vectorized floating-point tasks, yet it persists in applications demanding the extra precision of its 80-bit format to avoid intermediate rounding losses in chained computations.

SIMD Instructions

(Single Instruction, Multiple Data) instructions in x86 assembly language enable of multiple data elements within a single operation, significantly enhancing performance for vectorized computations. These extensions build upon the scalar floating-point capabilities by introducing wider registers and specialized operations for packed data types, such as integers and floating-point values. Introduced progressively since the late 1990s, SIMD instructions form a cornerstone of on x86 processors. The earliest SIMD extension, MMX (MultiMedia eXtension), introduced in 1997, provides operations on 64-bit MMX registers (MM0 through MM7, aliasing the x87 FPU registers) for packed integers. It supports data types like 8 packed bytes, 4 packed words, 2 packed doublewords, or a single quadword, with instructions such as PADDB (add packed bytes with ), PMULHW (multiply packed words, high part), and MOVQ (move quadword). MMX enables integer , logical operations, and shuffles for tasks like image processing, but requires EMMS to clear FPU tags after use to avoid conflicts with floating-point code. It laid the groundwork for later SIMD sets but is limited to 64-bit width. The foundational SIMD extension for floating-point, (SSE), utilizes 128-bit XMM registers (XMM0 through XMM15 in 64-bit mode) to handle packed data. SSE supports operations on single-precision floating-point (32-bit) and vectors, with key s including MOVAPS for aligned moves of packed single-precision floating-point values and ADDPS for adding such vectors element-wise. For example, the ADDPS xmm1, xmm2 adds the packed single-precision values in xmm2 to those in xmm1, storing the result in xmm1. SSE s use legacy SSE opcodes and are essential for basic vector processing. Advanced Vector Extensions (AVX) extend SIMD capabilities to 256-bit YMM registers (YMM0 through YMM15), doubling the vector width for greater throughput. AVX employs the VEX encoding prefix (2- or 3-byte) to specify vector length and operands, avoiding legacy escape bytes. Instructions like VADDPD add packed double-precision floating-point (64-bit) values, as in VADDPD ymm1, ymm2, ymm3, which processes eight elements simultaneously. AVX also supports masking via the VEX.vvvv field for conditional operations. Building on , AVX includes instructions such as PACKSSDW, which packs signed doublewords into signed words with (e.g., VPACKSSDW ymm1, ymm2, ymm3), useful for data compression in . Additionally, PSHUFB shuffles bytes based on a control mask (e.g., VPSHUFB ymm1, ymm2, ymm3), enabling flexible data for tasks like byte-level reordering. AVX-512 further advances to 512-bit ZMM registers (ZMM0 through ZMM31), supporting up to 16 single-precision or 8 double-precision elements per operation. It introduces the EVEX encoding (4-byte prefix) for features like writemasking (using k registers for element-wise control, e.g., {k1}{z} to zero non-masked elements) and broadcasting from memory. The instruction VGATHERDPD gathers double-precision values using 32-bit indices (e.g., VGATHERDPD zmm1 {k1}, vm512), facilitating sparse access in irregular datasets. Per-lane operations allow independent processing of vector lanes, enhancing flexibility. AVX-512 instructions extend prior sets, such as VADDPD now supporting ZMM widths with masking. These SIMD instructions find primary use in multimedia applications, where parallel operations accelerate video encoding, image filtering, and audio processing—for instance, ADDPS for pixel value adjustments or PSHUFB for color channel swaps. In machine learning, they optimize vectorized computations like matrix additions (VADDPD) and gather operations (VGATHERDPD) for neural network training on large datasets, providing substantial speedups in tensor operations.
ExtensionRegister WidthKey RegistersEncodingExample Vector Capacity (Single-Precision Float)
SSE128-bitXMM0-XMM15Legacy SSE4 elements
AVX256-bitYMM0-YMM15VEX8 elements
512-bitZMM0-ZMM31EVEX16 elements

Program Flow and Examples

Program Flow Control

In x86 assembly language, program flow control encompasses mechanisms for structuring code execution beyond basic linear sequencing, including subroutine management, asynchronous event handling, and conditional logic. These features enable , response to events, and error recovery, forming the backbone of complex applications from operating systems to . Procedures allow for reusable code blocks, while and exceptions provide hooks for system-level interactions, all orchestrated through the processor's interrupt architecture and stack-based control transfers. Procedures in x86 assembly are defined using assembler-specific directives and invoked via the CALL and RET instructions, which manage the stack to preserve execution context. In Microsoft Macro Assembler (MASM), procedures are delimited by PROC and ENDP directives, which declare the entry point and scope, respectively, facilitating linkage and scoping for the subroutine. For instance, a simple procedure might be structured as follows:
MyProc PROC
    ; procedure body
    ret
MyProc ENDP
This setup supports parameter passing and return value handling according to established application binary interfaces (ABIs). The cdecl convention, common in systems and C, passes parameters on the stack from right to left, with the caller responsible for stack cleanup after the RET instruction, promoting flexibility for variable-argument functions. In contrast, the stdcall convention, prevalent in calls, reverses the cleanup duty to the callee, standardizing stack frame sizes for better performance in frequent calls. These ABIs ensure between and higher-level languages, with parameters often accessed via offsets from the EBP register in 32-bit modes or through registers in 64-bit System V ABI. Interrupt service routines (ISRs) handle asynchronous events from or software, configured through the (IDT), a system that maps vectors to handler addresses. The IDT is loaded into the processor using the LIDT instruction, with each entry specifying a gate descriptor that points to the ISR , segment selector, and level. ISRs are invoked automatically on occurrence, saving the processor state on the stack before transferring control. To manage interrupt enabling and disabling, the CLI (Clear Interrupt Flag) and (Set Interrupt Flag) instructions toggle the IF bit in the EFLAGS register, allowing software to mask interrupts during critical sections. For example, an ISR might conclude with IRET to restore the state and return. Exceptions represent synchronous events triggered by execution errors or violations, routed through the similar to interrupts but classified as faults, s, or aborts based on restartability. The #GP (General ) exception, vector 13, occurs on violations such as invalid execution, privilege level mismatches, or faults, pushing an onto the for handler analysis. Exception handlers, defined in the as gates for precise restarts, process the event—such as logging the faulting address from CR2 for page faults—and typically invoke IRET to resume execution, ensuring system stability. Hardware s like #GP thus enable robust error handling in protected-mode environments. High-level constructs like are implemented using conditional that alter flow based on states set by instructions. A typical decrements a and back if non-zero, as in:
mov ecx, 10      ; [loop](/page/Loop) [counter](/page/Counter)
loop_start:
    ; [loop](/page/Loop) body
    dec ecx
    jnz loop_start  ; [jump](/page/Jump) if not zero
This leverages instructions like JNZ ( if not zero) to test the ZF , providing efficient without dedicated opcodes. Conditional directives further enhance flow control at assemble time; in (NASM), %if evaluates expressions to include or exclude code blocks, such as %if testing symbol definitions for platform-specific variants. Debugging integrates seamlessly via software breakpoints, where the INT 3 instruction (opcode CC) generates a #BP (Breakpoint) exception, vector 3, pausing execution for intervention. This one-byte trap is ideal for non-intrusive breakpoints, with handlers in the routing to the debugger's routine, which can inspect s and memory before single-stepping with TF in EFLAGS.

Basic Hello World Programs

A basic "Hello World" program in x86 assembly demonstrates fundamental operations and program termination specific to the target . These examples illustrate how assembly code interacts with the system for simple text output, highlighting differences in calling conventions, system calls, and linking requirements across platforms. The programs are kept minimal to focus on core concepts like data declaration, usage, and invocation of OS services. For 16-bit using MASM syntax, the program employs DOS interrupt 21h with function 09h in AH to print a (ending with '$'), followed by function 4Ch for program termination. The .model small directive specifies a small model suitable for DOS executables.
; hello.asm - 16-bit MS-DOS Hello World in MASM
.model small
.stack 128
.data
Msg db 'Hello, World!', 13, 10, '$'  ; Message with CR/LF and terminator
.code
start:
    mov ax, @data
    mov ds, ax
    mov ah, 09h
    lea dx, Msg
    int 21h
    mov ah, 4Ch
    int 21h
end start
To assemble and link: Use ml hello.asm to produce the executable hello.exe. This runs in real mode on or compatible emulators. In 32-bit Windows using MASM syntax, a graphical "Hello World" can invoke MessageBoxA from user32.dll to display the message in a dialog box, with ExitProcess from kernel32.dll for termination. The .model flat directive enables flat memory addressing, and the program follows the stdcall calling convention.
; hello.asm - 32-bit Windows Hello World in MASM with MessageBoxA
.386
.model flat, stdcall
option casemap:none
include windows.inc
include kernel32.inc
include user32.inc
includelib kernel32.lib
includelib user32.lib

.data
    titleMsg db 'x86 Assembly', 0
    msg db 'Hello, World!', 0

.code
Main:
    push 0                  ; MB_OK
    push offset titleMsg    ; Caption
    push offset msg         ; Text
    push 0                  ; HWND_DESKTOP
    call MessageBoxA
    push 0
    call ExitProcess
end Main
Assemble with ml /c /coff hello.asm and link with link /subsystem:windows hello.obj user32.lib kernel32.lib /entry:Main /libpath:"C:\path\to\libs" to generate hello.exe. For 32-bit using NASM syntax, the program uses 4 (sys_write) via INT 80h to output to stdout (file descriptor 1), with arguments in EBX (descriptor), ECX (), and EDX (), followed by 1 (sys_exit) with EBX as the exit code. No external libraries are required beyond the .
; hello.asm - 32-bit Linux Hello World in NASM
SECTION .data
    msg db 'Hello, World!', 10
    msgLen equ $ - msg

SECTION .text
    global _start
_start:
    [mov](/page/MOV) eax, 4      ; sys_write
    mov ebx, 1      ; stdout
    mov ecx, msg    ; [buffer](/page/Buffer)
    mov edx, msgLen ; [length](/page/Length)
    int 80h
    [mov](/page/MOV) eax, 1      ; sys_exit
    mov ebx, 0      ; exit code
    int 80h
Assemble with nasm -f elf32 hello.asm -o hello.o and link with ld -m elf_i386 hello.o -o hello to produce the executable. In 64-bit Linux using NASM syntax, a higher-level approach links against libc to call printf for formatted output, leveraging the x86-64 System V ABI where the first argument is in RDI and RIP-relative addressing accesses data. The program uses position-independent code for the string reference.
; hello.asm - 64-bit Linux Hello World in NASM with printf
extern printf
extern exit

SECTION .data
    msg db 'Hello, World!', 10, 0

SECTION .text
    global main
main:
    mov rdi, msg        ; Argument in RDI (RIP-relative)
    xor rax, rax        ; No vector args
    call [printf](/page/Printf)
    mov rdi, 0
    call [exit](/page/Exit)
Assemble with nasm -f elf64 hello.asm -o hello.o and link with ld -dynamic-linker /lib64/ld-linux-x86-64.so.2 hello.o -lc -o hello or simply gcc hello.o -o hello to include libc. This produces a dynamically linked .

Advanced Usage Examples

Advanced usage of x86 assembly language often involves low-level manipulation of processor state and hardware interactions, enabling optimized or specialized code such as position-independent executables, dynamic code generation, and custom interrupt processing. These techniques leverage specific instructions to interact with flags, the instruction pointer, and system events, but require careful handling to ensure correctness across processor generations. Flag manipulation is crucial for conditional control in performance-critical loops, where instructions like ADD can set flags such as the Carry Flag (CF) and Zero Flag (ZF) based on arithmetic results. The ADD instruction adds the source operand to the destination and stores the result in the destination, setting CF if there is a carry out of the most significant bit for unsigned operations and ZF if the result is zero. Following this, the JC (Jump if Carry) instruction can branch to a label if CF is set, enabling efficient handling of overflow in unsigned arithmetic loops. For instance, in a loop accumulating values until overflow:
mov eax, 0xFFFFFFFF  ; Initialize accumulator to max unsigned 32-bit value
mov ecx, 10          ; Loop counter
loop_start:
add eax, 1           ; Increment; sets CF if overflow
jc overflow_handler  ; Jump if carry (overflow)
dec ecx
jnz loop_start
; Continue if no overflow
overflow_handler:
; Handle wrap-around
This pattern detects unsigned overflow without additional comparisons, optimizing tight loops in numerical computations. Accessing the pointer (IP, or RIP in 64-bit mode) supports (PIC), essential for shared libraries and . The () computes the effective address of its source operand without memory access, storing it in the destination ; in PIC contexts, RIP-relative addressing allows relative offsets from the current position. Using assembler syntax like lea ebx, [rel &#36;] loads the address of the current into EBX, providing the code's position for runtime relocations in PIC binaries. An example in 64-bit PIC code to compute a relative offset to a :
lea rbx, [rel &#36;]     ; Load current RIP-relative position into RBX
add rbx, data_offset ; Adjust to target data location (offset computed at link time)
mov rax, [rbx]       ; Access data at runtime-independent address
This avoids absolute addresses, ensuring the code relocates correctly when loaded at arbitrary base addresses. Self-modifying code alters instructions at runtime, useful for just-in-time compilation or adaptive optimization, but requires serialization to flush processor caches and ensure the modified instructions are fetched correctly. After writing to a code region, executing a serializing instruction like prevents speculative execution of stale instructions by invalidating affected cache lines. The instruction returns processor identification but also acts as a full barrier, flushing the instruction pipeline. A simple self-modifying example jumps to a modifiable region, patches an opcode (e.g., changing to ADD), and resumes:
jmp modify_code      ; Jump to modifier
original_code: nop   ; Placeholder instruction at address 0x1000 (example)
modify_code:
mov byte [0x1000], 0x50  ; Patch NOP (0x90) to PUSH AX (0x50) - simplistic example
cpuid                 ; Serialize: flush caches and pipeline
jmp 0x1000           ; Resume at modified code
Such techniques incur performance penalties due to cache invalidation but enable runtime code adaptation in embedded or virtualized environments. Custom interrupt handlers allow direct hardware interaction, such as processing input via IRQ 1 ( 0x21 in legacy modes). In , the (IDT) routes hardware s to user-defined handlers, where the saves the current RIP and RFLAGS before transferring control. A basic handler reads from 0x60 after acknowledging the , processing scancodes for key presses. Example handler stub in 32-bit :
keyboard_handler:
pushad               ; Save registers
in al, 0x60          ; Read scancode from [keyboard](/page/Keyboard) controller
; Process scancode (e.g., map to ASCII)
mov [key_buffer], al ; Store in buffer
mov al, 0x20         ; EOI to [PIC](/page/Pic)
out 0x20, al         ; Acknowledge interrupt
popad
iret                 ; Return, restoring [RIP](/page/RIP) and EFLAGS
This setup, registered in the at vector 33 (IRQ 1 + 32), enables real-time input capture in or code. In 64-bit mode, advanced usage extends to system calls via the SYSCALL instruction, which saves the current to RCX and (the 64-bit extension of EFLAGS) to R11 before switching to mode. carries condition codes and status bits, while tracks execution position; in syscalls, parameters are passed in registers, with SYSCALL enabling fast transitions without stack manipulation. An example write syscall:
mov rax, 1           ; Syscall number: write
mov rdi, 1           ; [File descriptor](/page/File_descriptor): stdout
mov rsi, msg         ; [Buffer](/page/Buffer) address
mov rdx, len         ; [Length](/page/Length)
syscall              ; Invoke; RCX = saved [RIP](/page/The_Rip), R11 = saved [RFLAGS](/page/RFLAGS)
This preserves user-state for efficient return via SYSRET, minimizing overhead in high-frequency interactions.

References

  1. [1]
    Intel® 64 and IA-32 Architectures Software Developer Manuals
    Oct 29, 2025 · Overview. These manuals describe the architecture and programming environment of the Intel® 64 and IA-32 architectures.
  2. [2]
    The Beginning of a Legend: The 8086 - Explore Intel's history
    Intel introduced the 8086 microprocessor, one of the most important semiconductors in history. A modified version, the 8088, would power the first IBM-platform ...
  3. [3]
    PRESS RELEASE DATED JULY 16, 2003 - 8-K - AMD
    On April 22, AMD introduced the AMD Opteron processor, the world's first 64-bit processor compatible with the industry-standard x86 architecture. We believe the ...
  4. [4]
    The Intel ® 8086 and the IBM PC
    Intel introduced the 8086 microprocessor in 1978. Completed in just 18 months, the 8086 was a game changer for several reasons: its design allowed for much more ...Missing: x86 | Show results with:x86
  5. [5]
    [PDF] in the united states district court - Intel
    The initial prominence of the x86 instruction set was largely due to the fact that it was chosen by IBM in the early 1980s, together with Microsoft's PC ...
  6. [6]
    [PDF] Intel Corporation Annual Report 1984
    Early in 1982, by way of a seven page press release, the 80286 was formally introduced to the world. At the time, the Motorola. 68000 had become popular in ...
  7. [7]
    [PDF] White Paper: Introduction to Intel® Architecture, The Basics
    Beginning in 1993, the “x86” naming convention gave way to more memorable (and pronounceable) product names such as Intel® Pentium® processor, Intel® Celeron® ...
  8. [8]
    10-K: Annual report pursuant to Section 13 and 15(d) - AMD
    We introduced our first 64-bit microprocessor for servers and workstations, the AMD Opteron processor, in April 2003, an introduction date that we changed from ...
  9. [9]
    Deep Learning with Intel® AVX-512 and Intel® DL Boost
    Aug 17, 2022 · This guide is for users who are already familiar with deep learning using Intel® AVX-512 and Intel® Deep Learning Boost.
  10. [10]
    [PDF] Instruction Set Architectures Part II: x86, RISC, and CISC - UCSD CSE
    • x86 has variable-length instructions. • Between 1 and 15 bytes. Page 23. x86-64 Assembly Syntax. • There are two syntaxes for x86 assembly. • We will use the ...
  11. [11]
    Assembly Language Working, Features, and Advantages - Spiceworks
    Jun 15, 2023 · Today, assembly language is still used for systems programming, device driver development, and other low-level programming tasks. However ...
  12. [12]
    Just finished learning x86 assembly language. What can I do with it?
    Nov 27, 2009 · Kernel/OS development. · Compiler development (someone's got to translate the higher level languages to the CPU) · Driver development · Embedded ...Missing: primary | Show results with:primary
  13. [13]
    [PDF] What's cool about x86 assembly language? - Computer Science
    x86 instructions may be of varying lengths, between 1 and 15 bytes inclusive machine-code (hex): 48 87 44 CB 40 # 5-byte instruction-length. REX Opcode ModR ...
  14. [14]
    [PDF] Instruction Set Architecture (ISA)
    • Easy for assembly-level programmers, good code density. • RISC (Reduced ... • x86 code is becoming more “RISC-like”. • In 32-bit to 64-bit transition ...Missing: challenges | Show results with:challenges
  15. [15]
    AMD Now Commands One-Third of the Desktop x86 Processor Market
    Aug 14, 2025 · In Q2 2025, AMD achieved its best quarter yet. AMD's desktop x86 unit share rose to 32.2%, roughly one third of the market, and its desktop CPU ...
  16. [16]
    Despite the hubbub, Intel is holding onto server market share
    Aug 15, 2025 · Second quarter 2025 total X86 processor unit shipments grew well above their traditional seasonal rates, which is usually slow for this quarter ...
  17. [17]
    War on JITs: Software-Based Attacks and Hybrid Defenses for JIT ...
    May 6, 2025 · In this article, we present a survey of software attacks on Just-In-Time (JIT) compilers, which dynamically produce optimized code at run time.
  18. [18]
    MASM for x64 (ml64.exe) - Microsoft Learn
    May 2, 2025 · To port your x86 code that uses inline assembler to x64 or ARM64, you can convert your code to C++, use compiler intrinsics, or create assembler ...
  19. [19]
    x86 Assembly Language Programming
    This document contains very brief examples of assembly language programs for the x86. The topic of x86 assembly language programming is messy because: There are ...
  20. [20]
    [PDF] NASM — The Netwide Assembler - UMBC
    1.1 What Is NASM? The Netwide Assembler, NASM, is an 80x86 assembler designed for portability and modularity. It supports a range of object file formats, ...
  21. [21]
    flat assembler 1.73 Programmer's Manual
    2.1 The x86 architecture instructions. In this section you can find both the information about the syntax and purpose the assembly language instructions. If you ...Missing: AT&T NASM<|control11|><|separator|>
  22. [22]
  23. [23]
  24. [24]
    [PDF] AMD64 Architecture Programmer's Manual, Volumes 1-5, 40332
    This is the AMD64 Architecture Programmer's Manual, Volumes 1-5, for informational purposes, and subject to change. It may contain technical inaccuracies.
  25. [25]
    objdump(1) - Linux manual page - man7.org
    objdump displays information about one or more object files. The options control what particular information to display.
  26. [26]
    [PDF] MASM Reserved Words - KFUPM
    Reserved words in MASM 6.1 are reserved under all CPU modes. Words enabled in .8086 mode, the default, can be used in all higher CPU modes. To.
  27. [27]
    The Netwide Assembler - NASM
    The BITS directive specifies whether NASM should generate code designed to run on a processor operating in 16-bit mode, 32-bit mode or 64-bit mode. The syntax ...
  28. [28]
    [PDF] Intel® 64 and IA-32 Architectures Software Developer's Manual
    Jan 2, 2012 · NOTE: The Intel® 64 and IA-32 Architectures Software Developer's Manual consists of nine volumes: Basic Architecture, Order Number 253665; ...
  29. [29]
    Programmer's Technical Reference for MSDOS and the IBM PC
    This manual is intended to replace the various (expensive) references needed to program for the DOS environment.<|control11|><|separator|>
  30. [30]
    GNU GRUB Manual 2.12
    This is the documentation of GNU GRUB, the GRand Unified Bootloader, a flexible and powerful boot loader program for a wide range of architectures.
  31. [31]
    [PDF] Intel® 64 and IA-32 Architectures Software Developer's Manual
    NOTE: The Intel® 64 and IA-32 Architectures Software Developer's Manual consists of nine volumes: Basic Architecture, Order Number 253665; Instruction Set ...
  32. [32]
    1. The Linux/x86 Boot Protocol — The Linux Kernel documentation
    The first step in loading a Linux kernel should be to load the real-mode code (boot sector and setup code) and then examine the following header at offset 0x01 ...
  33. [33]
    Windows XP Professional x64 Edition - BetaWiki
    Sep 24, 2025 · Windows XP Professional x64 Edition ; OS family, Windows NT ; Version, 5.2 ; Architecture, AMD64 ; Release date, 2005-04-25 ; Support end, 2014-04-08.
  34. [34]
    [PDF] Volume 3 (3A, 3B, 3C & 3D): System Programming Guide - Intel
    NOTE: The Intel 64 and IA-32 Architectures Software Developer's Manual consists of four volumes: Basic Architecture, Order Number 253665; Instruction Set ...
  35. [35]
    [PDF] Unified Extensible Firmware Interface (UEFI) Specification
    Aug 29, 2022 · ... boot a UEFI-compliant OS. The UEFI Driver Model is designed to be generic and can be adapted to any type of bus or device. The UEFI Spec ...
  36. [36]
    [PDF] Intel® 64 and IA-32 Architectures Software Developer's Manual
    NOTE: The Intel® 64 and IA-32 Architectures Software Developer's Manual consists of nine volumes: Basic Architecture, Order Number 253665; Instruction Set ...
  37. [37]
    Push Word, Doubleword, or Quadword Onto the Stack
    For the Intel® 8086 processor, the PUSH SP instruction pushes the new value of the SP register (that is the value after it has been decremented by 2). Operation ...
  38. [38]
    POP — Pop a Value From the Stack
    The POP ESP instruction increments the stack pointer (ESP) before data at the old top of stack is written into the destination. Loading the SS register with a ...
  39. [39]
    ENTER — Make Stack Frame for Procedure Parameters
    The LEAVE instruction is then used at the end of the procedure (just before the RET instruction) to release the stack frame. If the nesting level is 0, the ...
  40. [40]
    [PDF] System V Application Binary Interface - AMD64 Architecture ...
    Jul 2, 2012 · This is the System V Application Binary Interface (ABI) for AMD64 architecture, specifically draft version 0.99.6.
  41. [41]
    [PDF] Intel® 64 and IA-32 Architectures Software Developer's Manual
    ... X87 FPU INSTRUCTIONS ... The Intel® 64 and IA-32 Architectures Software Developer's Manual, Volumes 3A, 3B, 3C, & 3D, describe the operating-system ...
  42. [42]
    MS-DOS Version 4.0 Programmer's Reference - PCjs Machines
    The current range of interrupts used for MS-DOS is 20H-27H; 28H-40H are reserved. Interrupt 21H is the function request service; it provides access to a ...
  43. [43]
    How to write hello world in assembly under Windows?
    Jun 21, 2009 · This is a Win32 console program that writes "Hello, World" on one line and then exits. It needs to be linked with a C library.Writing hello world to console with DOS interrupts using MASM on ...How to display "Hello World" in x86 MASM VS 2022? - Stack OverflowMore results from stackoverflow.com
  44. [44]
    Hello World with NASM Assembler - DevDungeon
    Jul 28, 2015 · NASM, or The Netwide Assembler, is an x86 compiler that allows us to turn Assembly code in to machine code object files.
  45. [45]
    NASM Tutorial
    This tutorial will show you how to write assembly language programs on the x86-64 architecture. You will write both (1) standalone programs and (2) programs ...