CPUID
CPUID is a supplementary instruction in the x86 instruction set architecture that enables software to query a processor for detailed information about its identity, supported features, and capabilities, such as vendor string, family, model, stepping, cache parameters, and extensions like MMX, SSE, and AVX.[1][2] Introduced by Intel in 1993 with the Pentium processor to facilitate software compatibility across evolving architectures, it was subsequently adopted by AMD starting with the Am486 DX4 processor in 1994 and has become a standard mechanism for processor enumeration in x86-compatible systems from multiple vendors.[3][4]
The instruction operates by loading a function code, or "leaf," into the EAX register (with ECX sometimes serving as a sub-leaf selector) prior to execution; it then returns data in the EAX, EBX, ECX, and EDX registers, serializing the processor state to ensure consistent results.[1] Standard functions, accessible via EAX values from 0 to the maximum basic function (queried with EAX=0), provide core identification details like the vendor ID ("GenuineIntel" for Intel or "AuthenticAMD" for AMD) and legacy feature flags in EDX (e.g., FPU, MMX) and ECX (e.g., SSE3, SSSE3).[1][2] Extended functions, starting from EAX=8000_0000H on AMD processors or specific leaves like 7H and 0DH on Intel, reveal vendor-specific extensions such as AMD's 3DNow! or SVM virtualization, processor topology, extended state management for AVX-512, and power management capabilities.[1][2]
Beyond basic identification, CPUID plays a critical role in modern computing by allowing operating systems, applications, and BIOS firmware to detect hardware support for optimizations, security features (e.g., Intel SGX or AMD SEV), and multi-core configurations, ensuring portability and performance tuning across diverse x86 processors.[1][2] Its extensible design accommodates ongoing architectural advancements, with new leaves added in successive processor generations to report evolving capabilities like machine learning accelerations or enhanced security mitigations.[1][2]
Introduction and History
Introduction to the CPUID Instruction
The CPUID instruction is a supplementary x86 processor instruction designed to return detailed information about the processor's identification, supported features, and configuration parameters. It operates by loading a function code into the EAX register as input, which selects the specific type of data to retrieve, and then populates the EAX, EBX, ECX, and EDX registers with the corresponding output values upon execution.[5][3]
This instruction allows software to directly query the underlying hardware capabilities of x86-compatible processors, bypassing the need for operating system mediation and enabling portable detection of features such as instruction set extensions or cache configurations. Its atomic execution ensures that the query completes as a single, uninterruptible operation, providing reliable results even in multithreaded environments. Introduced by Intel in 1993 alongside the Pentium processor, CPUID has been available on all subsequent x86 processors, marking its role in standardizing hardware introspection.[6][7][3]
In terms of register conventions, EAX serves as the primary input selector for the main function (or "leaf"), while ECX can specify sub-functions (or "sub-leaves") within certain leaves to access more granular details. Understanding CPUID requires familiarity with basic x86 assembly concepts, such as using the MOV instruction to prepare the EAX value and the CPUID opcode (0F A2) to invoke it, though the instruction itself handles the data transfer implicitly. Historically, CPUID was developed to supplant inconsistent, vendor-proprietary techniques for feature detection, such as timing-based probes or reset vector analysis, thereby promoting cross-processor compatibility in software development.[5][8][3]
Historical Development and Evolution
The CPUID instruction was introduced by Intel in 1993 alongside the Pentium processor (codenamed P5) and certain SL-enhanced 486 variants, primarily to enable software detection of the processor vendor, family, model, and basic feature flags through initial function leaves EAX=0 (vendor identification and maximum function) and EAX=1 (processor identification and features).[9] This marked a shift from earlier x86 identification methods, such as fixed flags in the FLAGS register, providing a standardized mechanism for runtime enumeration in an era of evolving processor architectures.[10]
Subsequent expansions occurred with the Pentium Pro processor in 1995, which added EAX=2 for cache and TLB descriptor information to address the growing complexity of on-chip caching hierarchies.[9] The EAX=3 leaf for processor serial number was introduced later with the Pentium III in 1999, allowing unique hardware identification but was deprecated and disabled in subsequent processors around 2000 due to privacy concerns over unique tracking capabilities.[11] AMD adopted CPUID starting with the Am486 DX4 processor in 1994, with the K5 in 1996 providing further compatibility and the K6 in 1997 introducing the extended function range beginning at EAX=80000000h for vendor-specific features, including early support for 3DNow! extensions.[12][13] This range later supported AMD64 64-bit architecture enumeration with the Athlon 64 in 2003.
Key evolutions in the mid-2000s reflected multicore and advanced feature proliferation: Intel's Core microarchitecture in 2006 introduced EAX=4 for deterministic cache parameters, offering more precise and structured cache hierarchy details compared to the legacy EAX=2 descriptors.[14] The Nehalem microarchitecture in 2008 added EAX=0Bh for extended topology enumeration, enabling software to query core, thread, and cache topology in multicore systems.[15] This was further enhanced in 2013 with the Haswell microarchitecture's EAX=7 leaf for structured extended feature enumeration, providing a scalable way to enumerate instruction set extensions like AES-NI and AVX without fragmenting the feature space.[16]
Recent developments continue to adapt CPUID for emerging workloads. In 2024, Intel introduced EAX=24 for AVX10 (Advanced Vector Extensions 10) enumeration, simplifying detection of converged vector ISAs with versioned sub-leaves for granular feature support across 128-bit, 256-bit, and 512-bit widths.[17] AMD's Zen 5 microarchitecture, launched in 2024, extends the 80000000h range with new leaves for advanced enumeration supporting optimizations in hybrid core topologies.[18] Intel's Software Developer's Manual updates through March 2025 incorporate precursors for features like TMUL (tensor multiply) and ChkTag (tag check) in extended leaves, aligning with APX (Advanced Performance Extensions) for enhanced security and vector processing.[19]
These evolutions have significantly impacted software portability, enabling operating system kernels like Linux to dynamically configure features via /proc/cpuinfo based on CPUID queries, and libraries to detect SIMD capabilities for optimized code paths without hardcoded assumptions.[20]
Invoking CPUID
Basic Calling Mechanism
The CPUID instruction is invoked in assembly language by first loading the desired main function code (leaf) into the EAX register, followed by execution of the CPUID opcode, after which the results are retrieved from the EAX, EBX, ECX, and EDX registers.[21] This process provides processor-specific information without operands, as the instruction is self-contained and serializes execution to ensure prior instructions complete.[21]
For functions that support sub-leaves, such as leaf 4 (deterministic cache parameters) or leaf 7 (extended features), the ECX register must be set to the sub-leaf index (e.g., 0 for the base sub-leaf) before executing CPUID to access additional levels of detail.[21] The outputs from these sub-leaves follow the same register format but provide progressively more specific data, with invalid sub-leaf indices typically resulting in all output registers being set to zero.[21]
Each output register holds a 32-bit value, where the interpretation of individual bits or fields depends on the selected leaf; for instance, in leaf 0, the EBX, EDX, and ECX registers collectively form a 12-character ASCII vendor identification string when concatenated.[21] These 32-bit outputs are standardized across x86 processors from Intel and AMD, though the specific data encodings may vary by vendor for certain leaves.[21][22]
Specifying an invalid main leaf—such as a value in EAX exceeding the maximum supported function—does not raise an exception but instead returns the results as if the maximum supported function was requested, with EAX containing the maximum value and EBX, ECX, EDX containing the corresponding data for that function.[21] To prevent such cases, software must first execute CPUID with EAX=0 to retrieve the maximum basic function in EAX, ensuring subsequent calls remain within supported bounds.[21] The range 40000000H to 4FFFFFFFH is reserved for hypervisor-specific leaves. Software should detect the presence of a hypervisor (e.g., via bit 31 of ECX in leaf 1) before accessing this range; otherwise, it will be treated as an invalid input and return maximum basic function data. Leaves in this range should generally be avoided in bare-metal environments.[21]
In x86-64 environments, CPUID operates using the 64-bit general-purpose registers RAX, RBX, RCX, and RDX, but it processes and returns only 32-bit values, clearing the upper 32 bits of these registers upon completion.[21] This maintains backward compatibility with 32-bit modes, and a REX prefix may be used if the instruction accesses extended registers, though the core mechanism remains a legacy 32-bit operation.[21]
The following pseudocode illustrates a basic invocation for leaf 0 to obtain the maximum function and vendor information:
function_value ← 0
EAX ← function_value
CPUID
max_function ← EAX
vendor_part1 ← EBX
vendor_part2 ← ECX
vendor_part3 ← EDX
function_value ← 0
EAX ← function_value
CPUID
max_function ← EAX
vendor_part1 ← EBX
vendor_part2 ← ECX
vendor_part3 ← EDX
This sequence sets EAX to the maximum basic leaf supported and populates the other registers with vendor string components.[21]
Usage Guidelines and Safety
The CPUID instruction requires careful invocation to ensure compatibility and avoid undefined behavior across processor generations and vendors. Software must first execute CPUID with EAX set to 0 to retrieve the maximum supported basic function in EAX, as well as the vendor identification string in EBX, ECX, and EDX; this step is essential before querying any higher function codes, which could otherwise result in reserved or invalid returns. Similarly, for extended functions, EAX=80000000h must be used to determine the maximum extended leaf, preventing attempts to access unsupported ranges that may return arbitrary data.[23][24]
CPUID operates at all privilege levels in protected mode, including ring 3 (user mode), without requiring elevated privileges, enabling its use in application software for feature detection. However, in virtualized environments, hypervisors such as VMware may intercept the instruction and modify outputs; for instance, when EAX=1, bit 31 in ECX (hypervisor bit) is set to indicate emulation, and specific bits (e.g., bits 29-30) may expose vendor details like VMware. Developers should verify physical hardware contexts when results impact security or performance assumptions.[23]
The instruction is atomic per core, providing inherent thread-safety for individual executions, but feature flags and topology information can vary across cores in multi-socket or heterogeneous systems; thus, detection should occur once during system initialization or boot, with results cached for runtime use rather than repeated queries. Vendor-specific behaviors pose common pitfalls, such as differing maximum function values—maximum function values vary by processor generation and vendor; for example, modern Intel and AMD processors support basic leaves up to 0x24 or higher as of 2025—and deprecated leaves like EAX=3 (processor serial number), which often return undefined or zeroed values on modern CPUs and should not be relied upon.[24][23]
Although CPUID executes quickly (typically 10-20 cycles), it serializes instruction fetch and should be avoided in hot code paths or real-time loops to prevent performance bottlenecks; caching results at startup mitigates this while ensuring accuracy for static processor characteristics. From a security perspective, avoid depending on potentially spoofable data like serial numbers (EAX=3), especially in virtual machines where emulation can alter outputs; always cross-check hypervisor indicators to detect virtualized environments. In recent AMD architectures like Zen 5 (Family 1Ah), new extended leaves for features such as performance monitoring require validation of the extended maximum function (EAX=80000000h) to confirm support.[25][24]
Core Standard Functions
EAX=0: Vendor Identification and Maximum Function
The CPUID instruction, when invoked with EAX set to 0, serves as the initial query for identifying the processor vendor and determining the highest supported basic function leaf in the standard range (0 to the returned maximum). This leaf outputs the maximum value for EAX in subsequent basic calls, allowing software to bound its enumeration of processor capabilities without invoking unsupported functions. The instruction requires ECX to be 0; values greater than 0 in ECX are ignored for this leaf. Unlike other leaves, no feature bits are returned here, focusing solely on identification and scoping.
Upon execution, EAX receives the largest valid input value for basic CPUID functions, typically ranging from 1 on early supported processors to higher values like 0x1A on modern implementations, indicating support for leaves up to that number (for example, a maximum of 7 signals availability of the extended feature leaf EAX=7). The vendor identification string, a 12-character ASCII sequence, is distributed across EBX, EDX, and ECX as four-byte little-endian words, which must be concatenated in the order EBX + EDX + ECX to form the readable string. Each register holds the bytes such that extracting from low to high byte order yields the correct sequence: for instance, on Intel processors, EBX=0x756E6547 (bytes 'G','e','n','u'), EDX=0x49656E69 (bytes 'i','n','e','I'), and ECX=0x6C65746E (bytes 'n','t','e','l'), yielding "GenuineIntel".
Common vendor strings include "GenuineIntel" for Intel processors, "AuthenticAMD" for AMD processors (e.g., EBX=0x68747541 for bytes 'A','u','t','h'; EDX=0x444D4163 for 'c','A','M','D'; ECX=0x69746E65 for 'e','n','t','i'), and "CentaurHauls" for Centaur Technology (VIA) processors. This leaf is essential as the first invocation in any CPUID enumeration routine, enabling vendor-specific handling and preventing invalid queries that could return undefined results. Historically, the CPUID instruction was introduced with the Intel Pentium processor in 1993; on pre-Pentium x86 processors like the 486, executing CPUID triggers an invalid opcode exception (#UD), while early supported models return 0 in EAX to indicate no additional basic functions beyond vendor identification.
EAX=1: Processor Identification and Basic Features
When the CPUID instruction is executed with EAX set to 1, it returns detailed processor identification and basic feature information in the general-purpose registers. This leaf provides the processor's version details in EAX, a brand index and related parameters in EBX, and feature indication bit vectors in ECX (bits 0-31) and EDX (bits 0-31, corresponding to features 32-63 in the overall set).[26][22]
The value returned in EAX encodes the processor's stepping, model, family, and extended identifiers. Bits 3:0 hold the stepping ID, bits 7:4 and 11:8 provide the base model (with bits 19:16 as the extended model for families 6, 15, or higher), bits 15:12 indicate the processor type (e.g., 0 for the original OEM processor), bits 27:20 supply the extended family, and the remaining bits are reserved. To decode the effective family ID, software computes base = (EAX >> 8) & 0xF; extended = (EAX >> 20) & 0xFF; effective family = base + (base == 0xF ? extended : 0). Similarly, the full model is ((EAX >> 16) & 0xF) << 4 | ((EAX >> 4) & 0xF) when applicable for extended models. This decoding allows operating systems to identify the processor generation precisely during initialization.[26][22]
EBX returns auxiliary information, including bits 7:0 as the brand index (deprecated since 2008 in favor of extended leaves for processor name strings), bits 15:8 as the cache line size for CLFLUSH in bytes (typically 64 or 128), bits 23:16 as the number of logical processors sharing the core's cache (1 for single-threaded, higher for hyper-threading), and bits 31:24 as the initial local APIC ID. The brand index, when non-zero, historically pointed to a brand string via EAX=0's maximum sub-leaf, but modern processors return 0 here and provide names through other mechanisms.[26]
ECX and EDX contain bit flags indicating support for core x86 features, with each bit set to 1 denoting availability. These flags are crucial for software to enable or disable instruction sets and capabilities at runtime. The following tables summarize key bits; full lists are defined per vendor, with some AMD-specific flags like 3DNow! in extended leaves.
| ECX Bit | Feature Name | Description |
|---|
| 0 | SSE3 | Streaming SIMD Extensions 3 support. |
| 1 | PCLMULQDQ | Carry-less multiplication for AES-GCM. |
| 3 | MONITOR | MONITOR/MWAIT for lightweight sleeping. |
| 5 | VMX | Intel VT-x for virtualization. |
| 9 | SSSE3 | Supplemental Streaming SIMD Extensions 3. |
| 13 | CMPXCHG16B | 128-bit compare-and-swap. |
| 19 | SSE4.1 | Streaming SIMD Extensions 4.1. |
| 21 | x2APIC | Extended xAPIC mode. |
| 22 | MOVBE | MOV with byte-order reversal. |
| 23 | POPCNT | Population count instruction. |
| 25 | AESNI | AES new instructions. |
| 28 | AVX | Advanced Vector Extensions (precursor to later SIMD expansions). |
| 29 | F16C | Half-precision floating-point conversion. |
| 30 | RDRAND | Hardware random number generator. |
For basic identification, software uses sub-leaf ECX=0; higher values of ECX (1 through 3 on legacy processors) return components of the processor brand string in EAX/EBX/ECX/EDX, but this mechanism is deprecated post-2008, with modern systems relying on extended function 80000002h-80000004h instead. This leaf is essential for operating system bootstrapping, as it enables detection of fundamental capabilities like SSE2 (EDX bit 26), which indicates x86-64 mode support when combined with vendor identification from EAX=0.[26]
EAX=2: Cache and TLB Descriptors
When the CPUID instruction is executed with EAX set to 2, it returns information about the processor's internal caches and translation lookaside buffers (TLBs) in the form of compact descriptor bytes encoded in the EAX, EBX, ECX, and EDX registers.[21] This function was introduced to allow software to discover memory subsystem characteristics without relying on vendor-specific assumptions, enabling optimized code for different processor generations.[21]
Each of the four 32-bit registers contains up to four 8-bit descriptor values, treated as a flat byte array across all registers. The least significant byte of EAX is the first descriptor, followed by the remaining bytes of EAX, then EBX, ECX, and EDX. A descriptor value of 00H signals the end of valid information in the current execution, requiring additional invocations of CPUID with EAX=2 to retrieve further descriptors if needed. On some processors, bit 31 of each register indicates validity (1 for valid, 0 for invalid), and software must ignore invalid registers; additionally, the least significant byte of EAX may return 01H, which should be disregarded as it does not represent a meaningful descriptor.[21] Descriptors are unordered and may repeat or vary by processor model, so enumeration involves collecting and decoding all non-zero bytes until 00H is encountered across multiple calls.[21]
The descriptor bytes encode specific attributes of caches and TLBs, such as size, associativity, line length, and entry counts, using standardized codes defined by Intel. For example, the code 06H typically describes an 8 KiB, 4-way set associative instruction cache with 32-byte lines, while 50H might indicate a 64-entry, 4-way data TLB for 4 KiB pages. Another representative encoding, 2CH, denotes a 32 KiB, 8-way first-level data cache with 64-byte lines. These codes prioritize brevity for legacy software but lack the granularity of modern leaves, often requiring lookup tables for decoding.[21] A value of FFH in a descriptor signals that detailed cache parameters should instead be queried using CPUID with EAX=4.[21]
This function is primarily relevant to Intel processors and has been deprecated in favor of more structured enumerations like EAX=4 (deterministic cache parameters) on processors supporting Intel 64 architecture, as the original descriptors can be ambiguous or processor-specific.[21] On AMD processors (Family 0Fh and later), EAX=2 is reserved, returning undefined or zeroed values in all registers, with cache and TLB details instead provided via extended functions such as EAX=80000005H (L1 details) and EAX=80000006H (L2 details).[24] Software targeting cross-vendor compatibility must therefore detect the vendor via EAX=0 and fallback to appropriate leaves.[24]
EAX=3: Processor Serial Number
When the CPUID instruction is executed with EAX set to 3, it returns portions of a 96-bit processor serial number on supported hardware. Specifically, EAX holds bits 31–0, EDX holds bits 63–32, and ECX holds bits 95–64 of the serial number, while EBX is reserved.[5][27] Support for this feature is indicated by bit 18 (PSN) in the EDX register from the EAX=1 leaf.[5]
Introduced with the Pentium III processor in 1999, the serial number was intended for applications such as anti-piracy protection, digital content authentication, and enterprise asset management.[11] The unique identifier was programmed into the processor at manufacturing and could be read by software, extending basic processor identification.[28]
The feature quickly drew criticism for privacy risks, as it allowed websites and applications to retrieve a persistent hardware identifier, potentially enabling user tracking across sessions without consent. Organizations like the American Civil Liberties Union highlighted these dangers, stating that the risks outweighed potential security benefits.[29] Privacy advocates, including Junkbusters, called for a boycott, amplifying concerns over surveillance implications.[30]
In response, Intel disabled the serial number by default via BIOS settings starting in late 1999 and fully phased it out by 2001 in subsequent processors like the Pentium 4.[31] If disabled—typically controlled by a model-specific register bit—the instruction returns all zeros in EAX, ECX, and EDX. Modern Intel and AMD processors deprecate this leaf entirely, returning zeros with no sub-leaves available.[5] Due to its obsolescence and privacy issues, software developers are advised to avoid querying EAX=3.
EAX=4: Deterministic Cache Parameters
The CPUID instruction with EAX set to 4 provides a standardized, deterministic method to enumerate the parameters of a processor's cache hierarchy, allowing software to query detailed attributes for each cache level through sub-leaves indexed by ECX. This leaf was introduced to offer more precise and consistent information compared to earlier mechanisms, enabling better cache-aware optimizations in applications such as compilers and operating systems. When invoked, the outputs in EAX, EBX, ECX, and EDX describe the cache type, size components, sharing characteristics, and behavioral flags for the specified sub-leaf.[32]
For sub-leaf 0 (ECX=0), the instruction returns the maximum valid sub-leaf index in EAX bits 31:26 (adding 1 to the value), indicating the number of cache levels available for enumeration; subsequent sub-leaves (ECX=1, 2, etc.) describe individual caches until a null entry is encountered. The cache type is encoded in EAX bits 4:0, where 0 denotes a null (invalid) entry, 1 indicates a data cache, 2 an instruction cache, and 3 a unified cache that handles both data and instructions. EAX bits 7:5 specify the cache level (starting from 1 for L1), while bit 8 flags a self-initializing cache (requiring no software initialization), and bits 25:14 report the number of logical processors (threads) sharing the cache (adding 1 to the value). Additionally, EAX bits 31:26 in valid sub-leaves indicate the maximum number of processor cores sharing the cache (adding 1).[32]
The EBX register provides key sizing parameters: bits 11:0 give the system coherency line size in bytes (adding 1), bits 21:12 the physical line partitions (adding 1), and bits 31:22 the ways of associativity (adding 1). ECX bits 31:0 represent the number of sets in the cache (adding 1 to compute the full count). These values allow calculation of the total cache size in bytes as (ways of associativity + 1) × (physical partitions + 1) × (coherency line size + 1) × (number of sets + 1), offering a complete picture of capacity without ambiguity. EDX bits 0:2 further detail cache behavior: bit 0 indicates write-back invalidate propagation to lower levels, bit 1 denotes cache inclusivity (where higher levels include all lower-level contents), and bit 2 flags complex indexing (beyond simple direct mapping). The remaining bits in EDX are reserved.[32]
This leaf is fully supported on Intel processors starting from the Core microarchitecture, where, for example, sub-leaf 1 typically describes an 8-way associative, 4-partitioned L1 data cache of 32 KB (with 64-byte lines and 64 sets), while higher sub-leaves cover L2 unified caches (e.g., 256 KB per core) and shared L3 caches (e.g., 8 MB with 16 ways). AMD provides partial support for leaf 4 on Zen-based processors and later, using it alongside extended leaves for complete cache topology, though some parameters may rely on legacy or vendor-specific queries for full accuracy. Enumeration via this leaf facilitates optimizations like data alignment to cache lines or prefetching strategies, and it supersedes the packed descriptors from the legacy EAX=2 leaf by providing indexed, hierarchical details. It primarily targets caches but can indirectly inform on translation lookaside buffers (TLBs) in implementations where they share similar hierarchical traits.[32][33]
EAX=B: Thread and Core Topology
The CPUID leaf B (hexadecimal 0BH, decimal 11) provides information on processor topology, enabling software to enumerate relationships between logical processors, physical cores, and higher-level domains such as processor packages. This leaf was introduced with the Intel Nehalem microarchitecture in 2008 and is supported on subsequent Intel 64 processors.[34] To invoke it, software sets EAX to 0BH and uses ECX to specify a sub-leaf index corresponding to a topology level, starting from 0 and incrementing until an invalid level is reached. The outputs in EAX, EBX, ECX, and EDX describe the properties of the current level, including the number of logical processors within it and identifiers for mapping the hierarchy.[21]
The register outputs for a valid sub-leaf are detailed as follows:
| Register | Bits | Field Name | Description |
|---|
| EAX | 4:0 | Level Shift Width | Number of bits to shift right the x2APIC ID to obtain the ID of the next higher-level domain (e.g., from thread to core). Reserved on invalid levels. |
| EAX | 31:5 | Reserved | Always 0. |
| EBX | 15:0 | Logical Processors | Number of logical processors at this topology level minus 1 (e.g., value 1 indicates 2 logical processors). |
| EBX | 31:16 | Reserved | Always 0. |
| ECX | 7:0 | Sub-leaf Index | The input sub-leaf value (matches ECX input). |
| ECX | 15:8 | Level Type | Encodes the topology domain: 1 for SMT (simultaneous multithreading, or logical processor level), 2 for core (physical core level). Values 0 and 3-255 indicate invalid or reserved levels. |
| ECX | 31:16 | Reserved | Always 0. |
| EDX | 31:0 | x2APIC ID | The 32-bit x2APIC ID of the current logical processor, used for identifying and mapping processors in multi-socket systems. |
These fields allow derivation of bit masks for extracting sub-IDs (e.g., SMT ID from low bits of x2APIC, core ID from subsequent bits) to build the full topology hierarchy.[35] In 64-bit mode, software should use the full 64-bit registers (RAX, RBX, RCX, RDX) to capture the complete 32-bit values without truncation.[21]
Enumeration begins by executing CPUID with EAX=0BH and ECX=0 to query the lowest level (typically SMT), then increments ECX for subsequent levels (e.g., core at ECX=1) until EBX[15:0] returns 0, indicating an invalid sub-leaf. Each valid level provides the shift width in EAX[4:0] to distinguish entities at that scope from lower ones, assuming symmetric topology across processors. For example, on a processor with hyper-threading, the SMT level (type 1) might report EBX[15:0]=1 (2 threads per core), and the core level (type 2) might report EBX[15:0]=3 (4 cores per package), allowing computation of total logical processors as the product across levels: (EBX_SMT + 1) × (EBX_core + 1) = 2 × 4 = 8. This process supports up to three primary domains (SMT, core, package) but may include reserved types for future extensions like modules.[34]
This topology data is essential for NUMA-aware scheduling in operating systems and applications, where x2APIC IDs help map threads to nearby cores or packages to minimize latency in multi-socket systems. It complements cache sharing information from leaf 4 by providing hierarchical processor relationships rather than just cache parameters. Software must first verify support via CPUID leaf 1 (e.g., checking ECX[36] for x2APIC) before using leaf B. Limitations include lack of support on pre-Nehalem processors (e.g., those before 2010, which return invalid results), and the enumeration assumes uniform topology, requiring additional handling for asymmetric configurations. Leaf B has been largely superseded by the more flexible leaf 1FH for extended topologies in modern processors.[35]
EAX=15 and EAX=16: Clock Frequencies
CPUID leaves 15H and 16H provide information on processor clock frequencies, enabling software to determine nominal rates for the Time Stamp Counter (TSC), core crystal, base processor, maximum turbo, and bus reference clocks.[37] These leaves were introduced in Intel's Skylake microarchitecture in 2015 and are supported on subsequent Intel Core and Xeon processors.[5] They return data in the general-purpose registers following execution of the CPUID instruction with EAX set to 15H or 16H and ECX=0 for the primary subleaf.[37]
When EAX=15H and ECX=0, the leaf returns parameters for calculating the TSC frequency relative to the nominal core crystal clock.[37] The registers are populated as follows:
| Register | Bits | Value |
|---|
| EAX | 31:0 | Denominator of the TSC/core crystal clock ratio (non-zero value; 1 if the ratio is not enumerated) |
| EBX | 31:0 | Numerator of the TSC/core crystal clock ratio (non-zero value; 1 if the ratio is not enumerated) |
| ECX | 31:0 | Nominal core crystal clock frequency in Hz (non-zero if enumerated; typically 24 MHz or 25 MHz; 0 if not available) |
| EDX | 31:0 | Reserved (returns 0) |
The TSC frequency can be computed as the core crystal clock frequency multiplied by the ratio of numerator to denominator: TSC frequency (Hz) = ECX × (EBX / EAX).[37] If EBX=0, the ratio is not enumerated by the processor.[5] For the subleaf with ECX=1, the registers provide the TSC to bus reference clock ratio: EAX=0, EBX=denominator, ECX=numerator, and EDX=bus reference clock frequency in kHz (all reserved or zero if not supported).[37] This leaf is enumerated as supported via CPUID leaf 7H (ECX=0), where bit 15 of EBX indicates availability for 15H.[37]
These values are useful for calibrating the TSC in performance analysis tools and counters, as the TSC increments at a constant rate tied to the core crystal but independent of dynamic frequency scaling.[5] However, the returned frequencies represent nominal specifications and do not reflect actual operating frequencies, which may vary due to turbo boost or power management states.[37]
Leaf 16H: Base, Maximum, and Bus Frequencies
When EAX=16H and ECX=0, the leaf returns nominal processor frequency information in MHz units for diagnostic and display purposes.[37] The registers are populated as follows:
| Register | Bits | Value |
|---|
| EAX | 15:0 | Processor base frequency in MHz (0 if not supported) |
| EAX | 31:16 | Reserved (returns 0) |
| EBX | 15:0 | Maximum efficiency frequency or turbo frequency in MHz (0 if not supported) |
| EBX | 31:16 | Reserved (returns 0) |
| ECX | 15:0 | Bus (reference) frequency in MHz (0 if not supported) |
| ECX | 31:16 | Reserved (returns 0) |
| EDX | 31:0 | Reserved (returns 0) |
No additional calculations are required, as frequencies are directly encoded in the lower 16 bits of EAX, EBX, and ECX.[5] This leaf is enumerated as supported via CPUID leaf 7H (ECX=0), where bit 16 of EBX indicates availability for 16H.[37] Like leaf 15H, the data pertains to nominal values from the processor specification sheet and should not be used for real-time performance monitoring, where actual frequencies can exceed the base due to turbo modes.[37] For dynamic event-based monitoring, software may cross-reference these with architectural performance counters from leaf 0AH.[5]
The CPUID leaf EAX=23H provides detailed enumeration of architectural performance monitoring (APM) capabilities on supported x86 processors, allowing software to query the version, configuration, and features of performance counters for optimizing workload analysis and debugging. This leaf extends the foundational performance monitoring support from earlier leaves like EAX=0AH by offering subleaf-specific details on general-purpose (GP) counters, fixed-function counters, supported events, and advanced sampling mechanisms such as Precise Event-Based Sampling (PEBS). It is accessible only if the extended architectural performance monitoring feature is enabled, indicated by CPUID.(EAX=07H, ECX=0H):EAX[38]=1.[27]
To invoke this leaf, software sets EAX to 23H and ECX to the desired subleaf index, with results returned in EAX, EBX, ECX, and EDX registers. Subleaf 0 delivers core architectural information: EAX[7:0] specifies the version ID (e.g., 1 for initial implementations), EAX[15:8] reports the number of GP counters per logical processor (typically 4-8), EAX[23:16] indicates the bit width of those counters (e.g., 48 bits for high-resolution counting), and EAX[31:24] denotes the length of the EBX bit vector for event support. EBX serves as a bit vector enumerating supported architectural events, such as core cycles (bit 0) or instructions retired (bit 1), enabling software to identify available metrics without vendor-specific assumptions. ECX[7:0] provides the number of execution slots per cycle (multiplied by core cycles for total throughput estimation), while ECX[15:8] counts fixed-function counters; EDX[4:0] similarly tallies fixed counters (often 3-4 for uncore events), with EDX[12:5] giving their bit width and EDX[39] signaling deprecation of AnyThread sampling modes. Higher subleaves (ECX=1 through 5) extend this granularity: for instance, subleaf 1 provides bitmaps of supported GP and fixed counters, subleaf 3 lists event bitmaps, and subleaves 4-5 detail PEBS configurations, including support for Architectural PEBS (APEBS) via bitmaps of compatible counters and features like counter groups (e.g., METRICS, FIXED, GPR) and PEBS with Processor Distributor (PDIST).[27]
This leaf facilitates detection of both programmable (variable) GP counters for custom events and fixed counters for invariant metrics like total cycles or reference cycles, crucial for tools such as the Linux perf subsystem to dynamically configure monitoring without hardcoded assumptions. By querying subleaf 0's EBX[7:0] for GP counter count and EBX[15:8] for bit width (noting EBX here aligns with general counter properties in some implementations), software can allocate resources efficiently and avoid overflows in long-running traces. Features like PEBS (EDX bits in subleaf 0) enable low-overhead, precise sampling of events, with EDX indicating basic PEBS support and higher bits for extensions like APEBS, which uses dedicated MSRs (e.g., IA32_PEBS_BASE) for record buffering.[27]
Introduced as an extension for modern architectures, EAX=23H saw significant enhancements in 2024 with Intel's Granite Rapids processors (Family 06H, Models ADH/AEH), adding subleaves 4 and 5 for APEBS bitmaps, expanded event support (e.g., over 100 architectural events including new branch and cache metrics), and counter group configurations via MSRs like IA32_PMC_GPn_CFG for including subgroups (e.g., LBR, AUX) in PEBS records. AMD's Zen 5 architecture (e.g., Ryzen 9000 series, released 2024) similarly bolstered performance monitoring with additional events tailored to its wider dispatch (up to 6 instructions per cycle) and improved branch prediction, integrated into the Linux perf framework via kernel patches for Zen 5-specific counter decoding, though AMD primarily leverages extended leaves like 80000022H for analogous discovery. These updates bridge gaps in prior generations, providing more events for hybrid core analysis and energy efficiency tracking without relying on clock frequencies from leaves like EAX=15H/16H.[27][40]
| Subleaf (ECX) | Key Focus | Representative Outputs |
|---|
| 0 | General APM info | EAX: Version, GP count/width; EBX: Event vector; ECX/EDX: Fixed counters/slots |
| 1 | Counter support | EAX: GP bitmap; EBX: Fixed bitmap |
| 2 | Auto Counter Reload (ACR) | EAX/EBX: GP/fixed ACR bitmaps; ECX/EDX: Reload-trigger bitmaps |
| 3 | Event enumeration | EAX: Supported events bitmap (e.g., cycles, retired instructions) |
| 4 | PEBS capabilities | EBX: Groups (e.g., bit 3: ALLOW_IN_RECORD; bits 7:4: subgroups) |
| 5 | APEBS/PEBS details (Granite Rapids+) | EAX/ECX: APEBS GP/fixed bitmaps; EBX/EDX: PDIST support |
This structure ensures comprehensive, processor-agnostic discovery while prioritizing scalability for tools analyzing multi-core workloads.[27]
Power, Thermal, and Specialized Features
EAX=5: MONITOR and MWAIT Capabilities
CPUID leaf 5, accessed by loading EAX with the value 5 prior to executing the CPUID instruction, provides parameters related to the MONITOR and MWAIT instructions, which enable efficient thread synchronization and power management by allowing a processor core to wait for specific memory events while entering low-power states.[5]
The MONITOR instruction arms the processor to detect writes to a specified address range, defined in terms of monitor-line sizes, while MWAIT causes the core to enter an implementation-dependent optimized state—typically a light sleep corresponding to C-states—until a monitored event occurs or another break condition is met.[41][42] These instructions are particularly useful for hinting the processor about idle periods, allowing it to reduce power consumption more granularly than the HLT instruction. The monitor-line sizes returned indicate the granularity for the address range, which must align with cache line boundaries to ensure correct operation and avoid false wakeups or missed events.[5]
Upon execution, the registers return the following information:
| Register | Bits | Description |
|---|
| EAX | 15:0 | Smallest monitor-line size in bytes (typically 64, matching the processor's cache line granularity). |
| 31:16 | Reserved (must be 0). |
| EBX | 15:0 | Largest monitor-line size in bytes (a power-of-2 multiple of the value in EAX). |
| 31:16 | Reserved (must be 0). |
| ECX | 0 | MONITOR/MWAIT extensions supported (1 indicates support for advanced features). |
| 1 | Interrupts as break events for MWAIT supported even when interrupts are disabled (IF=0 in EFLAGS). |
| 31:2 | Reserved. |
| EDX | 3:0 | Number of C0 sub C-states supported using MWAIT. |
| 7:4 | Number of C1 sub C-states supported using MWAIT. |
| 11:8 | Number of C2 sub C-states supported using MWAIT. |
| 15:12 | Number of C3 sub C-states supported using MWAIT. |
| 19:16 | Number of C4 sub C-states supported using MWAIT. |
| 23:20 | Number of C5 sub C-states supported using MWAIT. |
| 27:24 | Number of C6 sub C-states supported using MWAIT. |
| 31:28 | Number of C7 sub C-states supported using MWAIT. |
The extensions in ECX bit 0 enable additional MWAIT capabilities, such as sub-C-state power management, while bit 1 allows interrupts to reliably wake the processor from MWAIT even in non-maskable configurations, enhancing responsiveness in interrupt-driven environments.[5] The EDX register details the supported sub-C-states (implementation-specific hints for deeper idle levels), with values indicating how many variants of each C-state (from C0, the active state, to C7, a deeper sleep) can be requested via MWAIT encodings.[42] No sub-leaves are defined for this function, as ECX must be 0 on input.[5]
Support for leaf 5 is indicated by the MONITOR feature flag in CPUID leaf 1 (ECX bit 3 = 1) and has been available since Intel processors implementing Streaming SIMD Extensions 3 (SSE3), starting with the Prescott-core Pentium 4 family in 2004.[5] In practice, operating systems use these parameters in idle loops to select appropriate monitor-line sizes and C-state hints, ensuring the waiting core aligns with cache coherency domains for optimal power savings without excessive latency.[41] For example, Linux kernels query these values to configure per-core idle polling, balancing energy efficiency with quick resumption on events like timer interrupts. These leaves are primarily defined for Intel processors; AMD uses extended leaves for similar monitoring capabilities.
The CPUID instruction with EAX set to 6 enumerates key thermal and power management features available on Intel processors, enabling software to detect capabilities for temperature monitoring, performance scaling, and power efficiency optimizations. This leaf returns values in the EAX, EBX, ECX, and EDX registers that describe support for on-die sensors, timer behaviors, and hardware feedback mechanisms essential for managing thermal throttling and power limits. Introduced with the Core microarchitecture in 2006, it has no sub-leaves and is executed in real mode or protected mode without affecting processor state.[43]
The EAX register primarily flags core thermal and power features. Bit 0 (DTS) indicates support for the Digital Thermal Sensor, an on-die mechanism that provides temperature readings relative to a throttling threshold via the IA32_THERM_STATUS MSR, allowing software to detect and respond to thermal conditions before throttling occurs. Bit 1 signals availability of dynamic performance states, such as Intel Turbo Boost Technology, for opportunistic frequency increases within power budgets. Bit 7 (HW P-state) denotes support for Hardware P-states (HWP), where the processor autonomously adjusts performance levels based on workload hints from the OS, reducing software overhead in power management. Bit 8 indicates HWP Notification support (IA32_HWP_INTERRUPT MSR), facilitating interrupt-based feedback on performance state changes. Other bits in EAX, such as bit 2 (ARAT), confirm the Always Running APIC Timer, ensuring the local APIC timer operates at a bus-independent constant rate (typically TSC frequency) even during frequency scaling or power states, which aids precise timing in virtualized or multi-threaded environments.[5][43]
| EAX Bit | Feature | Description |
|---|
| 0 | DTS | Digital Thermal Sensor support for on-die temperature monitoring. |
| 1 | Performance States | Support for dynamic frequency scaling like Turbo Boost. |
| 2 | ARAT | Always Running APIC Timer for constant-rate operation. |
| 7 | HW P-state | Hardware P-states (HWP) for autonomous performance control. |
| 8 | HWP Notification | Support for IA32_HWP_INTERRUPT MSR for HWP feedback. |
The EBX register reports the number of interrupt thresholds supported by the DTS in bits 0-3, allowing configuration of multiple temperature alert levels for proactive thermal management; higher bits are reserved. ECX details processor-wide cycle counters and feedback capabilities, with bit 0 indicating hardware coordination for energy-efficient scheduling and bits 8-15 specifying the number of performance classes for thread director features in hybrid architectures. EDX focuses on power domain performance metrics, including bits 0-7 for hardware feedback interface (HFI) capabilities that provide utilization data across disaggregated domains (e.g., compute, graphics), bits 8-11 for the size of the feedback structure (in 4 KB pages plus one), and bits 16-31 for the logical processor's row index in that structure, enabling fine-grained power allocation in multi-domain SoCs.[5][43]
These features are particularly useful for detecting thermal throttling risks, as DTS readings can trigger OS interventions like workload migration or fan speed adjustments before hardware-enforced limits activate. The ARAT feature, for instance, ensures reliable interrupt delivery without TSC synchronization issues during power transitions, complementing wait mechanisms like MWAIT from CPUID leaf 5. In modern implementations, such as successors to Meteor Lake (e.g., Lunar Lake and Arrow Lake architectures released around 2024), leaf 6 enhancements extend HFI reporting to new power domains, including low-power islands and integrated accelerators, improving efficiency in heterogeneous computing scenarios with up to 20% better power granularity over prior generations.[44] These leaves are primarily defined for Intel processors; AMD uses extended leaves (e.g., EAX=80000007h for advanced power management and invariant TSC) for similar features.[2]
EAX=7: Structured Extended Feature Enumeration
The CPUID instruction with EAX set to 7 enables structured extended feature enumeration, a mechanism introduced in Intel processors starting with the Haswell microarchitecture to systematically report support for advanced instruction sets, security features, and hardware capabilities that extend beyond the unstructured flags in leaf 1. Unlike earlier leaves, this leaf organizes information across multiple sub-leaves indexed by the ECX input register, where ECX=0 provides the primary set of features, and higher values (up to the maximum indicated in EAX bits 7:0 for ECX=0) offer additional details for evolving architectures. Software typically queries sub-leaf 0 first to determine the highest valid sub-leaf, then probes subsequent ones to enable conditional use of instructions like AVX2 or AVX-512 variants, often in conjunction with OS support for state management via XGETBV. This structured approach facilitates forward compatibility, allowing detection of features like vector extensions and power management without relying on legacy bitmasks.[45][21]
For sub-leaf 0 (ECX=0), the EBX register enumerates foundational extended features, focusing on base address management, vector processing, and security primitives. Key bits include bit 0 (FSGSBASE) for instructions that read/write FS and GS segment bases without full context switches, bit 5 (AVX2) for 256-bit integer and floating-point operations, and bit 16 (AVX512F) signaling the foundation for 512-bit vector computations when combined with appropriate XCR0 state. The ECX register for this sub-leaf covers advanced vector manipulations and protection mechanisms, such as bit 1 (AVX512VBMI) for byte-level permutation instructions and bit 3 (PKU) for user-mode memory protection keys, which require OS enablement via bit 4 (OSPKE). The EDX register reports architectural capabilities, including bit 0 (LA57) for 57-bit linear address support in 64-bit mode and bit 2 (ARCH_CAPABILITIES) indicating the presence of an MSR for vulnerability mitigations. Additional vector features include EDX bit 4 (AVX512VP2INTERSECT) for vector intersection operations and bit 22 (AVX512BF16) for bfloat16 support in AI workloads.[45][21]
| Register | Bit | Feature | Description |
|---|
| EBX | 0 | FSGSBASE | Supports RDFSBASE, RDGSBASE, WRFSBASE, WRGSBASE instructions for efficient segment base access.[45] |
| EBX | 5 | AVX2 | Supports Advanced Vector Extensions 2 for enhanced 256-bit SIMD operations.[45] |
| EBX | 7 | SMEP | Supports Supervisor Mode Execution Prevention to restrict kernel execution of user pages.[45] |
| EBX | 16 | AVX512F | Supports AVX-512 Foundation for 512-bit vector processing (requires XCR0 enablement).[45] |
| EBX | 19 | ADX | Supports multi-precision arithmetic with ADCX and ADOX instructions.[45] |
| EBX | 29 | AVX512BW | Supports AVX-512 Byte and Word operations for finer-grained vector handling.[45] |
| ECX | 0 | PREFETCHWT1 | Supports PREFETCHWT1 for write-combining cache hints.[45] |
| ECX | 1 | AVX512VBMI | Supports vector byte manipulation for permutation and comparison.[45] |
| ECX | 3 | PKU | Supports Protection Keys for User-mode pages to control access permissions.[45] |
| ECX | 5 | WAITPKG | Supports TPAUSE, UMONITOR, UMWAIT for lightweight waiting mechanisms.[45] |
| ECX | 8 | GFNI | Supports Galois Field instructions for efficient CRC and encryption.[45] |
| ECX | 9 | VAES | Supports vectorized AES instructions for accelerated cryptography.[45] |
| ECX | 24 | BUS_LOCK_DETECT | Detects and enumerates bus lock events for optimization.[45] |
| ECX | 27 | MOVDIRI | Supports direct store of immediate values to memory, bypassing caches.[45] |
| ECX | 28 | MOVDIR64B | Supports 64-byte direct stores for bulk data transfer.[45] |
| EDX | 0 | LA57 | Supports 57-bit linear addresses for expanded virtual memory.[45] |
| EDX | 2 | ARCH_CAPABILITIES | Indicates support for IA32_ARCH_CAPABILITIES MSR detailing side-channel mitigations.[45] |
| EDX | 4 | AVX512VP2INTERSECT | Supports vector intersect instructions for set operations in databases.[45] |
| EDX | 10 | MD_CLEAR | Supports VERW instruction for microarchitectural data clearing.[45] |
| EDX | 22 | AVX512BF16 | Supports bfloat16 instructions for low-precision AI computations.[45] |
Sub-leaf 1 (ECX=1), available on processors supporting at least this index (as reported in EAX for sub-leaf 0), primarily uses the EBX register to detail specialized features building on prior capabilities. As of 2025 architectures, bit 3 (RAO-INT) enumerates support for remote atomic operations on integers, enabling instructions like AADD, AAND, AOR, and AXOR for distributed computing scenarios without full cache coherence overhead. Other bits cover features like Last Branch Records and core capabilities. Higher bits in EBX may cover emerging features like linear address space separation (LASS), but ECX and EDX are typically reserved or zero for this sub-leaf.[45][44]
Sub-leaf 2 (ECX=2), supported on select recent processors, extends enumeration to performance and I/O optimizations, with features reported mainly in EBX or EDX depending on implementation. This sub-leaf aids in enabling instructions for high-throughput data movement and performance monitoring, such as architectural performance counters, while ensuring compatibility with prior sub-leaves. The maximum sub-leaf value in EAX allows software to probe up to the processor's capability, typically 2 or higher in 2024-2025 models, promoting efficient feature discovery without unnecessary queries. These leaves are primarily defined for Intel processors; AMD has vendor-specific extensions in other ranges.[45][44]
| Register | Bit | Feature | Description |
|---|
| EBX | 0-31 | Performance Monitoring | Bitmap for architectural performance counters and extensions.[45] |
EAX=D: XSAVE and Extended States
CPUID leaf D (EAX=0DH) enumerates the processor's support for the XSAVE feature set, which enables software to save and restore extended processor states beyond the traditional x87 FPU and SSE registers, such as those introduced by AVX and later extensions.[37] This leaf is accessed by setting EAX to 0DH and using ECX to specify a sub-leaf index, with outputs returned in EAX, EBX, ECX, and EDX providing details on supported state components, their sizes, offsets, and operational capabilities.[37] The XSAVE feature set is initially indicated by bit 26 (XSAVE) in EDX from CPUID leaf 1 and further enabled via bit 27 (OSXSAVE) in ECX from the same leaf, allowing use of XGETBV/XSETBV to manage the XCR0 register.[37]
For sub-leaf 0 (ECX=0), the outputs provide foundational information on the XSAVE area: EAX returns the maximum supported sub-leaf index or the maximum size of the XSAVE area in bytes; EBX indicates the size in bytes of the XSAVE area required for features enabled in XCR0; ECX serves as a bitmap of valid XSAVE feature bits corresponding to XCR0 (with bit 0 for x87 FPU/MMX state, bit 1 for SSE state, bit 2 for AVX YMM state, and higher bits for extensions like AVX-512 components); and EDX returns the upper 32 bits of supported XCR0 features.[37] Sub-leaf 1 (ECX=1) focuses on operational components: EAX enumerates XSAVE sub-features such as bit 0 for XSAVEOPT support and bit 3 for XSAVES support; EBX gives the size of the XSAVE area when combining XCR0 and IA32_XSS enabled features; ECX provides supported IA32_XSS bits (for supervisor states) along with the size of the largest state component and bits indicating compacted format support (bit 0) and 64-byte alignment in compaction (bit 1); EDX returns the upper 32 bits of supported IA32_XSS.[37]
Higher sub-leaves (ECX > 1) detail individual state components, where the sub-leaf index corresponds to the bit position in XCR0 or IA32_XSS: EAX returns the size in bytes of the save area for that state component; EBX provides the offset in bytes from the start of the XSAVE area or the number of bytes required to save/restore it; ECX indicates whether the component is in XCR0 (bit 0=0) or IA32_XSS (bit 0=1), with bit 1 for compacted format using the next 64-byte boundary and bits 31:2 reserved; EDX is reserved or provides additional state bitmap information if valid, otherwise 0 for invalid sub-leaves.[37] Representative state components include bit 0 for x87 FPU/MMX (512 bytes), bit 1 for SSE XMM registers (160 bytes plus MXCSR), bit 2 for AVX YMM upper bits (256 bytes), bits 5–7 for AVX-512 elements like opmasks (64 bytes), ZMM_Hi256 (2048 bytes), and HI16_ZMM (8192 bytes), bit 9 for PKRU protection keys (8 bytes), bit 17 for AMX TILECFG (64 bytes), and bit 18 for AMX TILEDATA (up to 8 KB depending on configuration).[37] These components are enumerated dynamically via CPUID to reflect processor capabilities, with support varying by microarchitecture from Nehalem-era AVX introduction through 2025 processors including AMX.[37]
The primary usage of leaf D information is in operating system context switching, where software employs XSAVE to save selected extended states to memory based on the XCR0 or IA32_XSS bitmaps, and XRSTOR or XSAVES to restore them, ensuring efficient management of vector and other extension states without saving unused components.[37] This avoids the overhead of legacy FXSAVE/FXRSTOR instructions, which always save the full legacy region regardless of enabled features.[37] For supervisor-mode states (via IA32_XSS), XSAVES provides optimized saving with lazy save/restore semantics.[37]
Compaction support, enumerated in sub-leaf 1 (ECX bits 0–1) and per-component sub-leaves (ECX bit 1), enables the XSAVEC and XSAVES instructions to use a compacted XSAVE area format, where only enabled state components are saved contiguously starting after the 512-byte legacy region, potentially reducing the area size from thousands of bytes (e.g., full AVX-512) to hundreds for minimal configurations.[37] The compacted format includes an XCOMP_BV header bitmap tracking saved components and aligns each to 64-byte boundaries for cache efficiency, with XCR0 bit 63 indicating overall compaction enablement.[37] This feature, available since Haswell processors, significantly improves performance in virtualization and multitasking scenarios by minimizing memory bandwidth.[37]
| Sub-leaf (ECX) | EAX Output | EBX Output | ECX Output | EDX Output |
|---|
| 0 (Basic) | Max sub-leaf or max XSAVE size (bytes) | XSAVE area size for XCR0-enabled features | Bitmap of XCR0-valid XSAVE features (bits 0–63) | Upper 32 bits of XCR0 support |
| 1 (Operational) | XSAVE sub-features (e.g., bit 0: XSAVEOPT; bit 3: XSAVES) | XSAVE area size for XCR0 | IA32_XSS | IA32_XSS bits; largest component size; compaction bits (0: supported; 1: 64-byte align) | Upper 32 bits of IA32_XSS |
| n > 1 (Component n) | Size of component n save area (bytes) | Offset or bytes to save/restore component n | Bit 0: in XCR0 (0) or IA32_XSS (1); bit 1: compaction alignment; bits 31:2 reserved | Reserved (0 if invalid) or state bitmap |
[37]
Security and Advanced Capabilities
EAX=12: Software Guard Extensions (SGX)
The CPUID leaf EAX=12h provides enumeration of Intel Software Guard Extensions (SGX) capabilities, enabling software to detect support for secure enclave execution environments that protect sensitive code and data from higher-privilege software, including the operating system. This leaf is accessed by loading EAX with 12h prior to executing the CPUID instruction, with the ECX register specifying the sub-leaf index. Support for this leaf is indicated by bit 2 (SGX) of EBX returned from CPUID with EAX=07h and ECX=0h being set to 1.[46] SGX facilitates confidential computing by allowing applications to create isolated enclaves within processor-reserved memory, ensuring integrity and confidentiality through hardware-enforced mechanisms.[47]
For the primary sub-leaf (ECX=0), the registers return details on core SGX functionality and limitations. EAX bits indicate supported instruction sets: bit 0 set to 1 denotes SGX1 support, encompassing basic enclave lifecycle instructions like ECREATE, EADD, EINIT, EREMOVE, and EENTER via the ENCLU instruction; bit 1 set to 1 denotes SGX2 support, adding dynamic enclave updates with instructions such as EEXTEND and EMODPE via ENCLU, along with enhanced reporting via EVERIFYREPORT and EGETKEY. Additional bits in EAX (e.g., bit 5 for ENCLV support, bit 6 for ENCLS support, bit 7 for ENCLU support) specify availability of enclave-related leaf functions for virtualization and system management. EBX provides a bit vector (MISCSELECT) enumerating supported extended SGX features, such as PROVISIONKEY or other enclave-specific attributes. ECX is reserved in this sub-leaf. EDX specifies maximum enclave sizes: bits 7:0 encode the exponent for the largest enclave in non-64-bit mode (maximum size = 2^value bytes), while bits 15:8 do the same for 64-bit mode.[46][48]
Sub-leaf ECX=1 enumerates valid attributes for the SECS (Secure Enclave Control Structure) that can be set during enclave creation with ECREATE. The registers report bitmasks of supported attributes: EAX covers bits 31:0 (e.g., MODE64 for 64-bit enclaves, XFRM for encryption masks), EBX covers bits 63:32 (e.g., flags for debugging or provisioning), ECX covers bits 95:64, and EDX covers bits 127:96. These attributes define enclave behavior, such as whether launch control is enforced via the PROVISIONKEY bit, which requires attestation for enclave initialization.[46]
Higher sub-leaves (ECX ≥ 2) enumerate sections of the Enclave Page Cache (EPC), a protected memory region reserved for enclave pages. Each valid sub-leaf (type 0001b in EAX bits 3:0) describes one EPC section: EAX bits 31:12 and EBX bits 19:0 form the 52-bit physical base address; ECX bits 31:12 and EDX bits 19:0 form the 52-bit size; ECX bits 3:0 indicate properties like confidentiality and integrity protection (0001b). Software enumerates these sequentially until an invalid type (0000b) is returned, allowing indirect determination of total EPC capacity by summing section sizes—typically up to 128 MB or more depending on the processor configuration. The number of sections varies by implementation but is usually small (e.g., 1-8), enabling detection of available secure memory for enclave allocation.[46][48]
This leaf was introduced with the Skylake microarchitecture in 2015, appearing in subsequent Intel Core and Xeon processors where SGX is enabled in hardware and BIOS. However, SGX has been deprecated on client platforms (Intel Core processors) starting from the 11th generation (Tiger Lake, 2020) and 12th generation (Alder Lake, 2021), with support continuing on server and Xeon processors as of 2025.[49]
| Register | Bits | Description (ECX=0) |
|---|
| EAX | 0 | SGX1 supported (ENCLU leaf functions) |
| EAX | 1 | SGX2 supported (ENCLU leaf functions) |
| EAX | 5 | ENCLV leaf functions supported |
| EAX | 6 | ENCLS leaf functions supported |
| EAX | 7 | ENCLU leaf functions supported (e.g., EVERIFYREPORT2) |
| EBX | 31:0 | MISCSELECT: Supported extended SGX features |
| EDX | 7:0 | MaxEnclaveSize_Not64 (exponent for non-64-bit mode) |
| EDX | 15:8 | MaxEnclaveSize_64 (exponent for 64-bit mode) |
| Register | Bits Coverage (ECX=1) | Description |
|---|
| EAX | SECS.ATTRIBUTES[31:0] | Valid bits for enclave mode, encryption, etc. |
| EBX | SECS.ATTRIBUTES[63:32] | Valid bits for extended flags (e.g., launch control) |
| ECX | SECS.ATTRIBUTES[95:64] | Valid bits for additional attributes |
| EDX | SECS.ATTRIBUTES[127:96] | Valid bits for high-order attributes |
EAX=14: Intel Processor Trace
CPUID leaf 14H provides enumeration information for Intel Processor Trace (Intel PT), a hardware-based tracing mechanism that captures low-overhead execution flow data, such as branches and timing packets, to aid in software debugging and performance analysis.[21] To access this leaf, software sets EAX to 14H and ECX to 0 for the primary sub-leaf, which returns capabilities in the general-purpose registers.[21]
In sub-leaf 0 (ECX=0), EAX indicates the maximum number of supported sub-leaves, typically 1 on processors with basic Intel PT support. EBX contains bit fields detailing key features: bit 0 indicates support for CR3 filtering (via IA32_RTIT_CTL.CR3Filter and IA32_RTIT_CR3_MATCH MSR); bit 1 for configurable Packet Stream Boundary (PSB) and cycle-accurate mode; bit 2 for IP filtering, TraceStop filtering, and MSR preservation across warm resets; bit 3 for MTC (mini time counter) timing packets and suppression of COFI-based (conditional branch) packets; bit 4 for PTWRITE instruction support; and bit 5 for Power Event Trace packets.[21] ECX specifies output schemes and extensions: bit 0 for ToPA (Table of Physical Addresses) output, enabling buffer management through a table of output endpoints for efficient trace collection; bit 1 for flexible ToPA table sizes; bit 2 for single-range output; bit 3 for Trace Transport Subsystem (TTS) support; and bit 31 for including the code segment (CS) base in Last Inferred Packet (LIP) values. EDX is reserved and returns 0.[21]
Sub-leaf 1 (ECX=1), if supported, provides further configuration details: EAX bits 2:0 report the number of configurable address ranges for filtering (0 to 7); bits 15:3 are reserved; and bits 31:16 form a bitmap of supported MTC period encodings. EBX bits 15:0 indicate supported cycle threshold encodings for precise timing, while bits 31:16 cover PSB frequency encodings for packet boundaries. ECX and EDX are reserved in this sub-leaf. These bits collectively define packet types (e.g., branch targets, timing) and buffer management options like ToPA for handling trace overflows without halting execution.[21]
Intel PT, enumerated via this leaf, is primarily used for non-intrusive tracing in debugging and performance profiling, capturing execution flows to reconstruct control paths and identify bottlenecks. It integrates with tools like Linux perf, which decodes PT packets for analysis, supporting features such as cycle-accurate tracing through configurable PSB and MTC periods.[50][21]
Introduced in Broadwell processors in 2014, Intel PT availability is first detected via CPUID leaf 7H, sub-leaf 0, EBX bit 25.[21] Extensions for cycle-accurate tracing, including enhanced PSB and MTC support, appeared in subsequent generations like Skylake. In hybrid architectures starting with Alder Lake (12th Gen Core, 2021), Intel PT operates on both Performance-cores (P-cores) and Efficient-cores (E-cores), with minor enhancements such as additional features like Event Trace and Taken/Not-Taken (TNT) packet disabling on E-core-only variants like Alder Lake-N, ensuring consistent tracing across core types without major architectural changes as of 2025.[51][50]
EAX=17: System-on-Chip Vendor Attributes
The CPUID instruction with an input value of EAX=17h enumerates System-on-Chip (SoC) vendor attributes, enabling software to identify key details about the SoC design and its components in modern processors.[21] This leaf is particularly relevant for disaggregated SoCs where multiple vendors contribute subsystems, such as CPUs, integrated GPUs, and neural processing units (NPUs).[27]
When ECX=0 (the main sub-leaf), executing CPUID returns the maximum supported vendor index in EAX (MaxSOCID_Index, indicating the number of enumerable vendors), the SoC vendor ID in EBX bits 15:0 along with bit 16 (IsVendorScheme, set to 1 if using an industry-standard scheme or 0 if Intel-assigned), the project ID (a unique vendor-assigned number for the SoC project) in ECX, and the stepping ID (unique within the project) in EDX.[21] Software can then iterate over sub-leaves ECX=1 through MaxSOCID_Index to retrieve vendor-specific brand strings, with each sub-leaf (up to 3 per vendor) providing UTF-8 encoded string parts across EAX, EBX, ECX, and EDX registers, padded with null bytes (00h) as needed; sub-leaves beyond the maximum return zeros.[21] These ASCII-compatible strings allow detection of heterogeneous computing configurations, for example in multi-vendor SoCs combining Intel x86 cores with ARM-based peripherals.[5]
This functionality became available starting with Intel's Meteor Lake microarchitecture in 2023, targeting SoCs with integrated accelerators like GPUs and NPUs for AI workloads.[52] Beyond the provided vendor strings and IDs, further details are vendor-specific and not standardized, limiting portability to interpretation of the enumerated data.[21] In multi-die topologies, this leaf complements broader processor enumeration but focuses solely on SoC-level vendor identification.[21]
EAX=19: Intel Key Locker Features
The CPUID leaf EAX=19H, with ECX=0, provides enumeration information for Intel Key Locker, a hardware feature that enables secure management of AES encryption keys through an internal wrapping key mechanism.[53] This leaf returns details in the EAX, EBX, and ECX registers about supported restrictions, instruction enablement, and parameters, while EDX is reserved.[5] Key Locker allows software to encode application keys using AES instructions augmented with hardware protection, preventing key exposure in memory or during operations, and serves as a lightweight alternative to full trusted execution environments for key handling.[53]
The features enumerated by this leaf include support for AES Key Locker instructions (AESENCKEY128/256, AESDECKEY128/256, etc.), which perform key wrapping and unwrapping operations using an internal wrapping key (IWKey) derived from a random or platform-specific source.[53] Key wrapping employs AES-GCM-SIV for authenticity and integrity, ensuring keys remain protected even if metadata is compromised.[53] This integration supports usage in scenarios like secure storage for disk encryption or data-at-rest protection, where keys are loaded via LOADIWKEY and operated on without direct software access to the raw IWKey.[53]
| Register | Bit Position | Feature Description |
|---|
| EAX | 0 | 1 = Key Locker restricted to CPL=0 (kernel mode) only.[5] |
| EAX | 1 | 1 = No-encrypt restriction supported on handles (prevents encoding new keys).[5] |
| EAX | 2 | 1 = No-decrypt restriction supported on handles (prevents decoding existing keys).[5] |
| EAX | 31:3 | Reserved.[5] |
| EBX | 0 | 1 = AES Key Locker instructions (AESKLE) fully enabled by OS and firmware.[53] |
| EBX | 2 | 1 = Wide AES Key Locker instructions (e.g., for 512-bit operations) supported (WIDE_KL).[5] |
| EBX | 4 | 1 = IWKey backup via MSRs (e.g., IA32_KEYLOCKER_COPY_LOCAL_TO_PLATFORM) supported.[5] |
| EBX | 31:5 (except noted) | Reserved.[5] |
| ECX | 0 | 1 = NoBackup parameter supported in LOADIWKEY (prevents IWKey persistence).[5] |
| ECX | 1 | 1 = Random IWKey generation (KeySource=1) supported.[5] |
| ECX | 31:2 | Reserved.[5] |
| EDX | 31:0 | Reserved.[5] |
Key Locker integrates with Intel Software Guard Extensions (SGX) by allowing key operations within enclaves, enhancing trusted execution without the full overhead of enclave management for simple key tasks.[53] It was introduced in 11th Generation Intel Core processors (code-named Tiger Lake) in 2020 and remains available in subsequent client architectures like Alder Lake and Raptor Lake.[54] No sub-leaves beyond ECX=0 are defined for this leaf.[5]
Emerging and Vector Extensions
CPUID leaf 1Dh provides detailed information about the Advanced Matrix Extensions (AMX) tile architecture, enabling software to query the configuration of tile registers used for matrix operations in AI and machine learning workloads. When executed with EAX set to 1Dh and ECX set to 0, this leaf returns basic architecture parameters in the general-purpose registers. Specifically, EAX bits 31:0 specify the maximum number of palettes supported, which is 1 in implementations supporting AMX tiles. EBX, ECX, and EDX are reserved and return 0 in this subleaf.[5]
Subleaf 1, accessed with ECX set to 1, enumerates the configuration details for Palette 1, the primary palette for AMX tile operations. The outputs are as follows:
| Register | Bits | Description | Value (Palette 1) |
|---|
| EAX | 15:0 | Total tile bytes | 8192 |
| EAX | 31:16 | Bytes per tile | 1024 |
| EBX | 15:0 | Bytes per row | 64 |
| EBX | 31:16 | Maximum number of tile names (registers) | 8 |
| ECX | 15:0 | Maximum rows per tile | 16 |
| ECX | 31:16 | Reserved | 0 |
| EDX | 31:0 | Reserved | 0 |
This configuration defines tiles as 16 rows by 64 bytes each, resulting in 1024 bytes (1 KiB) per tile across 8 tile registers, for a total tile memory of 8192 bytes (8 KiB).[5][55]
This leaf was introduced in the Sapphire Rapids microarchitecture, part of the 4th Generation Intel Xeon Scalable processors launched in January 2023, where it supports tile configurations optimized for FP16 (via BF16) and INT8 data types in matrix multiply operations.[56][55] Palette 1 accommodates these formats, allowing software to detect the available tile memory size for setting up efficient matrix multiplications in AI/ML applications. The total tile bytes is calculated as the sum across all supported palettes; with a single palette, it equals 8192 bytes directly from the subleaf output. TMUL details are available if AMX-TILE is supported (CPUID (EAX=07H, ECX=0H):EDX[57] = 1).[5][55]
EAX=1E: TMUL Engine Details
The CPUID leaf accessed by setting EAX to 1EH provides information on the Tile Matrix Unit (TMUL), the accelerator within Intel's Advanced Matrix Extensions (AMX) for matrix multiplication operations such as C[M][N] += A[M][K] * B[K][N], supporting dense formats in AI and HPC workloads. This leaf complements the AMX tile architecture enumerated in leaf 1DH by providing engine parameters. TMUL support requires AMX-TILE (CPUID (EAX=07H, ECX=0H):EDX[57] = 1) and was introduced with AMX in Sapphire Rapids (4th Gen Xeon Scalable, January 2023), with enhancements in Xeon 6 processors (Granite Rapids P-cores, CPUID 06_ADH; Sierra Forest E-cores, 06_AFH) launched in 2024.[5][55]
When ECX is set to 0, EBX bits 7:0 report tmul_maxk (maximum rows/columns in K dimension, 16), and bits 23:8 report tmul_maxn (maximum column bytes in N dimension, 64); EAX, ECX, and EDX are reserved (return 0). Sub-leaves ECX >=1 are reserved.[5]
EAX=21: Trust Domain Extensions (TDX)
The CPUID leaf EAX=21 provides enumeration of Intel Trust Domain Extensions (TDX), a hardware-based technology that enables confidential virtual machines (VMs) by isolating guest memory and execution from the host and physical threats through encryption and attestation mechanisms.[58] When invoked with ECX=0, this leaf returns values indicating TDX support and basic module identification, allowing software such as the host Virtual Machine Monitor (VMM) and guest Trust Domains (TDs) to detect the TDX environment.[58] EAX returns 0x00000000 (maximum sub-leaf 0, indicating TDX module presence), while EBX=0x65746E49, ECX=0x20202020, and EDX=0x5844546C collectively form the ASCII identifier "Intel TDX". For sub-leaves greater than 0 (ECX>0), all registers return 0x00000000, reserving space for future extensions related to module attestation and key management.[58]
Guest TDs use this leaf, emulated by the TDX module, to confirm operation in a protected environment and invoke TDCALL instructions for attestation reports and key provisioning, while the host VMM samples it during TDX initialization (via TDH.SYS.INIT and TDH.SYS.LP.INIT) to ensure platform compatibility across logical processors.[58] These features enable enhanced security for VM-scale confidential computing, contrasting with application-scale isolation in Software Guard Extensions (SGX) enumerated via EAX=12.
| Register | Value (Sub-leaf 0) | Description |
|---|
| EAX | 0x00000000 | Maximum sub-leaf (0); indicates TDX module presence |
| EBX | 0x65746E49 | Part of identifier "Intel TDX" |
| ECX | 0x20202020 | Part of identifier "Intel TDX" (spaces) |
| EDX | 0x5844546C | Part of identifier "Intel TDX" |
TDX support via this leaf first became available on 3rd Generation Intel Xeon Scalable processors (Ice Lake-SP) in 2021, with broader deployment on 4th Generation (Sapphire Rapids) and later families.[59] As of 2025, enhancements in TDX module version 1.5 introduce partial support for guest ownership models, allowing TDs greater control over attestation and key derivation processes, though full implementation remains platform-specific.[58] Higher sub-leaves remain reserved for evolving attestation protocols and key management extensions.[58]
EAX=24: AVX10 Vector ISA Features
The CPUID leaf EAX=24 enumerates features of the AVX10 Vector Instruction Set Architecture (ISA), a converged extension building on prior AVX instructions to provide standardized vector processing capabilities across x86 processors. This leaf allows software to query support for vector widths up to 512 bits and specific ISA elements, enabling portable SIMD code that extends beyond AVX-512 foundations without fragmentation. It was introduced as part of the AVX10 specification to facilitate runtime detection of capabilities like integer fused multiply-add (IFMA) and scatter/gather operations, promoting interoperability in high-performance computing workloads.[17]
When EAX is set to 24, the sub-leaf index in ECX determines the queried information: ECX=0 provides base converged ISA details, including the AVX10 version and supported vector lengths, while ECX=1 exposes discrete feature bits for additional extensions. For ECX=0, EBX bits 7:0 indicate the AVX10 version (value ≥1 for supported implementations), with bits 16, 17, and 18 signaling support for 128-bit (VL128), 256-bit (VL256), and 512-bit (VL512) vector operations, respectively; ECX bit 0 indicates AVX10.2 support; EDX bits 31:0 are reserved. For ECX=1, EBX, ECX, and EDX primarily hold reserved or processor-specific discrete bits, with future allocations possible for AVX10 sub-extensions like "AVX10-XXXX" features. These bitfields ensure precise enumeration of vector register usage (XMM for 128-bit, YMM for 256-bit, ZMM for 512-bit).[17]
This structure supports portable SIMD implementations by confirming converged ISA elements without relying on legacy AVX-512 flags alone. AVX10 via EAX=24 was first announced in July 2024 with the AVX10.1 specification, debuting on Intel's Granite Rapids microarchitecture launched in September 2024. By October 2025, Intel and AMD ratified AVX10 as a standardized x86 extension through the x86 Ecosystem Advisory Group, ensuring full compatibility and enumeration consistency across both vendors' processors, including AMD's next-generation implementations.[17][60]
| Register | Bit(s) | Feature | Description |
|---|
| EBX (ECX=0) | 7:0 | AVX10 Version | Converged ISA version (≥1 indicates support) |
| EBX (ECX=0) | 16 | VL128 | 128-bit vector support |
| EBX (ECX=0) | 17 | VL256 | 256-bit vector support |
| EBX (ECX=0) | 18 | VL512 | 512-bit vector support |
| ECX (ECX=0) | 0 | AVX10.2 | Support for AVX10.2 extensions |
| EDX (ECX=0/1) | 31:0 | Reserved | For future extensions |
Extended AMD64 Functions
EAX=80000000h: Maximum Extended Function
When the CPUID instruction is executed with EAX set to 80000000h, it provides the entry point for determining support for extended functions in the AMD64 architecture and compatible processors.[2] The value returned in EAX indicates the highest extended function number supported by the processor, enabling software to safely query subsequent extended leaves up to that maximum without invoking undefined behavior.[2] For processors lacking AMD64 extended function support, EAX returns 0, signaling no extended range availability.
In AMD implementations, the maximum value in EAX varies by processor family; for example, recent models such as those in the Zen 4 and Zen 5 architectures return 8000001Fh, encompassing features up to memory encryption capabilities.[2] Additionally, EBX, ECX, and EDX return the hexadecimal values 68747541h, 444D4163h, and 69746E65h, respectively, which form the ASCII string "AuthenticAMD" to confirm the vendor in the extended function context.[2] These registers are otherwise reserved for general use.
Intel processors, for compatibility with the AMD64 extended range, return a more limited maximum in EAX, typically 80000008h, supporting basic topology and address size queries but not the full AMD-specific extensions.[32] On Intel, EBX, ECX, and EDX are reserved and return 0 for this input.[5]
This function supports no sub-leaves and must be invoked prior to any other 80000000h-range calls to establish the valid query bounds, preventing errors from accessing unsupported functions.[2]
EAX=80000001h: Extended Processor Info and Features
When the CPUID instruction is executed with EAX set to 80000001h, it returns extended processor identification and feature information tailored to the AMD64 architecture. This function provides details on the processor's version, including support for 64-bit operations and AMD-specific instruction extensions, serving as an extension to the basic CPUID leaf 1 for AMD processors. The outputs are stored in EAX for version details, ECX for certain extended features, and EDX for a broader set of capability flags, while EBX contains a brand identifier that is typically reserved or used for specific branding in AMD implementations. This leaf has no sub-leaves and is essential for software to verify AMD64 compatibility before enabling long mode.[61]
The EAX register delivers the extended processor signature, mirroring the structure of the basic CPUID EAX=1 but incorporating fields for higher revision levels common in AMD families starting from 15h (0Fh hexadecimal). Bits 3–0 hold the stepping ID, bits 7–4 the base model ID, and bits 11–8 the base family ID; bits 13–12 indicate the processor type, with bits 19–16 and 27–20 providing extended model and family IDs, respectively, while other bits are reserved. For processors with a base family ID of 0Fh (or certain models in families 06h/07h), the effective model is calculated as (extended model ID shifted left by 4) + base model ID, and the effective family as base family ID + extended family ID; this enables precise identification of advanced AMD processors like those in the Zen architecture series.[61]
The ECX register specifies AMD64-oriented feature flags, focusing on enhancements for 64-bit execution and compatibility. Bit 0 (LAHF_SAHF) indicates support for the LAHF and SAHF instructions within 64-bit mode, allowing efficient flag manipulation across operating modes. Bit 1 (CMP_LEGACY) signals core multi-processor legacy mode, where the hyper-threading technology (HTT) bit from basic CPUID reflects multiple physical cores rather than logical threads. Bit 6 (SSE4a) indicates support for AMD-specific SSE4a instructions, enhancing SIMD operations with ExtrQ and InsertQ. Remaining bits are reserved or implementation-specific, emphasizing conceptual readiness for AMD64 environments over exhaustive legacy checks.[61]
In contrast, the EDX register enumerates a comprehensive set of feature bits, many aligning with x86 standards but highlighting AMD64 essentials like bit 20 (NX), which enables no-execute page protection to enhance security by preventing code execution from data pages. Bit 11 confirms SYSCALL and SYSRET availability, bit 29 (Long Mode) verifies 64-bit addressing and execution support—critical for detecting full AMD64 capability—and bits 22 (MMXEXT) and 23 (MMX) indicate multimedia extensions. For legacy multimedia acceleration, bit 30 (3DNow! Ext.) and bit 31 (3DNow!) report support for AMD's 3DNow! instructions and their extensions, though these are increasingly deprecated in modern 64-bit software. Other bits cover foundational features like FPU (bit 0), PAE (bit 6), and MMX (bit 23), providing a holistic view that complements basic CPUID for robust 64-bit detection and feature enabling. The following table summarizes key EDX bits relevant to AMD64:
| Bit | Feature | Description |
|---|
| 11 | SYSCALL | Support for SYSCALL/SYSRET instructions in 64-bit mode. |
| 20 | NX | No-execute bit for page tables, supporting XD (execute disable). |
| 29 | Long Mode | Enables 64-bit operation, including RIP-relative addressing. |
| 30 | 3DNow! Ext. | Extensions to 3DNow! for enhanced SIMD floating-point. |
| 31 | 3DNow! | Original AMD 3DNow! SIMD instructions for 3D graphics. |
This function's outputs are pivotal for operating systems and applications to probe 64-bit readiness without assuming basic CPUID suffices, ensuring compatibility across AMD processor generations from Opteron onward.[61]
EAX=80000002h to 80000004h: Processor Brand String
The processor brand string is retrieved using the extended CPUID functions with EAX set to 80000002h, 80000003h, and 80000004h in sequence.[2][62] Each invocation returns 16 ASCII bytes distributed across the EAX, EBX, ECX, and EDX registers, with EAX holding bytes 0-3, EBX bytes 4-7, ECX bytes 8-11, and EDX bytes 12-15 of the respective segment.[2][62] The full string is formed by concatenating the outputs from these three functions, yielding up to 48 bytes of a null-terminated ASCII string that identifies the processor's marketing name.[2][62]
To parse the string, software must first execute CPUID with EAX=80000000h to determine the maximum extended function supported; if the returned EAX value is less than 80000004h, the brand string is not available and no further calls should be made.[2][62] Upon support confirmation, the concatenated result provides a human-readable identifier, such as "AMD Ryzen 9 7950X 16-Core Processor" for recent AMD models.[63] This string is programmed by the BIOS during system initialization and serves primarily for user-facing displays in operating systems and applications, enabling straightforward processor identification without reliance on numeric signatures.[2]
The feature has been available on AMD processors since the Athlon 64 family (introduced in 2003), corresponding to family 0Fh and higher.[64] On Intel processors, support is partial and dates to models like the Pentium 4, but the string is often empty or incomplete on many implementations, particularly older or certain embedded variants, requiring fallback to alternative identification methods such as the processor signature from EAX=1.[62]
EAX=80000005h: L1 Cache and TLB Identifiers
When executed with EAX set to 80000005h on AMD processors of family 0Fh or later, the CPUID instruction returns detailed parameters for the per-core L1 instruction and data caches, as well as the L1 instruction and data translation lookaside buffers (TLBs). This extended function, introduced with the K8 architecture in 2003, provides AMD-specific encodings that complement the legacy cache descriptors from EAX=2 and the deterministic enumeration from EAX=4, enabling software to query L1 characteristics without relying on hardcoded assumptions.[24]
The L1 instruction cache details are encoded in EDX as follows: bits 31:24 specify the cache size in kilobytes; bits 23:16 indicate the associativity (encoded as 00h for reserved, 01h for direct-mapped, 02h–FEh for the number of ways, and FFh for fully associative); bits 15:8 denote the number of lines per tag; and bits 7:0 give the line size in bytes. The L1 data cache uses an identical format in ECX. For instance, in the Zen microarchitecture, these registers describe a 32 KB instruction cache that is 8-way associative with 64-byte lines and 1 line per tag, alongside a matching 32 KB data cache configuration.[24][65]
TLB parameters occupy EAX and EBX, focusing on supported page sizes of 4 KB, 2 MB, and 4 MB. EBX encodes 4 KB page support, with bits 31:24 for data TLB associativity, bits 23:16 for data TLB entries, bits 15:8 for instruction TLB associativity, and bits 7:0 for instruction TLB entries (using the same associativity encoding as caches). EAX covers larger pages, with bits 31:24 for 2 MB/4 MB data TLB associativity, bits 23:16 for 2 MB data TLB entries (where 4 MB uses half that number), bits 15:8 for 2 MB/4 MB instruction TLB associativity, and bits 7:0 for 2 MB instruction TLB entries (again, halved for 4 MB). These values allow software to optimize memory access patterns by accounting for TLB coverage and potential misses for different page granularities.[24]
Advanced Extended Functions
EAX=80000006h: L2 Cache and L3 Features
When EAX is set to 80000006h, the CPUID instruction returns detailed descriptors for the L2 and L3 caches on AMD processors, enabling software to query cache parameters for performance tuning. This extended function has been supported since the K8 microarchitecture (AMD Family 0Fh).[24]
The EAX and EBX registers provide L2 TLB information, while ECX encodes L2 cache size and organization details, with bits 31:16 specifying the cache size in kilobytes per core. The total L2 cache size per core is calculated as the value in ECX bits 31:16 multiplied by 1024 bytes. For example, in Zen-based processors like the Ryzen 5000 series, this yields a 512 KB L2 cache per core, which is 8-way associative with 64-byte lines, aiding in efficient data access patterns.[24]
The EDX register reports L3 cache size and organization, with bits 31:18 indicating the size in units of 512 KB for the shared L3 cache across cores, as implemented in the Zen architecture and later. This allows computation of the total L3 capacity by multiplying the bit field value by 512 KB; for instance, Zen 3 cores in a chiplet typically report 32 MB shared L3 per core complex (CCD), supporting higher bandwidth in multi-threaded workloads. Bits 15:12 encode the L3 associativity (e.g., 16-way) and bits 11:8 the lines per tag to describe the cache's organization.[24]
These descriptors are primarily used for optimizing memory bandwidth in operating systems and applications, such as adjusting thread affinity or prefetching strategies to leverage cache hierarchy without explicit inclusivity details, which are not enumerated in this leaf.[24]
The CPUID leaf EAX=80000007h returns information on advanced power management features, thermal monitoring capabilities, and reliability, availability, and serviceability (RAS) mechanisms specific to AMD processors. This leaf enables operating systems and firmware to detect support for invariant time-stamp counter (TSC) behavior, frequency and voltage scaling controls, thermal safeguards, and error recovery options, facilitating optimized power policies and system stability. Supported since AMD Family 10h processors (introduced in 2007), it has no sub-leaves and returns data primarily in the EDX and EBX registers, with ECX providing power estimation ratios and EAX largely reserved.[66]
In the EDX register, bit fields indicate various power and thermal features. Bit 8 (ITSC) signals support for an invariant TSC, which maintains a constant rate independent of processor power states (P-states) or sleep states (C-states), ensuring reliable timing for operating system scheduling and performance monitoring across frequency changes. This invariance is crucial for applications requiring precise timekeeping without recalibration during power transitions. Other notable bits include bit 0 (TS) for on-die temperature sensor presence, enabling hardware-based thermal readings; bit 4 (TM) for thermal monitoring support; bit 5 (STC) for software-controlled thermal throttling; and bit 9 (CPB) for core performance boost, allowing dynamic frequency increases under low thermal loads. Higher bits, such as bit 14 (RAPL), indicate compatibility with running average power limit interfaces for energy tracking, similar to Intel's mechanisms but AMD-specific in implementation. These features aid in implementing OS-level power management, such as ACPI P-state transitions, by providing hardware guarantees on timing consistency and thermal safety.[67][66]
The EBX register focuses on RAS capabilities, reporting support for advanced error handling and recovery. Bit 0 (MCAOVR) denotes machine check architecture (MCA) overflow recovery, allowing the processor to handle excessive error logs without system halt. Bit 1 (SUCCOR) indicates software uncorrectable error containment and recovery, enabling firmware or OS to isolate and recover from non-fatal uncorrectable errors, such as those in memory or interconnects, without full system crashes. Bit 3 (SCMCA) signals scalable MCA extensions, which expand error reporting granularity for multi-core environments, including support for error scrubbing to preemptively correct correctable errors in caches and memory. These RAS bits are essential for server-grade reliability, allowing proactive error mitigation in data centers. Bit 2 (HWA) provides hardware assert mechanisms for debugging RAS events.[67][68]
The ECX register returns the CPU power sample time ratio (bits 31:0), a value representing the ratio of the power accumulator sample period to the TSC period, used for estimating core power consumption via MSRs like APERF and MPERF when referenced against TSC counts. This supports fine-grained power profiling without external hardware, integrating with OS policies for energy-aware scheduling. EAX is reserved and returns 0, though some implementations may echo the input value.[67][66]
In recent architectures like Zen 5 (Family 1Ah, released in 2024), this leaf includes enhanced RAS reporting for the data fabric interconnect, extending scalable MCA to cover fabric-level errors such as link failures or protocol violations, improving overall system resilience in high-density computing scenarios. Usage remains focused on initialization routines where software queries these bits to configure power governors, thermal limits, and error handlers, ensuring compatibility with ACPI and UEFI standards.
| Register | Key Bits | Feature | Description |
|---|
| EDX | 8 | ITSC | Invariant TSC across P- and C-states for consistent timing. |
| EDX | 0 | TS | On-die temperature sensor for thermal monitoring. |
| EDX | 4 | TM | Hardware thermal monitoring support. |
| EDX | 9 | CPB | Core performance boost for dynamic overclocking. |
| EDX | 14 | RAPL | Running average power limit interface for energy management. |
| EBX | 1 | SUCCOR | Software recovery from uncorrectable errors. |
| EBX | 0 | MCAOVR | Recovery from MCA status overflows. |
| EBX | 3 | SCMCA | Scalable machine check architecture for detailed error reporting. |
| ECX | 31:0 | CmpUnitPwrSampleTimeRatio | Ratio for power estimation using TSC and accumulators. |
EAX=80000008h: Physical Address Size and Topology
When executed with EAX set to 80000008h and ECX set to 0, the CPUID instruction returns key details on address sizes and basic processor topology for AMD64 processors. The EAX register provides the maximum physical address size in bits 7:0 (typically 40 for early implementations, up to 48 or more in modern variants), the maximum linear (virtual) address size in bits 15:8 (usually 48 for canonical addressing in 64-bit mode), and the maximum guest physical address size in bits 23:16 for use in Secure Virtual Machine (SVM) environments, where a value of 0 indicates fallback to the host physical address size. Bits 31:24 and the remaining bits in EAX are reserved. EBX and EDX are reserved in this base sub-leaf.[2]
The ECX register delivers topology information, with bits 7:0 holding NC, the number of CPU cores minus one (thus total cores = NC + 1, representing cores per processor package), and bits 15:12 specifying ApicIdCoreIdSize, the number of bits in the initial APIC ID dedicated to identifying the core (where a value of 0 indicates use of legacy methods for core enumeration). Bits 11:8 and 31:16 in ECX are reserved. This data enables software to parse the APIC ID for core-level identification, facilitating accurate topology construction in multi-core systems.[2]
These outputs are critical for 64-bit operating systems to configure memory management units (MMUs) according to supported addressing limits, preventing invalid access beyond hardware capabilities, and for building system topology models that optimize thread scheduling and resource allocation across cores. In multi-socket NUMA configurations, the core count and APIC ID encoding from this leaf combine with other CPUID functions (such as leaf 0Bh) to detect node boundaries and inter-node latencies, improving performance in scalable server environments.[2]
In AMD Zen 2 and later architectures, support for sub-leaves (ECX ≥ 1) extends this function to provide more granular topology details, allowing enumeration of hierarchical structures like cores per compute unit beyond the base package-level information. For example, bits in EAX for sub-leaves may report cores per compute unit in positions such as 19:16, aiding in fine-grained scheduling for chiplet-based designs.
As of 2025, AMD's Zen 5 architecture, as implemented in EPYC 9005 series processors, extends physical addressing to 52 bits while maintaining 57-bit virtual addressing, enabling support for up to 4 PB of physical memory per socket in high-end server configurations— a step beyond the 48-bit limit of prior consumer-oriented Zen implementations. This enhancement, enumerated via EAX[7:0], addresses growing demands for massive-scale data centers and AI workloads without requiring non-standard extensions.[69]
EAX=8000000Ah: Secure Virtual Machine (SVM) Features
The CPUID instruction with EAX set to 8000000Ah queries the Secure Virtual Machine (SVM) features on AMD processors supporting AMD-V virtualization technology. This extended function is available starting with AMD Family 10h processors and provides essential information for hypervisors to detect and enable hardware-assisted virtualization capabilities. Execution requires prior detection of SVM support via CPUID function 80000001h ECX bit 26. No sub-leaves are supported; ECX input must be 0.[24]
Upon invocation, the processor returns data in EAX, EBX, and EDX registers, while ECX is reserved and returns 0. EAX bits 7:0 specify the SVM revision level, an 8-bit value indicating the implementation version of the SVM architecture. EBX bits 31:0 report the number of supported Address Space Identifiers (ASIDs), which are used to tag memory translations for efficient context switching in virtualized environments. EDX enumerates the available SVM features through individual bit flags.[24]
The following table summarizes the key bits in EDX for SVM features:
| Bit | Feature | Description |
|---|
| 0 | NP (Nested Paging) | Indicates support for nested paging (also known as Rapid Virtualization Indexing or RVI), which accelerates two-dimensional page table walks by hardware.[24] |
| 1 | LbrVirt (LBR Virtualization) | Supports virtualization of the Last Branch Record (LBR) MSRs, allowing guest access to branch tracing without hypervisor intervention.[24] |
| 2 | SVML (SVM Lock) | Enables hardware locking of the SVM enable bit in EFER to prevent unauthorized enabling of virtualization mode.[24] |
| 3 | NRIPS (Next RIP Save) | Allows saving the next instruction pointer on event injection, improving efficiency for instruction emulation in guests.[24] |
| 4 | TscRateMsr (TSC Rate MSR) | Provides MSR-based control for scaling the guest Timestamp Counter (TSC) rate independently of the host.[24] |
| 5 | VmcbClean (VMCB Clean) | Supports clean bits in the Virtual Machine Control Block (VMCB) to track unmodified fields and reduce load overhead.[24] |
| 6 | FlushByAsid (Flush by ASID) | Enables TLB flushes targeted by ASID, minimizing global flushes and improving performance in multi-VM scenarios.[24] |
| 7 | DecodeAssists | Offers hardware assists for decoding instructions intercepted in guest mode, such as opcode grouping for faster emulation.[24] |
| 10 | PauseFilter | Implements a PAUSE instruction filter to count and skip excessive PAUSE loops in guests, reducing hypervisor exit frequency.[24] |
| 12 | PauseFilterThreshold | Allows configuration of the threshold for the PAUSE filter via MSR, tuning sensitivity for spinlock-heavy workloads.[24] |
| 13 | AVIC (Advanced Virtual Interrupt Controller) | Supports AVIC for direct virtual interrupt delivery from I/O devices to guest vCPUs, bypassing the hypervisor for common cases.[68] |
These features enhance virtualization performance and security by offloading common operations from software to hardware. For example, hypervisors like KVM in the Linux kernel query this function to enable compatible features, such as nested paging for memory management or AVIC for interrupt handling, ensuring optimal VM isolation and efficiency on supported AMD hardware.[68]
Recent and Vendor-Specific Extensions
EAX=8000001Fh: Memory Encryption Capabilities
The CPUID leaf 8000001Fh, accessed by setting EAX to 8000001Fh and ECX to 0, provides details on AMD processor support for memory encryption technologies, particularly those enabling Secure Encrypted Virtualization (SEV). This extended function returns values in the general-purpose registers that indicate hardware capabilities for encrypting virtual machine (VM) memory to protect against physical attacks and malicious hypervisors. The primary output in EAX enumerates SEV-related features: bit 0 signals basic SEV support for per-VM encryption keys, bit 1 indicates SEV-ES (Encrypted State) for protecting VM register and state data during transitions, and bit 3 denotes SEV-SNP (Secure Nested Paging) for enhanced integrity and attestation via a Reverse Map Table (RMP) that prevents memory remapping and replay attacks.[70]
EBX from this leaf specifies implementation details, such as bits 5:0 giving the page table encryption bit position (typically 51 or 52, reducing the effective physical address space) and bits 11:6 indicating the reduction in physical address bits when encryption is active (usually 6 bits). ECX reports the maximum number of SEV-enabled guests supported (up to 509 or more on modern hardware), while EDX provides the minimum ASID (Address Space Identifier) for non-SEV-ES VMs. These values allow software to configure encryption without exceeding hardware limits, ensuring guest-specific keys managed by the AMD Secure Processor isolate VM memory from the host and other guests.[70]
For sub-leaf ECX=1, the leaf focuses on RMP configuration for SEV-SNP, returning details on the table's structure for reverse mapping guest physical addresses to host pages, which enforces integrity protections like preventing unauthorized page assignments. This sub-leaf is relevant only if EAX bit 3 is set, enabling hypervisors to initialize the RMP securely during VM launch. In usage, these capabilities support VM encryption where private guest pages use unique keys, while shared pages may fall back to host-managed keys; the bits guide key allocation and ASID assignment to avoid collisions.[70]
SEV support via this leaf first appeared in AMD EPYC processors, with full SEV-SNP introduced in the 3rd-generation EPYC "Milan" family released in 2021. The 5th-generation EPYC "Turin" processors, launched in 2025, include enhancements to SEV-SNP such as ABI version 1.58 (as of May 2025) for improved compatibility and security.
EAX=80000021h: Additional Extended Features
Executing the CPUID instruction with EAX set to 80000021h enumerates additional extended processor features on AMD x86-64 implementations, with primary information returned in the EAX register. This leaf extends the feature detection beyond the standard extended aperture provided by EAX=80000001h, focusing on modern enhancements in security mitigations and instruction set capabilities relevant to virtualization and high-performance computing. It first appeared in the Zen 4 microarchitecture (family 19h, model 60h and higher) released in 2022, enabling software to identify support for these features in processors such as the Ryzen 7000 series and EPYC 9004 "Genoa" series.[71]
The EAX register bits specify individual capabilities:
-
Bit 0 (NO_NESTED_DATA_BP): Set if the processor lacks a vulnerability to nested data breakpoints, where debug register configurations could erroneously trigger on inner breakpoints during exception handling; this ensures reliable debugging in complex scenarios without hardware errata workarounds.
-
Bit 8 (AUTOIBRS): Indicates hardware support for Automatic Indirect Branch Restricted Speculation, an enhancement to the SPEC_CTRL MSR's IBRS bit that automatically enforces speculation barriers across privilege level transitions, mitigating Spectre variant 2 attacks with reduced performance overhead compared to software-managed IBRS; the kernel enables this by default on compatible hardware.[71]
-
Bit 17 (CPUID faulting for non-privileged software): Denotes support for disabling CPUID execution in user mode via the CPUID_FAULTING MSRs (MSR_C001_1029 through MSR_C001_102C), triggering a general protection exception (#GP) to prevent unprivileged code from querying sensitive processor information, thereby bolstering security in multi-tenant or virtualized environments.
The Zen 6 microarchitecture (znver6), expected in 2026, is planned to extend this leaf with bit 23 (AVX512_BMM), signaling availability of AVX-512 Bit Matrix Multiply instructions (VBMACOR16x16x16, VBMACXOR16x16x16, and VBITREV), which perform packed 16x16 bit matrix accumulations and reversals optimized for sparse matrix operations in AI inference and training workloads.[72]
This function has no sub-leaves (ECX=0), and EBX, ECX, and EDX return 0 or are reserved. Software typically queries the maximum extended function via EAX=80000000h before accessing this leaf to confirm availability, using it for Zen 4+ detection in hypervisors like KVM for optimized SVM configurations or security hardening.
EAX=80000025h: New AMD-Specific Enumeration (2025)
The CPUID leaf EAX=80000025h, with ECX=0, is a new extended function introduced in the Zen 5 microarchitecture, available in processors such as the Ryzen AI 300 series (launched mid-2024 for mobile, 2025 for desktop). According to the AMD64 Architecture Programmer's Manual Volume 3 (revision 3.37, July 2025), this leaf supports AMD-specific enumeration, with the maximum extended function reported as at least 80000025h via EAX=80000000h. Public details on bit fields and specific capabilities remain limited.[73][74]
Software can detect its presence for Zen 5 targeting, with initial tooling support in utilities like CPUIDx version 0.18 (April 2025) and fuller implementation in version 0.20 (August 2025). Future sub-leaves (ECX>0) may provide additional diagnostics.
Hypervisor and Reserved Ranges (40000000h+)
The range from 40000000h to 4FFFFFFFh in the CPUID leaf space is reserved exclusively for hypervisor (virtual machine monitor, VMM) implementations on x86 processors from both Intel and AMD.[75][76] This allocation allows hypervisors to expose vendor-specific information and capabilities to guest software without conflicting with native processor CPUID functions, which return zeros or undefined behavior on bare metal when these leaves are queried.[75][77]
When EAX is set to 40000000h, the instruction returns the maximum supported hypervisor CPUID leaf in EAX (typically up to 400000FFh) and a 12-character ASCII vendor identification string distributed across EBX (characters 1-4), ECX (5-8), and EDX (9-12).[75][76] Examples include "KVMKVMKVM" for the KVM hypervisor and "VMwareVMware" for VMware products.[75][76] This leaf functions analogously to the basic CPUID leaf 0h for processor vendors but is tailored for VMM identification.[77]
Subsequent leaves, such as 40000001h, report hypervisor-specific features in EAX, EBX, ECX, and EDX, including interface version details and capability bits like support for shadow paging or MSR access.[75][76] For instance, Microsoft Hyper-V uses leaves from 40000000h to 4000000Ah to enumerate enlightenments such as APIC access virtualization, while VMware employs similar leaves for features like nested paging.[78] These outputs enable guest software to detect the presence of a hypervisor—often in conjunction with bit 31 of ECX from leaf 1h—and optimize behavior accordingly, such as enabling paravirtualized I/O.[78][77]
Higher leaves within 40000002h to 4FFFFFFFh remain reserved for vendor-defined extensions, with no standardized processor-level functionality.[75][76] Software must respect the maximum leaf value reported by 40000000h.EAX to avoid querying undefined areas, which could trigger hypervisor-specific traps or crashes in emulated environments.[75][77]
As of 2025, no major architectural changes have been introduced to this range by Intel or AMD, though it interacts with confidential computing technologies like Intel Trust Domain Extensions (TDX) and AMD Secure Encrypted Virtualization (SEV) for reporting VMM-mediated security features.[75][76]
Other Vendor Extensions
Xeon Phi Functions (20000000h Series)
The 0x20000000 series of CPUID leaves represents a vendor-specific extension introduced by Intel for the Many Integrated Core (MIC) architecture in Xeon Phi coprocessors. When EAX is set to 0x20000000h, the instruction returns the maximum supported Phi function in EAX, enabling software to determine the range of available Phi-specific enumerations. This leaf is essential for identifying the presence of Xeon Phi hardware in systems employing an offload model, where computational tasks are delegated to the coprocessor for parallel processing.[79]
Subsequent leaves within this range enumerate Phi-specific features in the general-purpose registers, tailored for the coprocessor's in-order cores and high-throughput design. The outputs mirror standard CPUID formats but focus on configurations unique to MIC implementations like Knights Ferry (2010 prototype) and Knights Corner (2012 production).[80][79]
Originally developed for coprocessor-based systems from 2012 to 2016, these leaves supported legacy detection in high-performance computing environments but became deprecated after 2018 as MIC technology was discontinued and features shifted to mainstream Xeon processors via conventional leaves like EAX = 0x00000007h.[80]
Centaur Technology Extensions (C0000000h)
Centaur Technology, a subsidiary of VIA Technologies, introduced custom CPUID leaves in the range starting at C0000000h to enumerate proprietary features in their x86 processors, primarily targeted at embedded and low-power applications. These extensions allow software to detect support for specialized hardware accelerations, such as cryptographic engines, distinguishing VIA designs from mainstream Intel and AMD processors. The vendor identification string "CentaurHauls", obtained from the standard CPUID leaf 0h (EAX=0, EBX register), signals the presence of Centaur-derived processors capable of LongHaul power management technology, which enables dynamic frequency and voltage scaling for energy efficiency in mobile and embedded systems.
Executing CPUID with EAX set to C0000000h returns the highest supported Centaur-specific function in EAX, typically limited to 0xC0000001, indicating no major sub-leaves beyond this basic enumeration. The subsequent leaf C0000001h provides feature flags in the EDX register, focusing on VIA's PadLock security suite, which integrates hardware acceleration for cryptographic operations directly into the processor. Bit 2 of EDX (RNG) indicates support for the Random Number Generator, providing high-quality entropy for secure key generation and random number operations, while bit 3 (RNG-E) denotes if the feature is enabled by default. Similarly, bit 6 (ACE) signals the Advanced Cryptography Engine for AES encryption/decryption, and bit 7 (ACE-E) indicates its enabled state; these were first implemented in the VIA C3 Nehemiah core in 2001.[67][81]
Additional PadLock components are enumerated via bits 8 (MM/HE) and 9 (MM/HE-E) in EDX, supporting the Montgomery Multiplier for efficient modular exponentiation in RSA operations and a Hash Engine for SHA-1/MD5 acceleration. Bit 28 of EDX (LongHaul power) further confirms advanced power management capabilities, allowing software to access MSRs like 0x110A for frequency control, enhancing battery life in portable devices. In later designs like the VIA Nano (Isaiah architecture, introduced 2008), ECX bits in this leaf extend PadLock with support for AES-NI equivalents and improved RNG, though core functionality remains consistent across the lineage. These features have been available since the VIA C3 series around 2000, remaining niche for embedded markets such as thin clients and industrial systems, with limited adoption outside VIA ecosystems.[67][81]
AMD Easter Eggs (8FFFFFFFh)
AMD processors from the K7 (Athlon) and K8 (Opteron/Hammer) families include undocumented Easter eggs in the CPUID instruction, activated by input values in the EAX register that far exceed the maximum extended function leaf reported by EAX=80000000h, typically around 8000001Fh for these architectures. These high leaves, such as 8FFFFFFFh, fall outside the defined range and are considered reserved or invalid in official specifications, yet AMD engineers implemented whimsical responses instead of default zeroed registers to entertain curious developers.[82][2]
For EAX=8FFFFFFFh on K7 and K8 processors, the CPUID instruction returns the ASCII string "IT'S HAMMER TIME" split across the general-purpose registers: "IT'S" in EAX, "HAMM" in EBX, "ER T" in ECX, and "IME" in EDX. This playful message references the 1990 MC Hammer hit "U Can't Touch This," tying into the K8's internal codename "Hammer" and adding a lighthearted nod to the era's pop culture. The feature was introduced with the K7 architecture in 1999 and serves no diagnostic, performance, or compatibility purpose, existing purely as an inside joke from AMD's design team.[82][83]
These eggs have been exploited in niche applications, such as detecting virtual machine emulators that fail to replicate the non-standard behavior, but they offer no practical utility for general software and are safe to query without risking system stability. Discovered shortly after the K7's launch, they highlight early x86 design whimsy but have not been extended in subsequent AMD architectures. As of November 2025, no additional Easter eggs in this range have been reported for modern Zen-based processors like Ryzen.[82][84]
Software Usage Patterns
Inline Assembly Implementation
Inline assembly provides a direct method to execute the CPUID instruction within high-level languages like C or C++, allowing developers to query processor details at the lowest level without relying on external libraries. This approach is particularly useful for performance-critical applications or when fine-grained control over register usage is required. In GCC-compatible compilers, extended inline assembly syntax facilitates mapping C variables to CPU registers, ensuring safe interaction between assembly and host code. The CPUID instruction serializes execution, making it suitable for feature detection during initialization.[37]
A basic example targets the standard leaves EAX=0 (for maximum input value and vendor string) and EAX=1 (for processor version and feature flags). The following GCC inline assembly snippet first detects CPUID support by attempting to toggle the ID bit (bit 21) in the EFLAGS register, which is only possible on processors supporting CPUID; if unsuccessful, the code assumes lack of support. Upon confirmation, it executes CPUID with EAX=0 to retrieve the vendor ID as a 12-character string from EBX, EDX, and ECX registers, then with EAX=1 to extract the processor family, model, and stepping from EAX bits, along with feature flags in EDX (e.g., bit 25 for SSE support) and ECX (e.g., bit 0 for SSE3). Vendor extraction involves concatenating the registers into a char array, while features are bit-tested for presence.[37]
c
#include <stdio.h>
int has_cpuid_support() {
unsigned int flags1, flags2;
asm volatile (
"pushf\n"
"pop %0\n"
"mov %0, %1\n"
"xor $0x00200000, %1\n"
"push %1\n"
"popf\n"
"pushf\n"
"pop %0\n"
: "=r" (flags1), "=r" (flags2)
:
: "cc"
);
return (flags1 ^ flags2) & 0x00200000;
}
void get_vendor_and_features(char vendor[13], unsigned int *features_edx, unsigned int *features_ecx) {
if (!has_cpuid_support()) {
printf("CPUID not supported\n");
return;
}
// EAX=0: Vendor ID
unsigned int eax0, ebx, ecx0, edx;
eax0 = 0;
asm volatile ("cpuid"
: "=a" (eax0), "=b" (ebx), "=c" (ecx0), "=d" (edx)
: "a" (eax0)
: "memory");
vendor[0] = (char)(ebx);
vendor[1] = (char)(ebx >> 8);
vendor[2] = (char)(ebx >> 16);
vendor[3] = (char)(ebx >> 24);
vendor[4] = (char)(edx);
vendor[5] = (char)(edx >> 8);
vendor[6] = (char)(edx >> 16);
vendor[7] = (char)(edx >> 24);
vendor[8] = (char)(ecx0);
vendor[9] = (char)(ecx0 >> 8);
vendor[10] = (char)(ecx0 >> 16);
vendor[11] = (char)(ecx0 >> 24);
vendor[12] = '\0';
// EAX=1: Features (error check: max leaf from EAX=0)
if (eax0 < 1) return;
unsigned int eax1, ebx_unused;
eax1 = 1;
asm volatile ("cpuid"
: "=a" (eax1), "=b" (ebx_unused), "=c" (*features_ecx), "=d" (*features_edx)
: "a" (eax1)
: "memory");
// Example: Family = ((eax1 >> 8) & 0xF) + ((eax1 >> 20) & 0xFF);
}
int main() {
char vendor[13];
unsigned int edx_feat, ecx_feat;
get_vendor_and_features(vendor, &edx_feat, &ecx_feat);
printf("Vendor: %s\n", vendor);
if (edx_feat & (1 << 25)) printf("SSE supported\n");
return 0;
}
#include <stdio.h>
int has_cpuid_support() {
unsigned int flags1, flags2;
asm volatile (
"pushf\n"
"pop %0\n"
"mov %0, %1\n"
"xor $0x00200000, %1\n"
"push %1\n"
"popf\n"
"pushf\n"
"pop %0\n"
: "=r" (flags1), "=r" (flags2)
:
: "cc"
);
return (flags1 ^ flags2) & 0x00200000;
}
void get_vendor_and_features(char vendor[13], unsigned int *features_edx, unsigned int *features_ecx) {
if (!has_cpuid_support()) {
printf("CPUID not supported\n");
return;
}
// EAX=0: Vendor ID
unsigned int eax0, ebx, ecx0, edx;
eax0 = 0;
asm volatile ("cpuid"
: "=a" (eax0), "=b" (ebx), "=c" (ecx0), "=d" (edx)
: "a" (eax0)
: "memory");
vendor[0] = (char)(ebx);
vendor[1] = (char)(ebx >> 8);
vendor[2] = (char)(ebx >> 16);
vendor[3] = (char)(ebx >> 24);
vendor[4] = (char)(edx);
vendor[5] = (char)(edx >> 8);
vendor[6] = (char)(edx >> 16);
vendor[7] = (char)(edx >> 24);
vendor[8] = (char)(ecx0);
vendor[9] = (char)(ecx0 >> 8);
vendor[10] = (char)(ecx0 >> 16);
vendor[11] = (char)(ecx0 >> 24);
vendor[12] = '\0';
// EAX=1: Features (error check: max leaf from EAX=0)
if (eax0 < 1) return;
unsigned int eax1, ebx_unused;
eax1 = 1;
asm volatile ("cpuid"
: "=a" (eax1), "=b" (ebx_unused), "=c" (*features_ecx), "=d" (*features_edx)
: "a" (eax1)
: "memory");
// Example: Family = ((eax1 >> 8) & 0xF) + ((eax1 >> 20) & 0xFF);
}
int main() {
char vendor[13];
unsigned int edx_feat, ecx_feat;
get_vendor_and_features(vendor, &edx_feat, &ecx_feat);
printf("Vendor: %s\n", vendor);
if (edx_feat & (1 << 25)) printf("SSE supported\n");
return 0;
}
This code uses AT&T syntax (GAS), common in GCC, where inputs are prefixed with constraints like "a" for EAX and outputs with "=". The "memory" clobber ensures the compiler does not reorder memory accesses around CPUID. Compilation requires a 32-bit or 64-bit x86 target: gcc -m32 -o cpuid_test cpuid_test.c for 32-bit or g++ -o cpuid_test cpuid_test.cpp for 64-bit, assuming a compatible processor.[85][86]
In 64-bit mode, the implementation remains largely identical, as CPUID operates on the lower 32 bits of the general-purpose registers (RAX, RBX, RCX, RDX), but developers may use 64-bit variables for broader compatibility. The ID bit toggle detection adapts seamlessly, replacing 32-bit PUSHF/POPF with PUSHFQ/POPFQ equivalents, though the provided snippet works unchanged due to zero-extension behavior. Error checking for the maximum leaf (returned in EAX after EAX=0) prevents invalid queries, as exceeding it yields undefined results; for instance, if the maximum basic leaf is 5, attempts with EAX=6 return EAX=0 without further data. This variant ensures portability across 32-bit and 64-bit environments without mode-specific adjustments beyond compiler flags.[37]
For enumerating cache parameters using the deterministic cache leaf (EAX=4), a loop iterates over sub-leaves in ECX starting from 0 until the cache type field (EAX bits 4:0) indicates no more valid entries (value 0). Each iteration populates EAX with cache type/size/ways, EBX with system coherency line size and physical line partitions, ECX with number of sets, and EDX with associativity; for example, a level 1 data cache might return 8-way set associative with 32 KiB size. The loop counts valid caches while extracting details like coherency line size (EBX bits 11:0 +1 for byte value). Support for this leaf is indicated by ECX bit 5 in the EAX=1 response.[37]
c
#include <stdio.h>
void enumerate_caches() {
unsigned int eax1, ebx1, ecx1, edx1;
eax1 = 1;
asm volatile ("cpuid"
: "=a" (eax1), "=b" (ebx1), "=c" (ecx1), "=d" (edx1)
: "a" (eax1)
: "memory");
if (!(ecx1 & (1 << 5))) {
printf("Deterministic cache parameters not supported\n");
return;
}
int cache_count = 0;
unsigned int ecx_sub = 0;
do {
unsigned int eax, ebx, ecx, edx;
eax = 4;
ecx = ecx_sub;
asm volatile ("cpuid"
: "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx)
: "a" (eax), "c" (ecx)
: "memory");
unsigned int cache_type = eax & 0x1F;
if (cache_type == 0) break; // No more caches
unsigned int ways = (ebx >> 22) + 1;
unsigned int partitions = ((ebx >> 12) & 0x3FF) + 1;
unsigned int line_size = (ebx & 0xFFF) + 1;
unsigned int sets = ecx + 1;
unsigned int cache_size_bytes = ways * partitions * line_size * sets;
printf("Cache %d: Type %u, Size %u KiB, Coherency line %u bytes\n", cache_count, cache_type, cache_size_bytes >> 10, line_size);
cache_count++;
ecx_sub++;
} while (ecx_sub < 32); // Reasonable upper limit
}
#include <stdio.h>
void enumerate_caches() {
unsigned int eax1, ebx1, ecx1, edx1;
eax1 = 1;
asm volatile ("cpuid"
: "=a" (eax1), "=b" (ebx1), "=c" (ecx1), "=d" (edx1)
: "a" (eax1)
: "memory");
if (!(ecx1 & (1 << 5))) {
printf("Deterministic cache parameters not supported\n");
return;
}
int cache_count = 0;
unsigned int ecx_sub = 0;
do {
unsigned int eax, ebx, ecx, edx;
eax = 4;
ecx = ecx_sub;
asm volatile ("cpuid"
: "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx)
: "a" (eax), "c" (ecx)
: "memory");
unsigned int cache_type = eax & 0x1F;
if (cache_type == 0) break; // No more caches
unsigned int ways = (ebx >> 22) + 1;
unsigned int partitions = ((ebx >> 12) & 0x3FF) + 1;
unsigned int line_size = (ebx & 0xFFF) + 1;
unsigned int sets = ecx + 1;
unsigned int cache_size_bytes = ways * partitions * line_size * sets;
printf("Cache %d: Type %u, Size %u KiB, Coherency line %u bytes\n", cache_count, cache_type, cache_size_bytes >> 10, line_size);
cache_count++;
ecx_sub++;
} while (ecx_sub < 32); // Reasonable upper limit
}
This GAS-based example compiles similarly to the prior one, with the loop providing comprehensive cache topology without excessive iterations. For standalone assembly development, NASM syntax offers an alternative: section .text; global _start; _start: mov eax, 1; cpuid; ; int 80h (for Linux exit), assembled via nasm -f elf64 example.asm and linked with ld example.o -o example. NASM uses Intel syntax (e.g., MOV EAX, 1), contrasting GAS's AT&T (MOV $1, %EAX), but both support CPUID identically. Run with ./example on a supporting system.[37]
Common pitfalls include executing in 16-bit real mode, where CPUID is unsupported on processors before Pentium, potentially raising an invalid opcode exception (#UD); always verify mode and use protected or long mode. In 32-bit mode without CR4.OSFXSR set, certain extended states may be inaccessible, though basic leaves function. Register preservation is automatic in inline asm via constraints, but overstepping maximum leaves corrupts outputs without halting; robust code always checks the EAX=0 result first. Additionally, hypervisor environments may virtualize CPUID, altering outputs, requiring VMX/SVM awareness.[86][37]
High-Level Language Wrappers and Libraries
High-level language wrappers for the CPUID instruction provide abstractions that simplify access to processor information without directly embedding inline assembly, enabling portable and readable code in C and C++ environments. In Microsoft Visual C++ (MSVC), the __cpuid intrinsic executes the CPUID instruction with a specified leaf value in EAX, returning results in an array of four integers corresponding to EAX, EBX, ECX, and EDX registers, while __cpuidex additionally allows specifying an initial ECX value for sub-leaves.[6] For example, to test for AVX support, developers can invoke __cpuid(1, regs) and check bit 28 in the ECX register of the returned array.[6] In GCC and Clang, the <cpuid.h> header offers the __get_cpuid function, which similarly populates an array with register values for a given leaf and optional sub-leaf, ensuring compatibility across x86 compilers. These intrinsics abstract the low-level register handling, reducing errors in feature detection routines.
Several libraries build on these intrinsics to offer higher-level APIs for CPUID enumeration, focusing on cross-platform detection of features and topology. The cpuid library from Steinwurf provides a C++ interface for querying instruction sets like MMX, SSE, and AVX on x86, abstracting leaf traversals into simple boolean checks such as cpuid::has_sse4_2().[87] For hardware topology, the Hardware Locality (hwloc) library uses CPUID leaves (e.g., leaf 4 for cache parameters) to map processor cores, caches, and NUMA nodes, exposing them via a portable object model suitable for parallel applications. Cross-platform options like SDL's SDL_cpuinfo module leverage CPUID to detect features such as SSE2 or AltiVec, returning structured data via functions like SDL_GetCPUCacheLineSize(), which aids game engines and multimedia software in runtime optimization. Google's cpu_features library extends this to a C99 API supporting x86, ARM, and MIPS, with CPUID handling for feature bits and cache sizes, emphasizing simplicity for embedded and desktop use.[88]
Common usage patterns in these wrappers emphasize efficiency and reliability, such as initializing a global cache of all accessible leaves at program startup to avoid repeated CPUID calls, which can vary per core and impact performance in multi-threaded contexts.[89] This caching approach, often implemented as a singleton function returning a feature map, allows quick queries for optimizations like vectorization paths. For diagnostic tools, libraries may serialize CPUID data into JSON format, enabling structured output for logging or configuration, as seen in utilities that enumerate leaves 0 through 8000001Fh and encode bitfields as key-value pairs. Portability considerations in these wrappers address 32-bit versus 64-bit environments by using conditional compilation (e.g., via __x86_64__ macros) to handle register widths, ensuring 64-bit integers capture full EDX/EAX pairs on x86-64 without truncation.[88] Since x86 architectures are uniformly little-endian, no additional byte-swapping is required for CPUID outputs, simplifying cross-endian portability compared to network protocols.[88]
In 2025, library updates have incorporated support for new AMD-specific extended leaves introduced in 2024-2025 for Zen 5 processors, with tools like CPUIDx reflecting the latest AMD documentation for detecting enhanced security and topology features.[74] Glibc's runtime feature detection, which relies on CPUID for selecting optimized code paths in sysdeps (e.g., for AVX2 on AMD), has seen refinements to better accommodate these extensions, ensuring compatibility in Linux distributions without breaking existing binaries.
CPUID Outside Traditional x86
Adaptations in ARM and Other ISAs
In non-x86 instruction set architectures (ISAs), mechanisms analogous to the x86 CPUID instruction exist to query processor identification, version, and feature support, but they typically rely on dedicated control or status registers rather than a single parameterized instruction. These approaches provide essential information for software to detect capabilities, though they often require privileged access or kernel mediation for security. Unlike x86's extensible leaf-based queries, these mechanisms use fixed registers that enumerate specific attributes, promoting simplicity but limiting dynamic extensibility.
In the ARM architecture, processor identification is handled through AArch64 system registers such as MIDR_EL1 (Main ID Register), which encodes the implementer code, variant, architecture version, part number, and revision for the current core. Feature detection is facilitated by registers like ID_AA64PFR0_EL1 (AArch64 Processor Feature Register 0, EL1), which indicates support for elements including floating-point operations, Advanced SIMD, and extract instructions, with additional registers such as ID_AA64ISAR0_EL1 covering instruction set extensions like CRC32 and atomic operations. These registers are accessed using the MRS instruction in user or EL1 exception levels, but on systems like Linux, kernel emulation ensures safe userspace access via an ABI advertised through HWCAP_CPUID hardware capabilities, preventing direct hardware queries that could expose inconsistencies in heterogeneous multi-core environments.[90][91]
The RISC-V ISA employs Control and Status Registers (CSRs) for similar purposes, with mvendorid providing a 32-bit vendor identifier assigned by JEDEC, marchid specifying the base architecture (e.g., RV32I or RV64G), and mimpid indicating the implementation version, all forming a unique hart (hardware thread) microarchitecture identifier when combined. These read-only CSRs are accessed via instructions like CSRRW in machine mode, enabling software to detect the processor's origin and capabilities without a dedicated query instruction, though custom extensions can leverage the misa CSR for additional feature bits. This design supports RISC-V's modular nature, where standard identification focuses on core attributes while allowing extensions for specialized features.
For PowerPC processors, the Processor Version Register (PVR) serves as the primary identification mechanism, containing a 32-bit value that combines the version number (bits 0-15) and revision number (bits 16-31) to uniquely identify the processor model, such as distinguishing between variants like the PowerPC 750 or e500 core family. Accessed via the mfspr instruction in supervisor mode, the PVR enables software to branch based on hardware specifics, with values defined per implementation to support compatibility in embedded and server environments.[93]
Compared to x86's CPUID, which offers comprehensive, input-dependent enumeration across multiple registers for features like cache hierarchy and vendor strings, these non-x86 mechanisms are generally more streamlined, using auxiliary registers for targeted queries rather than broad leaves, which reduces overhead but requires architecture-specific code for full capability detection. In emulation contexts, tools like QEMU bridge this gap by synthesizing CPUID responses for x86 guest operating systems running on ARM or other hosts, configuring virtual CPU models to expose compatible identification data based on the emulated processor type, ensuring portability without native hardware support.[94]
Virtualization and Emulation Contexts
In virtualized environments, hypervisors such as KVM and VMware intercept CPUID instructions to prevent guests from directly accessing host processor details, instead returning synthetic values that reflect the configured virtual hardware. This interception ensures guest stability by masking or emulating features that may not be fully supported in the virtual context, such as advanced instruction sets or cache topologies. For example, VMware's vSphere can configure guest CPUID masks via advanced settings to expose only a subset of host features, avoiding compatibility issues.[95][96]
Hypervisor presence can be detected through specific CPUID leaves. When EAX=1 is executed, bit 31 of ECX (the hypervisor-present bit) is set if a hypervisor is active, signaling to software that it is running in a virtual machine. Additionally, the reserved range starting at EAX=0x40000000 provides VMM identification: EAX returns the maximum leaf in this range, while EBX, ECX, and EDX form the hypervisor vendor ID string, such as "KVMKVMKVM" for KVM or "Microsoft Hv" for Hyper-V. These mechanisms allow operating systems and applications to adapt behavior accordingly.[78][78]
In emulation scenarios, tools like QEMU handle CPUID by translating queries to the guest's emulated ISA or passing through host values in specific modes. QEMU's host-passthrough mode directly exposes the host CPU model, features, and stepping to the guest, minimizing overhead for performance-critical workloads, while named CPU models (e.g., "Skylake-Server") synthesize fixed feature sets for broader compatibility. For Hyper-V emulation, QEMU overrides leaves 0x40000000 to 0x4000000A to mimic Microsoft's signatures when enlightenments are enabled. Passthrough is configurable via command-line options like -cpu host, but requires careful alignment with the hypervisor to avoid exposing unintended host details.[97]
Challenges in CPUID handling include feature masking for security and support for nested virtualization. Hypervisors often mask vulnerable features, such as those related to Spectre and Meltdown, by clearing specific bits in synthetic CPUID responses to prevent guest exploitation of host-side attacks; for instance, QEMU applies mitigations by default, exposing only safe feature subsets. In nested virtualization, where a guest hypervisor runs inside another VM, CPUID interception becomes layered, requiring the outer hypervisor to emulate inner ones accurately to avoid propagation of incorrect feature flags. This can introduce performance overhead from additional VM exits.[98][95]
As of 2025, technologies like Intel Trust Domain Extensions (TDX) and AMD Secure Encrypted Virtualization-SNP (SEV-SNP) mandate precise CPUID enumeration for attestation processes. In TDX, the module trusts and exposes most CPUID leaves to guests for integrity verification during remote attestation, ensuring the virtual trust domain's configuration matches expected processor capabilities. Similarly, SEV-SNP attestation reports incorporate CPUID-derived platform details, such as family and model, to cryptographically verify the guest's initial state against the host's secure processor. Inaccurate CPUID emulation here could invalidate attestations, compromising confidential computing deployments.[99][100][101]
Tools like the virt-what script leverage CPUID probes to identify virtualization contexts. It executes targeted CPUID calls to check for hypervisor signatures in standard and reserved leaves, outputting detected environments (e.g., "kvm" or "vmware") based on matching vendor IDs and feature bits, aiding system administrators in diagnosing virtual setups without invasive checks.[102]