Zen 3
Zen 3 is a central processing unit (CPU) microarchitecture developed by Advanced Micro Devices (AMD) as the successor to Zen 2, introduced with the Ryzen 5000 series desktop processors on November 5, 2020.[1] Fabricated on TSMC's 7 nm process node, it emphasizes performance efficiency through a chiplet-based design that integrates multiple core complex dies (CCDs) via Infinity Fabric.[2] The architecture delivers an average 19% uplift in instructions per clock (IPC) over Zen 2, driven by enhancements in branch prediction, a wider execution engine, and optimized pipeline throughput.[2] A defining feature of Zen 3 is its redesigned eight-core CCD, which unifies 32 MB of L3 cache accessible to all cores within the chiplet, doubling the directly available L3 cache per core compared to Zen 2 and significantly reducing inter-core communication latency for latency-sensitive workloads like gaming.[2] This configuration enables up to 16 cores and 32 threads in high-end desktop models such as the Ryzen 9 5950X, with boost clocks reaching 4.9 GHz and support for PCIe 4.0 and DDR4-3200 memory. In server applications, Zen 3 powers the EPYC 7003 "Milan" series, scaling to 64 cores per socket with up to 256 MB of L3 cache, PCIe 4.0 lanes, and improved energy efficiency for data center tasks.[3] Mobile variants, including the Ryzen 5000 "Cezanne" APUs, integrate Zen 3 cores with Radeon graphics for laptops, offering up to eight cores and enhanced integrated performance.[2] Zen 3 marked a pivotal advancement for AMD, establishing leadership in 1080p gaming performance upon launch while maintaining competitive productivity and multi-threaded capabilities against contemporaries like Intel's 10th and 11th-generation Core processors.[1] Its innovations, including simultaneous multithreading (SMT) and advanced prefetching, contributed to broad adoption across consumer, professional, and enterprise segments, with the architecture remaining relevant through refreshes like Zen 3+ in mobile products until the transition to Zen 4.[2]Development
Announcement and design goals
AMD first provided high-level details on Zen 3 during its EPYC "Rome" processor launch event on August 7, 2019, confirming the microarchitecture would utilize TSMC's enhanced 7 nm process node (7 nm+) and continue the chiplet-based design for scalability in high-core-count configurations. This revelation positioned Zen 3 as the successor to Zen 2, with early roadmap updates emphasizing its role in extending AMD's competitive edge in both desktop and server markets through modular chiplet integration, allowing for efficient scaling beyond 16 cores without the manufacturing challenges of monolithic dies. The design phase for Zen 3 was completed by mid-2019, with tape-out occurring later that year, targeting production readiness in 2020 while maintaining compatibility with existing AM4 sockets for desktop variants.[4] The development of Zen 3 was motivated by the need to address intensifying competition, particularly as Intel faced repeated delays in transitioning to its 10 nm process, which hampered its ability to deliver competitive core counts and performance density. AMD aimed to solidify its market leadership by focusing on single-threaded performance improvements critical for gaming and productivity workloads, leveraging the chiplet's flexibility to support up to 64 cores in server applications like the upcoming EPYC "Milan" without compromising efficiency.[5] On October 8, 2020, AMD formally unveiled Zen 3 at a dedicated event, detailing key design goals including a 19% increase in instructions per clock (IPC) over Zen 2, achieved primarily through a unified core complex redesign that consolidated the L3 cache into a single 32 MB domain per eight-core chiplet, reducing latency and enhancing branch prediction for better single-threaded uplift.[1] This architecture targeted Zen 2-level clock speeds of up to 4.9 GHz while improving power efficiency, enabling higher sustained performance in latency-sensitive tasks without increasing thermal design power significantly.Manufacturing and release
The compute chiplets of Zen 3 processors were fabricated on TSMC's 7 nm process node, enabling high transistor density with each chiplet featuring 4.15 billion transistors across an area of approximately 80.7 mm². The I/O die, handling interconnects and peripheral interfaces, was produced using GlobalFoundries' 12 nm process for desktop and mobile variants, or 14 nm for server implementations. This combination optimized performance in core logic while maintaining cost-effective I/O fabrication on a more established node.[6] Engineering samples reached OEM partners in Q2 2020, allowing early validation and system integration. The official launch followed on November 5, 2020, introducing the Ryzen 5000 series desktop lineup, headlined by the 16-core Ryzen 9 5950X at a launch MSRP of $799. Zen 3's modular chiplet architecture facilitated yield improvements through smaller, specialized dies that reduced defect rates during manufacturing. Cost efficiencies arose from reusing compatible compute chiplets and I/O dies across desktop, mobile, and server segments, streamlining production and minimizing variant-specific redesigns.[7] Launch availability faced challenges from COVID-19-induced supply chain constraints, resulting in widespread shortages and elevated resale prices for Ryzen 5000 processors in late 2020.[8]Architecture
Core microarchitecture
The Zen 3 core implements a 4-wide superscalar, out-of-order execution pipeline, building on the foundational design of prior Zen generations while introducing targeted refinements for improved throughput and efficiency. The integer pipeline consists of 19 stages, enabling deep speculation and high-frequency operation, while the floating-point pipeline is shortened to 4 stages to minimize latency in vector workloads. This configuration supports simultaneous multi-threading (SMT) with two threads per core, allowing the core to dispatch up to 6 micro-operations (μops) per cycle—typically 4 to the integer domain and 2 to the floating-point domain—facilitating balanced execution across diverse workloads. https://www.agner.org/optimize/microarchitecture.pdf https://en.wikichip.org/wiki/amd/microarchitectures/zen_3 Central to the integer execution are dual schedulers, each capable of handling up to 44 entries, that enable 4-wide dispatch to four arithmetic logic units (ALUs) and three address generation units (AGUs). This setup allows for robust handling of integer operations, with branch execution supported by two dedicated units to maintain pipeline flow. The core is identified via CPUID function 0000_0001h, returning family 19h (model 01h or higher for Zen 3 variants), which distinguishes it from prior Zen 2 (17h) implementations. In desktop configurations, Zen 3 supports up to 8 cores per core complex (CCX), unifying access to shared resources within the complex for streamlined single-threaded performance. https://smartos.org/man/3cpc/amd_f19h_zen3_events https://en.wikichip.org/wiki/amd/microarchitectures/zen_3 https://wccftech.com/amd-zen-3-ryzen-4000-vermeer-cpus-detailed-up-to-16-cores-32-threads/ The floating-point unit represents a key enhancement, featuring three 256-bit fused multiply-add (FMA) units alongside two dedicated add pipes, delivering up to 24 floating-point operations per cycle for AVX2 instructions. This triple-FMA configuration reduces FMA latency to 4 cycles from 5 in Zen 2, enabling higher throughput for vectorized compute tasks without AVX-512 support, which was introduced in subsequent architectures. Store-to-load forwarding latency is optimized at 5 cycles, supporting efficient data dependencies in numerical applications. https://www.realworldtech.com/forum/?threadid=195965&curpostid=195985 https://en.wikichip.org/wiki/amd/microarchitectures/zen_3 https://www.agner.org/optimize/microarchitecture.pdf Enhancements in the load/store unit boost memory operation throughput to 3 loads or 2 stores per cycle (up to 256 bits each), a step up from Zen 2's 2 loads and 1 store. Three AGUs facilitate parallel address calculations, with architectural shifts—such as relocating floating-point stores and FP-to-integer conversions to the load/store domain—reducing overall latency by 1-2 cycles for dependent operations. This design minimizes stalls in bandwidth-sensitive scenarios, contributing to the core's overall 19% instructions-per-clock (IPC) uplift over Zen 2. https://www.nextplatform.com/2021/03/26/deep-dive-into-amds-milan-epyc-7003-architecture/ https://forums.anandtech.com/threads/design-changes-in-zen-3-cpu-core-chiplet-only.2585982/Chiplet design and interconnect
The Zen 3 microarchitecture utilizes a multi-chip module (MCM) design consisting of one or more compute chiplets, known as core complex dies (CCDs), connected to a central input/output (I/O) die through AMD's Infinity Fabric interconnect. Each CCD, fabricated on TSMC's 7 nm process node, integrates a single 8-core core complex (CCX) with 32 MB of unified L3 cache shared among all eight cores, departing from Zen 2's configuration of two 4-core CCXs per CCD with separate 16 MB L3 slices. This shift eliminates the need for inter-CCX communication via Infinity Fabric for local cache accesses, thereby reducing average inter-core L3 latency within the CCD compared to Zen 2's dual-CCX setup. The Infinity Fabric links between each CCD and the I/O die employ an on-package (IFOP) interface with 16 bidirectional lanes, operating at speeds up to the Infinity Fabric clock (FCLK) of 1.8 GT/s in typical configurations, delivering up to 32 bytes read and 16 bytes write per cycle at 1.8 GHz FCLK, for approximately 57.6 GB/s read and 28.8 GB/s write bandwidth (aggregate ~86.4 GB/s) per link.[9] The I/O die, built on a 12 nm process for client processors and 14 nm for server variants, manages essential system interfaces including the integrated memory controller supporting DDR4-3200 and up to 24 lanes of PCIe 4.0 for desktop applications, while server implementations expand to 128 PCIe 4.0 lanes. This design integrates the I/O die with a mesh topology of Infinity Fabric routers to efficiently route traffic among multiple CCDs and external peripherals, enhancing overall system coherence and scalability.[9] In server configurations, such as the EPYC "Milan" processors on the SP3 socket, the architecture supports up to eight CCDs per package, enabling a maximum of 64 cores while maintaining low-latency access to shared resources via the central I/O die and Infinity Fabric mesh. Each Zen 3 CCD contains approximately 4.15 billion transistors, contributing to the dense integration of eight high-performance cores and their associated cache hierarchy within a compact 83 mm² die area. This chiplet approach allows AMD to scale core counts flexibly while optimizing manufacturing yields by isolating compute logic on advanced nodes separate from the I/O functions on more mature processes.[3][10]Key features and improvements
Performance enhancements
Zen 3 achieves an average 19% increase in instructions per clock (IPC) over Zen 2, with gains reaching up to 25% in certain integer-heavy workloads such as decompression and encryption tasks. This uplift stems primarily from enhancements in the core's out-of-order execution engine, including a larger reorder buffer expanded to 256 entries from 224 in Zen 2, which allows for greater speculation depth and reduced stalls during instruction retirement. Additionally, improved branch misprediction recovery and speculation mechanisms contribute to higher throughput in integer pipelines, enabling more efficient handling of complex code paths.[11][12] A key contributor to the IPC gains is the overhauled branch prediction unit, featuring a doubled L1 branch target buffer (BTB) size of 1024 entries compared to 512 in Zen 2, alongside an enlarged L2 BTB with 6656 entries. Zen 3 retains the perceptron-based predictor introduced in earlier generations but benefits from higher prediction bandwidth and "zero-bubble" prediction for direct branches, resulting in improved accuracy over Zen 2 in branch-intensive benchmarks. These changes minimize pipeline bubbles from mispredictions, particularly in workloads with frequent conditional branches, boosting overall execution efficiency.[13][14][12] Execution enhancements further amplify performance, with dispatch and issue widths increased to 6-wide from 4-wide in Zen 2, allowing up to 10 integer operations per cycle via additional ports. Floating-point handling is improved through dedicated ports for FP stores and conversions, alongside a reduced FMA latency of 4 cycles (down from 5), supporting up to 6 FP μOPs dispatched per cycle. SMT handling is optimized for dual threads per core, with better resource allocation reducing contention in mixed workloads. These tweaks, combined with the unified cache design, enable 15-20% better performance per watt, facilitating sustained higher boost clocks up to 4.9 GHz on single cores.[11][12][14]Cache and memory subsystem
The cache hierarchy in Zen 3 processors follows a three-level design per core, with private L1 and L2 caches and a shared L3 cache at the core complex level. Each core features a 32 KiB instruction cache (L1I) that is 8-way set associative and a 32 KiB data cache (L1D) that is also 8-way set associative, both supporting 64-byte cache lines for efficient instruction fetch and data access. These L1 caches employ a write-back policy and provide low-latency access critical for out-of-order execution, with typical hit latencies around 4 cycles for both instruction and data accesses.[12] The private L2 cache per core is 512 KiB and 8-way set associative, also with 64-byte lines, serving as a unified victim cache for L1 evictions and extending the effective capacity for frequently accessed data. L2 hit latency is approximately 12 cycles, balancing size and speed to support the core's execution pipeline while minimizing pressure on higher levels.[12][13] At the core complex (CCX) level, Zen 3 unifies the L3 cache into a single 32 MB shared structure for all eight cores in the complex, a key change from Zen 2's split design that reduces inter-core latency by providing uniform access. This L3 cache is 16-way set associative with 64-byte lines and operates as a victim cache, capturing data evicted from L2 to maintain data locality; access latency within the CCX is around 34-40 cycles, enabling faster shared data retrieval compared to prior generations.[12][15] The memory subsystem integrates a dual-channel DDR4 controller on the I/O die, supporting up to DDR4-3200 with a theoretical peak bandwidth of 51.2 GB/s, which enhances overall system throughput for memory-intensive workloads. This configuration leverages the Infinity Fabric interconnect for efficient data movement between the I/O die and compute chiplets, prioritizing bandwidth improvements over latency in multi-core scenarios.[16] Zen 3 introduces optional 3D V-Cache technology in select variants, stacking an additional 64 MB of L3 cache vertically on the core complex die using through-silicon vias (TSVs) for a total of 96 MB per eight-core CCX, targeted at gaming applications to further reduce cache miss rates and latency through increased capacity and direct access paths.[17][18]Specifications
Processor tables
The Zen 3 processors encompass a range of desktop and server models without integrated graphics, emphasizing high-performance computing across segments. Desktop variants utilize the AM4 socket and deliver 24 total PCIe 4.0 lanes, of which 20 are usable for devices like GPUs and NVMe storage, while supporting unlocked multipliers for overclocking on applicable models.[19][20] Server configurations employ the SP3 socket and provide up to 128 PCIe 4.0 lanes per processor for expansive I/O scalability.[21][22] The following table summarizes key specifications for representative desktop processors based on the Zen 3 architecture:| Model | Cores/Threads | Base Clock (GHz) | Boost Clock (GHz) | L3 Cache (MB) | TDP (W) | Socket |
|---|---|---|---|---|---|---|
| Ryzen 9 5950X | 16/32 | 3.4 | 4.9 | 64 | 105 | AM4 |
| Ryzen 9 5900X | 12/24 | 3.7 | 4.8 | 64 | 105 | AM4 |
| Ryzen 7 5800X | 8/16 | 3.8 | 4.7 | 32 | 105 | AM4 |
| Ryzen 5 5600X | 6/12 | 3.7 | 4.6 | 32 | 65 | AM4 |
| Model | Cores/Threads | Base Clock (GHz) | Boost Clock (GHz) | L3 Cache (MB) | TDP (W) | Socket |
|---|---|---|---|---|---|---|
| EPYC 7763 | 64/128 | 2.45 | 3.5 | 256 | 280 | SP3 |
| EPYC 7543 | 32/64 | 2.80 | 3.70 | 256 | 225 | SP3 |
| EPYC 7443 | 24/48 | 2.85 | 4.00 | 128 | 200 | SP3 |
| EPYC 7303 | 16/32 | 2.40 | 3.40 | 64 | 130 | SP3 |
APU tables
Zen 3-based APUs, codenamed Cezanne and its refresh Barcelo, combine up to eight Zen 3 CPU cores with integrated Radeon Vega graphics targeted at mobile client devices such as laptops and thin clients. These APUs utilize a monolithic die design fabricated on TSMC's 7 nm process, supporting DDR4-3200 or LPDDR4x-4266 memory and featuring soldered BGA packaging for compact, power-efficient form factors.[25] The integrated graphics employ the Vega (GCN 5th generation) architecture with configurations ranging from 6 to 8 compute units (CUs), delivering up to 512 stream processors clocked as high as 2.1 GHz in higher-power variants. These iGPUs include multimedia engines with VCN 3.0 support for hardware-accelerated AV1 video decode, enabling efficient playback of modern video formats without discrete GPUs.[26][2] Power scaling across these APUs accommodates diverse laptop designs, with configurable thermal design power (TDP) from 10 W in ultra-low-power U-series models for thin-and-light devices to 54 W in H-series and configurable HX variants for performance-oriented systems. All models feature unlocked multipliers for overclocking in supported platforms, though actual power limits are OEM-configurable via cTDP.[27][28]| Model | Cores/Threads | Base Clock (GHz) | Boost Clock (GHz) | iGPU (CUs @ Peak GHz) | TDP (W) | Form Factor |
|---|---|---|---|---|---|---|
| Ryzen 7 5800U (Cezanne) | 8/16 | 1.9 | 4.4 | Radeon Vega 8 (8 @ 2.0) | 15 | BGA, soldered |
| Ryzen 5 5600U (Cezanne) | 6/12 | 2.3 | 4.2 | Radeon Vega 7 (7 @ 1.8) | 15 | BGA, soldered |
| Ryzen 3 5400U (Cezanne) | 4/8 | 2.6 | 4.0 | Radeon Vega 6 (6 @ 1.6) | 15 | BGA, soldered |
| Ryzen 7 5825U (Barcelo) | 8/16 | 2.0 | 4.5 | Radeon Vega 8 (8 @ 2.0) | 15 | BGA, soldered |
| Model | Cores/Threads | Base Clock (GHz) | Boost Clock (GHz) | iGPU (CUs @ Peak GHz) | TDP (W, configurable) | Form Factor |
|---|---|---|---|---|---|---|
| Ryzen 9 5980HS (Cezanne) | 8/16 | 3.0 | 4.8 | Radeon Vega 8 (8 @ 2.1) | 35 | BGA, soldered |
| Ryzen 7 5800H (Cezanne) | 8/16 | 3.2 | 4.4 | Radeon Vega 8 (8 @ 2.0) | 45 | BGA, soldered |
| Ryzen 9 5980HX (Cezanne) | 8/16 | 3.3 | 4.8 | Radeon Vega 8 (8 @ 2.1) | 45 (up to 54) | BGA, soldered |
| Ryzen 7 5825HS (Barcelo) | 8/16 | 3.0 | 4.5 | Radeon Vega 8 (8 @ 2.0) | 35-54 | BGA, soldered |