RDNA 3
RDNA 3 is the third generation of AMD's Radeon DNA (RDNA) graphics processing unit (GPU) architecture, unveiled on November 3, 2022, and designed for high-performance gaming, content creation, and AI workloads.[1] It introduces a groundbreaking chiplet-based design that combines a 5 nm Graphics Compute Die (GCD) with multiple 6 nm Memory Cache Dies (MCDs), enabling modular scalability and optimized power efficiency across discrete and integrated GPUs.[1] The architecture powers the AMD Radeon RX 7000 series graphics cards, starting with the flagship Radeon RX 7900 XTX and RX 7900 XT, which launched in December 2022 and feature up to 96 compute units (CUs), 24 GB of GDDR6 memory, and second-generation Infinity Cache technology providing up to 96 MB of on-package cache for effective bandwidth exceeding 3 TB/s.[1] Compared to its predecessor RDNA 2, RDNA 3 delivers up to 54% improved performance per watt, 1.7× faster 4K gaming performance, 1.8× better ray tracing capabilities, and 2.7× higher AI throughput, making it a significant leap in energy-efficient computing.[1] At the core of RDNA 3's design are enhanced compute units, each equipped with 64 stream processors and dual 32-wide single instruction multiple data (SIMD) units, enabling dual-issue execution for floating-point 32 (FP32) operations and effectively doubling the FP32 throughput to 128 ALUs per CU compared to RDNA 2's 64.[2] This structure supports up to 6,144 stream processors in high-end configurations like the Navi 31 die used in the RX 7900 XTX, achieving peak FP32 performance of 61.4 TFLOPS at clock speeds up to 2.5 GHz.[2] The chiplet approach, with a single GCD handling compute tasks and up to six MCDs providing L3 cache and inter-die interconnects at 5.3 TB/s bandwidth, reduces manufacturing costs and power consumption by less than 5% for the interconnect while improving silicon yield through smaller dies.[1][2] RDNA 3 advances ray tracing with second-generation dedicated accelerators—up to 96 per GPU—featuring optimized bounding volume hierarchy (BVH) traversal and larger vector general-purpose registers (VGPRs) that are 1.5× bigger, resulting in up to 1.8× performance gains over RDNA 2 in ray-traced workloads.[2] For AI and machine learning tasks, it integrates 192 matrix cores supporting bfloat16 (BF16) and integer4 (INT4) wave matrix multiply-accumulate (WMMA) operations, providing 2.7× faster matrix computations to accelerate features like AI upscaling and denoising in games.[2] Additional enhancements include shader write compression optimized for Wave64 thread groups, improved workgroup tiling for better Infinity Cache utilization, and support for DisplayPort 2.1 with up to 8K at 60 Hz, alongside new media engines for AV1 encoding/decoding.[3] These innovations position RDNA 3 as a versatile architecture, extending beyond desktops to integrated solutions in AMD's Ryzen processors and enabling future scalability in data centers and embedded systems. As of 2025, it also powers integrated graphics in AMD's Ryzen AI 300 series processors for laptops.[2][4]Background
Development and announcement
The chiplet-based design concept for RDNA 3 originated from a sketch by AMD corporate fellow Sam Naffziger on a hotel notepad during an off-site staff meeting, drawing from experiences with Ryzen CPU chiplets.[5] Development progressed with input from AMD fellows like Andy Pomianowski and Mike Mantor, targeting significant efficiency gains. AMD first teased RDNA 3 at Computex in June 2022, officially unveiling the architecture on November 3, 2022.[6][1] The first products, the Radeon RX 7900 XTX and RX 7900 XT, launched on December 13, 2022.Design goals and innovations
The primary design goals for AMD's RDNA 3 architecture centered on achieving a 50-70% performance uplift and over 50% improvement in power efficiency compared to RDNA 2, primarily through the adoption of chiplet modularity and advancements in manufacturing processes. This approach allowed for greater flexibility in scaling compute resources while optimizing energy consumption, enabling GPUs to deliver higher frame rates at the same power levels or equivalent performance at reduced wattage. For instance, RDNA 3 GPUs were engineered to operate at 1.3 times the frequency of RDNA 2 equivalents using the same power envelope, or maintain RDNA 2 frequencies while consuming roughly half the power.[2][1] A key innovation in realizing these goals was the introduction of a dual-process node strategy, utilizing TSMC's 5nm node for the Graphics Compute Die (GCD) to maximize compute density and performance, while employing the more cost-effective 6nm node for the Memory Cache Dies (MCDs) to handle memory and caching functions. This hybrid fabrication balanced high-performance computing with economical production scaling, reducing overall die costs without compromising core efficiency targets. The GCD, measuring approximately 306 mm², focuses on processing-intensive tasks, whereas the MCDs, each around 37.5 mm², integrate Infinity Cache to minimize latency.[1][2] RDNA 3 emphasized scalability across diverse GPU applications, including discrete graphics cards, integrated solutions in processors, and professional workstations, by leveraging the chiplet-based design to configure varying numbers of compute units and cache dies. This modularity supports configurations from high-end discrete GPUs with up to 96 compute units to more compact integrated variants, ensuring broad applicability while maintaining the architecture's efficiency and performance objectives.[2][1] Among its innovations, RDNA 3 incorporated AMD Infinity Fabric for die-to-die interconnects, providing up to 5.3 TB/s of bandwidth to facilitate seamless communication between the GCD and MCDs, thereby enhancing data throughput and reducing bottlenecks in memory access. Additionally, the architecture introduced support for AI acceleration through Wave Matrix Multiply-Accumulate (WMMA) instructions, which enable efficient matrix operations for machine learning workloads by leveraging wavefront-wide computations in formats like FP16 and BF16, potentially doubling throughput for AI inference compared to prior generations.[1][2][7]Architecture
Chiplet packaging and interconnects
RDNA 3 employs a multi-chip module (MCM) design, representing the first implementation of chiplets in a consumer graphics processing unit, with a single Graphics Compute Die (GCD) serving as the primary compute element connected to multiple Memory Cache Dies (MCDs).[1] This modular approach builds on AMD's prior success with chiplet-based central processing units, such as the Ryzen series, by extending high-bandwidth interconnects to GPU architectures for improved scalability.[2] The dies are integrated using TSMC's Integrated Fan-Out Re-Distribution Layer (InFO-RDL) packaging technology, which facilitates dense, high-performance fanout interconnects without relying on traditional silicon bridges.[2] Inter-die communication occurs via AMD's Infinity Fabric links, delivering up to 5.3 TB/s of bidirectional bandwidth while consuming less than 5% of the GPU's total power.[1][2] Compared to monolithic designs used in previous architectures like RDNA 2, the chiplet strategy enhances manufacturing yields by producing smaller, more manageable dies on optimized process nodes—such as 5 nm for the GCD and 6 nm for the MCDs—while reducing overall costs through reusable components and enabling easier scaling of memory capacity.[2] In high-end configurations, a single GCD can connect to up to six MCDs, allowing for up to 96 MB of shared Infinity Cache while maintaining efficient power delivery and thermal management.[1]Graphics Compute Die (GCD)
The Graphics Compute Die (GCD) serves as the primary processing component in AMD's RDNA 3 architecture, responsible for executing the core graphics and compute operations within the multi-chiplet GPU design. Fabricated using TSMC's 5 nm process node, the GCD integrates the majority of the GPU's computational resources into a monolithic die, enabling high-performance rasterization, ray tracing, and general-purpose computing tasks.[2][8] In its full configuration, as seen in high-end implementations like the Navi 31, the GCD encompasses approximately 45.7 billion transistors across a die area of about 300 mm², providing substantial scaling potential for demanding workloads.[2][8][9] This die houses up to 96 compute units, which include programmable shaders optimized for both graphics rendering and parallel compute operations, alongside dedicated accelerators to enhance efficiency in specific domains.[2][9] Additionally, the GCD incorporates fixed-function hardware blocks, such as the media engine for video encoding/decoding and ray intersect accelerators for hardware-accelerated ray tracing, all centralized to streamline pipeline performance.[2][8] As the central hub of the RDNA 3 GPU, the GCD orchestrates all major compute tasks, coordinating data flow and execution across the system while interfacing directly with the Memory Cache Dies (MCDs) through AMD's Infinity Fabric-based chiplet interconnects to access shared memory resources.[2][8] To accommodate different market segments, GCD implementations exhibit variability in scale; full-featured versions support the maximum compute unit count for flagship products, whereas mid-range variants utilize the same die design but with portions of compute units disabled to optimize for lower power consumption and cost.[2][9] This approach allows AMD to derive multiple GPU SKUs from a common GCD foundation, balancing performance and efficiency.[8]Memory Cache Dies (MCDs)
The Memory Cache Dies (MCDs) in AMD's RDNA 3 architecture serve as auxiliary chiplets dedicated to handling memory interfacing and caching functions, enabling a modular chiplet-based design for graphics processing units (GPUs).[2] Each MCD is fabricated on TSMC's 6 nm process node and measures 37.5 mm² in area.[1] These dies integrate 2.05 billion transistors, focusing on efficiency for memory-related tasks rather than compute-intensive operations.[2] A key feature of each MCD is its inclusion of 16 MB of second-generation Infinity Cache, implemented using 3D V-Cache stacking technology for L3 caching.[1] Additionally, every MCD supports a 64-bit GDDR6 memory channel, providing the physical interface for high-speed memory access.[2] In high-end configurations, such as the Navi 31 GPU, up to six MCDs are integrated into a single package, resulting in a total of 96 MB Infinity Cache and a 384-bit memory bus width capable of delivering up to 960 GB/s of bandwidth.[1][2] By offloading memory and cache responsibilities to the MCDs, the design reduces access latency and boosts effective bandwidth for the primary Graphics Compute Die (GCD) through high-speed Infinity Fabric interconnects, achieving up to 5.3 TB/s of L3-to-L2 bandwidth.[2] This separation of memory from compute elements also yields power efficiency gains, with the Infinity Fabric links consuming less than 5% of the total GPU power while enabling up to 54% better efficiency compared to the prior RDNA 2 architecture.[1][2] Thermally, the use of smaller, specialized MCDs distributes heat more evenly across the package, improving overall cooling and allowing for higher sustained frequencies without excessive hotspots.[2]Compute units and shaders
The compute units (CUs) in RDNA 3 form the core programmable processing elements responsible for executing shaders in both graphics rasterization and general-purpose compute tasks. Each CU consists of two SIMD32 units, enabling a total of 64 shaders per CU through a unified shader architecture that supports wave32 wavefronts of 32 threads each. This design incorporates dual-issue capabilities via VOPD instructions, allowing two independent vector ALU operations to be issued simultaneously in wave32 mode for common workloads, effectively doubling the instruction throughput compared to RDNA 2's single-issue SIMDs for eligible operations like FP32 additions.[10][11] Enhancements to scalar and vector operations further optimize execution efficiency. Scalar general-purpose registers (SGPRs) number up to 106 per wavefront, with additional resources like VCC and trap handler temporaries, while vector general-purpose registers (VGPRs) support packed 16-bit formats for higher density in AI and graphics pipelines. The architecture maintains backward compatibility with wave64 modes by dual-issuing instructions across low and high halves of a 64-thread wavefront, replicating data as needed for operations like matrix multiplies. Integrated within the Graphics Compute Die (GCD), the full configuration supports up to 96 CUs, delivering a peak FP32 performance of 61 TFLOPS at reference clocks.[10][11][2] RDNA 3's unified shader model enables seamless support for modern graphics and compute APIs, including DirectX 12 Ultimate with features like mesh shaders, Vulkan for cross-platform rendering, and OpenCL for heterogeneous computing. Shader types such as compute, pixel, geometry, and hull shaders leverage a shared instruction set with arithmetic, bitwise, and conversion operations across FP16, FP32, and FP64 data types. For compute-intensive tasks, the local data share (LDS) provides 64 KB per workgroup with banked access to minimize conflicts, supporting up to 1024 work-items per workgroup across 32 active workgroups per workgroup processor.[10] AI optimizations are integrated through Wave Matrix Multiply-Accumulate (WMMA) instructions, accelerating matrix operations for machine learning workloads. WMMA supports 16x16x16 matrices in formats like F32 accumulation from F16 or BF16 inputs, with VOP3P encoding for signed/unsigned handling and rounding modes, enabling efficient tensor computations without dedicated AI hardware. These features, combined with dual-issue vector operations, enhance performance in sparse matrix scenarios common to inference and training pipelines.[10][11]Ray tracing hardware
RDNA 3 introduces second-generation ray tracing accelerators, building upon the hardware first implemented in RDNA 2 to enhance real-time ray tracing capabilities in graphics rendering.[12] These dedicated units are designed to accelerate key operations in ray tracing pipelines, such as bounding volume hierarchy (BVH) traversal and ray-primitive intersection testing, allowing for more efficient handling of complex lighting, shadows, and reflections in games and applications.[3] Each compute unit in RDNA 3 includes one ray tracing accelerator, enabling a scalable implementation across GPU configurations. High-end variants, such as the Radeon RX 7900 XTX, feature up to 96 such accelerators, one per compute unit, which collectively support the processing of BVH nodes for traversal and perform ray-triangle intersection tests.[2] The accelerators integrate closely with the shader cores, facilitating hybrid rendering pipelines that combine traditional rasterization with ray tracing effects, where shaders manage ray generation and any-hit/closest-hit invocations while offloading intersection computations to the dedicated hardware.[13] This architecture supports industry-standard APIs for ray tracing, including DirectX Raytracing (DXR) via DirectX 12 Ultimate and the Vulkan ray tracing extensions, ensuring compatibility with a wide range of modern game engines and software.[12] Performance improvements stem from architectural optimizations, including larger vector general-purpose registers (VGPRs) for higher ray occupancy in shaders—up to 1.5 times the capacity of RDNA 2—and enhanced cache hierarchies that reduce latency during BVH traversal.[2] In benchmarks, RDNA 3 achieves up to 1.8 times the ray tracing throughput compared to equivalent RDNA 2 configurations, driven by these changes along with higher clock speeds and increased accelerator count.[2] For instance, in workloads like Cyberpunk 2077, the RX 7900 XTX demonstrates approximately 1.6 times higher ray-triangle intersection rates, reaching 5.22 billion tests per second versus 3.21 billion on RDNA 2 equivalents.[13]Cache and memory subsystem
The RDNA 3 architecture employs a multi-level cache hierarchy designed to optimize data access for graphics workloads, featuring improvements in capacity and bandwidth over prior generations. At the lowest level, the L0 cache provides 32 KB per compute unit (CU), doubling the size from RDNA 2 to enhance instruction and data caching close to the shaders.[11] The L1 cache, shared across a shader array, offers 256 KB capacity with 16-way set associativity, also doubled from RDNA 2's 128 KB, supporting efficient scalar and vector data access while maintaining improved latency and approximately doubled bandwidth compared to its predecessor.[11][2] The L2 cache serves as a centralized resource on the Graphics Compute Die (GCD), with 6 MB total capacity— a 50% increase from RDNA 2's 4 MB—using 16-way set associativity to aggregate data from multiple L1 caches and reduce pressure on higher levels.[11][2] This L2 design delivers roughly doubled bandwidth over RDNA 2, aiding in coherent memory access across the GCD.[11] Above the L2 lies the second-generation Infinity Cache, functioning as an L3 cache with up to 96 MB total capacity distributed across the Memory Cache Dies (MCDs), where each MCD contributes 16 MB.[14][11] Implemented as a 16-way set-associative structure on the MCDs, it provides an 1.8x bandwidth increase over RDNA 2's implementation (with theoretical potential up to 2.7x), though access latency is slightly higher due to the chiplet-based separation from the GCD.[11][15] The memory subsystem integrates dual 32-bit memory controllers per MCD, supporting GDDR6 memory at speeds up to 20 Gbps on a 384-bit bus for top-end variants like Navi 31, yielding peak bandwidth of 960 GB/s. Lower variants scale down to 128-bit or 256-bit buses with correspondingly reduced bandwidth, such as 288 GB/s for 128-bit Navi 33 or 624 GB/s for 256-bit Navi 32.[16] The Infinity Cache augments this by acting as a victim cache for the L2, effectively boosting overall memory bandwidth beyond raw GDDR6 figures in cache-hit scenarios, with aggregate GCD-to-MCD interconnect bandwidth reaching up to 5.3 TB/s.[17][11] This design prioritizes high-throughput access for compute-intensive tasks while managing power efficiency in the chiplet configuration.Media engine
The media engine in AMD's RDNA 3 architecture utilizes the Video Core Next (VCN) 4.0 IP block, a dedicated hardware component for video encoding and decoding integrated into the Graphics Compute Die (GCD). This engine enables efficient processing of high-resolution video streams, supporting applications in gaming, live streaming, and professional content creation. VCN 4.0 represents an evolution from prior generations by incorporating hardware acceleration tailored for modern video codecs and workflows.[18] A key feature of VCN 4.0 is its native support for the AV1 codec, providing both encoding and decoding capabilities at resolutions up to 8K at 60 Hz. This allows for up to 7X faster 8K video encoding compared to CPU-based software solutions, significantly reducing processing time for high-quality video output. In addition to AV1, the engine handles H.264 (AVC), H.265 (HEVC), and VP9 codecs for both encode and decode operations, ensuring broad compatibility with existing media formats. The AV1 implementation, in particular, delivers improved compression efficiency for bandwidth-intensive tasks like 8K streaming.[14] The RDNA 3 media engine features a dual-engine design, enabling up to two simultaneous encode or decode streams—such as one 8K60 HEVC decode paired with an AV1 encode—without performance degradation. This multi-format capability supports concurrent processing of different codecs, making it suitable for scenarios like real-time video editing or multi-camera streaming. For enhanced flexibility, the engine integrates with the GPU's programmable shaders to perform video post-processing tasks, including scaling, color correction, and effects application, leveraging RDNA 3's compute resources for superior output quality.[14] Overall, VCN 4.0's architecture prioritizes power efficiency, achieving up to 1.8X higher engine clock speeds than RDNA 2 equivalents while consuming less energy during video workloads. This design benefits streaming platforms and content creators by minimizing thermal output and extending battery life in mobile configurations, without sacrificing throughput for resolutions up to 8K.[14]Display engine
The display engine in AMD's RDNA 3 architecture is the Radiance Display Engine, which supports advanced display outputs for high-resolution and high-refresh-rate monitors. Integrated into the Graphics Compute Die (GCD), it enables support for DisplayPort 2.1 with UHBR 13.5 (up to 54 Gbps bandwidth) and HDMI 2.1a interfaces.[14][2] Key capabilities include 12 bits per channel color depth, supporting up to 68 billion colors and full REC 2020 color gamut for HDR content. Without compression, it can drive 4K resolutions at 229 Hz (8-bit per channel) or 187 Hz (10-bit per channel). With Display Stream Compression (DSC), it supports 4K at up to 480 Hz or 8K at 165 Hz. These features enhance visual fidelity and smoothness in gaming and professional applications, building on prior generations with improved bandwidth and color accuracy.[14][2]Power efficiency enhancements
RDNA 3 achieves more than a 50% improvement in performance per watt over the RDNA 2 architecture, driven by advanced process nodes and a chiplet-based design that optimizes power distribution. The Graphics Compute Die (GCD) is fabricated on TSMC's 5 nm node, while the Memory Cache Dies (MCDs) use the 6 nm node, enabling tailored manufacturing that balances performance and energy use across specialized functions. This combination, coupled with the chiplet separation, reduces power density by isolating high-power compute elements from lower-power memory and I/O components, resulting in decreased leakage currents and improved thermal efficiency for sustained operation.[14][19] Dynamic power management in RDNA 3 features refined adaptive techniques, including fine-grained clock gating at the individual compute unit level, which allows for granular control over clock signals to idle or underutilized sections, minimizing dynamic power consumption without compromising responsiveness. This approach builds on RDNA 2's foundations but incorporates tighter integration with the modular chiplet structure to handle varying workloads more efficiently. The overall design supports TDP configurations ranging from 355 W in high-end discrete implementations to 55 W in integrated variants, demonstrating scalability across power envelopes.[20][21][22] Efficiency gains are particularly notable in ray tracing (RT) and AI workloads, where RDNA 3 delivers up to 2× better performance per watt than RDNA 2 in select benchmarks, facilitated by second-generation RT accelerators and dedicated AI matrix cores. These enhancements stem from unified compute units that streamline RT traversal and AI tensor operations, reducing overhead and improving throughput while maintaining lower power draw relative to prior generations. The modular architecture further aids by enabling higher sustained clocks through better power delivery and reduced hotspots, ensuring consistent efficiency under prolonged loads.[14][23]Die variants
| Die Variant | Compute Units (CUs) | GCD Die Size (mm²) | Manufacturing Node | Memory Bus Width (bits) |
|---|---|---|---|---|
| Navi 31 | 96 | 304 | 5 nm | 384 |
| Navi 32 | 60 | 200 | 5 nm | 256 |
| Navi 33 | 32 | 204 | 5 nm | 128 |
Navi 31
The Navi 31 is the flagship Graphics Compute Die (GCD) variant in AMD's RDNA 3 architecture, designed for high-end discrete graphics processing. It features a full GCD with 96 compute units (CUs), comprising 6,144 stream processors, and integrates 58.2 billion transistors fabricated on a 5 nm process node. The die supports game clock speeds up to 2.5 GHz and boost clocks reaching up to 2.9 GHz, enabling peak theoretical FP32 performance of approximately 61 TFLOPS. This configuration represents a significant scaling from prior generations, emphasizing enhanced parallelism for demanding rasterization and compute workloads.[26][27][2] Navi 31 employs a chiplet-based design with six Memory Cache Dies (MCDs), each contributing 16 MB of Infinity Cache for a total of 96 MB of L3 cache, which helps mitigate latency in memory-bound scenarios. The memory subsystem utilizes a 384-bit interface connected to GDDR6 memory operating at 20 Gbps effective speeds, delivering up to 960 GB/s of bandwidth—ideal for high-resolution gaming and professional visualization tasks. This setup is paired with 96 dedicated ray tracing accelerators, one per CU, providing up to 1.8 times the ray tracing performance compared to RDNA 2 at equivalent power levels, alongside support for AV1 hardware encoding for efficient video compression up to 8K60 resolutions.[26][9][28] In discrete configurations, Navi 31 powers flagship products including the Radeon RX 7900 XTX and RX 7900 XT for gaming, as well as the Radeon Pro W7900 for professional workloads, with a typical thermal design power (TDP) of 355 W. This power envelope balances high performance with improved efficiency gains of over 50% per compute unit relative to the previous architecture, facilitated by dual-issue wavefront execution and optimized cache hierarchies.[26][29][2]Navi 32
The Navi 32 is a mid-range graphics compute die (GCD) in AMD's RDNA 3 architecture, designed for discrete GPUs targeting 1080p and 1440p gaming workloads with a balance of performance and power efficiency.[30][31] Fabricated on TSMC's 5 nm process, the GCD contains 28.1 billion transistors and supports up to 60 compute units (CUs), delivering 3,840 stream processors when fully enabled.[2][32] In practice, configurations vary: the Radeon RX 7800 XT utilizes all 60 CUs, while the RX 7700 XT enables 54 CUs for a more restrained power envelope.[30][31] Clock speeds emphasize efficiency, with game clocks reaching up to 2.2 GHz and boost clocks up to 2.6 GHz depending on the product and cooling solution.[33][34] The Navi 32 die integrates four memory cache dies (MCDs), each contributing 16 MB of Infinity Cache for a total of 64 MB in full configurations, enhancing bandwidth efficiency for gaming scenarios.[35] Paired with a 256-bit GDDR6 memory interface operating at up to 20 Gbps, it achieves a peak bandwidth of approximately 640 GB/s, though the RX 7700 XT employs a 192-bit bus with three active MCDs for 12 GB of VRAM and 48 MB of cache to optimize for mid-range builds.[33][34] This setup prioritizes power efficiency, enabling the RX 7800 XT to deliver high frame rates at 1440p with a 263 W TDP, while the RX 7700 XT targets similar resolutions at a 245 W TDP, outperforming prior-generation mid-range cards in rasterization by leveraging dual-issue FP32 execution and improved instruction scheduling.[30][31] Ray tracing and media processing hardware scale directly with the enabled CU count, providing up to 60 ray accelerators and 120 AI accelerators on the fully populated Navi 32 for hardware-accelerated ray tracing and AV1 encoding/decoding.[30] These features support efficient path tracing in games and video workloads, with the RX 7700 XT's 54 accelerators maintaining competitive performance in hybrid rendering pipelines at 1080p and 1440p.[31] Overall, the Navi 32's design focuses on delivering value-oriented gaming without excessive power draw, making it suitable for mainstream discrete GPUs in the Radeon RX 7000 series.[36]Navi 33
Navi 33 is the entry-level graphics compute die (GCD) in AMD's RDNA 3 architecture, designed for budget discrete GPUs. Fabricated on TSMC's 5 nm process node, it features 32 compute units (CUs), enabling 2,048 stream processors for rasterization and compute workloads. The die contains 13.3 billion transistors and measures 204 mm², making it a compact yet capable implementation optimized for power efficiency in mainstream scenarios.[25][37] Clock speeds on Navi 33 are tuned for balanced performance, with a game clock of up to 2.25 GHz and a boost clock reaching 2.655 GHz in reference configurations. The memory subsystem pairs the GCD with two memory cache dies (MCDs), providing 32 MB of second-generation Infinity Cache to augment the 128-bit GDDR6 interface running at 18 Gbps, yielding a raw bandwidth of 288 GB/s (effective up to 477 GB/s with cache). This setup supports 8 GB of VRAM in primary implementations, prioritizing hit rates for texture and compute data to maintain efficiency at lower resolutions.[37][28][16] Navi 33 powers the Radeon RX 7600 discrete GPU, launched in May 2023, targeting 1080p gaming with features like hardware ray tracing and AI acceleration. Its design emphasizes efficiency, delivering competitive performance in full HD gaming while consuming around 165 W TDP in desktop variants. Despite its smaller scale, Navi 33 includes full AV1 encode and decode support via the Video Core Next 4.0 engine, enabling high-quality video streaming and creation at up to 8K resolutions with 12-bit color depth.[37][38][16]Products
Desktop gaming GPUs
The AMD Radeon RX 7900 XTX is the flagship desktop gaming GPU in the RDNA 3 lineup, launched on December 13, 2022, and built on the Navi 31 die with 24 GB of GDDR6 memory, targeting high-end 4K gaming experiences.[14][26] It launched at a price of $999, offering up to 1.7 times the 4K performance of the previous-generation RX 6900 XT through enhanced compute units and ray tracing capabilities.[14] The Radeon RX 7900 XT, also based on the Navi 31 die, followed on the same launch date with 20 GB of GDDR6 memory and a $899 price point, positioned as a slightly more accessible high-end option for 4K gaming.[39][14]| Model | Launch Date | Die | Memory | Launch Price | Target Resolution |
|---|---|---|---|---|---|
| RX 7900 XTX | Dec 13, 2022 | Navi 31 | 24 GB GDDR6 | $999 | 4K gaming at 60+ FPS |
| RX 7900 XT | Dec 13, 2022 | Navi 31 | 20 GB GDDR6 | $899 | 4K gaming at 60+ FPS |
| RX 7800 XT | Sep 6, 2023 | Navi 32 | 16 GB GDDR6 | $499 | 1440p gaming at 60+ FPS, with 4K capability |
| RX 7700 XT | Sep 6, 2023 | Navi 32 | 12 GB GDDR6 | $449 | 1440p gaming at 60+ FPS |
| RX 7600 XT | Jan 24, 2024 | Navi 33 | 16 GB GDDR6 | $329 | 1080p/1440p gaming at 60+ FPS |
| RX 7600 | May 25, 2023 | Navi 33 | 8 GB GDDR6 | $269 | 1080p gaming at 60+ FPS, with 1440p support |
| RX 7700 | Sep 18, 2025 | Navi 32 | 16 GB GDDR6 | N/A | 1440p gaming at 60+ FPS |
| RX 7400 | Aug 10, 2025 | Navi 33 | 8 GB GDDR6 | N/A (OEM) | 1080p gaming |
Mobile gaming GPUs
The AMD Radeon RX 7000M series introduces discrete RDNA 3 GPUs tailored for high-end mobile gaming, emphasizing chiplet-based designs for improved power efficiency in laptops while delivering desktop-like performance.[45] These GPUs adapt the Navi 3x dies with reduced thermal design power (TDP) envelopes and optimized clock speeds to manage heat dissipation in compact chassis, enabling sustained frame rates in demanding titles at 1440p and beyond.[46] The flagship Radeon RX 7900M, launched on October 19, 2023, utilizes a variant of the Navi 31 die with 72 compute units (4,608 stream processors) and 16 GB of GDDR6 memory on a 256-bit bus.[47] Its total graphics power (TGP) reaches up to 180 W, supporting boost clocks from 1,825 MHz to 2,090 MHz for rasterization performance competitive with NVIDIA's GeForce RTX 4080 Laptop GPU in select workloads.[46][48] Mobile-specific optimizations include decoupled clock domains in the RDNA 3 architecture, allowing the front-end to operate at higher frequencies while shader arrays scale dynamically to mitigate thermal throttling under sustained loads.[2] Following in September 2024, the Radeon RX 7800M employs the Navi 32 die with 60 compute units (3,840 stream processors) and 12 GB of GDDR6 memory, also at up to 180 W TGP.[49] It features a game clock of 2,145 MHz, prioritizing balanced efficiency for 1440p gaming with ray tracing enabled, and incorporates 48 MB of Infinity Cache to reduce memory latency in power-constrained environments.[50] These adaptations, such as lower peak clocks compared to desktop counterparts (e.g., 2.5 GHz on Navi 32 desktop variants), help maintain stability during prolonged sessions by limiting heat buildup.[2] These GPUs power premium gaming laptops from manufacturers like Alienware and MSI, such as the Alienware m18 R1 equipped with the RX 7900M for ultra-high-frame-rate 1080p/1440p play.[51] Overall, RDNA 3 mobile implementations leverage AI accelerators and AV1 encoding for enhanced upscaling and content creation on the go, with power efficiency gains of up to 25% over prior generations in hybrid CPU-GPU scenarios.[45]| GPU Model | Die | Compute Units | Memory | Max TGP | Boost Clock Range |
|---|---|---|---|---|---|
| RX 7900M | Navi 31 | 72 | 16 GB GDDR6 | 180 W | 1.8-2.1 GHz |
| RX 7800M | Navi 32 | 60 | 12 GB GDDR6 | 180 W | Up to 2.15 GHz |
Workstation GPUs
The AMD Radeon Pro W7900, released in April 2023, is the flagship workstation GPU in the RDNA 3 lineup, built on the Navi 31 graphics processor and equipped with 48 GB of GDDR6 memory.[52] Priced at $3,999 USD at launch, it targets demanding professional workloads in compute, CAD, and content creation, delivering up to 61 TFLOPS of FP32 performance.[53] Its dual-slot design and 295 W TDP enable high-throughput tasks while maintaining compatibility with enterprise systems.[54] Following in the mid-range segment, the Radeon Pro W7800 launched in September 2023 on the Navi 32 processor with 32 GB of GDDR6 memory, offered at $2,499 USD.[55] It provides 45 TFLOPS of FP32 compute capability and a 260 W TDP, optimized for AI-accelerated rendering and visualization in professional environments.[56] A 48 GB variant of the W7800, introduced in November 2024, enhances memory capacity specifically for AI and VFX applications, supporting larger datasets in machine learning inference and complex simulations.[57] The Radeon Pro W7700, arriving in November 2023 on the Navi 32 architecture with 16 GB of GDDR6 memory, is positioned as an entry-level professional option at $999 USD.[58] Featuring 35 TFLOPS of FP32 performance and a 190 W TDP, it caters to CAD designers and content creators needing cost-effective acceleration without sacrificing reliability.[59] In August 2023, AMD expanded the lineup with the Radeon Pro W7600 (Navi 33, 8 GB GDDR6, $599 USD, 130 W TDP) and W7500 (Navi 33, 8 GB GDDR6, $569 USD, 70 W TDP), targeting small-to-medium businesses and entry-level professional workflows with efficient performance for visualization and light compute tasks.[60] These RDNA 3 workstation GPUs incorporate error-correcting code (ECC) memory support to ensure data integrity in mission-critical tasks, alongside Independent Software Vendor (ISV) certifications for seamless integration with applications like Autodesk Maya and Adobe Premiere Pro.[61] Thermal design power across the series reaches up to 300 W, balancing performance with power efficiency for sustained professional use.[52]| Model | Release Date | Die | Memory | Launch Price (USD) | TDP (W) |
|---|---|---|---|---|---|
| W7900 | Apr 2023 | Navi 31 | 48 GB GDDR6 | 3,999 | 295 |
| W7800 | Sep 2023 | Navi 32 | 32 GB GDDR6 | 2,499 | 260 |
| W7700 | Nov 2023 | Navi 32 | 16 GB GDDR6 | 999 | 190 |
| W7800 (48 GB) | Nov 2024 | Navi 32 | 48 GB GDDR6 | N/A | 260 |
| W7600 | Aug 2023 | Navi 33 | 8 GB GDDR6 | 599 | 130 |
| W7500 | Aug 2023 | Navi 33 | 8 GB GDDR6 | 569 | 70 |