Fact-checked by Grok 2 weeks ago

Graphics Core Next

Graphics Core Next (GCN) is a family of graphics processing unit (GPU) microarchitectures developed by Advanced Micro Devices (AMD), introduced in 2011 with the Radeon HD 7000 series (Southern Islands) graphics cards. It marked a significant redesign from AMD's prior TeraScale architectures, shifting from vector-oriented very long instruction word (VLIW) processing to a scalar, CPU-like instruction set to improve programmability and performance predictability for both graphics and general-purpose computing tasks. Key innovations include the introduction of dedicated Compute Units (CUs) as the core building blocks, support for unified virtual addressing, and coherent L1 and L2 caching to enable seamless data sharing between CPU and GPU in heterogeneous systems. GCN's Compute Unit design features four 16-wide (SIMD) engines, delivering 64 stream processors per CU, along with a 64 KB scalar local data share (LDS) for fast thread-local memory access and dedicated for branch execution and scalar operations. Each CU includes 16 KB of read/write vector caches and supports up to 40 concurrent wavefronts (groups of 64 threads), optimizing for high-throughput parallel workloads while maintaining compliance for . The architecture also incorporates Asynchronous Compute Engines (ACEs), allowing independent execution of graphics and compute pipelines to boost overall system efficiency. Over its lifespan, GCN evolved across five generations, starting with the first-generation implementation in 28 nm process technology for products like the Radeon HD 7970 (Tahiti GPU), and progressing to more efficient variants in later nodes, including integrated graphics in Ryzen APUs up to the Cezanne APU in 2021. Subsequent generations, such as GCN 2.0 (Sea Islands, 2013), GCN 3.0 (Volcanic Islands, 2015), GCN 4.0 (2016), and GCN 5.0 (2017), introduced refinements like improved power efficiency, higher clock speeds, and enhanced support for APIs including OpenCL 1.2, DirectCompute 11, and C++ AMP. This progression powered AMD's discrete GPUs through the Vega series (2017) and Radeon VII (2019) and served as the foundation for compute-focused derivatives like the CDNA architecture in Instinct accelerators. Driver support for GCN-based products ended in mid-2022, though its legacy persists in AMD's ecosystem for backward compatibility and specialized applications. GCN's emphasis on compute density and —exemplified by high cache throughput in early implementations like Tahiti's 710 /s—excelled in benchmarks, such as FFT operations, though it faced competition in pure graphics rasterization from rivals like NVIDIA's Kepler . The 's memory subsystem features a unified with 64-128 KB cache slices and 40-bit virtual addressing using 64 KB pages, facilitating integration with x86 CPUs for advanced features like GPU-accelerated and scientific simulations. GCN was eventually succeeded by the RDNA in , but its legacy persists in AMD's ecosystem for and specialized applications.

Overview

Development and history

AMD's development of the Graphics Core Next (GCN) architecture was rooted in its 2006 acquisition of , which expanded its graphics expertise and spurred internal research and development focused on for GPUs. The acquisition, completed on October 25, 2006, for approximately $5.4 billion, integrated ATI's graphics IP with AMD's CPU technology, enabling advancements in and setting the stage for a unified approach to CPU-GPU integration. Following this, AMD shifted its GPU design from the VLIW-based TeraScale architecture to a SIMD-based model with GCN, aiming to improve programmability, power efficiency, and performance consistency for both graphics and general-purpose compute workloads. In 2011, demonstrated its next-generation 28 nm graphics processor, previewing the as a successor to TeraScale to deliver enhanced compute performance and full support. The architecture was formally detailed in December 2011, emphasizing its design for scalable compute capabilities in discrete GPUs and integrated solutions. Initial silicon for the (Southern Islands family) was taped out in 2011, with the first product, the , launching on December 22, 2011, as 's flagship single-GPU card built on GCN 1.0. GCN evolved through several iterations, starting with the Southern Islands (GCN 1.0) in 2011-2012, followed by (GCN 2.0) in 2013 with products like the Radeon R9 290 series, and Volcanic Islands (GCN 3.0) in 2015 via the series. Later generations included (GCN 5.0) launched in August 2017 with the , and Vega 20 (GCN 5.1) in November 2018, marking the final major update before the transition to the RDNA architecture in 2019. These milestones reflected incremental improvements in efficiency, feature support, and process nodes while maintaining the core SIMD design. GCN played a pivotal role in AMD's strategy for integrated accelerated processing units () and GPUs, enabling seamless CPU-GPU collaboration through features like unified . First integrated into with the Kabini/Temash series in 2013, GCN powered subsequent designs like (2014) and later , enhancing everyday computing and thin-client applications. In the , GCN underpinned professional GPUs such as FirePro and the series, with the MI25 (Vega-based) launching in June 2017 to target and workloads. This versatility solidified GCN's importance in AMD's push toward heterogeneous systems and expanded market presence beyond consumer graphics.

Key innovations and design goals

Graphics Core Next (GCN) represented a fundamental shift in AMD's GPU design philosophy, moving away from the (VLIW) architecture of the preceding TeraScale generation to a single-instruction multiple-data (SIMD) model. This transition aimed to enhance efficiency across diverse workloads by enabling better utilization of hardware resources through wavefront-based execution, where groups of 64 threads (a ) are processed in a more predictable manner. The SIMD approach allowed for issuing up to five s per clock cycle across vector and scalar pipelines, improving instruction throughput and reducing the associated with VLIW's multi-issue dependencies. A core design goal of GCN was to elevate general-purpose GPU (GPGPU) computing, with full support for OpenCL 1.2 and later standards, alongside DirectCompute 11.1 and C++ AMP, to facilitate heterogeneous computing applications. This emphasis targeted at least 2x the shader performance of TeraScale architectures, achieved through optimized compute units that balanced graphics and parallel processing demands. The architecture integrated graphics and compute pipelines into a unified framework, supporting DirectX 11 and preparing for DirectX 12 feature levels, while enabling compatibility with AMD's Heterogeneous System Architecture (HSA) for seamless CPU-GPU collaboration via shared virtual memory. Power efficiency was another paramount objective, addressed through innovations like ZeroCore Power, which powers down idle GPU components to under 3W during long idle periods, a feature first implemented in GCN 1.0. Complementary technologies such as fine-grained and PowerTune for dynamic voltage and further optimized energy use, enabling configurations from low-power consuming 2-3W to high-end discrete GPUs delivering over 3 TFLOPS at 250W. This scalability was inherent in GCN's modular compute unit design, allowing flexible integration across market segments while maintaining consistent architectural principles.

Core Microarchitecture

Instruction set

The Graphics Core Next (GCN) instruction set architecture (ISA) is a 32-bit RISC-like design optimized for both graphics and general-purpose computing workloads, featuring distinct scalar (S) and vector (V) instruction types that enable efficient ALU operations across wavefronts. Scalar instructions operate on a single value per wavefront for control flow and address calculations, while vector instructions process one value per thread, supporting up to three operands in formats like VOP2 (two inputs) and VOP3 (up to three inputs, including 64-bit operations). This separation allows scalar units to handle program control independently from vector units focused on data-parallel computation. Key instruction categories encompass arithmetic operations such as S_ADD_I32 for scalar addition and V_ADD_F32 or V_ADD_F64 for floating-point addition; bitwise operations including S_AND_B32 and V_AND_B32; and transcendental functions like V_SIN_F32 or V_LOG_F32 for approximations of sine, cosine, and logarithms in the ALU (VALU). is managed primarily through scalar instructions such as S_BRANCH for unconditional jumps and S_CBRANCH for conditional branches based on execution masks, alongside barriers and primitives to coordinate groups. These categories support a -based execution model where each comprises 64 threads (organized as 16 work-items across 4 components for vector4 operations), enabling SIMD processing of instructions across the group. From GCN 1.0 onward, the includes native support for 64-bit integers (e.g., via V_ADD_U64) and double-precision floating-point operations (e.g., V_FMA_F64 for fused multiply-add), ensuring IEEE-754 compliance for compute-intensive tasks. Starting with GCN 3.0, the includes half-precision floating-point (FP16) instructions like V_ADD_F16 and V_FMA_F16 for improved efficiency in workloads, alongside packed math features in GCN 4.0 such as V_CVT_PK_U8_F32 for converting multiple low-precision values in a single operation. The maintains broad compatibility across GCN generations (1.0 through 5.0), with new capabilities added via minor extensions rather than breaking changes, facilitating binary portability for shaders and kernels.

Command processing and schedulers

The Graphics Command Processor (GCP) serves as the front-end unit in the Graphics Core Next (GCN) architecture responsible for parsing high-level commands from , such as draw calls and changes, and mapping them to the appropriate processing elements in the . It coordinates the traditional rendering by distributing workloads across stages and fixed-function hardware units, enabling efficient handling of graphics-specific tasks like vertex processing and rasterization setup. The GCP processes separate command streams for different types, which facilitates multitasking and improves overall pipeline utilization by allowing concurrent execution of graphics operations. Complementing the GCP, Asynchronous Compute Engines (ACEs) manage independent compute queues, allowing compute workloads to execute in parallel with graphics tasks for better resource overlap. Each ACE fetches commands from dedicated queues, forms prioritized task lists ranging from background to real-time levels, and dispatches workgroups to compute units (CUs) while checking for resource availability. GCN supports up to eight ACEs in later generations, enabling multiple independent queues that share hardware with the but operate asynchronously, with graphics typically holding priority during contention. This design reduces idle time on CUs by interleaving compute shaders with graphics rendering, though it incurs a small overhead known as the "async tax" due to and context switching. The scheduler hierarchy in GCN begins with a global command processor that dispatches work packets from user-visible queues in to workload managers, which then distribute tasks across shader engines and . These managers route commands to per-SIMD schedulers within each , where four SIMD units per each maintain a scheduler partition buffering up to 10 for execution. This tiered structure supports dispatching one per cycle per or GCP, with up to five instructions issued per cycle across multiple to maximize throughput. Hardware schedulers within the ACEs and per-SIMD units handle thread management by prioritizing queues and enabling preemption for efficient workload balancing. Priority queuing allows higher-priority tasks to lower ones by flushing active workgroups and switching contexts via a dedicated , supporting out-of-order completion while ensuring through fences or . This mechanism accommodates up to 81,920 in-flight work items across 32 CUs, promoting high occupancy and reducing latency in heterogeneous s. Introduced in the fourth generation of GCN (GCN 4.0), the (PDA) enhances command processing by early rejection of degenerate or small before they reach the vertex shader or rasterizer. It filters triangles with zero area or no sample coverage during input assembly, reducing unnecessary vertex fetches and geometry workload by up to 3.5 times in high-density scenarios. The PDA integrates into the front-end to cull non-contributing primitives efficiently, improving and performance in graphics-heavy applications without impacting valid geometry.

Compute units and wavefront execution

The (CU) serves as the fundamental processing element in the Graphics Core Next (GCN) architecture, comprising 64 shader processors organized into four 16-wide SIMD units. Each SIMD unit handles 16 work-items simultaneously, enabling the CU to process a full of 64 threads by executing it across four clock cycles in a manner. This structure emphasizes massive parallelism while maintaining scalar control for handling. At the heart of execution is the , the basic scheduling unit consisting of 64 threads that operate in across the SIMD units. These threads execute instructions synchronously, with the hardware decomposing each wavefront into four groups of 16 lanes processed sequentially over four to accommodate the 16-wide SIMD width. GCN supports dual-issue capability, allowing the scheduler to dispatch one scalar instruction alongside a instruction in the same , which enhances throughput for mixed workloads involving uniform operations and per-thread computations. The CU scheduler oversees wavefront dispatch using round-robin across up to six execution pipelines, managing buffers and ensuring balanced utilization while tracking outstanding operations like ALU counts. The SIMD arithmetic logic unit (VALU) within each performs core floating-point and integer operations, supporting full IEEE-754 compliance for FP32 and INT32 at a rate of one operation per lane per cycle, yielding 64 FP32 operations per CU clock in the base configuration. Export units integrated into the CU handle output from wavefronts, facilitating stores to global buffers via vector memory instructions and raster operations such as exporting colors or positions to targets. These units support for efficiency and are shared across wavefronts to synchronize data flow with downstream or compute pipelines. Double-precision floating-point performance evolved significantly across GCN generations to better support scientific . In GCN 1.0, double-precision operations ran at 1/16 the rate of single-precision due to shared hardware resources prioritizing FP32 workloads. Subsequent iterations, starting with GCN 2.0, improved this to 1/4 the single-precision rate through dedicated ALU enhancements and optimized instructions like V_FMA_F64, enabling higher throughput for applications requiring FP64 arithmetic without compromising the core scalar-vector balance.

Graphics and Compute Pipeline

Geometric processing

In the Graphics Core Next (GCN) architecture, geometric processing encompasses the initial stages of the , handling data ingestion, programmable shading for transformations, and fixed-function optimization to prepare for rasterization. This pipeline begins with fetch, where attributes are retrieved from buffers stored in system memory using buffer load instructions such as TBUFFER_LOAD_FORMAT, which access data through a unified read/write including a 16 L1 per compute unit (CU) and a shared 768 L2 . Primitive assembly follows, where fetched vertices are grouped into (such as triangles, lines, or points) by dual geometry engines capable of processing up to two per clock cycle, enabling high throughput—for instance, 1.85 billion per second on the HD 7970 at 925 MHz. The programmable vertex shader stage transforms these vertices using shaders executed on the scalable array of CUs, where each CU contains four 16-wide SIMD units that process 64-element wavefronts in parallel via a non-VLIW instruction set architecture (ISA) with vector ALU (VALU) operations for tasks like position calculations and attribute interpolation. This design allows flexible control flow and IEEE-754 compliant floating-point arithmetic, distributing workloads across up to 32 CUs for efficient parallel execution without the rigid bundling of prior VLIW architectures. Tessellation and geometry shaders extend this programmability, with a dedicated hardware tessellator performing efficient domain subdivision—generating 2 to 64 patches per invocation, up to four times faster than previous generations through improved parameter caching and vertex reuse that spills to the coherent L2 cache when needed. Geometry shaders, also run on CUs, enable primitive amplification and manipulation using instructions like S_SENDMSG for task signaling, supporting advanced effects such as fur or grass generation. Fixed-function clipping and stages then optimize the by rejecting unnecessary , including backface to discard facing away from the viewer and view-frustum to eliminate those outside the camera's , reducing downstream computational load. The setup engine concludes pre-raster processing by converting assembled into a standardized —typically triangles, but also points or lines—for handover to the rasterizer, which generates up to 16 pixels per cycle per while integrating hierarchical Z-testing for early detection. These stages collectively leverage GCN's unified virtual addressing and scalable design, supporting up to 1 terabyte of addressable memory to handle complex scenes efficiently across generations.

Rasterization and pixel processing

In the Graphics Core Next (GCN) architecture, the rasterization stage converts primitives into fragments by scanning screen space tiles, with each rasterizer unit processing one triangle per clock cycle and generating up to 16 pixels per cycle. This target-independent rasterization offloads anti-aliasing computations to fixed-function hardware, reducing overhead on programmable shaders. Hierarchical Z-testing is integrated early in the pipeline, performing coarse depth comparisons on tile-level buffers to cull occluded fragments before they reach the shading stage, thereby improving efficiency by avoiding unnecessary pixel shader invocations. Fragment shading occurs within the compute units (CUs), where pixel shaders execute as 64-wide wavefronts, leveraging the same SIMD hardware as and compute shaders for unified processing. GCN supports multi-sample (MSAA) up to 8x coverage, with render back-ends (RBEs) equipped with 16 KB color caches per RBE for sample storage and compression, enabling efficient handling of anti-aliased s without excessive demands. Enhanced quality AA (EQAA) extends this to 16x in some configurations using 4 KB depth caches per quad. Texture sampling is managed by texture fetch units (TFUs) integrated into each CU, typically four per CU in first-generation implementations, which compute up to 16 sampling addresses per cycle and fetch texels from the L1 cache. These units support bilinear, trilinear, and up to 16x, with anisotropic modes incurring up to N times the cost of bilinear filtering based on the factor to enhance texture clarity at angles. Following , fragments undergo depth and testing in the RBEs, which apply configurable tests to determine and resolve multi-sample coverage. Blending operations then combine fragment colors with data using coverage-weighted accumulation, supporting formats like RGBA8 and advanced blending modes for final pixel output. Pixel exports from CUs route directly to these RBEs, bypassing the in some cases for optimized access. GCN integrates dedicated multimedia accelerators for audio and video processing. The (VCE) provides hardware-accelerated encoding and decoding, starting with H.264/AVC support at /60 fps in first-generation GCN via VCE 1.0, and evolving to include HEVC (H.265) in VCE 3.0 (third-generation) and VCE 4.0 (fifth-generation Vega). TrueAudio, introduced in second-generation GCN, is a dedicated ASIC co-processor that simulates spatial audio effects, enhancing realism by processing 3D soundscapes in real-time alongside graphics rendering.

Compute and asynchronous operations

Graphics Core Next (GCN) architectures introduced robust support for compute shaders, enabling general-purpose computing on graphics processing units (GPGPU) through APIs such as 1.2 and DirectCompute 11, which provide CUDA-like programmability for parallel workloads. These compute shaders incorporate synchronization primitives including barriers for intra-work-group coordination and operations (e.g., , max, min) on and to ensure data consistency across threads. Barriers are implemented via the S_BARRIER instruction supporting up to 16 wavefronts per work-group, while atomics leverage the 64 KB data share (LDS) with 32-bit wide entries for efficient thread-level operations. A key innovation in GCN is the Asynchronous Compute Engines (ACEs), which manage compute workloads independently from graphics processing to enable overlapping execution of graphics and compute tasks on the same resources. Each ACE handles multiple task queues with priority-based scheduling (ranging from background to ), each supporting up to 8 queues, with high-end implementations featuring multiple ACEs for greater parallelism (up to 64 queues total), facilitating concurrent dispatch without stalling the . This asynchronous model supports out-of-order completion of tasks, synchronized through mechanisms like , LDS, or the global data share (GDS), thereby maximizing CU utilization during idle periods in rendering. Compute wavefronts—groups of 64 threads executed in —are dispatched directly to by the ACEs, bypassing the command and fixed-function stages to streamline non- workloads. Each can schedule up to 40 wavefronts (10 per SIMD unit across 4 SIMDs), enabling high throughput for compute-intensive kernels while sharing resources with shaders when possible. This direct path allows for efficient multitasking, where compute operations fill gaps left by latency, such as during or processing waits. GCN supports large work-group sizes of up to 1024 threads per group, divided into multiple wavefronts for execution, providing flexibility for algorithms requiring extensive intra-group communication. Shared memory is facilitated by the 64 KB LDS per CU, banked into 16 or 32 partitions to minimize contention and support fast atomic accesses within a work-group. Occupancy is tuned by factors like vector general-purpose register (VGPR) usage, with maximum waves per SIMD reaching 10 for low-register kernels (≤24 VGPRs) but dropping to 1 for high-register ones (>128 VGPRs). These features enable diverse applications in GPGPU tasks, such as physics simulations in game engines that leverage async queues for real-time particle effects and . In , GCN facilitates inference workloads through compute shaders, though performance is limited without dedicated tensor cores, relying instead on general matrix multiplications via or DirectCompute. Overall, the asynchronous model enhances efficiency in scenarios, allowing seamless integration with CPU-driven systems via brief references to models like those in (HSA).

Memory and System Features

Unified virtual memory

Graphics Core Next (GCN) introduces unified virtual memory (UVM) to enable seamless sharing of a single between the CPU and GPU, eliminating the need for explicit data copies in applications. This system allows pointers allocated by the CPU to be directly accessed by GPU kernels, facilitating fine-grained data sharing and improving programmability. Implemented starting with the first-generation GCN , UVM leverages hardware and driver support to manage , supporting up to a 40-bit that accommodates 1 of addressable memory for 3D resources and textures. The GPU's (MMU) handles management, using 4 KB pages compatible with x86 addressing for transparent translation of virtual to physical addresses. This setup supports variable page sizes, including optional 4 KB sub-pages within 64 KB frames, ensuring efficient mapping for frame buffers and other resources. are populated by the driver, with the GPU MMU performing on-demand translations to maintain compatibility with the host system's model. Pointer swapping is facilitated by the scalar ALU, which processes 64-bit pointer values from registers to enable dynamic during kernel execution. This allows for fine-grained access patterns, where vector instructions operate at granularities ranging from 32 bits to 128 bits, supporting operations and variable structures without fixed constraints. Such mechanisms ensure that CPU-allocated structures can be directly referenced on the GPU, promoting semantics for enhanced efficiency. Cache coherency in GCN's UVM is maintained through the L2 cache hierarchy and integration with the input-output memory management unit (IOMMU), which translates x86 virtual addresses for (DMA) transfers between CPU and GPU. The IOMMU ensures consistent visibility of pools across the system, preventing stale data issues by coordinating cache invalidations and flushes. This hardware-assisted coherency model supports system-level memory pools, allowing the GPU to access host memory transparently while minimizing synchronization overhead. From GCN 1.0 onward, UVM has been a core feature. Integration with the (HSA) further extends UVM capabilities for coherent, multi-device environments. The primary benefit of GCN's UVM lies in , where it drastically cuts data transfer overhead by enabling direct pointer-based sharing compared to traditional copy-based models. This not only boosts application performance but also simplifies development by abstracting complexities.

Heterogeneous System Architecture

Heterogeneous System Architecture (HSA) serves as the foundational framework for Graphics Core Next (GCN) to enable unified computing between CPUs and GPUs, allowing seamless integration and task orchestration across heterogeneous agents without traditional operating system intervention. Developed by the HSA Foundation in collaboration with , this architecture defines specifications for user-mode operations, models, and a portable intermediate language, optimizing GCN for applications requiring tight CPU-GPU collaboration. By abstracting hardware differences, HSA facilitates efficient workload distribution, reducing latency and power overhead in systems like AMD's Accelerated Processing Units (). At the core of HSA's integration model are user-level queues, known as hqueues, which allow direct signaling between CPU and GPU agents in user space, bypassing kernel-mode switches for lower-latency communication. These queues are runtime-allocated structures that hold command packets, enabling applications to enqueue tasks efficiently without OS involvement, as specified in the HSA Platform System Architecture. In GCN implementations, hqueues support priority-based scheduling, from background to real-time tasks, enhancing multi-tasking in heterogeneous environments. Dispatch from the CPU to the GPU occurs through Architected Queuing Language (AQL) packets enqueued on these user-level queues, supporting fine-grained work dispatch for kernels and agents. AQL packets, such as kernel dispatch types, specify launch dimensions, code handles, arguments, and completion signals, allowing agents to build and enqueue their own commands for fast, low-power execution on GCN hardware. This mechanism reduces launch latency by enabling direct enqueuing of tasks to kernel agents, with support for dependencies and out-of-order completion. HSA leverages shared with coherent caching to enable data sharing between CPU and GPU, utilizing the unified for direct access without data movement. All agents access global memory coherently, with automatic cache maintenance ensuring consistency across the system, as mandated by HSA specifications. This model, compatible with GCN's virtual addressing, promotes efficient data-parallel computing by allowing pointers to be passed directly between processing elements. AMD's HSA Intermediate Language (HSAIL) provides a portable virtual that is compiled to the native GCN (ISA) via a finalizer, ensuring hardware-agnostic for heterogeneous execution. HSAIL, a RISC-like supporting data-parallel kernels with grids, work-groups, and work-items, translates operations like arithmetic, memory loads/stores, and into optimized GCN instructions, with features like relaxed and acquire/release semantics. The finalizer handles optimizations such as and wavefront packing tailored to GCN's SIMD execution model. HSA adoption in GCN-based began with the series (GCN 2.0), the first to implement full HSA features including hqueues and for seamless CPU-GPU task assignment. Later generations extended this to with graphics (GCN 5.0), supporting advanced HSA capabilities through the software stack (hsaROCm), which builds on HSA for workloads. These implementations enable features like heterogeneous queuing and unified memory in and systems, driving applications in compute-intensive domains.

Lossless compression and accelerators

Graphics Core Next (GCN) incorporates as a technique specifically designed for color buffers in pipelines. DCC exploits data coherence by dividing color buffers into blocks and encoding one full-precision pixel value per block, with the remaining pixels represented as using fewer bits when colors are similar. This method enables compression ratios that can reduce usage by up to 2x in scenarios with coherent data, such as skies or gradients, while remaining fully lossless to preserve rendering accuracy. Introduced in GCN 1.2 architectures, DCC allows cores to read compressed data directly, bypassing decompression overhead in render-to-texture operations and improving overall efficiency. The (PDA) serves as a hardware mechanism to cull inefficient primitives early in the , particularly benefiting tessellation-heavy workloads. PDA identifies and discards small or degenerate (zero-area) that do not contribute to the final , preventing unnecessary in compute units and reducing cycle waste. This accelerator becomes increasingly effective as triangle density rises, enabling up to 3.5x higher throughput in dense scenes compared to prior implementations. Debuting in GCN 4.0 (), PDA enhances pre-rasterization efficiency by filtering occluded or irrelevant without impacting visible output. GCN supports standard block-based texture compression formats, including BCn (Block Compression) variants like BC1 through BC7, which reduce memory footprint by encoding 4x4 blocks into fixed-size outputs of 64 or 128 bits. These formats are decompressed on-the-fly within the units (TMUs), allowing efficient sampling of up to four texels per clock while minimizing demands from main . Complementing this, fast clear operations optimize initialization by rapidly setting surfaces to common values like 0.0 or 1.0, leveraging to avoid full buffer writes and achieving significantly higher speeds than traditional clears—often orders of magnitude faster in bandwidth-constrained scenarios. This combination is integral to GCN's render back-ends, where hierarchical Z-testing further aids in discarding occluded s post-clear. To enhance power efficiency, GCN implements ZeroCore Power, a power gating technology that aggressively reduces leakage in idle components. When the GPU enters long idle mode—such as during static screen states—ZeroCore gates clocks and powers down compute units, caches, and other blocks, dropping idle power draw from around 15W to under 3W. Available from GCN 1.0 (Southern Islands successors like ), this feature achieves up to 90% reduction in static power leakage by isolating unused , promoting in discrete GPU deployments without compromising resume latency.

Generations

First generation (GCN 1.0)

The first generation of the Graphics Core Next (GCN 1.0) architecture, codenamed Southern Islands, debuted with AMD's GPUs in late 2011, marking a shift to a more compute-oriented design compared to prior VLIW-based architectures. Announced on December 22, 2011, and available starting January 9, 2012, these GPUs were fabricated on a 28 nm process node by , enabling higher density and improved power efficiency. The architecture introduced foundational support for unified virtual memory (UVM), allowing shared virtual address spaces between CPU and GPU for simplified , though limited to 64 KB pages with 4 KB sub-pages in initial implementations. Key innovations included the ZeroCore Power technology, which dynamically powers down idle compute units to reduce leakage power during low-activity periods, a exclusive to the HD 7900, 7800, and 7700 series. Double-precision floating-point (FP64) performance was configured at 1/4 the rate of single-precision (FP32) for consumer GPUs, prioritizing graphics workloads over high-end compute tasks and resulting in latencies up to 4 times higher for operations. The supported 11 and 1.2, enabling advanced , compute shaders, and general-purpose GPU computing, but lacked full asynchronous compute optimization in early drivers, relying on two asynchronous compute engines (ACEs) for basic concurrent execution. Representative implementations included the flagship Tahiti GPU in the Radeon HD 7970, featuring 32 compute units (CUs), 2048 stream processors, and 3.79 TFLOPS of FP32 performance at a 250 W TDP, paired with 3 GB of GDDR5 memory on a 384-bit bus. Lower-end models used the Cape Verde GPU, as in the Radeon HD 7770 GHz Edition with 10 CUs, 640 stream processors, over 1 TFLOPS FP32 at a 1000 MHz core clock, and an 80 W TDP, targeting mainstream desktops with 1 GB GDDR5 on a 128-bit bus. These discrete GPUs powered high-end gaming and early professional visualization, emphasizing PCI Express 3.0 connectivity and features like AMD Eyefinity for multi-display support up to 4K resolutions.

Second generation (GCN 2.0)

The second generation of Graphics Core Next (GCN 2.0), known as the Sea Islands architecture, was introduced in 2013 with the launch of the AMD Radeon R9 200 series graphics cards. This generation built upon the foundational GCN design by incorporating optimizations for compute workloads, including the initial implementation of Asynchronous Compute Engines (ACEs) that enable up to eight independent compute queues per pipeline for concurrent graphics and compute operations. These enhancements allowed for more efficient multi-tasking, with support for advanced instruction sets such as 64-bit floating-point operations (e.g., V_ADD_F64 and V_MUL_F64) and improved memory addressing via unified system and device spaces. Key discrete GPU implementations included the high-end chip in the Radeon R9 290X, featuring 44 compute units (2,816 stream processors), a peak single-precision compute performance of up to 5.6 TFLOPS at an engine clock of 1 GHz, and fabrication on a 28 nm process node. Mid-range offerings utilized the GPU, as seen in the Radeon R9 270, while low-end models like the Radeon R7 240 employed the Oland chip, all leveraging the 28 nm process for improved power efficiency over prior generations through refined and clock management. Additionally, introduced (VCE) 2.0 hardware for H.264 encoding, supporting features like B-frames and YUV intra-frame encoding to accelerate video compression tasks. Integrated graphics in previewed (HSA) capabilities, with the family (launched in early 2014) incorporating up to eight GCN 2.0 compute units alongside CPU cores for unified memory access and seamless CPU-GPU task offloading. This generation also added support for 11.2 and 2.0, enabling broader compatibility with emerging compute standards while maintaining the 1:8 ratio of double- to single-precision floating-point performance.

Third generation (GCN 3.0)

The third generation of Graphics Core Next (GCN 3.0), codenamed Volcanic Islands, was released in as part of AMD's R9 300 series and lineup, introducing refinements aimed at improving efficiency and scaling for mid-range to high-end applications. This iteration built on prior generations by enhancing arithmetic precision and resource management, with key architectural updates including fused multiply-add (FMA) operations for FP32 computations to boost floating-point throughput without intermediate rounding errors. Additionally, it introduced the Discard (), a feature that optimizes by early of off-screen primitives, contributing to overall efficiency gains in rasterization workloads. Prominent implementations included the Tonga GPU, used in cards like the Radeon R9 285, fabricated on a 28 nm process with 32 compute units for mid-range performance scaling, and the flagship Fiji GPU in the Radeon R9 Fury X, featuring 64 compute units, 8.6 TFLOPS of single-precision compute performance, 4 GB of HBM1 memory, and a 275 W TDP. The Fiji variant, also on 28 nm, emphasized high-bandwidth memory integration for reduced latency in demanding scenarios, while the series as a whole supported partial H.265 (HEVC) video decode acceleration, enabling improved handling of 4K content through enhanced format conversions and buffer operations. These chips delivered notable efficiency improvements, with power-optimized designs allowing sustained performance in 4K gaming environments. GCN 3.0 also extended to accelerated processing units (), notably in the Carrizo family, where up to 12 compute units provided discrete-like graphics capabilities integrated with CPU cores on a 28 nm process, supporting 12 and for mainstream laptops. The Fury X's liquid-cooled thermal solution further exemplified refinements, maintaining lower temperatures under load compared to air-cooled predecessors, which aided in stable clock speeds and reduced throttling during extended sessions. Overall, these advancements focused on balancing compute density with power efficiency, enabling broader adoption in gaming and multimedia without significant node shrinks.

Fourth generation (GCN 4.0)

The fourth generation of the Graphics Core Next (GCN 4.0) architecture, codenamed Polaris, was introduced in 2016 with the Radeon RX 400 series graphics cards, emphasizing substantial improvements in power efficiency and mainstream performance. Fabricated on a 14 nm FinFET process by GlobalFoundries, Polaris delivered up to 2.5 times the performance per watt compared to the previous generation, enabling better thermal management and lower power consumption for gaming and compute tasks. Key enhancements included refined clock gating, improved branch handling in compute units, and support for DirectX 12, Vulkan, and asynchronous shaders, alongside FreeSync for adaptive sync displays and HDR10 for enhanced visuals. The architecture maintained a 1:16 FP64 to FP32 ratio for consumer products, with full hardware acceleration for HEVC (H.265) encode and decode up to 4K resolution via Video Core Next (VCN) 1.0. Prominent discrete implementations featured the 10 GPU in the Radeon RX 480, with 36 compute units (2,304 stream processors), up to 5.8 TFLOPS of single-precision performance at a boost clock of 1,266 MHz, 8 GB GDDR5 memory on a 256-bit bus delivering 224 GB/s , and a 150 W TDP. Higher-end variants like the RX 580 ( 20 refresh) achieved 6.17 TFLOPS at 1,340 MHz boost with similar memory configurations, targeting gaming. Mid-range options used 11 in the RX 470, with 24 CUs (1,536 SPs) and around 3.1 TFLOPS, while entry-level 12 powered the RX 460 with 16 CUs and 2.2 TFLOPS, all supporting PCIe 3.0 and multi-monitor setups up to 5 displays. The RX 500 series in 2017 refreshed these designs with higher clocks for modest performance uplifts. GCN 4.0 also integrated into APUs like the Bristol Ridge family (launched mid-2016), featuring up to 8 compute units paired with Excavator CPU cores on 28 nm for laptops and desktops, enabling 1080p gaming without discrete GPUs and HSA-compliant task sharing. These advancements positioned Polaris as a cost-effective solution for VR-ready computing and 4K video playback, bridging the gap to higher-end architectures.

Fifth generation (GCN 5.0)

The fifth generation of the Graphics Core Next (GCN 5.0) architecture, codenamed , was introduced by in 2017, debuting with the consumer-oriented , professional-grade Vega 20 GPUs on 7 nm, and integrated variants in Ryzen APUs. This generation focused on high-bandwidth memory integration, enhanced compute density for AI and HPC, and compatibility with (HSA), while supporting DirectX 12 and emerging workloads. Key implementations spanned 14 nm and 7 nm processes, with FP64 ratios varying: 1:16 for consumer products and up to 1:2 for professional accelerators. The flagship consumer model, Radeon RX Vega 64 based on Vega 10 (14 nm FinFET), featured 64 compute units, 4,096 stream processors, peak single-precision performance of 13.7 TFLOPS at a 1,546 MHz boost clock, and a 295 W TDP for air-cooled variants. It utilized 8 GB of 2 (HBM2) on a 2,048-bit for up to 484 GB/s bandwidth, addressing data bottlenecks in 1440p and gaming. Innovations like enhanced Delta Color Compression reduced render target bandwidth by exploiting pixel coherence, while Rapid Packed Math doubled FP16 and INT8 throughput to 27.5 TFLOPS, aiding half-precision tasks without dedicated tensor cores. Vega excelled in bandwidth-limited scenarios but faced thermal challenges in sustained loads. Professional extensions included the 7 nm Vega 20 in the Radeon Instinct MI50 (November 2018), with 60 CUs (3,840 stream processors), 13.3 TFLOPS FP32 and 6.7 TFLOPS FP64 at a 1,725 MHz peak clock, 16/32 GB HBM2 on a 4,096-bit (1 TB/s ), and 300 W TDP. The MI60 variant binned 64 CUs for 14.7 TFLOPS FP32 and 7.4 TFLOPS FP64, optimized for datacenter simulations and with a 1:2 FP32:FP64 ratio. (VCN) 2.0 enabled full HEVC/H.265 @60fps encode/decode with 10-bit support, while High Bandwidth Cache Controller (HBCC) extended unified to 48 bits, accessing up to 512 TB for large datasets. Integrated graphics in , such as Raven Ridge (2018, 14 nm) with 8–11 (8–11 CUs, up to 1.07 TFLOPS FP32 at 1,250 MHz DDR4), and the 12 nm Picasso refresh (2019), provided discrete-level for mainstream tasks. These solutions highlighted GCN 5.0's versatility in , paving the way for architecture transitions while ensuring .

Performance and Implementations

Chip implementations across generations

The Graphics Core Next (GCN) architecture powered a wide array of GPU implementations from 2012 to 2021, encompassing discrete graphics cards, integrated graphics processing units (iGPUs) in accelerated processing units (), and professional-grade accelerators. These chips were fabricated primarily on and process nodes ranging from 28 nm to 7 nm, with memory configurations evolving from GDDR5 to high-bandwidth memory (HBM) and HBM2 for enhanced performance in compute-intensive applications. Over 50 distinct chip variants were released, reflecting 's strategy to scale GCN across , , and segments.

Discrete GPUs

Discrete GCN implementations targeted gaming and , featuring large die sizes to accommodate numerous compute units (CUs). Key examples include the first-generation Tahiti die, used in the Radeon HD 7970 series, which utilized a 28 nm process node, measured 352 mm² in die , and contained 4.31 billion transistors while supporting GDDR5 . In the third generation, the Fiji die, employed in the Radeon R9 Fury series, represented a significant scale-up on the same 28 nm node with a 596 mm² die and 8.9 billion transistors, paired with 4 of HBM for superior in professional workloads. Fifth-generation Vega 10, found in the Radeon RX Vega 64, shifted to a 14 nm GlobalFoundries process, achieving a 486 mm² die with 12.5 billion transistors and up to 8 HBM2 to boost compute throughput. Other notable discrete dies spanned s, such as (GCN 2.0) and 10 (GCN 4.0, 230 mm² die on 14 nm with GDDR5).
GenerationKey DieProcess NodeDie Size (mm²)Transistors (Billions)Memory Type
GCN 1.0Tahiti28 nm3524.31GDDR5
GCN 3.0Fiji28 nm5968.9HBM
GCN 5.0Vega 1014 nm48612.5HBM2

Integrated APUs

GCN iGPUs were embedded in AMD's A-Series, Ryzen, and other APUs to enable heterogeneous computing on mainstream platforms, typically with fewer CUs than discrete counterparts for power efficiency. Early low-power examples include the Kabini APU (e.g., A4-5000 series, 2013), integrating up to 6 CUs of GCN 3.0 on a 28 nm process with shared DDR3 memory. For desktop, the Kaveri APUs, such as the A10-7850K (2014), featured an 8-CU Radeon R7 iGPU on a 28 nm GPU process, supporting up to 2133 MHz DDR3 for improved graphics performance in compact systems. By the fifth generation, Raven Ridge APUs like the Ryzen 5 2400G (2018) incorporated up to 11 CUs in a Vega-based iGPU on a 14 nm process, utilizing dual-channel DDR4 memory to deliver discrete-level graphics for gaming and content creation. These integrated solutions prioritized shared memory access over dedicated VRAM, enabling seamless CPU-GPU collaboration.

Professional GPUs

AMD extended GCN to and markets through the FirePro and lines, optimizing for stability and . The FirePro W9000, based on the GCN 1.0 die, offered 6 GB GDDR5 on a 28 nm process for CAD and tasks, delivering up to 3.9 TFLOPS of single-precision compute. Later, the MI series leveraged GCN 5.0, with the MI25 using a 10 die (16 GB HBM2, 14 nm) for acceleration, and the MI50 employing 20 (32 GB HBM2, 7 nm) to support clusters. These professional variants emphasized support and multi-GPU scaling, distinct from consumer-focused discrete cards.

Comparison of key specifications

The key specifications of Graphics Core Next (GCN) architectures evolved across generations, with progressive advancements in compute density, memory subsystems, and power efficiency driven by process node shrinks and architectural refinements. implementations, selected for their representative high-end performance in consumer or compute roles, demonstrate these trends through increased compute units (CUs), higher floating-point throughput, and enhanced , while maintaining compatibility with the unified GCN instruction set.
GenerationFlagship ChipCUsFP32 TFLOPSFP64 TFLOPSMemory Bandwidth (GB/s)Process NodeTDP (W)
GCN 1.0Radeon HD 7970323.790.95 (1:4 ratio)26428 nm250
GCN 2.0Radeon R9 290X445.631.41 (1:4 ratio)32028 nm290
GCN 3.0Radeon R9 Fury X648.600.54 (1:16 ratio)51228 nm275
GCN 4.0Radeon RX 480365.830.36 (1:16 ratio)25614 nm150
GCN 5.0Radeon Instinct MI256424.612.3 (1:2 ratio)48414 nm300
Performance trends in GCN show substantial uplift in single-precision (FP32) compute capability, scaling from approximately 3.8 TFLOPS in the GCN 1.0 HD 7970 to 24.6 TFLOPS in the GCN 5.0 Instinct MI25, representing over a 6x increase enabled by denser CU integration and clock optimizations. Efficiency metrics also improved markedly; for instance, GCN 4.0 () delivered about 36% higher compared to GCN 2.0 through refinements like level-3 data cache (L3CC) and , while GCN 5.0 () further enhanced this via high-bandwidth cache (HBC) structures, yielding up to 2x better perf/W in compute workloads relative to GCN 1.0 baselines. A feature matrix highlights GCN's evolution in concurrent processing and memory technologies: asynchronous compute (via Asynchronous Compute Engines, or ACEs) was available across all generations starting with up to 2 ACEs in GCN 1.0 and scaling to 8 in GCN 2.0 and later for better GPU utilization in heterogeneous workloads; high-bandwidth memory (HBM) support debuted in GCN 3.0 with HBM1 for reduced latency in bandwidth-intensive tasks, followed by HBM2 in GCN 5.0; precision ratios varied by product segment, with consumer GPUs maintaining 1:4 (FP64:FP32) in early generations for balanced graphics/compute and shifting to 1:16 in later consumer models to prioritize FP32 throughput, while compute-oriented chips like the MI25 achieved a full 1:2 ratio for double-precision dominance. GCN's strengths lie in its compute scalability, facilitated by the uniform wavefront execution model and support for APIs like and DirectX 12, enabling seamless integration in (HPC) and pipelines with up to 64 CUs per die in later generations. However, a notable limitation is the absence of dedicated , relying instead on software-emulated methods that incur higher overhead compared to specialized accelerators in subsequent architectures.

References

  1. [1]
    [PDF] AMD GRAPHICS CORES NEXT (GCN) ARCHITECTURE
    GCN is a fundamental shift for GPU hardware, using compute units with a new instruction set, coherent caching, and virtual memory.
  2. [2]
    GCN, AMD's GPU Architecture Modernization - Chips and Cheese
    Dec 4, 2023 · GCN focuses on predictable compute performance, using a CPU-like instruction set, scalar execution, and a Compute Unit structure, unlike ...
  3. [3]
    [PDF] gcn3-instruction-set-architecture.pdf - AMD
    Graphics Core Next Architecture, Generation 3. August 2016. Revision 1.1. Page 2. ii. DISCLAIMER. The information contained herein is for informational ...
  4. [4]
    [PDF] AMD CDNA ARCHITECTURE
    The AMD CDNA architecture builds on GCN's foundation of scalars and vectors and adds matrices as a first class citizen while simultaneously adding support for ...
  5. [5]
    [PDF] RDNA Architecture - GPUOpen
    This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to, the features, ...
  6. [6]
    Press Release - SEC.gov
    Oct 25, 2006 · The value of the ATI acquisition of approximately $5.4 billion is based upon the closing stock price of AMD common stock on October 24, 2006 of ...
  7. [7]
    AMD Demonstrates Next Generation 28nm Graphics Processor at ...
    Oct 5, 2011 · At Fusion 2011, AMD (NYSE: AMD) today demonstrated its next generation graphics processor, based on the cutting-edge 28nm process technology.
  8. [8]
  9. [9]
    AMD Launches World's Fastest Single-GPU Graphics Card
    Dec 22, 2011 · AMD Launches World's Fastest Single-GPU Graphics Card -- the AMD Radeon(TM) HD 7970. December 22, 2011 12:01 am EST Download as PDF. The AMD ...Missing: announcement | Show results with:announcement
  10. [10]
    AMD Vega 20 GPU Specs - TechPowerUp
    AMD Vega 20 ; Graphics Processor. Released: Nov 18th, 2018 ; Graphics Features. DirectX: 12 (12_1) ; Render Config. Shading Units: 4096.
  11. [11]
    [PDF] HD 7970 WITH GRAPHICS CORE NEXT (GCN) ARCHITECTURE
    Aug 28, 2012 · AMD Display Technologies whitepaper · AMD Eyefinity Technology ... Graphics Core Next Architecture whitepaper · http://www.amd.com/us ...
  12. [12]
    [PDF] Deep Dive: Asynchronous Compute - GPUOpen
    Asynchronous compute is not independent, shares hardware, and has an overhead cost called 'async tax'. It can lead to more performance.
  13. [13]
    [PDF] AMD GPU Hardware Basics
    AMD GCN Hardware Overview. | Frontier Application Readiness Kick-Off Workshop | Oct 2019 | |. 3. Page 4. AMD Graphics Core Next (GCN) GPUs. AMD's first GCN GPUs ...Missing: reference | Show results with:reference
  14. [14]
    The Evolution of GCN - SAPPHIRE Nation
    Jun 26, 2017 · The other major addition is the so-called Primitive Discard Accelerator, which discards Zero area triangle primitives and greatly helps with ...
  15. [15]
    [PDF] The Polaris Architecture | - TechPowerUp
    Figure 6: The primitive discard accelerator for small and degenerate primitives is more effective as the triangle density increases.2. The Polaris geometry ...
  16. [16]
    AMD GCN Assembly Cross-Lane Operations
    Aug 10, 2016 · Cross-lane operations are an efficient way to share data between wavefront lanes. This article covers in detail the cross-lane features that GCN3 offers.Missing: Discard Accelerator
  17. [17]
    [PDF] southern-islands-instruction-set-architecture.pdf - AMD
    The contents of this document are provided in connection with Advanced Micro Devices,. Inc. (“AMD”) products. AMD makes no representations or warranties with ...
  18. [18]
    GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
    Jan 31, 2014 · The document provides a detailed overview of AMD's Graphics Core Next (GCN) architecture, highlighting its evolution from earlier GPU ...
  19. [19]
    New Code Allows VCE 1.0 Video Acceleration To Work ... - Phoronix
    Oct 24, 2025 · Video Coding Engine 1.0 (VCE 1.0) is found with the GCN 1.0 graphics cards and Piledriver APUs for H.264 / MPEG-4 AVC video acceleration. VCE ...
  20. [20]
    AMD Radeon R9 and R7 Series Graphics Cards Usher in a New Era ...
    Sep 25, 2013 · AMD TrueAudio technology enhances audio realism by simulating the human brain's perception of real-world sound, working in concert with the ...
  21. [21]
    [PDF] "Vega" Instruction Set Architecture | AMD
    The figure below shows a block diagram of the AMD GCN Vega Generation series processors. Figure 1. AMD GCN VEGA Generation Series Block Diagram. The GCN ...
  22. [22]
    [PDF] AMD CDNA™ 2 ARCHITECTURE
    The compute tasks are managed by the four Asynchronous Compute Engines (ACEs), which dispatch compute shader wavefronts to the compute units. ... many ...
  23. [23]
    [PDF] HSA Platform System Architecture Specification Version 1.2
    2.8 Requirement: User mode queuing. An HSA-compliant platform shall have support for the allocation of multiple “user-level command queues.” For the purpose of ...
  24. [24]
    [PDF] HSA Programmer's Reference Manual: HSAIL Virtual ISA and ...
    May 1, 2013 · The contents of this document are provided in connection with the HSA Foundation specifications. This specification is protected by copyright ...
  25. [25]
    AMD Revolutionizes Compute and UltraHD Entertainment With ...
    Jan 14, 2014 · Heterogeneous System Architecture (HSA) Features Enable Groundbreaking Compute Performance and Define Next-Gen Application Acceleration.
  26. [26]
    ROCm 7.1.0 release notes
    If you're using AMD Radeon GPUs or Ryzen APUs in a workstation setting with a display connected, see the Use ROCm on Radeon and Ryzen documentation to verify ...Missing: HSA Vega<|control11|><|separator|>
  27. [27]
    Getting the Most Out of Delta Color Compression - AMD GPUOpen
    Mar 14, 2016 · DCC is a domain-specific compression that tries to take advantage of data coherence. It's lossless, and adapted for 3D rendering.
  28. [28]
    AMD Radeon HD 7970 unveiled as world's first 28nm GPU - New Atlas
    Dec 26, 2011 · The use of AMD ZeroCore Power technology facilitates a reduction in power consumption at idle to just 3W and it's also capable of reducing the ...
  29. [29]
    AMD Radeon HD 7970 Specs | TechPowerUp GPU Database
    Being a dual-slot card, the AMD Radeon HD 7970 draws power from 1x 6-pin + 1x 8-pin power connector, with power draw rated at 250 W maximum. Display outputs ...
  30. [30]
    AMD Radeon R9 290X Graphics Card Pioneers a New Era in ...
    Oct 24, 2013 · SUNNYVALE, CA -- (Marketwired) -- 10/24/13 -- AMD (NYSE: AMD) today launched the AMD Radeon™ R9 290X graphics card, introducing the ultimate GPU ...
  31. [31]
    [PDF] Sea Islands Series Instruction Set Architecture - AMD
    The contents of this document are provided in connection with Advanced Micro Devices, Inc.
  32. [32]
    AMD Open-Sources VCE Video Encode Engine Code - Phoronix
    Feb 4, 2014 · ... AMD GCN (Sea Islands, Kabini, etc) hardware. The VCE engine is optimized for encoding a low-latency H.264 stream for wireless displays but ...
  33. [33]
  34. [34]
    AMD Ushers in a New Era of PC Gaming With Radeon(TM) R9 and ...
    Jun 16, 2015 · The AMD Radeon™ R9 Fury X GPU is planned to be available on June 24 from select add-in card suppliers, including ASUS, Gigabyte, MSI, Sapphire, ...Missing: Tonga GCN
  35. [35]
    An Architectural Deep-Dive into AMD's TeraScale, GCN & RDNA ...
    Nov 11, 2019 · With an overview of AMD's GPUs and supporting prerequisite information behind us, it's time to delve into TeraScale, GCN and RDNA's architectural depths.<|control11|><|separator|>
  36. [36]
    AMD Radeon R9 FURY X Specs - GPU Database - TechPowerUp
    Render Config. Shading Units: 4096. TMUs: 256. ROPs: 64. Compute Units: 64. L1 Cache: 16 KB (per CU). L2 Cache: 2 MB ... 8.602 TFLOPS (1:1). FP32 (float): 8.602 ...
  37. [37]
    AMD Carrizo Mainstream APUs Overview - Notebookcheck
    Jun 3, 2015 · Carrizo uses the latest version of the GCN graphics core. The most important improvements of the third generation compared to the predecessor ...
  38. [38]
    AMD Redefines the Enthusiast Gaming Experience with Radeon ...
    Radeon Packs and Radeon RX Vega graphics cards are expected to be available starting August 14th. For more information on where to buy visit ...
  39. [39]
    AMD Radeon RX Vega 64 Specs - GPU Database - TechPowerUp
    The Radeon RX Vega 64 is a high-end graphics card by AMD, launched on August 7th, 2017. Built on the 14 nm process, and based on the Vega 10 graphics processor.
  40. [40]
    [PDF] AMD Radeon Next Generation GPU Architecture - Hot Chips
    Flexible Geometry Pipeline. Improved Native Pipeline. Next Generation Primitive ... Discrete AMD Radeon™ and FirePro™ GPUs based on the Graphics Core Next ...
  41. [41]
    RX Vega 64 & 56 Power Consumption, Price, & Threadripper Spacers
    Jul 31, 2017 · Vega 64 will also host 64 render back-ends, which are responsible for delta color compression in memory and color buffer compression to save ...<|control11|><|separator|>
  42. [42]
    [PDF] Radeon's next-generation Vega architecture | AMD - TechPowerUp
    For instance, the fixed-function geometry pipeline is capable of four primitives per clock of throughput, but “Vega's” next-generation geometry path has much ...
  43. [43]
    AMD Radeon Instinct MI50 Specs | TechPowerUp GPU Database
    The Vega 20 graphics processor is a large chip with a die area of 331 mm² and 13,230 million transistors. It features 3840 shading units, 240 texture mapping ...
  44. [44]
    TechPowerUp
    ### Summary of Flagship GPU Specifications (GCN Generations)
  45. [45]
    AMD Tahiti GPU Specs - TechPowerUp
    With a die size of 352 mm² and a transistor count of 4,313 million it is a large chip. Tahiti supports DirectX 12 (Feature Level 11_1). For GPU compute ...
  46. [46]
    AMD Vega 10 GPU Specs - TechPowerUp
    AMD's Vega 10 GPU uses the GCN 5.0 architecture and is made using a 14 nm production process at GlobalFoundries. With a die size of 495 mm² and a transistor ...Missing: 20 Ridge<|control11|><|separator|>
  47. [47]
  48. [48]
    AMD Trinity FM2 APU Preview Review | TechPowerUp
    Rating 4.0 · Review by cadaveca (TPU)Sep 26, 2012 · Specifications ; AMD A-Series APU · 32 nm, FM2 socket · CPU Power: 65W and 100W configurations · “Piledriver” 32nm HKMG process core (up to 4 cores) ...Missing: GCN | Show results with:GCN
  49. [49]
    GPU Specs Database - TechPowerUp
    Below you will find a reference list of most graphics cards released in recent years. ... AMD GCN 2.0 IGP (32). AMD. Name, Bus, Memory, GPU Clock, Memory Clock ...
  50. [50]
    AMD FirePro W9000 Specs - GPU Database - TechPowerUp
    The GPU is operating at a frequency of 975 MHz, memory is running at 1375 MHz (5.5 Gbps effective). Being a dual-slot card, the AMD FirePro W9000 draws power ...
  51. [51]
    AMD Instinct™ Accelerators
    The AMD Instinct™ MI350 Series GPUs set a new standard for Generative AI and high performance computing (HPC) in data centers. Built on the new cutting-edge 4th ...AMD Instinct MI350 SeriesView Instinct MI300 SeriesView Instinct MI210 SpecsView Instinct MI200 SeriesAMD Instinct™ MI250X
  52. [52]
    AMD Radeon R9 290X Specs - GPU Database - TechPowerUp
    It features 2816 shading units, 176 texture mapping units, and 64 ROPs. AMD has paired 4 GB GDDR5 memory with the Radeon R9 290X, which are connected using a ...
  53. [53]
    AMD Radeon RX 480 Specs - GPU Database - TechPowerUp
    The RX 480 has 2304 cores, 8GB GDDR5 memory, 256-bit bus, 1120/1266 MHz clock, 150W power, 1x HDMI 2.0b, 3x DisplayPort 1.4a, and 1x 6-pin connector.
  54. [54]
    Accelerator Specifications - AMD
    256, 2400 MHz, 10.1 PFLOPs, 10.1 PFLOPs, 5 PFLOPs, 5 PFLOPs, 10.1 PFLOPs, 2.5 ... 06/2017, Graphics Core Next (GCN) 3rd Gen, 28nm, 4,096, 8.9 TFLOPs, 8.9 TFLOPs ...
  55. [55]
    Leveraging Asynchronous Queues for Concurrent Execution - AMD ...
    Dec 1, 2016 · GCN hardware contains a single geometry frontend, so no additional performance will be gained by creating multiple direct queues in DirectX® 12.
  56. [56]
    AMD "Fiji" HBM Implementation Detailed - TechPowerUp
    May 20, 2015 · AMD's flagship Radeon graphics card based on "Fiji" is rumored to feature a memory clock speed of 500 MHz (1 Gbps per pin), which translates in to 512 GB/s of ...