Fact-checked by Grok 2 weeks ago

Tiled rendering

Tiled rendering is a computer graphics technique used in graphics processing units (GPUs) to divide the rendering target, such as a screen or framebuffer, into small rectangular regions called tiles, which are processed sequentially to minimize memory bandwidth usage and improve efficiency.^[1] This approach involves a two-stage process: first, a binning pass sorts geometry (like triangles) into the tiles they overlap, and second, each tile is rasterized and shaded independently using on-chip memory before being written to the final framebuffer.^[2] By confining rendering operations to local tile memory, tiled rendering avoids frequent accesses to slower off-chip memory, making it particularly suitable for power-constrained devices like mobile GPUs.^[3] The technique originated in the 1990s with early implementations by companies like PowerVR and Gigapixel, who developed tile-based architectures to address bandwidth limitations in embedded systems.^[4] PowerVR's tile-based deferred rendering (TBDR), for instance, deferred shading until visibility was determined per tile, a method that gained prominence in mobile GPUs from vendors such as ARM, Imagination Technologies, and Qualcomm.^[4] Over time, variants emerged, including tile-based immediate rendering (TBIR) in desktop GPUs like NVIDIA's Maxwell and Pascal architectures, which rasterize and shade tiles without full deferral but still buffer outputs on-die for efficiency.^[4] Apple silicon GPUs, starting with the A11 chip, enhanced TBDR with features like imageblocks for per-pixel data and tile shaders that integrate compute operations, further optimizing for high-performance mobile rendering.^[3] Key benefits of tiled rendering include reduced power consumption—critical for battery-powered devices, where mobile GPUs operate at 3-6 watts compared to hundreds for desktops—and higher performance through overdraw reduction, as shaders execute only on visible fragments within a tile.^[2] It contrasts with immediate-mode rendering (IMR) architectures, which process geometry across the entire frame without tiling, leading to higher bandwidth demands and less efficiency in memory-limited environments.^[1] Today, tiled rendering dominates mobile and XR platforms, such as Meta Quest devices and Samsung Galaxy hardware, enabling complex 3D scenes despite constrained resources like 1-5 MB of on-chip memory.^[2]^[5]

Fundamentals

Definition and Principles

Tiled rendering, also known as tile-based rendering, is a graphics processing technique that divides the screen space into a regular grid of small rectangular tiles, typically measuring 16x16 or 32x32 pixels, and renders each tile independently to optimize memory usage and bandwidth efficiency. This approach processes the entire scene geometry once to determine which primitives overlap each tile, avoiding the need for full-framebuffer reads and writes during shading. By confining rendering operations to local on-chip memory for each tile, tiled rendering reduces external memory traffic, which is particularly beneficial in power-constrained environments.^[6]^[7] The core principles revolve around a two-pass pipeline: first, a binning stage where vertex-shaded primitives are sorted and assigned to relevant tiles based on their screen-space coverage, creating compact per-tile lists of contributing geometry. In the second stage, each tile is rasterized and shaded in isolation, performing hidden surface removal—such as depth testing—entirely within on-chip buffers to eliminate occluded fragments early and prevent unnecessary shading computations. This deferred aspect ensures that shading only occurs for visible surfaces, further minimizing redundant work and bandwidth demands, as intermediate data like depth and color values remain local until the tile is complete.^[8]^[9] Tile size selection balances several factors, including the degree of parallelism across shader cores, the on-chip memory footprint required for tile buffers, and overhead from handling tile boundaries, such as edge artifacts or additional primitive tests. Smaller tiles enhance locality and reduce memory per tile but increase binning overhead and boundary computations, while larger tiles improve coherence for complex scenes at the cost of higher local storage needs.^[6]^[9]

Comparison to Immediate Mode Rendering

Immediate mode rendering, also known as immediate-mode rendering or scanline rendering, processes graphics primitives in the order they are submitted by the application, immediately transforming vertices, rasterizing triangles, and writing fragment data directly to an off-chip framebuffer in main memory. This approach results in high memory bandwidth demands, particularly due to overdraw—where multiple fragments are processed and written for the same pixel—and frequent texture fetches that traverse the memory hierarchy for each fragment across the entire screen. In contrast, tiled rendering divides the screen into small rectangular tiles (typically 16x16 or 32x32 pixels) and processes each tile independently using on-chip tile buffers for local storage of color, depth, and other fragment data, minimizing external memory accesses until the tile is complete. This enables early depth and stencil testing confined to the tile, rejecting occluded fragments before shading computations, unlike immediate mode's scene-wide processing that applies tests after full rasterization. Tiled rendering thus achieves lower bandwidth by localizing operations, while immediate mode relies on global memory for framebuffer updates, exacerbating latency in bandwidth-constrained environments like mobile GPUs. Bandwidth savings in tiled rendering arise because only tile-local data is stored on-chip before a single write-back to memory. In immediate mode, bandwidth scales with the full screen resolution times an overdraw factor (often 2-4x in complex scenes), leading to repeated off-chip reads and writes for depth, textures, and colors. Tiled rendering reduces bandwidth through on-chip buffering, depending on scene complexity. A key trade-off lies in parallelism: tiled rendering supports fine-grained parallelism at the tile level, allowing multiple tiles to be processed concurrently on GPU cores with reduced memory contention, but it introduces binning overhead to assign primitives to tiles. Immediate mode, conversely, enables coarser-grained parallelism across entire primitives or draw calls without this preprocessing, facilitating simpler driver implementations but at the cost of inefficient resource utilization in overdraw-heavy scenarios.

Historical Development

Early Concepts and Research

The Pixel Planes project, initiated in 1981 at the University of North Carolina by Henry Fuchs and John Poulton, marked a foundational effort in developing efficient hardware for raster graphics rendering. This VLSI-oriented design introduced pixel-parallel processing, where computations such as shading and visibility tests occur directly at the pixel level using specialized memory chips, aiming to overcome bandwidth limitations in traditional frame buffers. By distributing processing across pixels, the approach enabled real-time interaction with three-dimensional images, laying early groundwork for localized rendering strategies that would influence tiled methods. Building on this, Fuchs and Poulton's 1985 work further advanced deferred processing techniques within the Pixel Planes framework, demonstrating algorithms for fast rendering of spheres, shadows, textures, transparencies, and image enhancements. These methods deferred complex shading operations until after visibility resolution, reducing redundant computations and memory accesses in hardware prototypes. This deferred approach highlighted the potential for separating geometric processing from pixel filling, a core principle that would later integrate with tiling to optimize bandwidth in resource-constrained systems.^[10] Tiling concepts emerged prominently in the Pixel Planes 5 architecture, detailed in a 1989 publication by Fuchs, Poulton, and collaborators, which subdivided the screen into 128×128 pixel patches processed by multiple SIMD renderers. This tile-based subdivision allowed independent handling of primitives per patch, with simulations validating high performance—up to 150,000 Phong-shaded triangles per second per renderer—while minimizing global memory bandwidth through on-chip SRAM and local VRAM operations. Academic prototypes and simulations demonstrated substantial reductions in frame buffer traffic by localizing pixel updates, achieving efficient rendering of complex scenes with up to 1 million triangles per second across multiple renderers.^[11] Earlier scan-line algorithms, such as those developed by Kevin Weiler in the late 1970s and extended through the 1980s, contributed to the evolution toward tiled rendering by employing recursive image subdivision for hidden surface removal. Weiler's polygon area sorting method divided the viewport into smaller windows to resolve visibility, shifting from one-dimensional scan-line traversal to two-dimensional regions that better accommodated Z-buffering and complex polygon interactions. This progression from linear scan-lines to 2D tiles improved coherence exploitation and reduced overdraw in simulations, paving the way for hardware-efficient tiled pipelines.

Commercial Milestones

The first commercial implementation of tiled rendering in consumer graphics hardware arrived with the PowerVR PCX1 chipset, released in 1996 by VideoLogic (later Imagination Technologies), which introduced full tile-based deferred rendering (TBDR) for personal computers.^[12] This architecture divided the screen into tiles to reduce memory bandwidth, enabling efficient 3D rendering on limited hardware of the era.^[13] The PCX1 powered add-in cards like the Matrox M3D and 3Dlabs Oxygen VX1, marking an early shift toward bandwidth-optimized rendering in desktop GPUs.^[14] Concurrently, in the late 1990s, Gigapixel developed tile-based rendering technology, including the GigaMan engine announced in 1999, though it was not released commercially before the company's acquisition by 3dfx in 2000.^[15] In the late 1990s, tiled rendering entered the console market through the Sega Dreamcast, launched in 1998, which utilized the PowerVR2 (CLX2) GPU—a second-generation TBDR design capable of rendering up to 7 million polygons per second at 480p resolution.^[16] This console's adoption highlighted tiled rendering's advantages in power-constrained embedded systems, influencing future handheld and mobile designs.^[17] The 2000s saw further expansion into mobile devices, with ARM's acquisition of Falanx MicroSystems in 2006 leading to the Mali GPU family, which integrated TBDR for low-power embedded applications starting with the Mali-55 and Mali-200 series.^[18] Console milestones continued with the PlayStation Vita in 2011, featuring a quad-core PowerVR SGX543MP4+ GPU that advanced TBDR with support for OpenGL ES 2.0 and improved texture handling, delivering up to 28 GFLOPS of performance while maintaining efficiency for portable gaming.^[19]^[20] Post-2010, tiled rendering's adoption surged in mobile GPUs driven by stringent power and bandwidth constraints in smartphones, becoming the preferred architecture for optimizing overdraw and memory access in battery-limited environments.^[9] By the late 2010s, it dominated mobile GPUs from vendors like ARM (Mali series), Imagination Technologies (PowerVR), and Qualcomm (Adreno), which together captured the majority of the mobile market.^[6]^[21]

Technical Implementation

Tile-Based Rendering Pipeline

The tile-based rendering pipeline structures the graphics processing into sequential stages that partition the screen into small rectangular tiles, typically 16x16 or 32x32 pixels, to enable localized computations and minimize memory traffic. This approach processes input primitives through geometry preparation, spatial organization, and tile-specific rendering, culminating in framebuffer updates. By confining fragment operations to on-chip memory during tile processing, the pipeline enhances efficiency in bandwidth-limited systems.^[22]^[23]^[9] In the first stage, geometry processing transforms input primitives, such as triangles, by executing vertex shaders to compute screen-space positions and attributes. Primitives are then culled based on view frustum, back-face orientation, or other early rejection criteria to discard irrelevant geometry. The binning substage follows, where each culled primitive undergoes overlap tests—often using axis-aligned bounding boxes or precise intersection computations—against the grid of screen tiles; primitives overlapping multiple tiles are assigned to all relevant bins, resulting in replication across those tile lists.^[22]^[23]^[9] The second stage focuses on per-tile rasterization, where the pipeline iterates over each tile in parallel or sequentially. For a specific tile, only the primitives from its bin are loaded and rasterized using techniques like edge equations or hierarchical traversal to generate fragments representing covered pixels within the tile boundaries. This step computes fragment coverage masks and interpolates attributes, ensuring that geometry outside the tile is ignored to avoid unnecessary computations.^[1]^[23]^[9] In the third stage, shading and blending occur entirely within the tile's on-chip buffer. Generated fragments are shaded via fragment shaders to determine final colors and material properties, followed by depth and stencil tests to resolve visibility among overlapping fragments. Surviving fragments are then blended according to the active rendering state, such as alpha compositing, before the tile buffer is resolved—through operations like multisample anti-aliasing if enabled—and merged into the main framebuffer via a single write-back pass.^[22]^[1]^[23] The binning process incurs overhead from managing bin lists and replicating straddling primitives, which can increase memory usage in scenes with high primitive counts or large tiles; efficient overlap tests and hierarchical bin structures help balance this cost against the benefits of localized processing.^[1]^[22] Pipeline variants include immediate tiling, which skips off-chip binning by processing tiles directly in a single pass to reduce latency, and full deferred tiling, which delays fragment shading until after visibility determination to shade only visible surfaces.^[22]^[9]

Deferred Shading Techniques

Deferred shading techniques in tiled rendering separate the determination of visible geometry from the computationally intensive shading process, enabling significant efficiency gains particularly in bandwidth-constrained environments. In tile-based deferred rendering (TBDR), visibility is resolved per tile through hidden surface removal (HSR) using an on-chip depth buffer after rasterizing binned primitives; fragment shading is then performed only on visible fragments, writing results to an on-chip color buffer. This approach avoids shading hidden surfaces and confines all operations to fast local memory, reducing external memory accesses.^[9]^[3] Tile memory can also support advanced deferred shading passes, where developers populate on-chip geometry buffers (G-buffers) storing attributes like depth, surface normals, and albedo for visible pixels within a tile. Visibility is determined during the HSR stage, and subsequent lighting passes shade only visible fragments using this data, further minimizing bandwidth. Implementations vary by vendor; for example, PowerVR hardware performs fixed shading after HSR, while Apple GPUs use features like imageblocks to enable flexible G-buffer storage in tile memory for multi-pass deferred techniques.^[9]^[3] Additionally, hierarchical Z-buffering is employed during the geometry pass to perform early rejection of occluded fragments at multiple resolution levels, further culling unnecessary data before it reaches the tile buffer. This hierarchical approach builds a pyramid of depth information, enabling rapid visibility tests that reject entire groups of primitives or fragments per tile.^[3] The fundamental algorithm for TBDR can be expressed conceptually as \text{Shade}(f) = \text{Material}(L, V) \times \text{Visibility}(f), where f represents a fragment, L denotes lighting parameters, V includes view-dependent factors, and visibility is resolved post-HSR using the tile's depth buffer. Shading computations are deferred until after this visibility determination, ensuring that material evaluations—such as diffuse, specular, or physically-based models—are applied solely to fragments that contribute to the final image. This separation allows for flexible lighting integration, where multiple light sources can be processed efficiently per tile without re-rasterizing geometry. Advanced variants of TBDR extend these principles to handle anti-aliasing and data efficiency. Multi-sample anti-aliasing (MSAA) is integrated at the tile level by storing multiple samples per pixel in the on-chip buffer, resolving coverage masks during the visibility pass to shade only unique visible samples and reduce aliasing artifacts without excessive memory overhead. Compression techniques further optimize tile buffers by exploiting spatial coherence, such as for depth values. These enhancements maintain high fidelity while preserving the bandwidth savings inherent to TBDR. By deferring shading until visibility is fully resolved per tile, TBDR effectively addresses overdraw challenges in scenes with high fragment density, such as those featuring complex geometry or dense foliage, through early culling and on-chip processing. This makes it particularly suitable for resource-limited hardware, where traditional rendering might incur prohibitive memory bandwidth costs.^[3]

Applications

Desktop and Console GPUs

In desktop and console GPUs, tiled rendering has evolved into hybrid architectures that balance high-throughput performance with bandwidth efficiency, particularly in power-rich environments where memory access costs remain a bottleneck despite ample compute resources. NVIDIA introduced tiled rasterization in its Maxwell architecture starting in 2014, buffering geometry data on-chip within small screen-space tiles (typically 16x16 pixels) to minimize external memory accesses during the rasterization stage.^[4] This approach, carried forward in subsequent architectures like Pascal and beyond, reduces the need for multiple round-trips to DRAM by keeping rasterizer outputs local until tile completion, yielding significant bandwidth savings in geometry-heavy workloads.^[24] Prior to full desktop adoption, NVIDIA's Tegra series (pre-2015 models like Tegra 4) employed hybrid tiled rendering in mobile-oriented SoCs, combining tile-based deferred shading with immediate-mode elements to handle variable geometry loads while maintaining compatibility with desktop Kepler cores.^[25] AMD Radeon GPUs, beginning with the Graphics Core Next (GCN) architecture around 2011, support partial tiling through compute shaders for targeted optimizations, enabling software-based techniques rather than full-pipeline tile-based deferred rendering. In RDNA architectures (introduced 2019 and refined through RDNA 4 in 2025), developers leverage compute shaders to implement tiled light culling and deferred shading passes, dividing the screen into tiles to cull irrelevant lights or shadows per region, which is especially effective for compute-intensive effects like volumetric rendering.^[26] This software-driven partial tiling allows flexibility in large scenes, avoiding the overhead of hardware-mandated full tiling while still achieving localized bandwidth reductions by processing tiles independently in shader programs.^[27] Console GPUs, built on custom AMD RDNA 2 variants, integrate tiled techniques for high-fidelity rendering under fixed hardware constraints. The Xbox Series X and S (launched 2020) utilize DirectX 12's tiled resource management to enable sparse virtual texturing, where textures are divided into tiles loaded on-demand, reducing memory footprint and bandwidth for massive open-world environments without compromising resolution.^[28] This feature, combined with the GPU's native tiled rasterization, supports efficient handling of high-detail assets in titles emphasizing dynamic lighting. Such binning reduces draw calls and overdraw, as demonstrated in multi-platform engines like those in Call of Duty, where z-binning against tile boundaries improves volumetric culling performance in tiled setups.^[29] Hybrid models predominate in these platforms, merging tiled rasterization or compute with traditional immediate-mode rendering to accommodate expansive scenes that exceed pure tile-based limits. For instance, tile-based compute shaders handle shadow mapping or light culling in isolated passes, while the main raster pipeline processes full-frame geometry immediately, allowing seamless scaling for open-world games with millions of primitives.^[22] This combination mitigates the geometry sorting overhead of full tiling, enabling higher throughput in desktop and console titles. Performance benefits include notable bandwidth efficiency, with NVIDIA's tiled rasterization delivering reductions in memory traffic for rasterization-bound workloads, as seen in compute-heavy scenarios akin to those in Cyberpunk 2077's ray-traced global illumination passes.^[4] AMD's compute-based tiling similarly yields bandwidth savings in deferred lighting, enhancing frame rates in bandwidth-limited configurations without altering core architecture.^[26] From 2020 to 2025, tiled rendering has increasingly integrated with ray tracing through optimized BVH traversal in hybrid pipelines, where screen-space tiles guide acceleration structure culls to focus ray queries on visible regions. NVIDIA's Ampere and Ada architectures (2020 onward) use tiled raster outputs to inform BVH builds, reducing traversal costs by 20-40% in dynamic scenes via on-chip tile data reuse.^[30] This continues in the Blackwell architecture (2024). AMD's RDNA 3 (2022) and RDNA 4 (2025) extend this with compute shaders for tiled BVH refits, enabling real-time updates in ray-traced titles while maintaining compatibility with DirectX Raytracing APIs.^[31]^[32] These advancements, highlighted in high-impact works like treelet-based BVH traversal, underscore tiled rendering's role in scaling ray tracing for desktop and console interactivity.^[33]

Mobile and Embedded Devices

Tiled rendering has become the dominant architecture in mobile GPUs due to its efficiency in bandwidth-constrained and power-limited environments. ARM's Mali G-series GPUs, introduced in 2008, employ full tile-based deferred rendering (TBDR), dividing the screen into small tiles—typically 16x16 or 32x32 pixels—to process geometry and shading locally, minimizing external memory accesses and reducing power draw compared to immediate-mode alternatives.^[6] Similarly, Qualcomm's Adreno GPUs, integrated into Snapdragon SoCs since the early 2010s, utilize a tile-based approach with FlexRender technology, which dynamically adjusts tile sizes and switches between binned and direct rendering modes to optimize for varying workloads, enhancing efficiency in devices like smartphones and tablets.^[34] Apple's A-series processors, starting from the A4 in 2010, feature custom-designed GPUs that leverage TBDR tailored to the Metal API, enabling seamless integration of advanced shading techniques while maintaining low latency and power efficiency; this architecture processes tiles on-chip, supporting features like efficient multisample anti-aliasing (MSAA) and contributing to sustained performance in graphics-intensive apps without excessive battery drain.^[3] By 2025, tiled rendering dominates smartphone GPUs, facilitating smooth 60 fps gameplay at resolutions up to 4K on external displays while consuming under 5W, as seen in flagship SoCs like the Snapdragon 8 series and Apple A18. In embedded systems, tiled rendering supports power-sensitive applications such as automotive infotainment in NVIDIA's Drive PX platforms, which incorporate tiled rasterization from Pascal-era GPUs to handle real-time 3D visualizations with minimal overhead. Imagination Technologies' PowerVR Rogue architecture, used in IoT devices, applies TBDR to deliver scalable graphics in constrained environments like smart sensors and wearables, where on-chip tile buffers reduce data movement. Optimizations like dynamic tile sizing adapt to varying display resolutions, while power gating mechanisms deactivate idle tile processing units, further lowering energy use in these integrated SoCs.^[35]^[24]

Advantages and Challenges

Performance and Efficiency Gains

Tiled rendering significantly reduces memory bandwidth usage by eliminating redundant fetches due to overdraw, as fragments are processed on-chip within each tile before writing to external memory. In fill-rate limited scenes with high overdraw, this approach can achieve energy reductions of up to 90% and average bandwidth reductions of 48% through techniques like early discard of redundant tiles, compared to immediate-mode rendering that requires multiple off-chip accesses per pixel. Measurements across various workloads show an average total external data traffic reduction by a factor of approximately 2, with back traffic (from rasterizer to memory) decreasing by up to 2.71 times in scenes prone to overdraw.^[36]^[37] Power efficiency gains stem from minimizing DRAM accesses, which are energy-intensive; on-chip tile processing consumes roughly 10 times less power per access than external memory operations. Tile-based architectures in mobile GPUs demonstrate higher performance per watt compared to desktop immediate-mode GPUs, enabling longer battery life in graphics-intensive applications. For instance, optimizations in tile-based deferred rendering have been shown to reduce overall energy consumption by 37% in real-time rendering scenarios on mobile hardware.^[38]^[39] Latency improvements arise from parallel tile rendering, which allows independent processing of screen regions and reduces pipeline stalls caused by overdraw in immediate-mode systems; the effective speedup scales with the number of tiles divided by the overdraw ratio, as hidden surfaces are discarded early without external memory intervention. This parallelism is particularly beneficial in complex scenes, where it can lower average memory latency by 13.5% and yield up to 1.15x overall speedup in commercial gaming applications.^[40]^[41] Empirical comparisons highlight power savings in tile-based GPUs, contributing to extended battery life in demanding games under similar performance levels.^[42] These benefits scale with increasing resolution and scene complexity, as higher pixel counts amplify overdraw and bandwidth demands; in VR scenarios, tiled rendering supports up to 4x effective performance gains by efficiently handling the dual high-resolution eye buffers and foveated techniques without proportional memory overhead increases.^[43] As of 2024, advancements in APIs like Vulkan have enhanced tiled rendering efficiency on mobile GPUs through better support for render passes and dynamic rendering, reducing overhead in multi-subpass scenarios and improving memory access patterns.^[44]^[45]

Limitations and Optimizations

One key limitation of tiled rendering is the binning overhead, where primitives are sorted into tile lists, which can become significant with complex scenes containing many large or overlapping triangles that span multiple tiles, leading to repeated processing and increased geometry throughput demands. This overhead grows with scene complexity, potentially consuming a notable portion of the rendering budget in mobile GPUs. Another challenge arises in handling transparent objects, as alpha blending disrupts the deferred nature of tiled rendering by requiring back-to-front sorting across the entire scene to ensure correct compositing, rather than per-tile processing, which can eliminate bandwidth savings and force full-frame buffer reads and writes. Bandwidth spikes occur during tile buffer flushes to main memory, particularly at tile boundaries or when mid-render access to the framebuffer is needed for effects like post-processing, resulting in sudden high memory traffic that undermines the architecture's efficiency goals. Alpha blending further exacerbates inefficiencies by necessitating frequent framebuffer accesses, which prevent the use of on-chip tile memory and revert to higher-bandwidth immediate-mode-like behavior in tile-based deferred architectures. To mitigate binning overhead, adaptive binning techniques employ hierarchical tile structures, where coarser levels of tiles are used for initial primitive assignment before refining to finer tiles, reducing redundant geometry processing for large primitives and improving scalability in complex scenes. Compression algorithms, such as delta encoding applied to tile-local data like depth or color values, enable efficient storage in on-chip buffers; for instance, lightweight integer compression schemes using delta differences can achieve substantial reductions in data footprint for sorted or semi-sorted tile contents, though exact ratios depend on workload characteristics. Software mitigations include API extensions like Vulkan's VK_EXT_shader_tile_image, which grant fragment shaders rasterization-order access to on-chip tile image data, allowing developers to optimize custom blending or effects without full framebuffer flushes. Hybrid rendering modes address edge cases, such as high-overdraw transparency passes, by dynamically switching between tile-based deferred rendering and direct rendering paths to balance bandwidth savings with flexibility in scenarios like compute-heavy post-effects.

References

[1]
GPU Framebuffer Memory: Understanding Tiling | Samsung Developer
Summary. Tile-based rendering is a technique used by modern GPUs to reduce the bandwidth requirements of accessing off-chip framebuffer memory. Almost ...Missing: architecture | Show results with:architecture
[2]
Mobile GPUs and tiled rendering | Meta Horizon OS Developers
Dec 4, 2024 · This article provides an overview of tile-based rendering, an algorithm and architecture design used by mobile GPUs for 3D rendering.
[3]
Tailor your apps for Apple GPUs and tile-based deferred rendering
The GPU breaks up the render destination into a grid of smaller regions, called tiles. It processes each tile with one of its GPU cores, often running many at ...
[4]
Tile-based Rasterization in Nvidia GPUs - Real World Tech
Aug 1, 2016 · The PowerVR architecture has used tile-based deferred rendering since the 1990's, and mobile GPUs from ARM and Qualcomm also use various forms ...
[5]
Tile-based rendering in XR - Unity - Manual
Tile-based GPUs divide the screen into small regions (tiles), which are rendered on-chip in the tile buffer. When a tile is rendered, its data is written to the ...<|control11|><|separator|>
[6]
Tile-based GPUs - Arm Developer
Tile-based renders split the screen into small pieces and fragment shade each small tile to completion before writing it out to memory.Missing: definition | Show results with:definition
[7]
The PowerVR Advantage - PowerVR Developer Documentation
All generations are based on Imagination's patented Tile Based Deferred Rendering (TBDR) architecture. The core design principle of the TBDR architecture is ...
[8]
5th Gen Arm GPU Architecture
### Summary of Mali GPU Tile-Based Rendering Principles
[9]
A look at the PowerVR Graphics Architecture: Tile-Based Deferred ...
Apr 2, 2015 · Explore the intricacies of PowerVR's Tile-Based Deferred Rendering architecture and its efficiency advantages over Immediate Mode Renderers ...
[10]
[PDF] Fast Spheres, Shadows, Textures, Transparencies, and Image ...
Details of the hardware design and the implementation are in [Paeth 1982] and in [Poulton 1985]. The latter of these papers outlines architectural enhancements ...
[11]
Pixel-planes 5: a heterogeneous multiprocessor graphics system ...
This paper introduces the architecture and initial algorithms for Pixel-Planes 5, a heterogeneous multi-computer designed both for high-speed polygon and ...
[12]
PowerVR at 25 : The story of a graphics revolution - Imagination Blog
Aug 11, 2017 · The very first PowerVR logo This key innovation was a tile-based deferred rendering (TBDR) technology which we introduced in the mid-90s. At ...
[13]
PowerVR PCX1 review - Vintage 3D
Tile rendering divides the screen into small tiles and draws all of it, dump it to primary card and than moves onto another tile. Tiles are small enough to ...Missing: commercial milestones GPUs history Imageon ARM Mali Nintendo DS PlayStation Vita
[14]
History of PowerVR graphics cards - VOGONS
Oct 23, 2021 · The 3D performance of Midas 3 is between PCX1 and PCX2. PowerVR Series1 3D performance is stronger than Rendition Vérité V1000, and weaker than ...
[15]
Dreamcast Architecture | A Practical Analysis - Rodrigo Copetti
VideoLogic chose an alternative approach for the construction of their 3D engine called Tile-Based Deferred Rendering (TBDR). Instead of rendering a whole frame ...
[16]
Sega Dreamcast/Hardware comparison
The Dreamcast's PowerVR CLX2 GPU, due to its tiled rendering architecture, also has has a higher fillrate and faster polygon rendering throughput than a ...Graphics comparison table · Vs. Arcade · Vs. PC · Vs. Consoles
[17]
[PDF] Mali-400 MP: A Scalable GPU for Mobile Devices
▫ Graphics at ARM. ▫ Acquired Falanx in 2006. ▫ ARM Mali is now the world's most widely licensed GPU family ... ▫ Details: see Real-Time Rendering, 3 rd ed. 0.
[18]
Graphics - Vita Developer wiki - PSDevWiki
Sep 13, 2020 · The PlayStation®Vita SoC (system-on-a-chip) contains an SGX543MP4+ GPU. This is a multi-core, tile based deferred rendering GPU, with an advanced unified ...SGX543MP4+ · SGX543MP4+ Block Overview · Single SGX543+ Core Block...
[19]
[PDF] Snapdragon-8-Elite-Gen-5-product-brief.pdf - Qualcomm
Industry-leading memory cache solution, Adreno HPM, provides 18 MB of dedicated memory for revolutionary efficiency and visual rendering. • Tile Memory Heap ...
[20]
GPU architecture types explained – RasterGrid | Software Consultancy
The two major architecture types being tile-based and immediate-mode rendering GPUs. In this article we explore how they work, present their strengths/ ...<|control11|><|separator|>
[21]
[PDF] Computer Graphics: Rendering, Geometry, and Image Manipulation ...
Step 3: per-tile processing. In parallel, the cores process the bins: performing rasterization, fragment shading, and frame bufier update. While (more bins ...Missing: binning | Show results with:binning
[22]
Chapter 19. Deferred Shading in Tabula Rasa - NVIDIA Developer
Deferred shading is a technique that separates the first two steps from the last two, performing each at separate discrete stages of the render pipeline.
[23]
https://gfxcourses.stanford.edu/cs248a/winter24content/media/gpuhardware/18_mobilegpu.pdf
[24]
On NVIDIA's Tile-Based Rendering | TechPowerUp
Mar 1, 2017 · Tile-based rendering seems to have been a key part on NVIDIA's secret-sauce towards achieving the impressive performance-per-watt ratings of their last two ...
[25]
[PDF] NVIDIA Tegra Multi-processor Architecture
The GPU is specifically optimized to decode Flash video and graphics elements and is able to decode Flash content at full frame rates and render Flash visual ...
[26]
Basically all modern GPU architectures implement tiled rasterization ...
Jul 20, 2021 · Basically all modern GPU architectures implement tiled rasterization. NVIDIA has been doing it since Maxwell (2014) and AMD has been doing ...
[27]
TiledLighting11 DirectX® 11 SDK Sample - AMD GPUOpen
They utilize a Direct3D® 11 compute shader (DirectCompute 5.0) to divide the screen into tiles and quickly cull lights against those tiles. In addition to ...
[28]
RDNA Performance Guide - AMD GPUOpen
Mar 22, 2023 · Async compute queues can be used to issue compute commands to the GPU parallel to the graphics queue. This allows use of the shader resources ...
[29]
Announcing DirectX 12 Ultimate - Microsoft Developer Blogs
Mar 19, 2020 · When gamers purchase PC graphics hardware with the DX12 Ultimate logo or an Xbox Series X, they can do so with the confidence that their ...
[30]
Playstation 5 Pro specs analysis, also new information - NeoGAF
Mar 19, 2024 · Sure, being able to bin geometry to tiles was a pre-requisite. Again, not the point, it is not really that important, but between the nVIDIA GPU ...Missing: culling | Show results with:culling
[31]
Improved Culling for Tiled and Clustered Rendering in Call of Duty
The first, z-binning, significantly improves the quality and performance of volumetric entity-vs-geometry culling as compared to classic tiled and clustered ...<|separator|>
[32]
Ray Tracing | NVIDIA Developer
Ray tracing is a rendering technique that can realistically simulate the lighting of a scene and its objects by rendering physically accurate reflections.Missing: tiled 2020-2025
[33]
[PDF] "RDNA3" Instruction Set Architecture: Reference Guide - AMD
Aug 15, 2023 · This document describes the instruction set and shader program accessible state for RDNA3 devices. The AMD RDNA3 processor implements a ...
[34]
[PDF] Treelet Accelerated Ray Tracing on GPUs - People
Apr 3, 2025 · Prior work has shown that dividing the BVH tree into smaller subtrees (treelets) and traversing all rays that visit a treelet before switching ...Missing: desktop | Show results with:desktop
[35]
Introduction to Snapdragon Adreno - Game Developer Guide
FlexRender allows Adreno GPUs to switch between tile-based binned rendering and direct rendering to a frame buffer – since depending on workload, direct or ...
[36]
Developers: The evolution of high performance foveated rendering ...
Jul 7, 2021 · The key functionality that makes efficient, high-performing foveated rendering possible is the Adreno GPU's tile-based rendering approach. Tile ...
[37]
PowerVR Rogue Architecture - Imagination Technologies
Next-generation Tile Based Deferred Rendering architecture. Our Technology. Edge AI & Compute · Edge Graphics Processing · Ray Tracing · Functional Safety ...
[38]
(PDF) Memory Bandwidth Requirements of Tile-Based Rendering
7 ago 2025 · Because mobile phones are omnipresent and equipped with displays, they are attractive platforms for rendering 3D images.
[39]
[PDF] Early Discard of Redundant Tiles in the Graphics Pipeline - UPC
Tile-. Based Rendering GPUs divide the screen space into multiple tiles that are independently rendered in on-chip buffers, thus reducing memory bandwidth and ...
[40]
The Mali GPU: An Abstract Machine, Part 2 - Tile-based Rendering
Feb 20, 2014 · This blog continues the development of this abstract machine, looking at the tile-based rendering model of the Arm Mali GPU family. I'll ...Missing: 2006 | Show results with:2006
[41]
[PDF] Exploiting Frame Coherence in Real-Time Rendering for Energy ...
applied on top of a Tile-Based Rendering GPU and shown to reduce energy consumption by 37% ... Tile-Based Rendering is currently employed as a low-power de- sign ...
[42]
https://www.design-reuse.com/news/3518-imagination-s-powervr-graphics-provide-huge-performance-boost-and-power-savings-in-mediatek-s-new-helio-x30-chipset/
[43]
[PDF] Memory Bandwidth- and Locality-Aware Parallel Tile Rendering
Liuha, “Memory bandwidth requirements of tile-based rendering,” in Computer Systems: Architec- ... Molnar, G. Turk, B. Tebbs, and L. Israel, “Pixel-planes 5: a ...
[44]
Imagination's PowerVR graphics provide huge performance boost ...
... power savings up to 60 percent compared to the previous generations. The ... PowerVR's efficiency through tile-based deferred rendering (TBDR) ensures ...
[45]
Optimizing Oculus Go for Performance | Meta Horizon OS Developers
Using FFR, many apps can dramatically increase the resolution of the eye texture that they render to on Oculus Go, improving the final image.