S3 Texture Compression
S3 Texture Compression (S3TC), also known as DXTC or DXT, is a family of lossy block-based texture compression algorithms designed for efficient storage and rendering of images in 3D computer graphics applications. Developed by S3 Incorporated in the late 1990s, S3TC divides textures into 4×4 texel blocks and encodes them using fixed bit rates of 64 or 128 bits per block, achieving compression ratios of up to 6:1 for RGB data and 4:1 for RGBA data while preserving visual quality suitable for real-time rendering.[1][2] The core formats include DXT1 for opaque or binary alpha RGB/RGBA textures, DXT3 for explicit 4-bit alpha per texel, and DXT5 for interpolated alpha values, with color data interpolated from two endpoint colors per block to approximate the original image.[2] Originally introduced for S3's Savage 3D graphics accelerators, S3TC gained prominence when Microsoft licensed the technology in 1998 for integration into DirectX 6.0, enabling developers to quadruple texture memory capacity and bandwidth without significant performance overhead.[1] This adoption addressed key limitations in early 3D hardware, such as constrained video memory, by allowing compressed textures to be decompressed on-the-fly during rendering via dedicated hardware support.[1] Over time, S3TC evolved into an industry standard, standardized in OpenGL through the EXT_texture_compression_s3tc extension (finalized in 2000) and in Direct3D as BC1, BC2, and BC3 formats starting with DirectX 10.[2] Today, S3TC remains widely supported across graphics APIs including Vulkan, Metal, and WebGL, as well as hardware from NVIDIA, AMD, Intel, and mobile platforms like Android, due to its balance of compression efficiency, decoding speed, and compatibility with legacy content. Following the expiration of related patents in 2018, S3TC became freely implementable without licensing fees, further promoting its use in open-source software.[3] Despite the emergence of newer formats like ASTC and ETC, S3TC's fixed-block design and hardware acceleration continue to make it a foundational choice for game engines and real-time applications, with ongoing use in cross-platform development.[4]History and Development
Origins at S3 Graphics
S3 Texture Compression (S3TC), also known as DXT compression, originated at S3 Graphics, a company founded in 1989 that specialized in graphics processing technologies. In the mid-1990s, as 3D graphics applications demanded higher texture resolutions, S3 Graphics identified the need for efficient compression to alleviate memory bandwidth constraints in hardware accelerators. This led to the development of a fixed-rate, block-based compression scheme designed specifically for real-time texture rendering, prioritizing low decoding complexity, random pixel access, and compatibility with graphics pipelines.[5] The core algorithms were invented by Konstantine I. Iourcha, Krishna S. Nayak, and Zhou Hong, who filed the foundational patents on October 2, 1997. These patents describe a method for compressing 4x4 pixel blocks into 64 bits using two color codewords and a bitmap to interpolate pixel values, enabling 4:1 to 6:1 compression ratios while preserving visual quality for typical textures. The approach addressed shortcomings of earlier techniques like block truncation coding (BTC) and discrete cosine transform (DCT), which suffered from variable rates or high computational overhead unsuitable for hardware implementation. Issued in 2003 and 2004 as U.S. Patents 6,658,146 and 6,683,978, these documents formalized S3TC's inferred pixel value generation, where intermediate colors are derived linearly from endpoint codewords to represent block palettes efficiently.[5][6] S3TC was first implemented in hardware with the release of the Savage 3D graphics accelerator in late 1998, marking the inaugural consumer GPU to support on-the-fly texture decompression. This integration allowed the Savage 3D to handle larger textures without proportional increases in memory usage, boosting performance in early 3D games and applications. To promote widespread adoption, S3 Graphics licensed the technology to Microsoft on March 24, 1998, for inclusion in DirectX 6.0, which standardized S3TC as the DXT format and simplified developer integration by endorsing a single compression method. The licensing emphasized S3TC's developer-friendly encoding and hardware efficiency, enabling 4-6 times more texture storage in accelerators without quality loss.[1]Licensing and Standardization
S3 Texture Compression (S3TC), originally developed by S3 Graphics as a proprietary technology in the mid-1990s, required licensing agreements for implementation in graphics APIs and hardware. In March 1998, Microsoft secured a license from S3 Incorporated to integrate the compression formats into DirectX 6.0, renaming them DirectX Texture Compression (DXTC) and establishing them as a core feature for texture handling in Windows-based 3D graphics applications.[1] Integration into OpenGL faced significant challenges due to intellectual property restrictions. In 1999, S3 informed the OpenGL Architecture Review Board (ARB) that it would not provide a general license for S3TC use in the API, prompting individual hardware vendors (IHVs) to negotiate separate licenses with S3 or its successors, such as Sonicblue and later S3 Graphics (a VIA Technologies joint venture). Despite this, the GL_EXT_texture_compression_s3tc extension—supporting DXT1, DXT3, and DXT5 formats—was finalized in July 2000 by NVIDIA Corporation contributors, with a explicit warning that Direct3D licenses did not extend to OpenGL implementations.[2] The formats achieved de facto standardization through widespread adoption. S3TC was incorporated into OpenGL 1.3 as a core capability in August 2001, though actual support depended on vendor licensing and remained optional to avoid infringement risks. S3 Graphics licensed the technology to major players, including NVIDIA, ATI Technologies, Nintendo (for GameCube and subsequent consoles), and Sony (for PlayStation systems), ensuring broad hardware compatibility across PCs and gaming platforms. Licensing fees persisted until the underlying patents expired. The primary U.S. patents, filed in 1997, lapsed on October 2, 2017, after a standard 20-year term, with one continuation patent (US 6,775,417) extended until March 16, 2018. Post-expiration, S3TC—redesignated as BC1 (DXT1), BC2 (DXT3), and BC3 (DXT5) in modern specifications—became freely implementable, facilitating full integration into open-source drivers like Mesa and reinforcing its status as an industry standard in APIs such as Vulkan and [OpenGL ES](/page/OpenGL ES).[7][8]Technical Fundamentals
Compression Principles
S3 Texture Compression (S3TC), also known as DirectX Texture Compression (DXTC) or Block Compression (BC), employs a block-based, lossy compression scheme designed for efficient storage and hardware-accelerated decompression in real-time graphics rendering. The fundamental approach divides textures into independent 4×4 pixel blocks, each encoded at a fixed rate to enable random access without dependencies between blocks, which is essential for parallel GPU processing. This method achieves compression ratios of up to 6:1 for RGB data relative to uncompressed 24-bit per pixel formats, balancing quality loss with bandwidth savings.[9] At its core, S3TC builds upon Block Truncation Coding (BTC) by extending the quantization from two grayscale levels to four colors in RGB space, selected to approximate the original block's content with minimal perceptual error. For a basic RGB block, two endpoint colors are stored in 16-bit RGB565 format each, totaling 32 bits. These endpoints define a line in color space, from which two additional colors are interpolated using fixed weights: the first interpolated color as \frac{2}{3} \times \text{color}_0 + \frac{1}{3} \times \text{color}_1, and the second as \frac{1}{3} \times \text{color}_0 + \frac{2}{3} \times \text{color}_1. A 32-bit index map then assigns each of the 16 pixels to one of these four colors using 2 bits per pixel, completing the 64-bit block encoding at 4 bits per pixel (bpp). This linear interpolation assumes dominant color gradients align well with straight lines, though it can introduce artifacts at block boundaries.[9] During compression, the algorithm identifies optimal endpoint colors by evaluating pairs that minimize the sum of squared errors when pixels are assigned to the nearest interpolated color, often via exhaustive search over 256×256 possible RGB565 combinations for efficiency. Decompression reconstructs the block by simply retrieving the endpoints, computing the interpolants, and indexing the colors, a low-complexity operation implemented in fixed-function hardware since its introduction. For variants supporting alpha, such as those with 128-bit blocks, alpha is encoded separately using two-endpoint interpolation with eight levels: two endpoints plus six interpolated values when the first endpoint exceeds the second, or endpoints plus four interpolated values plus full transparent (0) and opaque (255) otherwise.[10] This separation ensures flexibility for applications requiring transparency while maintaining the fixed-rate structure. The principles prioritize perceptual quality over bit-exact fidelity, leveraging human vision's tolerance for minor color shifts in textures, and support features like punch-through alpha in some modes where one index maps to fully transparent black for cutout effects. Overall, S3TC's design enables seamless integration into graphics pipelines, with decompression costs dominated by simple arithmetic rather than complex decoding, facilitating widespread adoption in 3D rendering.[9]Block Encoding Structure
S3 Texture Compression (S3TC), also known as DirectX Texture Compression or Block Compression (BC), operates by partitioning textures into fixed-size 4×4 texel blocks, with each block encoded independently to achieve a consistent compression ratio across the image.[11] This block-based approach ensures random access to texels during rendering, as hardware can decode individual blocks without dependencies on neighboring data.[11] The encoding typically uses a small set of representative colors or values (endpoints) and indices to interpolate per-texel values, reducing data from 512 bits (uncompressed RGBA8 4×4 block) to 64 or 128 bits depending on the format.[11] In the core S3TC formats, color data is handled via two 16-bit RGB565 endpoints per block, representing the minimum and maximum colors (C0 and C1), from which intermediate colors are linearly interpolated for each texel.[12] Indices, packed as 2 bits per texel (totaling 32 bits for 16 texels), select one of four possible colors: C0, C1, or the two interpolated values (e.g., (2/3)C0 + (1/3)C1 or (1/3)C0 + (2/3)C1).[12] Alpha channels, when present, follow similar principles but use 8-bit endpoints and 3-bit indices for finer granularity, or direct per-texel values in explicit formats.[10] Block alignment is typically row-major, with the 4×4 texels ordered left-to-right, top-to-bottom, and indices bit-packed starting from the top-left texel.[11] The DXT1 (BC1) format exemplifies the basic structure, using 64 bits total: 16 bits for C0, 16 bits for C1, and 32 bits for indices.[12] If C0 > C1, four opaque colors are available; if C0 ≤ C1, a transparent black (alpha=0) replaces one interpolated color, enabling simple transparency without dedicated alpha bits.[12] For DXT3 (BC2), the block expands to 128 bits, prefixing 64 bits of explicit 4-bit alpha values per texel (allowing 16 discrete levels) before the 64-bit DXT1 color data.[13] This explicit alpha avoids interpolation artifacts but at the cost of reduced precision compared to compressed alpha.[13] DXT5 (BC3) also uses 128 bits but compresses alpha separately with two 8-bit endpoints (A0 and A1) followed by 16 × 3-bit indices (48 bits total).[10] Alpha interpolation uses eight levels, with formulas varying by endpoint order to include full transparency and opacity when needed.[10] The color portion reuses the DXT1 structure, making DXT5 suitable for RGBA textures at 4 bits per pixel (bpp).[10] Later extensions like BC4 and BC5 adapt this for single- or dual-channel data (e.g., signed normals), using 8-bit endpoints and 3-bit indices per channel in 64-bit blocks.[14] BC6H and BC7 introduce more flexible modes with variable endpoint counts and selectors, but retain the 4×4 block foundation for compatibility.[11]| Format | Block Size (bits) | Color Structure | Alpha Structure | Use Case |
|---|---|---|---|---|
| DXT1/BC1 | 64 | 2× RGB565 endpoints + 2-bit indices | Implicit (transparent black option) | RGB textures with optional transparency |
| DXT3/BC2 | 128 | As DXT1 | 4-bit explicit per texel | Textures needing precise alpha edges |
| DXT5/BC3 | 128 | As DXT1 | 2× 8-bit endpoints + 3-bit indices | General RGBA compression |
| BC4 | 64 | N/A | 2× 8-bit endpoints + 3-bit indices (one channel) | Grayscale or heightmaps |
| BC5 | 128 (two channels) | N/A | As BC4 per channel (e.g., RG for normals) | Multi-channel data without color |
Original DXT Codecs
DXT1
DXT1, also known as Block Compression 1 (BC1) in later standards, is the foundational format in the S3 Texture Compression (S3TC) family, designed for compressing RGB or RGBA textures with optional 1-bit alpha transparency. It achieves a fixed compression ratio of 4 bits per pixel by encoding 4x4 blocks of texels into 64 bits, making it suitable for real-time graphics applications where memory bandwidth is limited. Developed by S3 Graphics, DXT1 prioritizes opaque textures but supports binary alpha through a special transparent color index.[15][16] The core of DXT1's encoding revolves around a 4x4 block structure stored in 8 bytes (64 bits). The block begins with two 16-bit color values in RGB 5:6:5 format: Color_0 (bits 0-15) and Color_1 (bits 16-31). These are followed by two 32-bit words forming a 4x4 bitmap of 2-bit indices (bits 32-63), where each pair of bits selects one of up to four derived colors for the corresponding texel. The bitmap is organized row-wise, with the first 16 bits covering the top two rows and the next 16 bits the bottom two. This layout ensures hardware-efficient decoding on graphics pipelines.[16][15] Color palette derivation depends on the relative magnitudes of Color_0 and Color_1, treated as unsigned 16-bit integers. If Color_0 > Color_1, four opaque colors are generated: Color_0 (index 00), Color_1 (01), an interpolated Color_2 = round((2 × Color_0 + Color_1) / 3) (10), and Color_3 = round((Color_0 + 2 × Color_1) / 3) (11). All interpolations occur in the 5:6:5 space before expansion to full RGB. Conversely, if Color_0 ≤ Color_1, only three colors are used—Color_0 (00), Color_1 (01), and Color_2 = round((Color_0 + Color_1) / 2) (10)—with index 11 mapping to transparent black (RGB 0:0:0, alpha 0). This conditional alpha mode enables binary transparency without dedicated alpha bits, though it can introduce artifacts if transparency gradients are needed.[16][15] Decoding a texel involves extracting its 2-bit index from the bitmap using bit position 2 × (4 × y + x), where (x, y) ranges from (0,0) to (3,3), then selecting and expanding the corresponding color to 8 bits per channel. In the alpha-enabled case (Color_0 ≤ Color_1), index 11 yields alpha = 0 and RGB = (0,0,0) for correct blending. For the opaque variant (COMPRESSED_RGB_S3TC_DXT1_EXT), all texels are treated as fully opaque regardless of indices. This format's simplicity allows fast random access but limits color fidelity due to the small palette and fixed interpolation, often resulting in noticeable banding in gradients.[15][16] DXT1's design trades quality for efficiency, supporting textures with dimensions that are multiples of 4 in each direction to align blocks without padding waste in higher mip levels. It became widely adopted after S3TC's licensing to major GPU vendors, forming the basis for BC1 in Direct3D and OpenGL extensions. While effective for diffuse maps and environment textures, its lack of per-texel alpha precision makes it less ideal for detailed transparency effects compared to later formats like DXT5.[15][17]DXT2 and DXT3
DXT2 and DXT3 are variants of the S3 Texture Compression (S3TC) family that incorporate explicit alpha channel support, enabling textures with per-pixel transparency while maintaining a fixed 4:1 compression ratio of 8 bits per pixel (bpp) for 4x4 texel blocks.[18] Each 128-bit block consists of 64 bits dedicated to alpha encoding and 64 bits to color encoding, allowing for more precise alpha representation compared to the 1-bit alpha in DXT1.[19] These formats were developed to address the limitations of opaque textures in real-time rendering, particularly in scenarios requiring blending effects like shadows or semi-transparent surfaces.[2] The color encoding in both DXT2 and DXT3 uses two 16-bit RGB565 endpoint colors (color0 and color1) followed by a 32-bit index map with 2 bits per texel to select from four possible colors derived through linear interpolation assuming color0 > color1: color0, color1, (2×color0 + color1)/3, (color0 + 2×color1)/3.[18][19] During decoding, each texel's color is determined by indexing into these interpolated values, ensuring fast hardware decompression suitable for graphics pipelines.[18] The alpha channel in DXT2 and DXT3 is encoded explicitly with 4 bits per texel, stored as a contiguous 64-bit block that yields 16 distinct alpha levels (0-15), scaled to full 8-bit range (0-255) by multiplying by 17 during decoding.[19] This direct per-pixel alpha avoids interpolation, providing sharp transitions ideal for fonts, UI elements, or hard-edged transparency, but it can introduce artifacts in smooth gradients due to quantization.[18] The primary distinction between the formats lies in alpha premultiplication: DXT2 assumes colors are premultiplied by alpha (RGB channels scaled by alpha value before encoding), which aligns with certain blending models but can lead to darker results if not handled correctly in the renderer.[19] In contrast, DXT3 treats alpha as straight (non-premultiplied), keeping color channels independent for more intuitive editing and blending.[2] This difference is flagged in modern specifications via the KHR_DF_FLAG_ALPHA_PREMULTIPLIED descriptor, ensuring compatibility in APIs like OpenGL and Vulkan.[18]| Format | Alpha Encoding | Color Premultiplication | Total Block Size | Compression Ratio |
|---|---|---|---|---|
| DXT2 | 4 bits/texel (explicit) | Yes (RGB × alpha) | 128 bits (4×4 block) | 4:1 (8 bpp) |
| DXT3 | 4 bits/texel (explicit) | No (straight alpha) | 128 bits (4×4 block) | 4:1 (8 bpp) |
DXT4 and DXT5
DXT4 and DXT5 are advanced variants in the S3 Texture Compression (S3TC) family, designed to handle textures with alpha channels more efficiently than earlier formats like DXT2 and DXT3. Both formats achieve a fixed 4:1 compression ratio for 32-bit RGBA textures by encoding 4x4 pixel blocks into 128 bits, combining a 64-bit color block (identical to DXT1) with a dedicated 64-bit interpolated alpha block. This separation allows independent compression of color and opacity, enabling smoother alpha gradients compared to the explicit 4-bit-per-pixel alpha in DXT3. Introduced as part of the original S3TC suite by S3 Graphics in the late 1990s and integrated into DirectX 6.0, these formats prioritize real-time decompression on GPUs while supporting premultiplied or straight alpha workflows.[20][15] The primary distinction between DXT4 and DXT5 lies in their handling of alpha premultiplication. In DXT4, the color values in the block are assumed to be premultiplied by the alpha channel (RGB * A) during encoding, requiring shaders to divide the decoded colors by the alpha value post-decompression to recover straight RGB if needed. Conversely, DXT5 stores non-premultiplied (straight) color data, simplifying shader processing for most applications. This premultiplication assumption in DXT4 can introduce artifacts if not handled correctly, contributing to its rarity in practice; modern implementations often map both to the BC3 format (equivalent to DXT5) in DirectX 10 and later, deprecating DXT4's distinct behavior. DXT5, however, remains widely adopted for its versatility in representing semi-transparent effects like fog, shadows, or particle systems.[20][21][22] Both formats share an identical alpha encoding scheme, which uses two 8-bit endpoint values (α₀ and α₁) followed by sixteen 3-bit indices (one per pixel in the 4x4 block) to select interpolated alpha levels. The 64-bit alpha block layout consists of α₀ (bytes 0-1, but typically byte 0), α₁ (byte 1), and 48 bits (6 bytes) for the indices in row-major order. Interpolation depends on the endpoint ordering:- If α₀ > α₁, eight evenly spaced levels are generated: α = α₀, α[23] = (6α₀ + α₁)/7, α[24] = (5α₀ + 2α₁)/7, α[25] = (4α₀ + 3α₁)/7, α[26] = (3α₀ + 4α₁)/7, α[27] = (2α₀ + 5α₁)/7, α[28] = (α₀ + 6α₁)/7, α[29] = α₁. This mode suits gradual opacity transitions.
- If α₀ ≤ α₁, six interpolated levels plus extremes are used: α = α₀, α[23] = α₁, α[24] = (4α₀ + α₁)/5, α[25] = (3α₀ + 2α₁)/5, α[26] = (2α₀ + 3α₁)/5, α[27] = (α₀ + 4α₁)/5, α[28] = 0 (fully transparent), α[29] = 255 (fully opaque). This facilitates encoding binary transparency efficiently.[15][30]
Extended BC Formats
BC4 and BC5
BC4 and BC5, introduced as part of the Block Compression (BC) formats in Direct3D 10, extend the original S3 Texture Compression family by providing efficient encoding for single- and dual-channel data, respectively.[30] These formats were designed to support higher-precision applications, such as normal mapping, where full RGB compression is unnecessary, achieving a compression ratio of 4 bits per pixel (bpp) for BC4 and 8 bpp for BC5.[32] Unlike the earlier DXT formats (BC1–BC3), which primarily target RGB or RGBA data with punch-through alpha options, BC4 and BC5 focus on normalized scalar values, enabling better fidelity for specialized textures without the overhead of unused color channels.[30] BC4, available in unsigned normalized (UNORM) and signed normalized (SNORM) variants (DXGI_FORMAT_BC4_UNORM_BLOCK and DXGI_FORMAT_BC4_SNORM_BLOCK), compresses a single channel of 4×4 texel blocks into 8 bytes. The encoding uses two 8-bit endpoint values to define a gradient, followed by sixteen 3-bit indices that select from a palette of eight values using 3-bit indices, with the palette defined by two endpoints and either six interpolated values (if the first endpoint exceeds the second) or four interpolated values plus fixed minimum (0 or -1) and maximum (1) values (if the first is less than or equal to the second) for each texel.[33] This linear interpolation scheme allows representation of values in [0,1] for UNORM or [-1,1] for SNORM, making it suitable for grayscale images, heightmaps, or single-channel displacement data.[34] The format's block structure consists of bytes 0-1 for the two 8-bit endpoints, followed by bytes 2-7 packing the 48 bits (sixteen 3-bit indices), ensuring hardware-accelerated decoding on Direct3D 10+ compatible GPUs.[30] BC5 builds directly on BC4 by encoding two independent channels (typically red and green, or X and Y components) within a 4×4 block, using 16 bytes total—effectively two BC4 blocks concatenated.[35] Each channel employs its own pair of 8-bit endpoints and sixteen 3-bit indices, supporting UNORM ([0,1] per channel) or SNORM ([-1,1] per channel) interpretations (DXGI_FORMAT_BC5_UNORM_BLOCK and DXGI_FORMAT_BC5_SNORM_BLOCK).[36] This dual-channel approach is particularly effective for tangent-space normal maps, where the Z component can be derived from X and Y via normalization, reducing memory usage while preserving surface detail essential for lighting calculations.[30] In the broader context of S3 Texture Compression evolution, BC4 and BC5 represent a shift toward modular, channel-agnostic compression, standardized in Direct3D 10 (2006) and later adopted in OpenGL via the EXT_texture_compression_rgtc extension, which aligns with these formats for cross-API compatibility.[37] Their introduction addressed limitations in earlier DXT codecs by omitting irrelevant channels, resulting in up to 50% memory savings for normal maps compared to BC3, without significant quality loss in targeted applications.[30]BC6H and BC7
BC6H and BC7 represent advanced block compression formats introduced in Direct3D 11 (2009), and later standardized in OpenGL through the EXT_texture_compression_bptc extension (2012), to extend the capabilities of earlier S3TC-derived codecs, targeting high-dynamic-range (HDR) textures and high-quality low-dynamic-range (LDR) images with optional alpha, respectively.[38][39][40] Both formats utilize a fixed 16-byte (128-bit) block size to compress 4x4 texel tiles, achieving an effective 8 bits per pixel (bpp) compression ratio while supporting hardware-accelerated decoding on compatible GPUs. These formats are stored in the DDS file format and require Direct3D 11 feature level support for runtime usage.[38][39]BC6H
The BC6H format is specifically designed for compressing HDR textures, supporting three-channel (RGB) half-precision floating-point data (16 bits per channel in the IEEE 754 format: 1 sign bit, 5 exponent bits, and 10 or 11 mantissa bits depending on signed or unsigned variants). It lacks native alpha channel support, defaulting alpha to 1.0 during decoding, and is available in unsigned (DXGI_FORMAT_BC6H_UF16) and signed (DXGI_FORMAT_BC6H_SF16) configurations, with a typeless variant (DXGI_FORMAT_BC6H_TYPELESS) for flexible usage. This format enables efficient storage of high-fidelity lighting and environment maps in graphics applications, where dynamic range exceeds 8 bits per channel.[41][38] BC6H employs 14 encoding modes to balance quality and complexity, divided into one-region (4 modes) and two-region (10 modes) configurations, with mode selection indicated by 2 to 5 bits in the block header. In two-region modes, the 4x4 tile is partitioned into two subsets using one of 32 predefined partition patterns, each defined by a 5-bit index that assigns texels to subsets while ensuring a "fix-up" texel (typically index 0) belongs to the first subset to avoid degenerate cases. Endpoints for each region are encoded as compressed RGB triplets: for unsigned floats, each component uses 11 mantissa bits plus a shared 5-bit exponent; for signed, a per-component sign bit reduces mantissa to 10 bits. These endpoints undergo delta encoding and bit transformation (e.g., sign extension or zigzag patterns) to fit within 72-82 bits total, followed by 46 bits of 3-bit indices per texel (one per texel, selecting from two endpoints). One-region modes allocate more bits to indices (63 bits total, with variable 2-4 bits per texel) and fewer to endpoints (60-65 bits), using shared exponents across components for efficiency.[41][42] Decoding BC6H blocks involves extracting the mode, unquantizing endpoints to full 16-bit floats, and interpolating colors based on indices. Unquantization first transforms compressed values back to integers (e.g., for unsigned: if the value is maximum, scale to 0xFFFF; otherwise, shift left by 16 and add 0x8000 before right-shifting by the component's bit precision). Endpoints are then scaled by a factor (31/64 for unsigned, 31/32 for signed) to map to the [0,1] range in float space. Interpolation uses predefined weight tables (e.g., 64-entry table for 4-bit indices: c = \frac{a \cdot (64 - w) + b \cdot w + 32}{64}, where a and b are endpoints and w is the weight), followed by final float conversion, ensuring denormalized floats are preserved but infinities and NaNs are clamped or converted during encoding. This process yields bit-exact results across hardware, though encoders must avoid unsupported values like positive infinity in unsigned mode. Key limitations include no alpha handling and potential quality trade-offs in modes with finer partitioning, but it provides superior HDR fidelity compared to clamping earlier formats to 8-bit ranges.[41][42]BC7
BC7 extends compression to high-quality LDR textures, supporting RGB or RGBA data with 4-8 bits per channel (UNORM) and optional sRGB gamma correction (DXGI_FORMAT_BC7_UNORM_SRGB), making it suitable for detailed surface maps, normal maps, and UI elements where artifact reduction is critical. Like BC6H, it uses 128-bit blocks for 4x4 tiles but introduces flexible alpha integration—either combined in a four-component endpoint, separated for independent interpolation, or omitted (alpha=1.0)—allowing up to 8 bpp for RGBA. The format's 8 modes (0-7) are selected via 1-8 header bits, each optimizing for subset count, bit depth, and alpha handling to minimize visual artifacts like color banding or blocking.[43][39]| Mode | Subsets | Endpoint Format (per subset) | Index Bits/Texel | Partition Bits | Alpha Handling | Key Features |
|---|---|---|---|---|---|---|
| 0 | 3 | RGBP 4.4.4.1 (unique P-bit) | 3 | 4 | None (α=1.0) | High partition variety (16 options) |
| 1 | 2 | RGBP 6.6.6.1 (shared P-bit) | 3 | 6 | None (α=1.0) | Balanced precision, 64 partitions |
| 2 | 3 | RGB 5.5.5 | 2 | 6 | None (α=1.0) | Lower bits for speed, 64 partitions |
| 3 | 2 | RGBP 7.7.7.1 (unique P-bit) | 2 | 6 | None (α=1.0) | Highest RGB precision, 64 partitions |
| 4 | 1 | RGB 5.5.5 + A 6.6 | 2 (color), 3 (α) | 0 | Separate | 2-bit rotation, 1-bit index selector |
| 5 | 1 | RGB 7.7.7 + A 8 | 2 (color/α) | 0 | Separate | 2-bit rotation for channel remap |
| 6 | 1 | RGBAP 7.7.7.7.1 (unique P-bit) | 4 | 0 | Combined | Full 4-channel, high index precision |
| 7 | 2 | RGBAP 5.5.5.5.1 (unique P-bit) | 2 | 6 | Combined | Partitioned alpha, 64 options |
Comparisons and Applications
Format Performance Comparison
S3 Texture Compression (S3TC) formats, standardized as Block Compression (BC) in modern APIs, exhibit performance characteristics that vary primarily in encoding complexity, visual quality, and memory efficiency, while runtime decoding is hardware-accelerated across all variants with negligible differences in sampling speed.[39] All formats operate on 4×4 texel blocks, achieving fixed compression ratios relative to uncompressed 32-bit RGBA (128 bpp), but trade-offs exist between bitrate, quality, and computational cost during encoding. BC1 and BC4 provide 4 bits per pixel (bpp, 8:1 ratio), suitable for bandwidth-constrained scenarios, while BC3, BC5, BC6H, and BC7 operate at 8 bpp (4:1 ratio) for enhanced fidelity.[45] Decoding performance is optimized on GPUs, requiring fixed-function hardware or simple shaders, with BC1 being the simplest and fastest due to its basic interpolation, followed closely by others like BC3 and BC7, which incur minimal overhead from additional alpha or mode handling.[45] In practice, all BC formats reduce memory bandwidth by up to 75% compared to uncompressed textures, enabling higher resolutions without proportional VRAM increases. Quality is typically measured using peak signal-to-noise ratio (PSNR), where higher values indicate better fidelity. Original DXT formats (BC1 for RGB, BC3 for RGBA) deliver medium quality at 35-40 dB PSNR, with BC1 excelling in opaque surfaces but introducing artifacts in gradients due to its limited 16-color palette and 2-bit interpolation. BC3 improves alpha handling over BC1 but maintains similar color PSNR, making it preferable for textures with transparency. Extended formats enhance this: BC4 (signed/unsigned single-channel) and BC5 (two-channel, e.g., for normals) achieve higher per-channel fidelity at their bitrates, often exceeding 40 dB for specialized data like heightmaps or tangent spaces. BC6H targets high dynamic range (HDR) content, offering PSNR comparable to BC7 (~42-45 dB) for floating-point RGB without alpha, while BC7 provides the highest quality for general RGBA at 8 bpp, routinely surpassing 42 dB and up to 45 dB with optimized encoders, minimizing block artifacts through 8 modes and 3- or 4-bit indices.[46][4] BC7 outperforms BC1/BC3 by 5-10 dB in PSNR for equivalent bitrates, though at the cost of increased encoding complexity.[46] Encoding performance, critical for asset preparation, shows stark differences due to algorithmic sophistication. BC1 and BC3 encoders are highly efficient, achieving speeds of 600-1000 megapixels per second (Mpix/s) on multi-core CPUs, enabling real-time compression for simple textures.[46] In contrast, BC7 requires exhaustive mode selection and partitioning, resulting in 10-20 Mpix/s for high-quality outputs (>45 dB PSNR), often taking seconds per 4K texture depending on hardware.[46] BC4 and BC5 fall between, with speeds closer to BC1 due to fewer channels, while BC6H matches BC7's demands for HDR endpoint optimization. These benchmarks, tested on Intel Core i9 and AMD Threadripper systems, highlight BC1/BC3's suitability for rapid iteration versus BC7's role in final assets.[46]| Format | Bitrate (bpp) | Typical PSNR (dB) | Encoding Speed (Mpix/s, approx.) | Primary Use Case |
|---|---|---|---|---|
| BC1 (DXT1) | 4 | 35-40 | 600-1000 | Opaque RGB textures |
| BC3 (DXT5) | 8 | 35-40 | 600-1000 | RGBA with alpha |
| BC4 | 4 | >40 (per channel) | 500-800 | Grayscale/single-channel |
| BC5 | 8 | >40 (per channel) | 300-600 | Normals/two-channel |
| BC6H | 8 | 42-45 (HDR) | 10-30 | HDR RGB (no alpha) |
| BC7 | 8 | >42 (up to 45) | 10-20 | High-quality RGBA |