High Efficiency Video Coding
High Efficiency Video Coding (HEVC), also known as H.265 and MPEG-H Part 2, is an international video compression standard jointly developed by the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group, providing approximately twice the compression efficiency of its predecessor, H.264/Advanced Video Coding (AVC), for equivalent perceptual quality.[1][2][3] Published initially in April 2013 as ITU-T Recommendation H.265 and ISO/IEC 23008-2, HEVC supports video resolutions up to 8K Ultra HD and bit depths of up to 10 bits per sample, enabling efficient encoding for applications ranging from streaming and broadcasting to storage and mobile devices.[1][3] The development of HEVC was led by the Joint Collaborative Team on Video Coding (JCT-VC), formed in 2010 by ITU-T and MPEG to address the growing demand for higher-resolution video content, such as 4K and beyond, while maintaining low bitrate requirements.[3][2] The standard's core goal was to reduce bitrate by about 50% compared to H.264/AVC across various content types, including natural video, graphics, and animations, without compromising visual quality.[2] Since its initial release, HEVC has undergone multiple amendments and updates, with the latest version published in July 2024, incorporating enhancements for scalability, multiview coding, and range extensions to support higher bit depths up to 16 bits and wider color gamuts.[1] At its foundation, HEVC introduces advanced coding tools, including flexible quadtree-based partitioning of coding tree units (CTUs) up to 64×64 pixels, 35 intra-prediction modes for better spatial redundancy reduction, and improved motion compensation with advanced motion vector prediction.[2][3] These features, combined with enhanced transform coding using larger discrete sine/cosine transforms and context-adaptive binary arithmetic entropy coding, enable parallel processing and scalability for diverse profiles, such as Main 10 for HDR content and Screen Content Coding for graphics-heavy applications.[2][3] In-loop filtering techniques, like sample adaptive offset and deblocking filters, further minimize artifacts, ensuring high fidelity in compressed output.[2] HEVC's adoption has been widespread in consumer electronics, with integration into Blu-ray discs, 4K/8K broadcasting standards, and streaming platforms, though its computational complexity—roughly twice that of H.264—has posed encoding challenges, often addressed through hardware acceleration.[3] Performance evaluations show bitrate savings of 22% to 76% over H.264 depending on resolution and content, making it foundational for modern video workflows, including ultra-high-definition television (UHDTV) as specified in ITU-R recommendations.[3][2] Despite licensing complexities under the HEVC Advance patent pool, the standard remains a benchmark for efficiency, paving the way for successors like Versatile Video Coding (VVC).[3]Development and Standardization
Concept and Goals
High Efficiency Video Coding (HEVC), formally known as ITU-T H.265 and ISO/IEC 23008-2 (MPEG-H Part 2), is a block-based hybrid video compression standard that builds on established techniques such as motion-compensated prediction and transform coding to achieve substantially improved efficiency. Developed as the successor to H.264/Advanced Video Coding (AVC), its core design objective is to double the compression performance, enabling equivalent video quality at roughly half the bitrate required by prior standards.[4] This target arose from the growing need for more efficient handling of increasing video data volumes driven by higher resolutions and frame rates in modern applications. The primary goals of HEVC encompass achieving approximately 50% bitrate reduction for the same perceptual quality across a range of content types, while maintaining or enhancing subjective visual experience.[4] Key performance targets include support for resolutions up to 8K Ultra High Definition (8192 × 4320 pixels), frame rates reaching 300 frames per second, and bit depths up to 16 bits per sample to accommodate high dynamic range and professional workflows.[5] These objectives were established through rigorous testing under Joint Collaborative Team on Video Coding (JCT-VC) common conditions, demonstrating BD-rate savings of about 50% relative to H.264/AVC for high-definition sequences.[6] HEVC is tailored for diverse applications, including consumer video storage on devices and media, broadcast television distribution, internet-based streaming services, and professional video production environments. By prioritizing coding efficiency, it facilitates bandwidth savings in transmission and reduced storage requirements without compromising quality, making it particularly suitable for the proliferation of 4K and beyond content in these sectors.[7]Historical Development
The development of video coding standards began with ITU-T Recommendation H.261 in 1988, which introduced discrete cosine transform (DCT)-based compression for videoconferencing over integrated services digital network (ISDN) lines at low bit rates, but it was limited to resolutions like CIF and QCIF, proving inefficient for higher-definition content due to fixed block sizes and basic motion compensation. Subsequent standards built on this foundation; ISO/IEC MPEG-1, standardized in 1992, targeted storage media like CD-ROMs with bit rates up to 1.5 Mbps for VHS-quality video, yet it struggled with bandwidth demands for high-definition (HD) formats. In 1994, MPEG-2 (ISO/IEC 13818-2) emerged for digital television broadcasting, supporting interlaced HD up to 1920x1080 but requiring significantly higher bit rates—often 15-20 Mbps for HD—making it impractical for emerging 4K ultra-high-definition (UHD) applications without substantial quality degradation or storage overhead.[8] Further advancements included H.263 in 1996 from ITU-T, which enhanced low-bit-rate video telephony with variable block sizes and improved motion estimation, though it remained optimized for resolutions below HD and exhibited artifacts in higher-quality scenarios. MPEG-4 Part 2 (ISO/IEC 14496-2), released in 1999, introduced object-based coding and better efficiency for internet streaming and mobile video, but its compression gains were marginal over predecessors for HD, limiting adoption in bandwidth-constrained 4K environments. The most influential prior standard, H.264/AVC (ITU-T H.264 | ISO/IEC 14496-10), finalized in 2003 through joint ITU-T VCEG and MPEG efforts, achieved about 50% better compression than MPEG-2 via advanced tools like multiple reference frames and integer transforms, enabling efficient HD broadcasting and Blu-ray storage; however, for 4K video, it demanded bit rates exceeding 50 Mbps to maintain quality, posing challenges for transmission and storage as display resolutions escalated.[9] By the mid-2000s, the limitations of H.264/AVC in handling HD and emerging 4K/UHD content—such as increased computational complexity and bitrate inefficiency—prompted ITU-T Video Coding Experts Group (VCEG) and ISO/IEC Moving Picture Experts Group (MPEG) to issue a joint call for proposals (CfP) in January 2010.[10][11] In response, 27 complete proposals were submitted and rigorously evaluated at the first joint meeting in April 2010 in Dresden, Germany, where subjective quality assessments and objective metrics confirmed several candidates' potential for substantial efficiency gains.[12] This evaluation led to the formal establishment of the Joint Collaborative Team on Video Coding (JCT-VC) in 2010, uniting experts from VCEG and MPEG to collaboratively develop the next-generation standard.[13] A key early milestone was the creation of the Test Model under Consideration (TMuC) in 2010, which integrated promising tools from the top proposals into a unified framework for further refinement and testing.[11] By 2011, this evolved into the HEVC Test Model (HM), serving as the reference software for ongoing development and achieving initial demonstrations of the targeted efficiency improvements through iterative core experiments.[14]Standardization Process
The standardization of High Efficiency Video Coding (HEVC) culminated in its formal adoption as ITU-T Recommendation H.265, with Version 1 receiving consent on April 13, 2013, following initial agreement among ITU members in January of that year.[15][16] Concurrently, the ISO/IEC counterpart, International Standard 23008-2 (MPEG-H Part 2), was published in December 2013, establishing the baseline specification for HEVC across both organizations.[17] This dual approval ensured compatibility and widespread adoption potential for the standard in telecommunications and multimedia applications. Subsequent versioning expanded HEVC's capabilities while maintaining backward compatibility with the baseline. Version 2, approved in October 2014, introduced range extensions (RExt) to support higher bit depths (up to 16 bits per component), additional chroma formats (4:2:2 and 4:4:4), and enhanced color representation for professional and high-fidelity applications.[18] Version 3, finalized in April 2015, added screen content coding (SCC) extensions, including intra block copy and palette modes, to improve efficiency for mixed-content video such as desktop sharing and graphics-heavy streams.[19] Version 4, published in August 2020 as ISO/IEC 23008-2 Edition 4, incorporated further profiles and tools for advanced applications, with ongoing amendments through 2025 addressing refinements in syntax and semantics. ITU-T H.265 Version 7, approved in November 2019, integrated additional supplemental enhancement information (SEI) messages and minor enhancements. The latest ISO edition, Edition 6, was published in March 2025.[20][21][22] Recent updates from 2023 to 2025 have focused on amendments enhancing scalability features, such as improved layered coding support for multi-resolution and multi-view scenarios, building on the scalable extensions from Version 2.[23] These changes align with integration into broadcast systems, notably the ATSC 3.0 standard, where A/341 ("Video – HEVC") was approved on July 17, 2025, specifying constraints for HEVC in next-generation terrestrial television, including support for high dynamic range and wide color gamut.[24][25] Maintenance of the HEVC standard is handled through ongoing collaboration under the Joint Video Experts Team (JVET), which succeeded the Joint Collaborative Team on Video Coding (JCT-VC) responsible for initial development.[13] JVET conducts regular meetings to process errata, verify conformance, and incorporate minor tools; for instance, ITU-T H.265 Version 10, approved in July 2024, consolidated recent errata and clarifications, ensuring robustness for deployments in streaming, broadcasting, and storage.[26][3][27] This iterative process supports the standard's evolution without major overhauls.Patent Pools and Licensing
The intellectual property framework for High Efficiency Video Coding (HEVC), standardized jointly by ITU-T and MPEG, is managed primarily through two major patent pools established in 2015: HEVC Advance (administered by Access Advance LLC) and MPEG LA (now under Via Licensing Alliance). HEVC Advance licenses over 27,000 essential patents from more than 50 licensors, offering a one-stop solution for implementers worldwide under fair, reasonable, and non-discriminatory (FRAND) terms.[28][29] In contrast, the MPEG LA/Via LA pool covers essential patents from around 25 initial contributors, with rates structured to avoid royalties on content distribution and focusing on device and component implementations.[30][31] Major patent holders include Qualcomm, which leads with the highest number of declared standard-essential patents (SEPs), followed by Ericsson, Nokia, Samsung, LG Electronics, and others such as Huawei, Dolby, and Sony, collectively contributing the bulk of the approximately 27,000 declared HEVC SEPs as of 2025.[32][29] HEVC Advance's royalty structure applies per end-product, with rates up to $0.20 for mobile and connected devices in Region 2 (e.g., emerging markets), escalating to $0.40-$1.20 in Region 1 for premium categories like 4K UHD televisions based on selling price; annual caps limit total payments, and no royalties apply to content.[33] MPEG LA/Via LA employs a flat $0.20 per unit for end-products after the first 100,000 units annually (waived for free software distributions), with tiered reductions for higher volumes (e.g., $0.125 per unit beyond 10 million) and no resolution-specific differentiation, though extensions cover advanced profiles.[30][34] In 2020, the Joint Licensing Agreement (JLA) was introduced to unify aspects of the pools, facilitating cross-licensing among participants like LG Electronics joining HEVC Advance and Xiaomi signing with MPEG LA, while providing exemptions for non-commercial and open-source software implementations to encourage adoption without royalties for freely distributed encoders/decoders.[35][36] These terms include zero royalties for software made available at no charge, provided it does not exceed volume thresholds or involve commercial sales.[30] The HEVC licensing landscape has faced challenges, including ongoing antitrust scrutiny over potential royalty stacking—where cumulative fees from multiple pools and bilateral licenses exceed reasonable levels—and a series of lawsuits from 2023 to 2025, such as Access Advance licensors suing Roku for infringement in the US and Brazil, NEC and Sun Patent Trust targeting Transsion at the Unified Patent Court, and resolved disputes involving Microsoft with Via LA licensors in Germany.[37][38][39] These actions highlight tensions in enforcing FRAND commitments amid fragmented pools, following the 2022 dissolution of the third pool, Velos Media, which returned patents to individual owners like Ericsson and Qualcomm.[40]Technical Framework
Coding Efficiency Metrics
High Efficiency Video Coding (HEVC), also known as H.265, achieves significant improvements in compression efficiency over its predecessor, H.264/AVC, as quantified by standardized metrics developed during its standardization process. The primary objective metric used to evaluate coding efficiency is the Bjøntegaard Delta rate (BD-rate), which measures the average bitrate reduction required to achieve the same video quality, typically assessed via peak signal-to-noise ratio (PSNR) in the luma component. This metric aligns with the aspirational goal of approximately 50% bitrate savings set by the Joint Collaborative Team on Video Coding (JCT-VC). The BD-rate is calculated by comparing rate-distortion curves from the codec under test and a reference codec, providing a percentage difference in bitrate for equivalent distortion levels. A common approximation of the formula is given by ΔRate = (1/N) × Σ [10 × log₁₀(Rᵢ / R_ref)], where N is the number of data points, Rᵢ is the bitrate for the test codec at each point, and R_ref is the bitrate for the reference codec (H.264/AVC); the result is expressed in decibels and converted to percentage savings (negative values indicate reduction). This method ensures a balanced assessment across operating points, often using logarithmic scaling for bitrate to emphasize perceptual relevance. Evaluations under the JCT-VC Common Test Conditions (CTC) demonstrate HEVC's efficiency gains, with tests conducted using reference software (HM for HEVC and JM for H.264/AVC) on standardized test sequences across resolutions from 240p to 1080p, in both random access (RA) and low-delay (LD) configurations. In RA scenarios, which support broadcast and streaming applications with periodic keyframes, HEVC achieves average BD-rate savings of 42% to 50% over H.264/AVC for the same luma PSNR, with variations by resolution class: approximately 35% for lower resolutions (e.g., 480p-720p) and up to 45% for HD (1080p). Savings increase with resolution, typically exceeding 50% for 4K ultra-high-definition content under similar conditions, highlighting HEVC's scalability for higher resolutions. In LD configurations, suited for low-latency applications like video conferencing, gains are slightly lower at around 40-48%, due to constraints on bidirectional prediction.[41] Beyond objective metrics, subjective quality assessments confirm HEVC's perceptual benefits, showing higher mean opinion scores (MOS) at reduced bitrates compared to H.264/AVC. In JCT-VC verification tests involving double-stimulus continuous quality scale ratings across resolutions from 480p to UHD, HEVC delivered equivalent subjective quality using 52% to 64% less bitrate, with the largest gains (64%) observed at 4K—outperforming objective PSNR predictions in 86% of cases. These results, derived from formal subjective experiments with multiple viewers, underscore HEVC's ability to maintain visual fidelity at half or less the bitrate of H.264/AVC, particularly in complex scenes.[42]Overall Architecture
High Efficiency Video Coding (HEVC), standardized as ITU-T H.265 and ISO/IEC 23008-2, employs a hybrid block-based coding architecture that combines predictive and transform-based techniques to achieve high compression efficiency. This framework integrates spatial prediction (intra-frame) to remove redundancies within a single picture and temporal prediction (inter-frame) to exploit similarities across pictures, followed by transform coding, quantization, entropy coding, and in-loop filtering to refine the reconstructed signal and enhance future predictions. The core processing operates on blocks, with the encoder subtracting the predicted block from the original to form a residual, which is then transformed using an integer approximation of the discrete cosine transform (DCT), quantized to discard less perceptible details, and entropy-coded using context-adaptive binary arithmetic coding (CABAC) for lossless compression of the symbols. In-loop filters, such as deblocking and sample adaptive offset (SAO), are applied post-reconstruction to mitigate blocking artifacts and improve picture quality, ensuring the reference frames used for prediction are as accurate as possible.[4] To support parallel processing, error resilience, and flexible bitstream manipulation, HEVC pictures are partitioned into independent regions such as slices, tiles, or wavefronts. Slices divide a picture into sequential rows of coding tree units (CTUs) for sequential decoding, while tiles enable rectangular, non-overlapping subdivisions that allow independent processing of regions without interdependencies. Wavefronts facilitate parallel decoding by processing CTUs in a diagonal wavefront pattern, interleaving entropy decoding across rows to balance computational load. The fundamental processing unit, the coding tree unit (CTU), represents the largest possible block size of up to 64×64 luma samples (with corresponding chroma blocks), which can be recursively subdivided into smaller coding units via a quadtree structure for adaptive granularity in prediction and transform application. This partitioning scheme enhances scalability for multi-threaded implementations and low-latency applications compared to prior standards.[4][43] The HEVC bitstream is structured around Network Abstraction Layer (NAL) units, which provide a modular format for encapsulating coded data, metadata, and supplemental enhancement information, facilitating network transmission and parsing. NAL units include parameter sets such as the Sequence Parameter Set (SPS), which conveys sequence-level parameters like profile, level, and maximum CTU size, and the Picture Parameter Set (PPS), which specifies picture-specific settings including reference picture lists and slice partitioning modes. Coded slice NAL units carry the bulk of the video data, containing the entropy-coded syntax elements for CTUs within a slice, while other NAL types handle video usability information or filler data. This layered organization ensures robust handling of incomplete bitstreams and supports extensions for scalability or multiview coding.[43][4] HEVC's architecture emphasizes encoder-decoder symmetry, where the decoder mirrors the encoder's core processes—motion-compensated prediction, residual decoding via inverse transform and dequantization, and in-loop filtering—to reconstruct the video sequence faithfully. Motion estimation and compensation occur prior to transform in the prediction loop, using fractional-pixel accuracy (up to 1/4-pel) and advanced reference frame management to minimize residuals effectively. A DCT-like core transform (with sizes from 4×4 to 32×32) is applied to the residual in both encoding and decoding paths, ensuring bitstream interoperability across compliant devices. This symmetric design, refined through the Joint Collaborative Team on Video Coding (JCT-VC) efforts, underpins HEVC's ability to deliver roughly double the compression efficiency of H.264/AVC under equivalent quality constraints.[4][43]Color Spaces and Formats
High Efficiency Video Coding (HEVC) primarily employs the YCbCr color space with 4:2:0 chroma subsampling for progressive video sequences, where the luma (Y) component is sampled at full resolution and the chroma (Cb and Cr) components are subsampled by a factor of 2 in both horizontal and vertical directions. In this format, each Cb or Cr value represents the average color difference over a 2x2 block of luma samples, enabling efficient compression by prioritizing luminance detail while reducing chroma data. This approach aligns with human visual perception, as the eye is more sensitive to brightness variations than color nuances.[3] HEVC also supports alternative color representations, including RGB, YCoCg, and monochrome formats, to accommodate diverse applications such as computer graphics and high-fidelity imaging. The RGB color space is facilitated through the 4:4:4 chroma format with the separate_colour_plane_flag enabled, treating red, green, and blue as independent monochrome planes. YCoCg, a reversible integer transform of RGB, is utilized for improved coding efficiency in scenarios requiring lossless or near-lossless representation, particularly in screen content extensions. Monochrome coding, equivalent to 4:0:0 chroma subsampling, discards chroma entirely and codes only the luma component, suitable for grayscale content. Bit depths in HEVC range from 8 to 16 bits per component for luma and chroma in the Main and Range extensions, allowing for enhanced dynamic range and reduced quantization artifacts compared to prior standards. These depths are specified via sequence parameter set (SPS) syntax elements like bit_depth_luma_minus8 and bit_depth_chroma_minus8, with values computed as 8 plus the respective minus8 parameter. Higher bit depths support professional workflows and emerging display technologies by preserving subtle gradations in shadows and highlights.[3] Extended chroma formats—4:2:2 and 4:4:4—were introduced in HEVC Version 2 (Range extensions), enabling higher fidelity for broadcast and professional video production.[3] In 4:2:2, chroma is subsampled only horizontally (SubWidthC=2, SubHeightC=1), maintaining full vertical resolution for applications like camera raw footage. The 4:4:4 format provides unsampled chroma (SubWidthC=1, SubHeightC=1), ideal for RGB workflows in post-production. These formats are signaled via the chroma_format_idc parameter in the SPS, with 0 indicating monochrome, 1 for 4:2:0, 2 for 4:2:2, and 3 for 4:4:4. For high dynamic range (HDR) content, HEVC integrates support for Hybrid Log-Gamma (HLG) and Perceptual Quantizer (PQ) transfer functions through supplemental enhancement information (SEI) messages and video usability information (VUI) parameters. HLG (transfer_characteristics value 18) enables backward compatibility with standard dynamic range displays, while PQ (value 16) optimizes for absolute luminance levels up to 10,000 nits. These are conveyed via sideband signaling in SEI payloads, such as tone_mapping_info_sei, allowing decoders to apply appropriate electro-optical transfer functions without altering the core bitstream. This HDR integration enhances HEVC's applicability in modern broadcasting and streaming ecosystems.[3]Core Coding Tools
Coding Tree Unit and Blocks
In High Efficiency Video Coding (HEVC), the fundamental processing unit is the Coding Tree Unit (CTU), which represents the largest possible block size and consists of up to 64×64 luma samples along with corresponding chroma samples for color video.[3][4] This structure replaces the fixed 16×16 macroblock from prior standards like H.264/AVC, allowing for greater flexibility in handling diverse video content such as high-resolution footage.[4] The CTU, often referred to interchangeably as the Largest Coding Unit (LCU) when at maximum size, serves as the root for hierarchical partitioning and includes associated syntax elements for coding decisions. The CTU is subdivided into Coding Units (CUs) using a quadtree partitioning scheme, enabling adaptive block sizes ranging from 64×64 down to 8×8 luma samples to better match local content characteristics and improve compression efficiency.[4][3] Each node in the quadtree represents a CU, which can either be further split into four equal-sized child CUs or treated as a leaf node for prediction and transform processing; this recursive division continues until a minimum CU size is reached or no further splitting benefits the rate-distortion cost. The quadtree depth can thus vary from 0 (full 64×64 CTU as a single CU) to 3 (smallest 8×8 CUs), providing a balance between granularity and overhead in signaling the partition structure.[4] Within each CU, further subdivision occurs into Prediction Units (PUs) for spatial or temporal prediction and Transform Units (TUs) for residual transformation, each governed by separate quadtree structures to decouple these processes.[4] PUs define the regions where prediction is applied and support up to eight partitioning modes for inter-coded CUs, including asymmetric options such as 3:1 and 1:3 ratios (e.g., 3N/4 × N/2 or N/4 × 3N/2, where N is the CU side length), while intra-coded CUs use simpler square splits; the minimum PU size is 4×4 except for certain inter configurations. TUs, on the other hand, form a residual quadtree (RQT) with square sizes from 4×4 to 32×32 for efficient transform application.[4] This separation enables optimized partitioning for prediction accuracy and transform efficiency independently. The selection of CU sizes and partitions is determined through rate-distortion optimization (RDO), where the goal is to minimize the Lagrangian cost function J = D + \lambda R, with D representing distortion (e.g., mean squared error), R the bitrate, and \lambda a Lagrange multiplier tuned to the quantization parameter.[4] This process evaluates multiple partitioning candidates at each quadtree node, comparing their costs to decide splits, ensuring that the block structure adapts to content complexity while controlling bitrate; for example, smoother regions may favor larger CUs to reduce overhead, whereas detailed areas benefit from finer partitions.Transform and Quantization
In High Efficiency Video Coding (HEVC), the transform process converts spatial-domain prediction residuals into the frequency domain to enable efficient energy compaction and subsequent quantization. This is applied to residuals derived from coding units (CUs) within the coding tree unit structure. HEVC employs separable two-dimensional transforms of square sizes ranging from 4×4 to 32×32 pixels, allowing flexibility for different block characteristics and content types. For 4×4 luma transform units (TUs) in intra-predicted blocks, a Discrete Sine Transform type VII (DST-VII) is used, which provides better coding efficiency for the directional nature of intra residuals compared to cosine-based transforms. Larger blocks, including all inter-predicted TUs and intra TUs beyond 4×4, utilize Discrete Cosine Transform type II (DCT-II) approximations, which are effective for smooth, low-frequency content. The core transforms in HEVC are implemented as finite-precision integer approximations to ensure computational efficiency and avoid floating-point operations. These approximations are derived from separable one-dimensional (1D) transforms applied first row-wise and then column-wise on the residual block R. The 1D DCT-II matrices are designed with elements scaled to powers of 2 where possible, and a 9-point 1D DCT is incorporated as a building block for larger sizes to minimize multiplication complexity while maintaining approximation accuracy. The overall 2D transform output T is computed as T = A R A^T, where A is the N \times N transform matrix for size N, and ^T denotes the transpose. Intermediate scaling factors are applied post-transform to normalize the coefficients before quantization, balancing precision and bit-depth requirements. Following the transform, HEVC applies uniform scalar quantization with a dead-zone to the transform coefficients, which introduces a central zero interval around zero to favor small coefficients as zero for better rate-distortion performance. The quantization parameter (QP) ranges from 0 to 51 and is adjusted independently for each TU, with chroma components offset from luma QP by a configurable value. The quantization step size controls the coarseness, and during decoding, the dequantization step for luma is given by Q_{\text{step}} = 2^{(QP-4)/6}, with scaling matrices optionally applied for frequency-dependent adjustments. This design ensures a nonlinear QP scale where each increment of 6 QP doubles the step size, providing fine control over bitrate and quality. To handle high-frequency coefficients efficiently, HEVC incorporates implicit signaling in the coefficient coding process, where the absence of further significant coefficients in higher frequencies is inferred without explicit flags once a last non-zero position is determined, reducing overhead for blocks with energy concentrated in low frequencies. This is particularly beneficial for small transforms where high-frequency components are less likely to carry significant energy.Intra and Inter Prediction
High Efficiency Video Coding (HEVC), also known as H.265, employs intra and inter prediction as core mechanisms to exploit spatial and temporal redundancies within video sequences, respectively, thereby generating prediction signals that minimize the residual data to be encoded. These prediction techniques operate on prediction units (PUs) derived from coding tree units (CTUs) through flexible block partitioning schemes. By predicting pixel values from neighboring or reference frame data, HEVC achieves substantial compression gains over prior standards like H.264/AVC, with reported bitrate reductions of up to 50% for equivalent quality.[4] Intra prediction in HEVC focuses on spatial redundancy within the same frame, supporting up to 35 modes for luma components to capture diverse local textures and directions. These include one planar mode for smooth transitions, one DC mode for uniform regions, and 33 angular modes that extrapolate from adjacent reconstructed samples at various angles, enabling finer adaptation to image edges compared to the 9 modes in H.264/AVC. For chroma components, intra prediction offers a derived mode that reuses the luma mode, a direct planar or DC mode, or a single LM chroma mode that predicts chroma from luma samples, reducing overhead for color information. To efficiently signal the selected mode using context-adaptive binary arithmetic coding (CABAC), HEVC employs a most probable mode (MPM) mechanism that constructs a small set of candidate modes from neighboring PUs, with fallback to a fixed scan order if the actual mode is absent from the list.[44][44] Inter prediction in HEVC leverages temporal correlations across frames by estimating motion between the current block and multiple reference pictures stored in the decoded picture buffer (DPB). Each PU can reference up to 16 pictures from lists L0 and L1, allowing uni- or bi-prediction for enhanced accuracy in complex scenes. Motion information is coded via two primary modes: advanced motion vector prediction (AMVP), which selects from spatial and temporal candidates to predict the motion vector (MV) and reference index before encoding the difference, and merge mode, which infers complete motion parameters (MV, reference index, and prediction direction) from one of up to five neighboring or collocated candidates without residual signaling for skip cases. This dual approach balances flexibility and efficiency, with merge mode particularly effective for homogeneous motion regions.[4][4] Fractional-pixel motion compensation refines inter prediction accuracy in HEVC to 1/4-pixel for luma and 1/8-pixel for chroma, using separable FIR interpolation filters to generate sub-sample positions from integer samples. Luma interpolation applies an 8-tap filter for half-pel positions and two variants of 7-tap filters for quarter-pel positions, designed via discrete cosine transform (DCT) approximation to approximate ideal Wiener-Hopf solutions while minimizing aliasing and ringing artifacts. Chroma uses 4-tap filters for half-pel and quarter-pel (or eighth-pel) positions, providing sufficient smoothing for lower resolution components. These filters contribute to HEVC's improved prediction quality, yielding about 5-10% bitrate savings over H.264/AVC's 6-tap luma design in motion-heavy sequences. Weighted prediction extends inter prediction in HEVC to handle luminance variations in fade or dissolve transitions, applicable to P and B slices on a per-slice basis. It multiplies the reference prediction signal by a scaling factor and adds an offset, both signaled explicitly in the bitstream, with support for uni-prediction or bi-prediction modes to adapt weights per reference list. This mechanism, refined from H.264/AVC, enhances coding efficiency by up to 20% in fade scenarios without impacting random access performance.[45][45]Loop Filters and Post-Processing
In High Efficiency Video Coding (HEVC), loop filters are applied during the reconstruction process to mitigate coding artifacts, enhancing both objective and subjective video quality while improving compression efficiency. The primary in-loop filters include the deblocking filter and Sample Adaptive Offset (SAO), with the Adaptive Loop Filter (ALF) introduced in the Range Extensions of version 2. These filters operate on reconstructed samples after prediction and inverse transform, reducing distortions such as blocking and ringing before storing frames in the decoded picture buffer for motion-compensated prediction.[46][47] The deblocking filter targets discontinuities at block edges caused by quantization, adaptively attenuating artifacts across luma and chroma boundaries. It processes 8×8 sample grids, evaluating 4×4 sub-blocks to determine boundary strength (Bs) based on coding modes like intra prediction or non-zero transform coefficients; Bs values range from 0 (no filtering) to 2 for chroma intra blocks. Filtering decisions use thresholds β (boundary strength) and tC (clipping threshold), derived from lookup tables indexed by the average quantization parameter (QP) of adjacent blocks—higher QP values increase β and tC, enabling stronger filtering in coarser quantization scenarios. For flat regions (|p2 - 2p1 + p0| < β/8 and similar for q samples), a strong filter modifies up to three samples per side; otherwise, a normal filter adjusts one or two samples with clipping to ±tC, preserving edges while reducing banding. This adaptive approach yields up to 5% PSNR gains in compression efficiency.[46][48] Following deblocking, SAO further refines reconstructed samples by adding category-based offsets to counteract residual distortions like ringing and banding. SAO classifies samples into edge offsets (four types: horizontal, vertical, and two diagonal directions) or band offsets (32 intensity bands spanning the sample range), with offsets signaled per coding tree unit (CTU). Edge offsets are applied based on local gradients (e.g., p0 > p1 for horizontal), while band offsets target smooth intensity regions by grouping 16 consecutive bands selectable from 32. This non-linear, sample-wise adjustment, estimated via rate-distortion optimization at the encoder, improves subjective quality and coding efficiency without altering prediction references.[47] Introduced in HEVC version 2 (Range Extensions), the Adaptive Loop Filter (ALF) employs Wiener-based filtering to minimize mean squared error between original and decoded samples, applied after SAO on a per-CTU basis. It classifies luma samples into up to 25 classes using quadtree partitioning and Laplacian metrics for local activity, with separate handling for chroma. Filter coefficients, derived from Wiener-Hopf equations via auto- and cross-correlation of original and deblocked samples, form diamond-shaped taps (e.g., 2×2 to 5×5 for luma). This block-based, adaptive design reduces computational overhead compared to pixel-wise alternatives, achieving 3.3–4.1% BD-rate savings in high-fidelity profiles like 4:4:4.[49][50] Inverse transforms in HEVC reconstruction convert quantized coefficients back to spatial residuals, mirroring forward transforms (DCT-II or DST-VII) but with integer approximations and rounding for fixed-point arithmetic. After inverse quantization scales coefficients by a QP-dependent factor, an offset (scale/2) is added before transform computation to ensure proper rounding toward zero, followed by clipping to the dynamic range. This process, applied separably (horizontal then vertical), enables near-lossless recovery of residuals when combined with prediction, supporting block sizes from 4×4 to 32×32.Advanced Features and Extensions
Parallel Processing Techniques
High Efficiency Video Coding (HEVC) incorporates parallel processing techniques to leverage multi-core processors, addressing the increased computational demands of higher resolutions and frame rates compared to prior standards like H.264/AVC. These methods divide pictures into segments that can be processed concurrently, balancing dependency management with minimal impact on compression efficiency. The primary tools—slices, tiles, and wavefront parallel processing (WPP)—enable both spatial and data-level parallelism for encoding and decoding, supporting applications from real-time streaming to ultra-high-definition content.[43][51] Slices segment a picture into one or more independent or dependent sequences of coding tree units (CTUs), primarily for error resilience and low-latency transmission but also facilitating parallelism. Independent slices contain all necessary data for self-contained decoding, with no prediction or entropy coding dependencies across boundaries, allowing parallel processing of multiple slices on separate cores. Dependent slices, in contrast, initialize contexts like CABAC probability models from prior slices in the same picture, reducing overhead for low-delay scenarios while still permitting concurrent execution after sequential dependencies are resolved. This structure supports bitstream packaging constraints, such as maximum transmission unit sizes, without requiring full picture buffering.[51][43] Tiles enable spatial parallelism by partitioning a picture into rectangular, independently decodable regions aligned to CTU boundaries, eliminating inter-tile dependencies for intra prediction, motion compensation, and entropy coding. Each tile operates as a self-contained unit sharing only picture-level parameters, such as resolution and profile, which simplifies synchronization and allows distribution across cores or even devices. Tiles can intersect with slices for hybrid partitioning, providing flexibility for region-of-interest processing or load balancing in multi-threaded environments, though they introduce minor boundary overheads in loop filtering. This independence makes tiles particularly effective for high-throughput decoding in scenarios like tiled streaming or virtual reality.[43][51] Wavefront parallel processing (WPP) achieves row-wise parallelism within a slice by decoding CTU rows in a staggered, diagonal pattern, where each subsequent row begins after the first two CTUs of the previous row are completed to satisfy dependencies for prediction and in-loop filtering. CABAC entropy decoding is initialized separately for each row using substreams, with synchronization ensuring availability of neighboring data from above rows, thus breaking the serial dependency of traditional raster-order processing. WPP minimizes coding efficiency loss—typically under 1% in bit rate—compared to non-parallel modes, as it preserves most inter-row contexts while enabling fine-grained thread allocation. This technique is especially suited for multi-core CPUs, where threads process wavefront segments with limited inter-thread communication.[52][43] These techniques deliver substantial performance gains on multi-core hardware, with speedups scaling to the number of available cores. WPP has demonstrated encoding speedups of up to 5.5× on a 6-core Intel Core i7 processor for 1080p sequences under random access and low-delay configurations, approaching ideal linear scaling for up to 12 threads. Tiles provide similar or superior decoding efficiency, achieving 4–6× speedups on 4- to 12-core systems when the number of tiles matches thread count, as seen in tests with 1080p and lower-resolution videos. Overall, combining these methods with block-level parallelism within CTUs enables real-time HEVC processing of 4K video at 30 fps on standard multi-core CPUs, enhancing scalability for emerging high-resolution applications.[53][54]Range and Screen Content Extensions
The Range Extensions (RExt) introduced in Version 2 of HEVC, finalized in October 2014, expand the standard's capabilities to handle higher bit depths and alternative chroma formats beyond the baseline 8-bit 4:2:0 support.[18] These extensions enable encoding of content with sample bit depths up to 16 bits per component, accommodating professional video workflows requiring greater precision, such as high dynamic range (HDR) production.[55] Additionally, RExt adds support for 4:2:2 and 4:4:4 chroma subsampling, as well as monochrome (4:0:0) formats, and introduces RGB color space handling, which is particularly useful for computer graphics and non-broadcast applications. A key tool in RExt is the enhanced transform skip mode, which allows blocks to bypass the discrete cosine transform (DCT) for lossless coding or near-lossless scenarios, improving efficiency for content with sharp edges or synthetic elements by avoiding quantization artifacts.[56] This mode is especially effective in 4:4:4 RGB sequences, where it can yield bit-rate savings of up to 35% compared to transformed coding without significant quality loss.[57] Overall, RExt maintains backward compatibility with Version 1 while enabling higher-fidelity representations, with typical coding efficiency losses of less than 5% for supported formats relative to baseline HEVC. The Screen Content Coding (SCC) extensions, integrated in Version 3 of HEVC and approved in April 2015, address the unique characteristics of non-camera-captured video, such as desktop sharing, remote desktop, and graphics overlays, which feature repeated patterns, sharp transitions, and limited color palettes.[19] Core tools include intra block copy (IBC) mode, which allows copying previously coded blocks within the same frame for exploiting spatial redundancies in screen material, and a variant called intra line copy that operates on finer granularities like individual lines to better handle text and graphics.[58] Palette mode represents blocks using a small set of representative colors (up to 128 entries) plus escape values for outliers, reducing bit overhead for areas with few distinct hues, such as icons or slides.[19] Further enhancements in SCC involve motion vector matching, which refines inter prediction by aligning motion vectors to nearby blocks with similar patterns, and adaptive motion vector resolution to adjust sub-pixel accuracy based on content type, minimizing overhead for integer-pixel shifts common in screen updates.[59] These tools collectively achieve bit-rate reductions of up to 30% over baseline HEVC for typical screen content sequences in all-intra configurations, with even greater gains (up to 50%) for mixed graphics-video material when combined with RExt features.[60]Still Picture Profile
The Still Picture Profile, introduced in the first edition of the High Efficiency Video Coding (HEVC) standard in April 2013, is designed specifically for efficient compression of static images. It conforms to the constraints of the Main Profile but restricts coding to intra-frame prediction only, excluding any motion compensation or inter-frame dependencies, resulting in bitstreams that contain a single intra-coded picture. This profile leverages the core intra-coding tools of HEVC while supporting high resolutions, with maximum picture sizes up to 16K × 16K pixels depending on the applied level constraints.[51] Key tools in the Still Picture Profile include all 35 intra prediction modes available in HEVC for luma and chroma components, enabling directional and planar predictions to reduce spatial redundancies within the image. Transform coding supports block sizes from 4×4 up to 32×32, using integer discrete cosine transform (DCT)-like approximations for energy compaction, followed by scalar quantization. For lossless coding, the profile incorporates a transform skip mode, which bypasses the transform and quantization steps for small blocks (initially 4×4 in version 1, later extended), allowing exact reconstruction of the input image while maintaining compatibility with lossy modes. These features build directly on the intra prediction mechanisms from HEVC's core coding tools.[61] The profile finds primary applications as a modern replacement for legacy still image formats like JPEG, particularly for high-resolution photography and graphics where superior compression is needed without sacrificing quality. It integrates seamlessly with the High Efficiency Image File Format (HEIF), serving as the basis for HEIC files that store single or burst images with reduced file sizes compared to traditional JPEG containers. This adoption has been prominent in mobile devices and professional workflows for archiving and sharing high-fidelity images.[62][63] In terms of compression efficiency, the Still Picture Profile achieves average bit-rate savings of approximately 25% over JPEG 2000 for 8-bit images at comparable quality levels, with gains increasing to around 50% for 10-bit content, as demonstrated in objective evaluations using peak signal-to-noise ratio (PSNR) and subjective assessments. These improvements stem from HEVC's advanced intra tools, which outperform wavelet-based methods in JPEG 2000 for natural images, though computational complexity is higher during encoding.[64]Profiles, Tiers, and Levels
Version 1 Profiles
The Version 1 of the High Efficiency Video Coding (HEVC) standard, finalized in April 2013 as ITU-T H.265 and ISO/IEC 23008-2, introduced three baseline profiles: Main, Main 10, and Main Still Picture, to address a range of video and still image applications, with the Main and Main 10 profiles serving as the primary options for progressive video sequences in YCbCr 4:2:0 color format.[3][16] These profiles build on core coding tools such as the coding tree unit structure, transform-based residual coding, intra and inter prediction modes, and loop filters, while imposing constraints on bit depth, chroma subsampling, and supported tools to ensure interoperability and decoder complexity management.[4] The Main profile supports 8 bits per sample for luma and chroma components, enabling efficient compression for standard dynamic range (SDR) content up to resolutions of 8192×4320 pixels and frame rates reaching 120 fps at 4K (3840×2160) under Level 6.2 constraints.[4] It mandates the use of context-adaptive binary arithmetic coding (CABAC) for entropy encoding and the in-loop deblocking filter to reduce blocking artifacts, with no support for features like separate color plane coding or higher bit depths.[4] This profile achieves approximately 50% bitrate reduction compared to H.264/AVC High Profile under similar subjective quality conditions, making it suitable for bandwidth-constrained environments.[4] The Main 10 profile extends the Main profile by supporting bit depths of 8 to 10 bits per sample, facilitating high dynamic range (HDR) content with enhanced color precision and reduced banding artifacts in gradients.[3] Introduced as an amendment during the finalization of Version 1, it retains the same chroma format and progressive scanning requirements but adds tools for higher precision internal calculations to maintain coding efficiency at 10-bit depth.[4] Like the Main profile, it requires CABAC and deblocking, and supports the same maximum capabilities under Level 6.2, including 4K at 120 fps.[4] In practice, the Main profile has been widely adopted for broadcast and consumer video distribution due to its balance of compression efficiency and compatibility with existing 8-bit ecosystems, while the Main 10 profile is mandated for UHD Blu-ray discs to enable HDR10 support with 10-bit color depth.[16][65]Version 2 and Later Profiles
Version 2 of the High Efficiency Video Coding (HEVC) standard, finalized in October 2014, introduced range extensions to support higher bit depths and chroma formats beyond the 8-bit 4:2:0 limitations of version 1 profiles.[66] These extensions added 21 new profiles, including the Main 4:2:2 10 profile for 10-bit 4:2:2 chroma subsampling, suitable for professional video workflows requiring enhanced color accuracy.[3] Additionally, the Main 4:4:4 10 and Main 4:4:4 12 profiles enable up to 12-bit depth with full 4:4:4 chroma resolution, targeting applications in post-production, medical imaging, and high-end display content where precise color reproduction is essential.[66] Key features in these profiles include separate color plane coding, which treats each color component as an independent monochrome channel to improve efficiency for non-4:2:0 formats, and cross-component prediction, a block-adaptive tool that leverages statistical dependencies between luma and chroma for better compression in 4:4:4 content. Version 4, approved in December 2016, incorporated screen content coding (SCC) extensions to optimize compression for computer-generated content like text, graphics, and animations, which exhibit sharp edges and repetitive patterns unlike natural video. The Main 4:4:4 8 SCC profile, for instance, supports 8-bit 4:4:4 with palette mode, where blocks of similar colors are represented by a compact palette index map rather than individual pixel values, achieving significant bitrate reductions for screen-sharing and remote desktop applications.[3] Other SCC profiles, such as Screen-Extended Main 10 and Screen-Extended High Throughput 4:4:4 10, extend these tools to higher bit depths and throughput scenarios. Subsequent versions built on these foundations with scalability and immersive video support. Version 4 also added the Scalable Main and Scalable Main 10 profiles, enabling layered coding for spatial, quality, and temporal scalability to facilitate adaptive streaming over varying bandwidths. Version 5 (February 2018) introduced supplemental enhancement information (SEI) messages for 360-degree omnidirectional video, allowing efficient packing and projection of spherical content without altering core coding tools. In July 2024, as part of Version 10, amendments to the standard specified six new multiview profiles: Multiview Extended, Multiview Extended 10, Multiview Monochrome, Multiview Monochrome 12, Multiview 4:2:2, and Multiview 4:2:2 12, enhancing support for stereoscopic and multi-view applications like VR and 3D broadcasting by building on earlier multiview extensions.[27] These developments ensure HEVC's adaptability to emerging use cases while maintaining backward compatibility with prior profiles.Tiers and Level Constraints
The HEVC standard defines two tiers—Main and High—to address varying application needs by imposing different constraints on bitrate and buffer sizes, while sharing the same decoding tools. The Main tier targets consumer applications with moderate bitrates, supporting resolutions up to 16K (level 6.2) but limiting maximum bitrates to values such as 20 Mbps at level 4.1 and up to 360 Mbps at higher levels. In contrast, the High tier accommodates demanding scenarios like broadcast and cinema, enabling resolutions up to 16K (level 6.2) with significantly higher bitrates exceeding 800 Mbps at level 6.2 to maintain quality at elevated data rates; it is available only for levels above 4, as lower levels are restricted to Main tier. These tiers apply across profiles, ensuring backward compatibility where High tier decoders can handle Main tier bitstreams.[4] HEVC includes levels numbered from 1 to 6.2 (with sub-levels like 2.1, 3.1), corresponding to 64 possible level identifiers via the level_idc parameter (ranging from 30 to 186 in increments), each setting bounds on decoder resources and bitstream parameters. Key constraints encompass maximum luma picture size in samples (MaxLumaPictureSizeInSamplesY), maximum luma samples per second (MaxLumaSamplesPerSecond), maximum bitrate (MaxBitRate), and maximum coded picture buffer size (MaxCpbSize), all tabulated in the standard with tier-specific variations. For instance, level 4.1 in the Main tier permits 1080p at 60 fps with a maximum bitrate of 20 Mbps and MaxLumaPictureSizeInSamplesY calculated approximately as 36864 × (level_idc / 30), where level_idc = 123 yields approximately 151,062 luma samples—sufficient for HD content—while the High tier variant raises the bitrate to 50 Mbps for enhanced quality. Higher levels scale these limits exponentially; level 6.2 in the Main tier supports up to 222 million luma samples for 16K video, with MaxLumaSamplesPerSecond dependent on the level and tier to cap frame rates and complexity.[4] These tier and level constraints optimize HEVC for diverse deployments by bounding computational demands and network requirements. Lower levels (e.g., 3.1) suit mobile devices with constraints like 720p at 30 fps and bitrates under 10 Mbps, enabling efficient battery and bandwidth use. Conversely, upper levels (e.g., 6.1 in High tier) target cinema and professional workflows, supporting 8K at high frame rates with large buffer sizes up to 1 Gbit for seamless high-fidelity playback. This structure promotes standardized interoperability without mandating support for all combinations.Decoded Picture Buffer Management
The Decoded Picture Buffer (DPB) in High Efficiency Video Coding (HEVC) serves as a storage mechanism for decoded pictures used in inter prediction and output reordering, enabling efficient temporal prediction while constraining memory usage. Unlike its predecessor in H.264/AVC, HEVC's DPB management employs a more flexible reference picture set (RPS) mechanism to explicitly signal which pictures are retained as references, reducing signaling overhead and improving robustness to packet loss. This approach allows the encoder to mark pictures as short-term or long-term references, with the decoder maintaining the buffer according to these signals and level-specific constraints.[67] The size of the DPB is signaled in the sequence parameter set (SPS) via the parametersps_max_dec_pic_buffering_minus1[i] for each temporal sub-layer i, representing the maximum number of pictures (plus one) that can occupy the buffer at any time, with values typically ranging from 1 to 16 depending on the profile, tier, and level. For instance, lower levels like 1 to 3.1 support up to 6 pictures for the maximum luma picture size, while higher levels such as 4 to 6.2 allow up to 16 pictures when picture sizes are smaller relative to the level's maximum luma samples. An additional parameter, nuh_max_num_reorder_pics, in the network abstraction layer (NAL) unit header, specifies the maximum number of pictures that may need reordering for output before the current picture, ensuring the DPB accommodates both reference and delayed output pictures without exceeding the signaled size. These limits are derived from MaxDpbSize, calculated based on the picture size in luma samples and the level's MaxDpbPicBuf value (e.g., 6 for main tiers up to level 6.2), using formulas such as MaxDpbSize = min(4 * MaxDpbPicBuf, 16) when the picture size is one-quarter or less of the level's maximum.[67]
Reference pictures in the DPB are organized into RPSs, which consist of short-term and long-term lists explicitly defined in the SPS or slice headers to indicate pictures used for prediction of the current picture. Short-term references are managed via a sliding window mechanism or explicit deltas in picture order count (POC), with parameters like NumShortTermRefs tracking pictures before (PocStCurrBefore) and after (PocStCurrAfter) the current POC, as well as future pictures (PocStFoll); the sliding window automatically removes the oldest short-term reference when the buffer fills, based on sps_max_num_reorder_pics. Long-term references, signaled by long_term_ref_pics_present_flag and up to 32 per SPS via num_long_term_ref_pics_sps, use POC least significant bits (poc_lsb_lt) and MSB cycle deltas for identification, divided into current (PocLtCurr) and future (PocLtFoll) lists; these persist longer than short-term ones, aiding in error resilience for applications like random access. The total number of references in an RPS is constrained to not exceed MaxDpbSize - 1, preventing buffer overflow.[67]
Memory management in the DPB follows the Hypothetical Reference Decoder (HRD) model outlined in Annex C of the HEVC standard, which enforces conformance by simulating buffer operations to avoid underflow or overflow during decoding. Pictures are added to the DPB after decoding all slices, marked as "used for reference" or "unused," and removed either by explicit bumping (when exceeding the maximum size before inserting the current picture) or upon output; the process ensures that the DPB occupancy, calculated as the maximum of short-term and long-term references combined, satisfies NumPicsInDPB ≤ sps_max_dec_pic_buffering_minus1[HighestTid] + 1. This model uses timing parameters like pic_dpb_output_delay to schedule output reordering, with equations such as the DPB output interval DpbOutputInterval[n] = DpbOutputTime[nextAuInOutputOrder] - DpbOutputTime[n] verifying delay constraints across access units. Conformance requires that no more pictures are stored than specified, and operations like "no_output_of_prior_pics_flag" allow flushing the DPB at random access points.
For scalability extensions, HEVC incorporates optimizations such as reference picture resampling, which allows referencing pictures of different resolutions from the current layer by applying phase-based interpolation (e.g., 8-tap for luma, 4-tap for chroma) signaled in the picture parameter set (PPS) or supplemental enhancement information (SEI) messages. This technique, enabled by flags like scaled_ref_layer_offset_present_flag in multi-layer profiles, reduces memory demands in hierarchical coding by resampling lower-resolution references, supporting up to 6:1 resolution ratios while maintaining prediction accuracy.