Perceptual quantizer
The Perceptual Quantizer (PQ) is a non-linear electro-optical transfer function (EOTF) designed for high dynamic range (HDR) video and imaging, which maps absolute linear light levels to digital code values in a manner optimized for human visual perception, supporting luminance ranges from 0 to 10,000 candela per square meter (nits) using 10- or 12-bit quantization to minimize visible banding artifacts.[1][2][3] Developed by Dolby Laboratories and proposed by the United States to the ITU-R Working Party 6C in April 2012, PQ is based on the Barten contrast sensitivity model to allocate bits efficiently across the luminance range, ensuring just-noticeable differences (JNDs) are preserved without contouring.[2][1] It was standardized by the Society of Motion Picture and Television Engineers (SMPTE) as ST 2084 in 2014 and incorporated into the International Telecommunication Union (ITU) Recommendation BT.2100 in 2016, serving as one of two primary EOTFs for HDR content alongside Hybrid Log-Gamma (HLG).[1][2] PQ's perceptual uniformity derives from its use of absolute luminance scaling, which relates code values directly to display output light levels rather than relative scene-referred values, making it suitable for display-referred workflows in professional production, broadcasting, and consumer devices.[1] In practice, the 12-bit PQ curve provides approximately 2,080 code values for the 0-100 nit range, enhancing detail in shadows and mid-tones while allocating fewer bits to highlights where human sensitivity is lower.[3][4] It underpins HDR formats such as HDR10, HDR10+, and Dolby Vision, enabling seamless exchange of HDR content across capture, post-production, distribution, and playback ecosystems, with widespread adoption in UHD Blu-ray, streaming services, and HDR-capable displays reaching peak brightnesses of 700-4,000 nits.[1][2]Introduction
Definition and purpose
The perceptual quantizer (PQ) is a non-linear electro-optical transfer function (EOTF) that maps absolute luminance levels from 0 to 10,000 cd/m² (nits) to digital code values ranging from 0 to 1.[1] This function is specifically engineered to minimize visible banding artifacts by aligning quantization steps with just-noticeable differences (JNDs) derived from models of human vision, such as the Barten contrast sensitivity curve.[2] The core purpose of PQ is to facilitate the efficient encoding and representation of high dynamic range (HDR) content within limited bit depths, enabling high-quality imaging without perceptible distortions.[5] By optimizing code allocation to match human perceptual thresholds, PQ allows 12-bit quantization to cover the full luminance range effectively, whereas linear encoding or traditional power-law functions like gamma 2.4 would demand 15 bits or more to avoid contouring in HDR scenarios.[2] Perceptual uniformity in PQ ensures that changes in digital code values correspond to roughly equivalent perceptual steps in brightness as perceived by the human eye, from deep shadows to intense highlights.[1] This is accomplished by distributing quantization levels proportionally to visual sensitivity, dedicating more codes to luminance regions where the eye detects finer differences, thereby maximizing the utility of the bit depth for natural-looking HDR reproduction.[2] PQ plays a foundational role in HDR standards, including SMPTE ST 2084 and ITU-R BT.2100.[5]Historical development
The development of the perceptual quantizer (PQ) originated from early high dynamic range (HDR) research conducted by Dolby Laboratories in the late 2000s, driven by the limitations of standard dynamic range (SDR) content and displays capped at around 100 nits of brightness.[2] Researchers at Dolby sought to enable displays exceeding 1,000 nits and content that could capture a broader luminance range up to 10,000 nits, addressing the inefficiencies of traditional gamma curves like ITU-R BT.1886, which required excessive bit depths (e.g., 15 bits) and wasted quantization levels in both bright and dark regions relative to human vision sensitivity.[2] This work built on perceptual studies supported by Dolby, including those at the University of British Columbia under the Dolby Research Chair, which explored viewer preferences for HDR under varying ambient conditions to inform more natural image rendering.[6][7] In the early 2010s, HDR systems relied on custom drivers and ad-hoc tone-mapping algorithms, which were computationally intensive and prone to artifacts like contouring in high-luminance scenes.[1] Dolby's team, including key engineers such as Timo Kunkel, Scott Daly, and Scott Miller, proposed the PQ transfer function around 2012–2013 as a more efficient alternative, refining it through iterative psychophysical testing and modeling of the human visual system (HVS).[1] The design incorporated established HVS models, notably Peter Barten's contrast sensitivity function and absolute threshold of visibility, to allocate quantization steps based on just-noticeable differences (JNDs) across the luminance range, optimizing for 10- and 12-bit encoding without visible banding.[2][1] The PQ was initially published in 2014 as part of Dolby's broader HDR initiatives, marking a shift from proprietary solutions to an open framework that facilitated industry-wide collaboration.[1] This effort involved vetting with professional studios and display manufacturers to ensure robustness across viewing conditions, ultimately evolving PQ into a foundational element for HDR workflows by addressing the shortcomings of earlier log-linear and gamma-based approaches.[1]Technical details
Transfer function equations
The opto-electronic transfer function (OETF) of the perceptual quantizer maps absolute scene luminance to normalized code values and is mathematically formulated as N = \left[ \frac{ c_1 + c_2 \cdot Y_p }{ 1 + c_3 \cdot Y_p } \right]^{m_2}, where L represents the absolute scene luminance in nits, Y_p = \left( \frac{L}{L_p} \right)^{m_1} with peak luminance L_p = 10000 nits, and the constants c_1, c_2, m_1, m_2, and c_3 are predefined parameters optimized for perceptual uniformity.[1] This equation encodes the luminance into a nonlinear domain that approximates just-noticeable differences in human vision across a wide dynamic range. The OETF provides a continuous nonlinear response that compresses the dynamic range, allocating more code values to lower luminances where human contrast sensitivity is higher, without explicit piecewise definitions for low-light handling. The inverse electro-optical transfer function (EOTF), used for decoding normalized code values back to absolute luminance, is given by L = L_p \cdot \left[ \frac{ \max\left(0, \, N^{1/m_2} - c_1 \right) }{ c_2 - c_3 \cdot N^{1/m_2} } \right]^{1/m_1}, with the same constants and L_p = 10000 nits; the \max(0, \cdot) term ensures numerical stability for low code values by clamping to zero luminance in the underflow region.[1] The derivation of these equations integrates power-law exponents for mid-tone response with a rational function for smooth compression, drawing from contrast sensitivity models like Barten's threshold versus intensity curve. This approach allocates code levels proportional to perceptual steps (JNDs), optimizing bit-depth efficiency without visible banding over luminance spans from near-zero to 10,000 cd/m².[2]Key parameters and constants
The perceptual quantizer (PQ) transfer function, as defined in SMPTE ST 2084, relies on a set of fixed numerical constants that shape its non-linear response to align with human visual perception across a wide luminance range. These constants ensure uniform perceptual spacing of quantization steps, prioritizing more codes for lower luminances where the eye is more sensitive. The core parameters are m1 = 0.1593017578125, m2 = 78.84375, c1 = 0.8359375, c2 = 18.8515625, and c3 = 18.6875, each derived from rational fractions to facilitate precise fixed-point implementations.[8] The parameter m1 controls the initial response in low-luminance regions, providing finer granularity near black levels to match contrast sensitivity thresholds. Meanwhile, m2 governs the overall compression in the nonlinear mapping for higher luminances. The constants c1, c2, and c3 collectively manage the smooth transition and asymptotic behavior at elevated luminances, with c1 as an offset, c2 for scaling, and c3 for roll-off adjustment to maintain perceptual uniformity.[1] These parameters enable 12-bit encoding to cover the full luminance range from near 0 to 10,000 cd/m² without introducing visible contouring or banding artifacts, as the non-linear allocation matches just-noticeable differences in human vision. In contrast, linear quantization of the same range would require approximately 15 bits to achieve comparable perceptual fidelity, due to inefficient bit distribution across brighter areas.[2] The constants are inherently fixed to a maximum peak luminance of 10,000 cd/m² in the base specification, reflecting an absolute scale optimized for mastering reference displays, with no provisions for user-adjustable scaling to accommodate varying display capabilities. This design integrates into the broader electro-optical transfer function (EOTF) framework without requiring modifications for standard HDR workflows.[9]| Parameter | Value | Fractional Representation | Role |
|---|---|---|---|
| m1 | 0.1593017578125 | 2610 / (4096 × 4) | Response in low-luminance region |
| m2 | 78.84375 | (2523 / 4096) × 128 | Compression in nonlinear mapping |
| c1 | 0.8359375 | 3424 / 4096 | Offset for curve anchoring |
| c2 | 18.8515625 | (2413 / 4096) × 32 | Scaling factor |
| c3 | 18.6875 | (2392 / 4096) × 32 | Roll-off adjustment |