Fact-checked by Grok 2 weeks ago

Discrete cosine transform

The Discrete Cosine Transform (DCT) is a Fourier-related transform similar to the discrete Fourier transform (DFT) that expresses a finite sequence of equally spaced data samples as a sum of cosine functions oscillating at different frequencies, using only real numbers for computational efficiency.^[1]^[2] Introduced in 1974 by Nasir Ahmed, T. Natarajan, and K. R. Rao, the DCT was developed as a practical alternative to the optimal but computationally intensive Karhunen–Loève transform for applications in signal processing and data compression.^[3] There are four primary variants of the DCT—labeled DCT-I, DCT-II, DCT-III, and DCT-IV—each defined by specific boundary conditions and symmetry properties that make them suitable for different boundary value problems in signal analysis.^[2] The most commonly used is the DCT-II, which applies a forward transform to input sequences and is invertible via the DCT-III, ensuring orthogonality and perfect reconstruction in the absence of quantization.^[2] Its one-dimensional formulation for a sequence x_n of length N is given by X_k = \sum_{n=0}^{N-1} x_n \cos\left[\frac{\pi (2n+1)k}{2N}\right] for k = 0, \dots, N-1, often with scaling factors for normalization.^[4] In two dimensions, it extends separably to block-based processing, such as 8×8 pixel arrays, yielding a matrix of frequency coefficients.^[5] A key advantage of the DCT lies in its strong energy compaction property, where it concentrates most of a signal's energy into the low-frequency coefficients, particularly for correlated data like natural images, enabling efficient compression by quantizing and discarding higher-frequency terms with minimal perceptual distortion.^[6] This orthogonality and compaction make it superior to the DFT for real-valued signals, as it avoids complex arithmetic and reduces boundary artifacts through even symmetry.^[2] Computationally, fast algorithms based on the fast Fourier transform allow the DCT to be evaluated in O(N \log N) time, facilitating real-time processing.^[3] The DCT has become foundational in digital media standards due to its balance of performance and simplicity.^[7] It forms the core of lossy compression in the JPEG still-image standard, where 8×8 DCT blocks are quantized to achieve high compression ratios while preserving visual quality.^[5]^[7] Similarly, variants like the DCT-II underpin video codecs in MPEG-1, MPEG-2, and H.26x families, contributing to efficient transmission and storage of multimedia content.^[8] Beyond compression, the DCT appears in audio processing (e.g., modified forms in MP3), numerical solutions to partial differential equations, and feature extraction in machine learning.^[2]

Fundamentals

Informal overview

The discrete cosine transform (DCT) represents a sequence of finitely many data points as a sum of cosine functions oscillating at different frequencies, providing an efficient way to analyze the frequency content of real-valued signals.^[3] Introduced as a real-valued alternative to the discrete Fourier transform (DFT), the DCT uses only cosine basis functions rather than the complex exponentials of the DFT, which eliminates imaginary components and simplifies computations for applications involving real data.^[3] Like the Fourier transform, which decomposes a signal into its constituent frequencies to reveal patterns of variation, the DCT breaks down a signal into low-frequency components that capture broad, smooth trends and high-frequency components that highlight fine details or rapid changes.^[1] A key advantage of the DCT arises from its boundary conditions, which assume an even extension of the signal, resulting in symmetric basis functions that better match the typical structure of natural signals and images, leading to more concentrated energy in the lower frequencies compared to the DFT.^[3] This energy compaction property means that most of a signal's information is packed into a few low-frequency coefficients, while higher ones can often be approximated or discarded with minimal loss in perceptual quality.^[1] For a basic one-dimensional example, consider a simple signal like a gradually rising audio waveform or a pixel intensity row in an image, represented as a series of points along a line. Applying the DCT (such as the commonly used type-II variant) yields coefficients where the first few values dominate, illustrating the smooth overall shape, while subsequent values diminish rapidly, reflecting sparse details. This visualization—original signal as a continuous curve versus coefficients as a steeply declining bar graph—highlights how the transform shifts focus from spatial to frequency domain for easier manipulation.^[4]

Relation to discrete Fourier transform

The discrete cosine transform (DCT) is mathematically derived from the discrete Fourier transform (DFT) by considering the DFT of a signal that has been symmetrically extended, specifically through an even extension that preserves the real-valued cosine components while eliminating the imaginary sine parts. For a finite-length sequence x of length N, the even extension constructs a 2N-point sequence by mirroring x around the boundaries, such that the extended signal is \tilde{x} = x for $0 \leq n < N and \tilde{x} = x[2N - 1 - n] for N \leq n < 2N. Applying the DFT to this extended signal yields coefficients whose imaginary parts vanish due to the even symmetry, leaving only the real parts, which correspond to the cosine terms. This derivation establishes the DCT as a real-valued subset of the Fourier transform tailored for real signals with symmetric boundary conditions.^[9] The boundary conditions play a crucial role in this relationship: even extensions enforce symmetry that aligns with cosine functions, as the sine components, which are odd, become zero under this mirroring. In contrast, odd extensions would lead to a discrete sine transform (DST) with only sine basis functions. The DFT, however, operates on periodically extended signals without such symmetry constraints, using complex exponential basis functions e^{-j 2\pi k n / M}, where M is the transform length, combining both cosine and sine oscillations. The DCT basis functions, conversely, consist solely of real cosines, such as \cos(\pi k (n + \alpha)/N) for appropriate shifts \alpha depending on the variant, providing a more intuitive and computationally efficient representation for real, stationary signals. This cosine-only basis arises directly from the real part of the DFT on the even-extended input.^[9]^[3] Formally, the DCT coefficients X_k can be expressed in terms of the DFT as

X_k = \operatorname{Re} \left\{ \operatorname{DFT} \left\{ \tilde{x} \right\}_k \right\},

where \tilde{x} is the even-extended signal of length 2N, and the DFT is computed over 2N points. This relation highlights how the DCT avoids the complex arithmetic of the DFT while retaining its frequency decomposition properties.^[9] A key advantage of the DCT over the DFT stems from its energy compaction property, where for typical signals like highly correlated Markov processes, the DCT concentrates more signal energy into the lower-frequency coefficients than the DFT does. This occurs because the cosine basis better matches the smooth, decaying autocorrelation typical of natural signals, reducing the magnitude of higher-frequency terms compared to the oscillatory complex exponentials in the DFT. As a result, fewer coefficients are needed to represent a given energy level, enhancing efficiency in subsequent processing tasks.^[3]

Formal Mathematics

DCT-I

The Discrete Cosine Transform of type I (DCT-I) applies to a finite sequence of N+1 real-valued data points x_0, x_1, \dots, x_N, transforming it into a set of N+1 cosine coefficients X_0, X_1, \dots, X_N. The forward transform is given by

X_k = \sum_{n=0}^{N} x_n \cos\left( \frac{\pi k n}{N} \right), \quad k = 0, 1, \dots, N.

This formulation arises from sampling cosine functions at integer multiples of \pi / N, ensuring orthogonality for even-symmetric extensions of the sequence.^[10] The transform matrix \mathbf{C} is symmetric and real orthogonal (up to scaling), with entries C_{k,n} = \cos(\pi k n / N), allowing the transform to be expressed as \mathbf{X} = \mathbf{C} \mathbf{x}. The basis vectors correspond to cosine waves with frequencies that fit neatly within the sequence length, starting from a constant (DC) component at k=0 and increasing to the highest frequency at k=N. For illustration, in small dimensions like N=1, the matrix reduces to \mathbf{C} = \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix} (unnormalized), while for N=2, it is \begin{pmatrix} 1 & 1 & 1 \\ 1 & 0 & -1 \\ 1 & -1 & 1 \end{pmatrix} (unnormalized).^[9] The DCT-I is intimately connected to the Chebyshev polynomials of the first kind, T_k(x), defined by T_k(\cos \theta) = \cos(k \theta). By mapping the sequence indices to points x_n = \cos(\pi n / N), the DCT-I coefficients directly yield the expansion coefficients of a polynomial interpolated from the data points in the Chebyshev basis on [-1, 1]. This relationship facilitates applications in numerical analysis, such as efficient polynomial multiplication and approximation, where the transform enables fast evaluation via pointwise operations.^[11] The basis functions thus inherit the minimax properties of Chebyshev polynomials, providing near-optimal approximation for smooth functions.^[12] DCT-I derives from the Fourier cosine series expansion of a function on a finite interval [0, L] with even (Neumann) boundary conditions at both endpoints, where the series is \sum_{k=0}^{\infty} a_k \cos(k \pi x / L). Discretizing the function at N+1 equidistant points x_n = n L / N and truncating the series to N+1 terms yields the DCT-I formula exactly, as the cosine terms align with the sampled basis. This derivation underscores its suitability for problems with reflective symmetries, avoiding artifacts from abrupt boundaries.^[9] In signal processing, DCT-I finds niche use in scenarios requiring reflective boundaries, such as channel estimation in multicarrier communication systems with symmetric channel responses, where it exploits even symmetry to reduce estimation complexity. It also appears in filter design for one-dimensional signals modeled with Neumann conditions at the edges, enabling accurate representation without edge distortions in finite-length convolutions. Unlike the DCT-II, which dominates in data compression due to its half-sample symmetry for zero-padded extensions, DCT-I is preferred for these boundary-aware applications.^[13]

DCT-II

The type-II discrete cosine transform (DCT-II), also known as the forward DCT, is the most commonly used variant and is defined for an input sequence x = (x_0, x_1, \dots, x_{N-1}) of length N as

X_k = \sum_{n=0}^{N-1} x_n \cos\left[ \frac{\pi k (2n + 1)}{2N} \right], \quad k = 0, 1, \dots, N-1.

This unnormalized form was introduced in the seminal work establishing the DCT family.^[3] To achieve orthogonality, the transform incorporates normalization factors, yielding

X_k = \alpha_k \sum_{n=0}^{N-1} x_n \cos\left[ \frac{\pi k (2n + 1)}{2N} \right],

where \alpha_0 = \sqrt{1/N} and \alpha_k = \sqrt{2/N} for k = 1, 2, \dots, N-1.^[9] The basis functions \phi_k(n) = \cos\left[ \pi k (2n + 1)/(2N) \right] represent cosine waves with frequencies increasing from zero (the DC component, a constant function) to nearly N/2 cycles over the interval, providing a complete set of real-valued modes that span the space of length-N sequences.^[9] The orthogonality of the DCT-II basis follows from the transform matrix being symmetric and composed of eigenvectors of a symmetric second-difference matrix corresponding to specific boundary conditions, ensuring \sum_{n=0}^{N-1} \phi_j(n) \phi_k(n) = N \delta_{jk} before normalization (where \delta_{jk} is the Kronecker delta).^[9] With the specified \alpha_k, the normalized matrix satisfies C^T C = I, confirming unitarity up to scaling.^[14] DCT-II is preferred for signal compression due to its superior energy compaction properties for natural signals, as it concentrates most energy into low-frequency coefficients, approaching the performance of the optimal Karhunen-Loève transform.^[3] This arises from its implicit boundary conditions: the transform assumes the input sequence is extended evenly around n = -0.5 and oddly around n = N - 0.5, which aligns well with the smooth, continuous nature of typical signals at block edges, reducing boundary discontinuities and high-frequency leakage compared to other variants.^[9] The DCT-II matrix relates to the DCT-I matrix through a shift in indices by half a sample, effectively adjusting the cosine arguments to \pi k (n + 1/2)/N from the integer-sample symmetry of DCT-I, enabling better adaptation to zero-extended or padded signals.^[9]

DCT-III

The Discrete Cosine Transform of type III (DCT-III) serves as the inverse-oriented variant within the DCT family, commonly employed in synthesis operations to reconstruct signals from frequency coefficients using a cosine basis adjusted for even symmetry around half-integer points. This transform is particularly valued for its role in ensuring perfect reconstruction when paired with a forward transform, while accommodating specific boundary symmetries that minimize artifacts in signal processing pipelines. The mathematical definition of the DCT-III is given by

x_n = \frac{1}{2} X_0 + \sum_{k=1}^{N-1} X_k \cos\left( \frac{\pi k (2n + 1)}{2N} \right)

for n = 0, 1, \dots, N-1, where \{x_n\} represents the output time-domain sequence and \{X_k\} the input frequency-domain coefficients. In this unnormalized form, the factor of \frac{1}{2} applies specifically to the DC component X_0 to account for the symmetry in the basis expansion. In matrix notation, the DCT-III operation is \mathbf{x} = C^{(III)} \mathbf{X}, where C^{(III)} is an N \times N matrix with entries C^{(III)}_{n,0} = \frac{1}{2} for all n, and C^{(III)}_{n,k} = \cos\left( \frac{\pi k (2n + 1)}{2N} \right) for k = 1, \dots, N-1 and n = 0, \dots, N-1. This matrix is the transpose of the corresponding DCT-II matrix (with appropriate scaling adjustments for normalization), highlighting their conjugate relationship in transform pairs. The DCT-III finds frequent use in paired forward-inverse configurations, such as in cosine-modulated filter banks for multirate signal processing, where it acts as the synthesis bank complementing a DCT-II analysis bank to achieve near-perfect reconstruction with controlled aliasing cancellation. For instance, in standards like JPEG, it pairs with DCT-II to enable efficient decoding of compressed image data.^[15] Unlike other DCT variants, the DCT-III's basis functions enforce even symmetry around n = -0.5 and odd symmetry around n = N - 0.5, making it well-suited for odd-length signal extensions where boundary discontinuities are modeled through half-sample shifts, thus preserving continuity in reflective or periodic prolongations without introducing severe Gibbs phenomena. Orthogonality of the DCT-III is established through direct evaluation of the inner products of its basis vectors. For distinct indices j \neq k, the sum \sum_{n=0}^{N-1} \cos\left( \frac{\pi j (2n + 1)}{2N} \right) \cos\left( \frac{\pi k (2n + 1)}{2N} \right) = 0, derived from the product-to-sum trigonometric identity \cos a \cos b = \frac{1}{2} [ \cos(a+b) + \cos(a-b) ], which yields sums over full periods of cosine functions that integrate to zero under the even boundary conditions. For j = k, the inner product equals N/2 (or N for the DC term, adjusted by scaling), confirming the basis set's linear independence and the transform's invertibility.

DCT-IV

The type-IV discrete cosine transform (DCT-IV) is defined for a finite sequence of N real numbers x_n, n = 0, 1, \dots, N-1, by the transformation

X_k = \sum_{n=0}^{N-1} x_n \cos\left[ \frac{\pi}{N} \left(n + \frac{1}{2}\right) \left(k + \frac{1}{2}\right) \right], \quad k = 0, 1, \dots, N-1.

This form produces N real-valued coefficients X_k and is particularly suited for signals with periodic boundary conditions that wrap around without abrupt discontinuities.^[10] The basis functions of the DCT-IV are cosine waves given by \cos\left[ \frac{\pi}{N} \left(n + \frac{1}{2}\right) \left(k + \frac{1}{2}\right) \right], which span a full period over the interval n = 0 to N-1 without enforcing zeros at the boundaries, unlike some other DCT variants. These basis vectors form an orthogonal set, enabling efficient representation of signals that exhibit smooth periodic extensions.^[10] The DCT-IV can be derived from the Fourier series coefficients of a doubled, periodically extended signal with even symmetry centered at half-integer points (such as n = -0.5 and n = N - 0.5), ensuring the transform captures the cosine components of this periodic continuation. This derivation highlights its suitability for applications involving overlap-add operations in time-domain processing.^[10] To achieve unitarity, which preserves the Euclidean norm of the signal (i.e., \sum |x_n|^2 = \sum |X_k|^2), the transform is normalized by the factor \sqrt{2/N}, yielding the unitary DCT-IV:

X_k = \sqrt{\frac{2}{N}} \sum_{n=0}^{N-1} x_n \cos\left[ \frac{\pi}{N} \left(n + \frac{1}{2}\right) \left(k + \frac{1}{2}\right) \right].

This normalization makes the transform matrix orthogonal, facilitating reversible operations in signal processing.^[10] The DCT-IV serves as the core transform in the modified discrete cosine transform (MDCT), which is widely used for perfect reconstruction in audio coding schemes due to its effective handling of overlapping signal blocks.^[16]

Other variants

The less common variants of the discrete cosine transform, types V through VIII, feature specialized boundary conditions that result in "odd-type" behaviors, distinguishing them from the even-type variants I through IV. These types are rarely employed in mainstream applications like image, video, speech, and audio coding due to their less optimal energy compaction for typical signals.^[17] DCT-V exhibits mixed symmetry, blending even and odd boundary conditions (such as Dirichlet at one end and Neumann at the other), and has been utilized in certain quadrature mirror filter designs for subband coding. Its basic form can be sketched as a sum of cosines with arguments involving half-integer shifts, like \cos\left[\frac{(2k+1)\pi (n + 1/2)}{2N}\right], though it sees limited practical adoption.^[10] DCT-VI supports antisymmetric odd extensions of the input sequence, aligning with odd symmetry around specific points, and remains niche primarily in theoretical signal processing studies, such as mappings to Fourier transforms for algorithm development.^[18] DCT-VII relates to DCT-III through a shift in symmetry points (half-sample odd at one boundary and whole-sample even at the other), with occasional applications in approximation theory for orthogonal expansions.^[17] DCT-VIII mirrors DCT-I in symmetry but applies to half-range extensions with Neumann conditions at both ends, establishing links to Legendre polynomials in the context of orthogonal polynomial bases for numerical analysis.^[19] The following table compares the eight DCT types based on their implied periodic extensions of the input sequence, symmetry nature (even or odd), and length adjustments:

Type	Extension Type	Symmetry Points	Periodic Length
DCT-I	Even	Whole-sample at both ends	2N
DCT-II	Even	Whole-sample left, half right	2N
DCT-III	Even	Half left, whole-sample right	2N
DCT-IV	Even	Half-sample at both ends	2N
DCT-V	Odd	Half left, whole-sample right	2N - 1
DCT-VI	Odd	Whole-sample at both ends	2N - 1
DCT-VII	Odd	Whole-sample left, half right	2N - 1
DCT-VIII	Odd	Half-sample at both ends	2N - 1

These extensions determine the transform's boundary handling and orthogonality properties.^[10]^[17]

Inverse Transforms

Inverse DCT-I

The inverse discrete cosine transform of type I (IDCT-I) reconstructs the original symmetric signal \mathbf{x} of length N from its DCT-I coefficients \mathbf{X}, ensuring perfect reconstruction for signals exhibiting even symmetry around the boundaries. The transformation is represented as \mathbf{X} = T_I \mathbf{x}, where T_I is the N \times N DCT-I matrix with elements T_{I,k,n} = p_k q_n \cos \left( \frac{\pi k n}{N-1} \right) in the unitary form, with p_k = \sqrt{ \frac{2 - \delta_{k,0} - \delta_{k,N-1}}{N-1} } and q_n = \sqrt{ \frac{1}{1 + \delta_{n,0} + \delta_{n,N-1}} }, where \delta is the Kronecker delta. The inverse is then \mathbf{x} = T_I^{-1} \mathbf{X}.^[20] With this scaling, T_I is symmetric and unitary (T_I^T T_I = I), so the inverse simplifies to T_I^{-1} = T_I^T = T_I, making the IDCT-I identical to the forward DCT-I. The reconstruction formula is thus

x_n = q_n \sum_{k=0}^{N-1} p_k X_k \cos \left( \frac{\pi k n}{N-1} \right),

for n = 0, \dots, N-1. This unitary form preserves the \ell_2-norm of the signal.^[20] The invertibility of T_I follows from its full rank: the columns (or rows) form a complete orthogonal basis for \mathbb{R}^N, as the set \left\{ \cos \left( \frac{\pi k n}{N-1} \right) \mid k = 0, \dots, N-1 \right\} for fixed n spans \mathbb{R}^N due to the linear independence of these trigonometric functions at distinct frequencies k, confirmed by the Vandermonde structure and non-zero determinant in finite dimensions. For the unnormalized case, the matrix remains invertible as the scaling is a diagonal positive-definite matrix, preserving rank. As a simple example, consider N=2. The unnormalized T_I = \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}, with \det(T_I) = -2 \neq 0, confirming full rank. The inverse is T_I^{-1} = \frac{1}{2} T_I = \begin{pmatrix} 1/2 & 1/2 \\ 1/2 & -1/2 \end{pmatrix}. Applying forward and inverse yields the identity: for \mathbf{x} = \begin{pmatrix} x_0 \\ x_1 \end{pmatrix}, \mathbf{X} = T_I \mathbf{x} = \begin{pmatrix} x_0 + x_1 \\ x_0 - x_1 \end{pmatrix}, and T_I^{-1} \mathbf{X} = \frac{1}{2} \begin{pmatrix} (x_0 + x_1) + (x_0 - x_1) \\ (x_0 + x_1) - (x_0 - x_1) \end{pmatrix} = \begin{pmatrix} x_0 \\ x_1 \end{pmatrix}. With unitary scaling, q_n = 1/\sqrt{2} and p_k = 1 for both indices, the round-trip directly recovers \mathbf{x} without additional factors.^[21]

Inverse DCT-II

The inverse discrete cosine transform of type II (IDCT-II) reconstructs a finite-length signal x_n from its DCT-II coefficients X_k. In the standard orthogonal normalization, it is given by

x_n = \sum_{k=0}^{N-1} \alpha_k X_k \cos\left( \frac{\pi k (2n + 1)}{2N} \right), \quad n = 0, 1, \dots, N-1,

where \alpha_0 = \sqrt{1/N} and \alpha_k = \sqrt{2/N} for k = 1, \dots, N-1.^[9] This formulation pairs with the corresponding forward DCT-II to yield an orthogonal transform pair. In matrix notation, let \mathbf{C} denote the N \times N DCT-II transformation matrix with entries C_{k,n} = \alpha_k \cos\left( \frac{\pi k (2n + 1)}{2N} \right). The forward transform is then \mathbf{X} = \mathbf{C} \mathbf{x}, and the inverse is \mathbf{x} = \mathbf{C}^T \mathbf{X}, since \mathbf{C} satisfies \mathbf{C}^T \mathbf{C} = \mathbf{I}, confirming its orthogonality.^[9] Alternative normalization conventions adjust the scaling factors for computational efficiency or compatibility with specific standards. For instance, in JPEG image compression, the IDCT-II employs

x_n = \frac{1}{N} \sum_{k=0}^{N-1} \beta_k X_k \cos\left( \frac{\pi k (2n + 1)}{2N} \right),

with \beta_0 = 1/\sqrt{2} and \beta_k = 1 for k > 0; for N=8, this yields a leading factor of $1/8. This variant distributes the scaling to support fixed-point implementations while preserving the core cosine basis. The orthogonality property guarantees a lossless round-trip transformation: applying the forward DCT-II followed by the IDCT-II recovers the exact input signal \mathbf{x} without error, as \mathbf{C}^T \mathbf{C} = \mathbf{I}.^[9] This invertibility underpins its utility in signal reconstruction for applications such as decoding compressed data.

Inverse DCT-III

The inverse discrete cosine transform of type III (IDCT-III) reconstructs the original input sequence from the coefficients produced by the DCT-III, particularly suited for synthesis stages in filter banks where the signal exhibits odd symmetry around the midpoint of the extended sequence. This transform pair ensures perfect reconstruction when the forward and inverse operations are properly scaled, leveraging the orthogonality of the underlying cosine basis functions.^[9] The formula for the unnormalized IDCT-III is

x_n = \sum_{k=0}^{N-1} X_k \cos\left( \frac{\pi (k + 1/2) n}{N} \right), \quad n = 0, 1, \dots, N-1,

where X_k are the DCT-III coefficients. This expression mirrors the form of the DCT-II but with transposed indices, reflecting the transpose relationship between the DCT-II and DCT-III matrices.^[9]^[22] For perfect reconstruction in cases involving odd-symmetric signal extensions—where the sequence is assumed odd with respect to n = -1/2 and n = N - 1/2 in the periodic doubling—the forward DCT-III and inverse IDCT-III pair must incorporate a scaling factor of $1/(2N) applied to the sum, ensuring the round-trip transformation yields the identity. This scaling accounts for the energy preservation in the odd-symmetric boundary conditions, preventing amplitude distortion during synthesis.^[9] To achieve unitarity, the normalized IDCT-III incorporates additional scaling factors: the basis vectors are multiplied by $1/\sqrt{N} overall, with the zeroth component scaled by $1/\sqrt{2N} to maintain the orthonormal property of the transform matrix. Under these conditions, the IDCT-III matrix C^{\mathrm{III}} satisfies (C^{\mathrm{III}})^T C^{\mathrm{III}} = I, confirming invertibility and energy conservation for odd-symmetric inputs. The inverse of the unitary DCT-III is precisely the unitary DCT-II, up to index transposition.^[9]^[22]

Inverse DCT-IV

The inverse discrete cosine transform of type IV (IDCT-IV) reconstructs a finite-length sequence from its DCT-IV coefficients, assuming a periodic extension of the signal with specific symmetry properties. It is mathematically defined for n = 0, 1, \dots, N-1 as

x_n = \frac{2}{N} \sum_{k=0}^{N-1} X_k \cos\left( \frac{\pi (2k + 1)(2n + 1)}{4N} \right),

where X_k are the DCT-IV coefficients and the normalization factor $2/N ensures orthogonality when paired with the unnormalized forward transform. This formulation pairs with the unnormalized forward DCT-IV to yield exact reconstruction of the original sequence.^[21]^[23] The IDCT-IV inherently handles wrap-around boundaries through its basis functions, which correspond to a periodic signal of period $2N exhibiting quarter-wave symmetry and anti-periodicity. Specifically, the reconstructed sequence satisfies x_{n+2N} = -x_n, with odd symmetry around points n = -1/2 and n = N - 1/2. This boundary behavior introduces controlled aliasing that manifests as wrap-around effects, distinguishing it from acyclic transforms like the IDCT-II; in practice, these effects are managed by treating the output as the central N samples of a $2N-point aliased sequence.^[16]^[24] In modulated filter bank setups, such as the modified discrete cosine transform (MDCT), the IDCT-IV enables perfect reconstruction when combined with 50% block overlap and a time-domain aliasing cancellation (TDAC) window. The aliasing components from adjacent blocks, arising from the wrap-around symmetry, are precisely canceled in the overlap-add operation, provided the window satisfies w^2(n) + w^2(n + N) = 1 for n = 0 to N-1. This property ensures that the overall synthesis reconstructs the original signal without distortion, a cornerstone of critically sampled filter banks. The approach was originally demonstrated for TDAC-based systems, confirming invertibility under these conditions.^[16] The IDCT-IV is integral to audio codecs like AAC, where it supports efficient reconstruction in overlap-add processing.^[25]

Properties and Computation

Key properties

The discrete cosine transform (DCT) is an orthogonal transform, meaning its basis vectors satisfy the orthogonality condition: the inner product of two distinct basis vectors is zero, while the inner product of a basis vector with itself equals a normalization constant depending on the variant. This property holds for all standard DCT types (I through IV), as the basis functions are derived from cosine sequences that form an orthogonal set. Orthogonality enables the DCT matrix to be inverted simply by its transpose (up to scaling), which is crucial for perfect reconstruction in applications like compression.^[10] A direct consequence of orthogonality is the adaptation of Parseval's theorem to the DCT, which preserves signal energy across domains. For a normalized DCT-II of length N, this takes the form

\sum_{n=0}^{N-1} |x_n|^2 = \sum_{k=0}^{N-1} |X_k|^2,

where x_n are the input coefficients and X_k are the transform coefficients; unnormalized variants include appropriate scaling factors such as $1/N or boundary adjustments. This energy preservation property quantifies how quantization errors in the transform domain propagate to the spatial domain, aiding in rate-distortion optimization.^[26] The DCT basis functions are real-valued and even, consisting solely of cosines that exhibit symmetry about their midpoints. This even symmetry stems from the transform's equivalence to the real part of a DFT applied to an even-extended signal, minimizing discontinuities at boundaries and promoting energy concentration in lower frequencies compared to the DFT.^[10] In the DCT domain, a convolution theorem exists via symmetric convolution, where the DCT of the symmetric convolution of two finite sequences equals the pointwise product of their individual DCTs (with boundary extensions to maintain periodicity). This relation, distinct from the circular convolution of the DFT, supports efficient filtering and multiplication operations directly in the transform domain without full inverse transforms.^[27] The DCT also features decimation-in-time and decimation-in-frequency relations, mirroring those of the DFT but adapted to cosine symmetry. These allow recursive decomposition of the transform into smaller subproblems by subsampling inputs or outputs, facilitating structured proofs of invertibility and stability.

Efficient computation methods

The efficient computation of the discrete cosine transform (DCT) relies on fast algorithms that reduce the computational burden from the direct O(N²) matrix-vector multiplication to O(N log N) operations, enabling practical applications in signal processing. Fast cosine transform (FCT) algorithms exploit the inherent symmetries and sparsity in the DCT basis to decompose the transform into stages of simple additions, subtractions, and multiplications by fixed constants, analogous to the butterfly operations in the fast Fourier transform (FFT). For one-dimensional (1D) DCTs, these algorithms are particularly effective for power-of-two lengths like N=8, commonly used in standards such as JPEG. In two dimensions, the separability of the DCT allows efficient computation via successive 1D FCTs along rows and columns, minimizing redundancy while preserving the transform's properties. A key approach links the DCT-II to the FFT by extending the input sequence to length 2N with even symmetry around the boundaries (x[-n] = x and x[2N-n] = x), computing a real-valued FFT on this extended sequence, and extracting the DCT coefficients from the real parts of specific FFT outputs with post-processing adjustments. Pruning techniques eliminate computations for the symmetric zero-input portions, further optimizing the process to approach the efficiency of a pure N-point FFT. This method, introduced by Makhoul, achieves the O(N log N) complexity inherent to FFT-based decompositions and is adaptable to hardware implementations. For the widely adopted 8-point DCT-II in JPEG, Chen's algorithm provides a direct factorization into butterfly structures and 4-point rotation stages, requiring only 16 multiplications and 28 additions compared to 192 operations in the direct method. The algorithm proceeds in three stages: first, pairwise additions and subtractions to form even and odd indexed groups; second, 4-point butterflies with rotations by angles like π/4 and 3π/16 (using precomputed cosines such as √2/2 ≈ 0.7071 and cos(3π/16) ≈ 0.8315); and third, final combinations to yield the coefficients. This structure leverages the DCT matrix's quarter-wave symmetry, reducing redundant calculations without relying on FFT extensions. The computational savings are substantial: direct evaluation demands N² = 64 multiplications and additions for N=8, whereas Chen's algorithm and similar FCTs scale to O(N log N) ≈ 8 × 3 = 24 operations in the logarithmic base, with actual counts as low as 29 arithmetic operations for the full 8-point transform after optimizations. These efficiencies have made fast DCTs indispensable for real-time image and video processing, where power and speed constraints are critical. To illustrate an 8-point IDCT-II computation as used in JPEG decoding (focusing on the inverse for pixel reconstruction from quantized coefficients), the IDCT-II reconstructs the spatial samples x using the formula

x = \frac{1}{2} \sum_{k=0}^{7} c_k Y \cos\left( \frac{\pi (2n+1) k}{16} \right), \quad c_0 = \frac{1}{\sqrt{2}}, \quad c_k = 1 \ (k \geq 1),

but implemented efficiently via Chen's inverse flowgraph (which mirrors the forward with conjugate rotations). The stages involve input to butterflies—e.g., even and odd parts formed by additions/subtractions of Y indices; application of 4-point rotations; and final combinations with multiplications by fixed cosines and additions/subtractions. This step-by-step butterfly traversal avoids full matrix inversion, ensuring numerical stability within 1 LSB error for fixed-point implementations.

Multidimensional Extensions

Two-dimensional DCT

The two-dimensional discrete cosine transform (2D DCT) is a separable extension of the one-dimensional DCT, widely used for representing two-dimensional signals like digital images by decomposing them into frequency components along both spatial dimensions. Among the variants, the 2D DCT-II is the most prevalent, particularly in image compression standards, as it provides excellent energy compaction while being computationally efficient due to its orthogonality and real-valued basis functions. This transform maps an input image block or matrix into a coefficient matrix where low-frequency components capture the majority of the signal energy. The mathematical definition of the 2D DCT-II for an M \times N input matrix x_{m,n} is given by

X_{k,l} = \sum_{m=0}^{M-1} \sum_{n=0}^{N-1} x_{m,n} \cos\left[ \frac{\pi k (2m + 1)}{2M} \right] \cos\left[ \frac{\pi l (2n + 1)}{2N} \right],

where k = 0, 1, \dots, M-1 and l = 0, 1, \dots, N-1. This formula arises naturally from the product of two 1D DCT-II basis functions, one applied horizontally and one vertically, as introduced in the foundational work on the DCT.^[3] The separability of the 2D DCT allows its computation through two successive one-dimensional DCTs: first applying the 1D DCT to each row of the input matrix to obtain an intermediate matrix, then applying the 1D DCT to each column of that intermediate result. This row-column decomposition significantly reduces complexity from O(MN(M + N)) for a direct implementation to O(MN \log(MN)) when using fast algorithms for the 1D transforms.^[28] In practical image processing, the 2D DCT is often applied in a block-based manner to exploit spatial locality and manage computational load for large images. The JPEG standard, for instance, divides images into non-overlapping 8×8 blocks and computes the 2D DCT-II on each, enabling localized frequency analysis that aligns well with human visual perception and compression efficiency. The 2D DCT exhibits strong energy compaction in two dimensions, concentrating most of the signal's energy in the low-frequency coefficients near the top-left of the X_{k,l} matrix, while high-frequency coefficients in the bottom-right are typically small and can be quantized aggressively. To optimize entropy coding, these coefficients are reordered using a zigzag scan that traverses the matrix in a diagonal pattern from lowest to highest frequencies, facilitating run-length encoding of zeros. When applying the 2D DCT to entire images where dimensions are not multiples of the block size or for irregular regions, boundary handling is essential to minimize artifacts from finite extents. Common techniques include zero-padding, which extends the image with zeros, or mirroring (symmetric reflection), which replicates edge pixels to simulate even extension and reduce discontinuities.^[29]

Higher-dimensional DCTs

The discrete cosine transform (DCT) extends naturally to higher dimensions beyond two, enabling the analysis of volumetric and tensor-structured data such as those encountered in medical imaging or multidimensional signal processing. This generalization relies on the separability of the DCT kernel, where the multidimensional transform is computed as a tensor product of one-dimensional DCTs applied successively along each dimension. For an N \times M \times P-dimensional input array, the separable approach reduces the computational complexity from O((NMP)^2) for the direct non-separable implementation to O(NMP(N + M + P)) by performing successive 1D DCTs along each dimension (row-wise, column-wise, and depth-wise). Further reductions to O(NMP \log(NMP)) are possible using fast 1D algorithms.^[30] Building on the separability principle established for two dimensions, the three-dimensional DCT-II (3D DCT-II) is particularly useful for processing cubic data volumes. The forward 3D DCT-II of an input sequence x(i,j,n) for i=0,\dots,N-1, j=0,\dots,M-1, n=0,\dots,P-1 is given by

X(k,l,m) = \sum_{i=0}^{N-1} \sum_{j=0}^{M-1} \sum_{n=0}^{P-1} x(i,j,n) \cos\left[\frac{\pi (2i+1) k}{2N}\right] \cos\left[\frac{\pi (2j+1) l}{2M}\right] \cos\left[\frac{\pi (2n+1) m}{2P}\right],

with appropriate normalization factors \alpha(k), \alpha(l), and \alpha(m) defined as in the 1D case (\alpha(0) = \sqrt{1/N}, \alpha(k) = \sqrt{2/N} for k > 0, and similarly for other indices). Efficient computation of the 3D DCT-II employs the vector-radix decimation-in-frequency (VR DIF) algorithm, which decomposes the transform into stages of smaller radix-2 or radix-4 subtransforms, achieving a complexity of approximately O(N^3 \log N) multiplications for cubic sizes N \times N \times N. This method outperforms row-column separability alone by reducing redundant operations through direct multidimensional factorization.^[31] Such higher-dimensional DCTs find application in processing volumetric datasets, including MRI scans and video volume compression, where they facilitate energy compaction across spatial and temporal dimensions (with detailed uses covered in subsequent sections).^[32] For scenarios involving periodic boundary conditions, such as in hyperspectral imaging data, the multidimensional DCT-IV (MD-DCT-IV) variant is preferred, as it accommodates circular extensions without boundary artifacts. The MD-DCT-IV is computed via separable 1D DCT-IV applications and can be efficiently realized using connections to multidimensional discrete wavelet transforms, enabling fast implementations for large-scale periodic tensors.^[33]^[34]

Applications

Image and video compression

The discrete cosine transform (DCT) plays a central role in image and video compression by exploiting spatial redundancy to achieve high compression ratios while preserving perceptual quality. In still image compression, the JPEG standard divides images into 8x8 pixel blocks and applies the type-II DCT (DCT-II) to each block, transforming spatial data into frequency-domain coefficients where low-frequency components capture most of the visual energy.^[7] This allows subsequent quantization to discard high-frequency details that are less perceptible to the human eye, using predefined or custom quantization tables to scale coefficients based on frequency and perceptual importance. The quantized coefficients are then entropy-coded, typically with Huffman coding, to further reduce bitrate by assigning shorter codes to more frequent values, resulting in compression ratios often exceeding 10:1 for typical images with minimal visible artifacts.^[7] In video compression, the DCT was first standardized in MPEG-1 and MPEG-2, where it processes 8x8 blocks of intra-coded (I-)frames to remove spatial correlations, similar to JPEG, while inter-frame prediction handles temporal redundancy. This block-based DCT approach enabled efficient encoding of digital video for applications like DVD and broadcast, achieving compression factors of 50:1 or more for standard-definition content by combining intra-DCT with motion-compensated prediction.^[35] The technique evolved in the High Efficiency Video Coding (HEVC, or H.265) standard, which supports variable block sizes from 4x4 to 32x32 for DCT-II application, allowing better adaptation to content structure and reducing boundary artifacts through larger transforms that improve energy compaction for high-resolution video. HEVC's flexible DCT sizes contribute to up to 50% bitrate savings over MPEG-2 for equivalent quality, particularly in 4K and beyond. Modern codecs continue to leverage DCT for its near-optimal performance in decorrelating spatial data. The Versatile Video Coding (VVC, or H.266) standard, finalized in 2020, employs DCT-II as the primary transform alongside secondary options like DST-VII for residual blocks, with multiple transform selection (MTS) to choose the best basis per prediction unit, enabling 30-50% efficiency gains over HEVC for ultra-high-definition video.^[36] Similarly, the AV1 codec integrates DCT-II in a hybrid transform scheme for open-source video streaming, supporting block sizes up to 64x64 and achieving comparable or better compression than HEVC through adaptive selection that emphasizes low-frequency dominance in natural scenes. These advancements underscore the DCT's enduring advantage in concentrating signal energy into fewer coefficients, facilitating scalable compression for diverse visual data.^[35]

Audio and speech processing

The modified discrete cosine transform (MDCT), a variant of the type-IV discrete cosine transform (DCT-IV), employs 50% overlapping windows between adjacent blocks to achieve critical sampling in audio signal processing, enabling efficient frequency-domain representation while allowing perfect reconstruction through time-domain aliasing cancellation (TDAC).^[37] This overlap-add structure minimizes block boundary artifacts and supports block switching for transient signals, making MDCT particularly suitable for perceptual audio coding where energy compaction and frequency selectivity are essential.^[37] In audio compression standards, MDCT forms the core transform for several widely adopted codecs. The MP3 standard (MPEG-1 Layer III) uses a hybrid filter bank combining a polyphase filter with MDCT on subbands to achieve high compression ratios at bitrates around 128 kbps, balancing quality and efficiency for general music playback. Advanced Audio Coding (AAC), defined in MPEG-2 and MPEG-4, relies purely on MDCT with variable block sizes (2048 or 128 samples) for improved frequency resolution and perceptual modeling, delivering superior quality at similar bitrates compared to MP3. The Opus codec, standardized in RFC 6716, incorporates MDCT in its CELT mode for music signals, using short overlaps (as low as 2.5 ms) to support low-latency applications like VoIP while achieving near-transparent quality at 64-128 kbps.^[38] More recently, the Low Complexity Communication Codec (LC3) for Bluetooth LE Audio employs MDCT with adaptive windowing to enable high-quality, low-power streaming at bitrates from 32-345 kbps, supporting multi-stream audio in devices like hearing aids. For speech processing, low-delay DCT (LDCT) variants, such as those with reduced window sizes, minimize algorithmic delay to under 10 ms in real-time speech applications, outperforming traditional MDCT in latency-sensitive scenarios like teleconferencing without significant quality loss.^[39] These approaches leverage MDCT's critical sampling and aliasing cancellation to achieve low delay and efficient bandwidth use, critical for speech where perceptual transparency is less demanding than in music.^[37]

Other signal processing uses

The discrete cosine transform (DCT) facilitates efficient convolution operations in the frequency domain, leveraging its multiplication-convolution property to enable low-complexity filtering for multi-rate signal processing systems. This approach is particularly advantageous in adaptive filtering scenarios, where block-based updates in the DCT domain reduce computational overhead compared to time-domain methods, allowing seamless integration with decimation or interpolation processes without full inverse transforms.^[40] In digital watermarking, DCT coefficients serve as a robust embedding domain for hidden information, capitalizing on the transform's energy compaction to place watermarks in mid-frequency bands that withstand common signal distortions like noise addition or filtering. This technique enhances security and imperceptibility by modifying select coefficients, ensuring extraction remains viable under attacks such as compression or geometric transformations, as demonstrated in schemes combining DCT with singular value decomposition for grayscale signals.^[41]^[42] For biomedical signal analysis, DCT enables effective feature extraction from electrocardiogram (ECG) signals by converting time-domain data into frequency components, isolating key morphological traits like QRS complexes through low-order coefficients. Autocorrelation combined with DCT further refines these features, improving classification accuracy for arrhythmia detection by emphasizing periodic elements while suppressing noise, achieving identification rates above 90% in clinical datasets.^[43]^[44] Integration of DCT as a preprocessing step in machine learning pipelines, particularly for convolutional neural networks (CNNs), has gained traction since 2020 for frequency-domain analysis of signals. By transforming inputs into DCT representations, models can perform efficient augmentation via coefficient manipulation, enhancing robustness to distortions and reducing dimensionality— for instance, truncating high-frequency components yields up to 50% parameter compression in harmonic networks without significant accuracy loss. This preprocessing aids in tasks like time-series classification, where DCT captures spectral patterns overlooked in spatial domains.^[45] Multidimensional extensions of the DCT find application in radar signal processing for beamforming and target detection, where 2D or 3D transforms process array data to separate spatial frequencies, enabling low-complexity approximations that approximate full tensor decompositions with multiplierless operations. In spectroscopy, DCT-based spectral estimation improves resolution in wideband signals by reducing bias in magnitude-squared periodograms, facilitating accurate peak detection in noisy environments like chemical analysis or cognitive radio spectrum sensing.^[46]^[47]

History and Development

Origins and early work

The Discrete Cosine Transform (DCT) traces its conceptual roots to the Fourier cosine series, a mathematical tool developed in the early 19th century by Joseph Fourier to represent even periodic functions using cosines, providing a foundation for frequency-domain analysis of real-valued signals. While the continuous Fourier cosine transform had long been used in signal processing, the discrete variant gained prominence in the 1970s amid advances in digital computing and the widespread adoption of the Discrete Fourier Transform (DFT) following the 1965 Cooley-Tukey fast Fourier transform algorithm. The DCT emerged as a refinement tailored for discrete real-valued data, such as digital images, by extending the signal evenly to eliminate the discontinuities inherent in the DFT's periodic extension, thereby improving spectral concentration for practical compression tasks. In January 1974, Nasir Ahmed, T. Natarajan, and K. R. Rao formally introduced the DCT in their paper published in IEEE Transactions on Computers, defining what is now known as the type-II DCT, optimized for one-sided sequences common in block-based processing, along with its inverse (type-III DCT).^[3] They developed an efficient computation algorithm leveraging the fast Fourier transform and established key properties, including orthogonality and symmetry. The core motivation was to address limitations of the DFT in handling real-valued image data, where the DCT demonstrated superior energy compaction—concentrating over 90% of the signal's energy in the lowest-frequency coefficients for typical images, compared to the DFT's less efficient distribution due to its complex-valued nature. This property was verified through experiments on sample images, showing the DCT's potential for reducing data redundancy in compression without significant loss of perceptual quality.^[3] These initial developments sparked early experimental applications in image coding, where the DCT was integrated with differential pulse code modulation (DPCM) to form hybrid schemes that combined prediction for temporal or spatial redundancy with transform-based frequency decorrelation. Pioneering work by Ali Habibi in 1974 explored such hybrid approaches, applying the DCT alongside DPCM to pictorial data, achieving notable bitrate reductions in experimental setups with monochrome images. At Bell Laboratories, concurrent studies on video coding for systems like Picturephone incorporated similar transform-predictive techniques, evaluating the DCT's performance in DPCM frameworks to optimize bandwidth for transmitted images in the mid-1970s, laying groundwork for more efficient digital visual communication.

Standardization and evolution

The Discrete Cosine Transform (DCT), introduced in a 1974 paper by Nasir Ahmed, T. Natarajan, and K. R. Rao, represented a significant advancement in transform coding for signal compression due to its ability to approximate the optimal Karhunen–Loève transform with real-valued basis functions.^[3] Initially explored for image processing, the DCT's efficiency in energy compaction and decorrelation properties led to its evaluation in early experimental systems during the late 1970s and 1980s, paving the way for broader adoption.^[48] The DCT's formal standardization began in the late 1980s, culminating in its integration into the ITU-T H.261 video coding recommendation, approved in 1990 for low-bitrate videoconferencing over integrated services digital network (ISDN) lines. H.261 employed an 8×8 type-II DCT on luminance and chrominance blocks following motion compensation, marking the first international standard to leverage the transform for hybrid video compression and establishing a template for future codecs. This adoption was driven by the DCT's superior performance over alternatives like the discrete sine transform in block-based coding.^[49] Building on H.261, the DCT became central to still-image and video standards in the early 1990s. The Joint Photographic Experts Group (JPEG) standard, ISO/IEC 10918-1, published in 1992, specified the DCT for lossy compression of continuous-tone images, applying it to 8×8 pixel blocks after level shifting and followed by quantization and Huffman coding.^[50] Concurrently, the Moving Picture Experts Group (MPEG-1) standard, ISO/IEC 11172, finalized in 1992 and published in 1993, incorporated DCT-based residual coding in its video part for bit rates up to 1.5 Mbit/s, extending H.261 with bidirectional prediction.^[51] These standards solidified the DCT's role, with JPEG enabling widespread digital photography and MPEG-1 supporting compact disc-based video. The evolution of the DCT continued through refinements in subsequent standards, adapting to higher resolutions and bit rates while preserving its core framework. MPEG-2 (ISO/IEC 13818, 1995) enhanced MPEG-1 for broadcast and DVD applications, retaining the 8×8 DCT with added scalability modes. In audio, the modified DCT (MDCT), an lapped variant for better frequency resolution, was standardized in MPEG-1 Layer III (MP3, 1993) for perceptual coding at low bit rates.^[52] Later, ITU-T H.263 (1996) refined H.261 for internet video, incorporating optional 4×4 DCT alternatives, while H.264/AVC (2003) shifted to separable 4×4 integer DCT approximations for exact integer arithmetic and reduced complexity. Despite challenges from wavelets in JPEG 2000 (ISO/IEC 15444, 2000), the DCT's computational maturity and performance ensured its persistence in hybrid forms across generations of codecs, influencing over 90% of digital media formats by the 2010s.

References

[1]
[PDF] Discrete Cosine Transform - Semantic Scholar
Apr 11, 1996 · The discrete cosine transform (DCT) is a Fourier-related transform similar to the discrete Fourier transform (DFT), but using only real ...
[2]
None
### Summary of Introduction to Discrete Cosine Transforms (DCT)
[3]
Discrete Cosine Transform | IEEE Journals & Magazine
A discrete cosine transform (DCT) is defined and an algorithm to compute it using the fast Fourier transform is developed.
[4]
DCT - Lossy Data Compression: JPEG
The key to the JPEG baseline compression process is a mathematical transformation known as the Discrete Cosine Transform (DCT).Missing: standard | Show results with:standard
[5]
DCT
The Discrete Cosine Transform (DCT) is an example of transform coding. The current JPEG standard uses the DCT as its basis.
[6]
[PDF] Discrete Cosine Transform
energy compaction. – is the ability to pack the energy of the spatial sequence into as few frequency coefficients as possible.
[7]
The JPEG still picture compression standard - IEEE Xplore
... JPEG standard includes two basic compression methods, each with various modes of operation. A DCT (discrete cosine transform)-based method is specified for ...
[8]
Video compression using the three dimensional discrete cosine ...
Widely used image and video compression standards, such as JPEG and MPEG, use the two-dimensional discrete cosine transform (DCT) to achieve near-optimal ...
[9]
The Discrete Cosine Transform | SIAM Review
The purpose of this note is to consider real transforms that involve cosines. Each matrix of cosines yields a Discrete Cosine Transform (DCT).
[10]
[PDF] Discrete Cosine Transform - MIT Mathematics
Each Discrete Cosine Transform uses N real basis vectors whose components are cosines. In the DCT-4, for example, the jth component of vk is cos(j + 12 )(к + 12 ...
[11]
[PDF] On Polynomial Multiplication in Chebyshev Basis - arXiv
Sep 9, 2013 · The idea is to transform the input polynomials by using forward DCT-I, then perform a pointwise multiplication and finally transform the result ...
[12]
[PDF] arXiv:2412.06242v1 [math.NA] 9 Dec 2024
Dec 9, 2024 · Since the Chebyshev polynomials of the first kind are defined as Tk ... inverse transformation to DCT-I is again DCT-I. Inspecting the ...
[13]
Estimation of Symmetric Channels for Discrete Cosine Transform ...
Oct 18, 2015 · We focus on systems employing the Discrete Cosine Transform Type-I (DCT1) even at both the transmitter and the receiver, presenting an algorithm ...Missing: formula | Show results with:formula
[14]
[PDF] On discrete cosine transform - arXiv
Sep 2, 2011 · Abstract—The discrete cosine transform (DCT), introduced by. Ahmed, Natarajan and Rao, has been used in many applications.
[15]
Discrete Cosine Transform - MATLAB & Simulink - MathWorks
DCT-2 and DCT-3 are inverses of each other: Inverse of DCT-1: x ( n ) = 2 N ... The function idct computes the inverse DCT for an input sequence ...
[16]
[PDF] Type-IV DCT, DST, and MDCT algorithms with reduced numbers of ...
Jan 29, 2009 · Abstract—We present algorithms for the type-IV discrete cosine transform (DCT-IV) and discrete sine transform (DST-.
[17]
[PDF] Mapping between Discrete Cosine Transform of Type-VI/VII and ...
The so-called “odd-type” DCTs of types V, VI, VII and VIII, however, have not been widely used for purposes of coding image, video, speech and audio data.
[18]
Relationship between DCT-II, DCT-VI, and DST-VII transforms
Aug 6, 2025 · The popularity of the discrete cosine transform is based on the fact ... antisymmetric. The filtering will be efficient because fast ...
[19]
[PDF] Algebraic Signal Processing: Modeling and Subband Analysis
... (DCT-VIII):. Pb,α. = hVℓ(αk)i0≤k,ℓ<n= hcos. (k + 1/2)(ℓ + 1/2)π. (n + 1/2) ... Legendre polynomials discussed in Appendix A. Parameter ω corresponds to the ...
[20]
dct — SciPy v1.16.2 Manual
'The' DCT generally refers to DCT type 2, and 'the' Inverse DCT generally refers to DCT type 3. Type I. There are several definitions of the DCT-I; we use the ...Missing: formula | Show results with:formula
[21]
discrete cosine transform - PlanetMath
Mar 22, 2013 · The inverse of DCT-III is DCT-II. 1.4 DCT-IV. C ...Missing: formula | Show results with:formula
[22]
[PDF] Type-II/III DCT/DST algorithms with reduced number of arithmetic ...
Jan 29, 2009 · DCT-III, and also DST-II and DST-III), of power-of-two sizes, that ... II (which is the inverse, for the unitary normalization). In.
[23]
https://ieeexplore.ieee.org/document/544742
[24]
https://ieeexplore.ieee.org/document/685705
[25]
[PDF] Audio Coding based on Integer Transforms
The Modified Discrete Cosine Transform (MDCT) is widely used in modern audio coding schemes. It provides critical sampling, overlapping of blocks and good ...
[26]
[PDF] The Discrete Cosine Transform (DCT): Theory and Application
Like other transforms, the Discrete Cosine Transform (DCT) attempts to decorrelate the image data. After decorrelation each transform coefficient can be encoded ...
[27]
https://ieeexplore.ieee.org/document/295213
[28]
https://ieeexplore.ieee.org/document/5235989
[29]
[PDF] Arbitrary Shape Wavelet Transform with Phase Alignment - Microsoft
Kauff et al [6] proposed a shape-adaptive DCT. (SA-DCT) which avoided the padding. When applied to a block not fully occupied by the object, SA-DCT first moved ...
[30]
[PDF] Low-complexity Multidimensional DCT Approximations - arXiv
Jun 20, 2023 · In this paper, we introduce low-complexity multidimensional discrete cosine transform (DCT) approxi- mations. Three dimensional DCT (3D DCT) ...
[31]
Low-Complexity Real-Time Light Field Compression using 4-D ...
Motivated by the partial separability of the multidimensional spectrum of LFs, the proposed 4-D ADCT is obtained by cascading 2-D inter-view and 2-D intra-view ...
[32]
https://arxiv.org/pdf/2306.11724
[33]
Fast Computation of MD-DCT-IV/MD-DST-IV by MD-DWT ... - SIAM.org
This paper reveals relationships between the type-IV multidimensional discrete cosine transform (MD-DCT-IV) and the type-II multidimensional discrete cosine ...
[34]
Compression of Hyperspectral Imagery Using the 3–D DCT and ...
Two systems are presented for compression of hyperspectral imagery which utilize trellis coded quantization (TCQ). Specifically, the first system uses TCQ ...
[35]
Nasir Ahmed Pioneered Digital Compression Algorithms
Aug 19, 2024 · DCT was the compression technique of choice when ISO and the international Electrotechnical Commission (IEC) established the Moving Picture ...
[36]
https://ieeexplore.ieee.org/document/9449858
[37]
Subband/Transform coding using filter bank designs based on time ...
Abstract: A new, oddly stacked, critically sampled, single side-band (SSB) [7] analysis/synthesis system based on Time Domain Aliasing Cancellation (TDAC) ...Missing: Princen Bradley
[38]
RFC 6716 - Definition of the Opus Audio Codec - IETF Datatracker
... MDCT mode, Opus uses a primarily implicit bit allocation. The available bitstream capacity is known in advance to both the encoder and decoder without ...
[39]
A Discrete Cosine Transform Scheme for Low-Delay Wideband ...
This technique is applied to the coding of wideband speech at 32 kbps with low delays, simulation results showing a very high speech quality comparable to that ...Missing: LDCT | Show results with:LDCT
[40]
Design and analysis of an image resizing filter in the block-DCT ...
The proposed image-resizing method is performed in the DCT domain and exploits the multiplication-convolution property of the DCT. Filter characteristics of ...
[41]
A Simplified and Robust DCT-based Watermarking Algorithm
In this paper, we proposed a new method that simplify the previous DCT-based methods while more robust to common attacks than conventional methods.Missing: robustness | Show results with:robustness
[42]
Enhanced Invisibility and Robustness of Digital Image Watermarking ...
The proposed system was shown to have high invisibility and robustness against various types of attacks on watermarked images.
[43]
ECG features extraction using AC/DCT for biometric - IEEE Xplore
It based on autocorrelation (AC) in conjunction with the discrete cosine transform (DCT) proposed for feature extractions from the pre-processed ECG signal.
[44]
Identification of Premature Ventricular Cycles of Electrocardiogram ...
DCT and Application to ECG. Discrete cosine transform (DCT) can linearly transform the data in time domain to the frequency domain by a set of DCT coefficients ...
[45]
Harmonic convolutional networks based on discrete cosine transform
Apr 12, 2022 · Using DCT energy compaction properties, we demonstrate how the harmonic networks can be efficiently compressed by truncating high-frequency information in ...
[46]
2D-spectral estimation based on DCT and modified magnitude ...
Jan 10, 2012 · The analytic 2D-DCT preserves the desirable properties of the DCT (like, improved frequency resolution, leakage and detectability) and is ...<|control11|><|separator|>
[47]
Compressed Wideband Spectrum Sensing Based on Discrete ... - NIH
Discrete cosine transform (DCT) is a special type of transform which is widely used for compression of speech and image. However, its use for spectrum ...
[48]
[PDF] How I Came Up with the Discrete Cosine Transform - CSE, IIT Delhi
How I Came Up with the Discrete Cosine Transform. Nasir Ahmed. Electrical and ... the paper was then published in the January 1974 issue. I recall that ...Missing: original | Show results with:original
[49]
Video Coding History — Vcodex BV
Manfred Schroeder of Bell Labs applied the prediction concept specifically to video signals “preferably… ... The Discrete Cosine Transform. In their classic ...
[50]
JPEG 1
Specifies the core coding system, consisting of the well-known Huffman-coded DCT based lossy image format, but also including the arithmetic coding option, ...JPEG XS · JPEG AI · JPEG DNA · Overview of JPEG 2000
[51]
Video | MPEG
The basic principle of MPEG-1 Video is hybrid coding, a combination of block-wise motion-compensated prediction and scalar-quantized DCT-based coding of the ...
[52]
Audio | MPEG
MPEG-1 Audio. MPEG doc#: N7703. Date: October 2005. Authors: B. Grill, S. Quackenbush. MPEG-1 Layer I or II Audio is a generic subband coder operating at ...