Fact-checked by Grok 2 weeks ago

Fast Fourier transform

The Fast Fourier Transform (FFT) is an algorithm that computes the discrete Fourier transform (DFT) of a finite sequence of equally spaced data points, reducing the from the direct O(N²) operations required by the naive DFT to O(N log N) for N data points, where N is typically a power of 2. This efficiency arises from the Cooley-Tukey algorithm's divide-and-conquer approach, which recursively breaks down the DFT into smaller DFTs of even and odd indexed terms, leveraging symmetries in the transform to avoid redundant calculations. The FFT was popularized by James W. Cooley and John W. Tukey in their 1965 paper, which described the algorithm for machine computation of complex , though similar ideas had appeared earlier in works by Gauss and others. Their formulation enabled practical implementation on early computers, transforming the DFT from a theoretical tool into a cornerstone of . The Cooley-Tukey radix-2 variant, applicable when N is a power of 2, is the most common, but generalizations exist for arbitrary N via mixed-radix or prime-factor methods. The FFT's impact spans numerous fields, including audio and image processing, where it enables frequency-domain analysis for filtering, compression, and feature extraction; for modulation and error correction; and scientific for solving partial equations via methods. In , it supports MRI reconstruction and ECG analysis, while in finance, it aids in time-series forecasting and option pricing. Recognized as one of the most influential algorithms of the , the FFT underpins modern technologies like wireless communication and , with ongoing research focusing on optimized implementations for and specialized hardware.

Fundamentals

Definition

The Fast Fourier transform (FFT) is an algorithm that computes the (DFT) of a finite of equally points in a computationally efficient manner. The DFT of a complex-valued x_n for n = 0, \dots, N-1 produces coefficients X_k = \sum_{n=0}^{N-1} x_n \exp\left( -2 \pi i \frac{kn}{N} \right), \quad k = 0, \dots, N-1, which represent the frequency-domain representation of the input. Direct via this summation requires \mathcal{O}(N^2) operations, but the FFT reduces the complexity to \mathcal{O}(N \log N), enabling practical computation for large N. The FFT serves a fundamental role in by converting time-domain sequences into frequency-domain representations, allowing decomposition into constituent sinusoidal components for analysis and manipulation. The transform output exhibits periodicity with period N, such that X_{k+N} = X_k, and conjugate for real-valued input sequences, where X_{N-k} = \overline{X_k} for k = 1, \dots, N-1. As an introductory example, consider the real-valued sequence of length N=4 given by x = [1, 2, 3, 4]. The FFT yields the output X = [10, -2 + 2i, -2, -2 - 2i], illustrating the DC component in X_0, the conjugate symmetry between X_1 and X_3, and the Nyquist frequency at X_2.

Relationship to the Discrete Fourier Transform

The Discrete Fourier Transform (DFT) provides a discrete-frequency representation of a finite-length sequence of equally spaced samples of a function, serving as the primary computational tool for frequency analysis in digital signal processing. For a sequence x of length N, defined for n = 0, 1, \dots, N-1, the forward DFT is given by X = \sum_{n=0}^{N-1} x e^{-j 2 \pi k n / N}, \quad k = 0, 1, \dots, N-1, where j = \sqrt{-1} and X represents the frequency-domain coefficients at discrete frequencies k / N cycles per sample. The inverse DFT recovers the original sequence via x = \frac{1}{N} \sum_{k=0}^{N-1} X e^{j 2 \pi k n / N}, \quad n = 0, 1, \dots, N-1. This pair of transforms is unitary up to scaling, preserving the structure of the signal in the frequency domain. Key properties of the DFT include , time and shift theorems, and properties, which mirror those of the continuous but adapted to finite discrete sequences. A fundamental property is , which states that the energy of the sequence is preserved between domains: \sum_{n=0}^{N-1} |x|^2 = \frac{1}{N} \sum_{k=0}^{N-1} |X|^2. This relation quantifies the power distribution across , enabling energy-based analyses without loss of information. The DFT arises as a sampled version of the continuous-time under specific assumptions. For a bandlimited continuous signal x(t) sampled at rate $1/T to yield x = x(nT), the DFT coefficients correspond to samples of the X(f) of the periodic extension \tilde{x}(t), where \tilde{x}(t) repeats every NT seconds. This periodic assumption implies that the sequence x is treated as one period of an infinite periodic train, with zero-padding if necessary to length N, and requires N to be at least the signal duration to avoid time-domain . Direct computation of the DFT involves evaluating N sums, each with N terms, leading to an overall complexity of O(N^2) arithmetic operations. This can be interpreted as a matrix-vector multiplication, where the DFT matrix \mathbf{F} has entries F_{k,n} = e^{-j 2 \pi k n / N}, a dense Vandermonde structure requiring N^2 multiplications and additions for \mathbf{X} = \mathbf{F} \mathbf{x}.

Historical Development

Early Concepts and Precursors

The foundations of the Fast Fourier Transform (FFT) trace back to the continuous Fourier analysis introduced by in his 1822 treatise Théorie analytique de la chaleur, where he demonstrated that arbitrary periodic functions could be represented as infinite sums of sines and cosines, laying the groundwork for in physical systems like heat conduction. This continuous framework motivated later discretizations as computational needs arose in the early , particularly with the advent of numerical methods for solving differential equations, where sampled data required finite approximations of Fourier integrals to enable practical calculations. A pivotal precursor emerged in Carl Friedrich Gauss's unpublished 1805 work on least-squares for orbits, in which he developed a discrete summation akin to the modern (DFT) using a divide-and-conquer approach to exploit symmetries in trigonometric sums, though this remained obscure until its posthumous publication in 1866. Building on such ideas, G. C. Danielson and advanced efficient computation in 1942 by deriving a recursive lemma that decomposes the DFT into smaller subproblems for spectrum in x-ray crystallography, achieving an N log N complexity through repeated halvings of the data length, motivated by the need for faster filtering in experimental data processing. That same year, Ralph V. L. Hartley proposed a real-valued transform alternative to the complex in his of problems in , emphasizing symmetrical kernel functions to simplify computations for real signals without imaginary components. In the 1950s, these concepts gained traction amid growing demands in early , particularly for radar signal analysis during and subsequent seismological applications, where direct DFT evaluation proved computationally prohibitive on vacuum-tube computers, prompting explorations of recursive and symmetry-based accelerations. A notable hint toward formalized fast methods appeared in I. J. Good's 1958 paper on statistical interaction algorithms, where a footnote briefly alluded to efficient multidimensional techniques via prime-factor decompositions for factorial designs, without providing a complete algorithmic description.

Cooley-Tukey Breakthrough and Evolution

James W. Cooley, working at IBM's Thomas J. Watson Research Center, and John W. Tukey, a statistician at Princeton University and Bell Labs, developed the divide-and-conquer approach to computing the discrete Fourier transform, with the algorithm first demonstrated in 1964, motivated by the need for efficient processing of seismic data to detect underground nuclear tests. Their algorithm reduced computation time dramatically compared to direct methods, enabling practical applications on early computers. Cooley and Tukey published their findings in 1965 as "An Algorithm for the Machine Calculation of Complex Fourier Series" in Mathematics of Computation, building upon scattered precursors like Carl Friedrich Gauss's 1805 work on least squares and I. J. Good's 1958 suggestions for efficient DFT computation. Tukey's interest in Fourier methods stemmed from his earlier contributions to spectrum analysis during the 1940s at , where he developed techniques for estimating power spectra from time series data, including applications in and communications . The 1965 paper quickly gained traction, with initial implementations appearing soon after. In , M. Brenner at MIT's Lincoln Laboratory published implementations of the algorithm, building on interest sparked by Thomas G. Stockham, facilitating its use in analyzing seismic and audio data, sparking broader interest among researchers. The algorithm's evolution accelerated through optimizations and dissemination in the late 1960s. Glenn D. Bergland at Sandia Laboratories published radix-2 and higher-radix variants in 1969, including a radix-8 subroutine for real-valued series that improved efficiency for specific hardware and data types. Key events included the Arden House Workshops on Fast Fourier Transform Processing, organized by the IEEE Audio and Electroacoustics Committee in 1968 and 1970, which brought together researchers from industry and academia to share implementations and applications, significantly popularizing the FFT. These developments enabled real-time in fields like and speech analysis, transforming the feasibility of large-scale Fourier computations on computers of the era.

Core Algorithms

Cooley-Tukey Algorithm

The Cooley-Tukey algorithm is a divide-and-conquer approach to computing the discrete Fourier transform (DFT) by recursively decomposing an N-point transform into smaller transforms of sizes N₁ and N₂, where N = N₁ × N₂. This factorization reduces the computational complexity from O(N²) to O(N log N) operations when N has many factors, particularly powers of two. The general recursive step reindexes the input and output arrays to separate the DFT sum into independent subproblems, with twiddle factors W = exp(-2πi / N) applied to combine results. For the common radix-2 case where N = 2ᵐ, the decomposition splits the input into even- and odd-indexed subsequences of length N/2. In the radix-2 formulation, the DFT coefficients X for k = 0 to N-1 are computed as follows for 0 ≤ k < N/2: \begin{align} X &= X_{\text{even}} + W^k \cdot X_{\text{odd}}, \\ X[k + N/2] &= X_{\text{even}} - W^k \cdot X_{\text{odd}}, \end{align} where X_even and X_odd are the (N/2)-point DFTs of the even- and odd-indexed inputs, respectively, and W = exp(-2πi / N). This even-odd split is applied recursively until base cases of 1- or 2-point DFTs are reached, forming a recursion tree with log₂ N levels. The algorithm supports both decimation-in-time (DIT), which processes input in bit-reversed order, and decimation-in-frequency (DIF), which produces bit-reversed output. The radix-2 butterfly operation is the core computational unit, combining two inputs a and b into outputs a + W^j · b and a - W^j · b, where j is the index determining the twiddle factor. In a signal-flow graph, butterflies connect inputs to outputs across stages, with each stage halving the transform size and applying twiddles. For in-place computation, the array is overwritten stage by stage: initialize with bit-reversed input (for DIT), then for each of log₂ N stages, iterate over groups of size 2^s (s from 1 to log₂ N), computing butterflies within each group using strides that double per stage, and twiddles W^{m · 2^{log N - s}} for offset m. This requires N log₂ N / 2 complex multiplications and N log₂ N complex additions overall. Pseudocode for the recursive DIT variant (assuming N is a power of 2 and complex input array x of length N) is:
function DIT_FFT(x, N):
    if N == 1:
        return x
    even = DIT_FFT(x[0::2], N/2)  # even indices
    odd = DIT_FFT(x[1::2], N/2)   # odd indices
    ω = exp(-2πi / N)
    for k in 0 to N/2 - 1:
        twiddle = ω^k
        upper = even[k] + twiddle * odd[k]
        lower = even[k] - twiddle * odd[k]
        x[k] = upper
        x[k + N/2] = lower
    return x
The input must be bit-reversed before calling, e.g., via swapping indices whose binary representations are reverses. Pseudocode for the recursive DIF variant is:
function DIF_FFT(x, N):
    if N == 1:
        return x
    for n in 0 to N/2 - 1:
        a = x[n]
        b = x[n + N/2]
        x[n] = a + b
        x[n + N/2] = (a - b) * exp(-2πi n / N)
    even = DIF_FFT(x[0::2], N/2)
    odd = DIF_FFT(x[1::2], N/2)
    for k in 0 to N/2 - 1:
        x[2*k] = even[k]
        x[2*k + 1] = odd[k]
    return x
The output is in bit-reversed order and requires reordering afterward. For an illustrative N=8 example with input x = [x₀, x₁, x₂, x₃, x₄, x₅, x₆, x₇], first apply bit-reversal for DIT: permute to [x₀, x₄, x₂, x₆, x₁, x₅, x₃, x₇]. The three stages use twiddle factors W₈ = exp(-2πi / 8) = exp(-πi / 4). Stage 1 (groups of 2): butterflies with W₈⁰ = 1, yielding sums and differences. Stage 2 (groups of 4, stride 2): twiddles W₈⁰=1 and W₈² = -i for offsets. Stage 3 (groups of 8, stride 4): twiddles W₈⁰=1, W₈¹=i, W₈²=-1, W₈³=-i. The final output X matches the , with 12 complex multiplications total versus 64 for direct computation. Prime-factor algorithms provide an alternative to radix-based decompositions for computing the discrete Fourier transform (DFT) when the transform length N factors into coprime integers, leveraging number-theoretic mappings to simplify the computation. The prime-factor algorithm (PFA), pioneered by Good in 1958 and refined by Thomas in 1963, reindexes the input and output sequences using the (CRT) to express the N-point DFT as a set of smaller DFTs without intermediate twiddle factors or bit-reversal permutations. This row-column-like decomposition treats the data as a virtual two-dimensional array of dimensions N_1 \times N_2 where N = N_1 N_2 and \gcd(N_1, N_2) = 1, computing N_2 DFTs of length N_1 followed by N_1 DFTs of length N_2. The indexing in the PFA is defined by the CRT mapping: for input index n = \langle n_1 N_2 + n_2 N_1 \rangle_N and output index k = \langle k_1 N_2 + k_2 N_1 \rangle_N, where $0 \leq n_1 < N_1, $0 \leq n_2 < N_2, $0 \leq k_1 < N_1, $0 \leq k_2 < N_2, and \langle \cdot \rangle_N denotes modulo N. The resulting DFT relation simplifies to X_k = \sum_{n_1=0}^{N_1-1} \sum_{n_2=0}^{N_2-1} x_n \omega_N^{ (n_1 N_2 + n_2 N_1)(k_1 N_2 + k_2 N_1) } = \sum_{n_1} \left( \sum_{n_2} x_n \omega_{N_2}^{n_2 k_2} \right) \omega_{N_1}^{n_1 k_1}, which separates into independent smaller DFTs along each dimension. For multiple factors, the algorithm extends recursively or iteratively over the prime factorization of N. An illustrative example is the 15-point DFT (N=15=3 \times 5), where the input sequence x{{grok:render&&&type=render_inline_citation&&&citation_id=0&&&citation_type=wikipedia}} to x{{grok:render&&&type=render_inline_citation&&&citation_id=14&&&citation_type=wikipedia}} is rearranged into a $3 \times 5 matrix via the mapping n = (5 n_1 + 3 n_2) \mod 15. Three 5-point DFTs are computed on the rows, followed by five 3-point DFTs on the columns of the result; the output is then read out using the same mapping on the indices to yield X{{grok:render&&&type=render_inline_citation&&&citation_id=0&&&citation_type=wikipedia}} to X{{grok:render&&&type=render_inline_citation&&&citation_id=14&&&citation_type=wikipedia}}. This requires 34 real multiplications and 82 real additions in total, compared to higher counts in non-factorized methods for this length. Related factorization approaches, such as the introduced in 1976, further optimize small prime-length DFTs by minimizing multiplications through sparse polynomial convolutions derived from the cyclic structure of the transform. For a prime length N = p, the (1968) maps the non-trivial indices (excluding DC and Nyquist) to a cyclic convolution of length p-1 using a primitive root g modulo p, where the index permutation is k = g^j \mod p for j = 0 to p-2, reducing the problem to an efficient convolution computable via smaller FFTs. Winograd's method generalizes this to composite short lengths (e.g., N=2,3,5) by factoring into minimal-arithmetic kernels, expressing the DFT as \mathbf{X} = \mathbf{A}_1 (\mathbf{B}_1 \mathbf{x} \circledast \mathbf{D}) \mathbf{A}_2^T, where \circledast denotes convolution and \mathbf{A}, \mathbf{B}, \mathbf{D} are sparse matrices with few non-zero entries. Winograd algorithms achieve lower multiplication counts for short transforms by exploiting algebraic identities to replace general multiplications with additions and pre/post-additions around fewer scalar multiplies. For instance:
Length NWinograd Real MultiplicationsWinograd Real AdditionsCooley-Tukey Baseline (Real Multiplications)
2024
321012
401016
543440
These counts establish the scale of savings for kernel sizes used in larger factorized FFTs, though Winograd increases addition counts and complicates in-place implementation. For N=15, combining Good-Thomas mapping with Winograd kernels yields 24 real multiplications and 244 real additions, highlighting the complementary reduction in arithmetic complexity over pure PFA or radix methods for factorable lengths.

Specialized Algorithms

For Real-Valued Inputs

When the input sequence to the discrete Fourier transform (DFT) consists entirely of real-valued numbers, specialized fast Fourier transform (FFT) algorithms can exploit the inherent symmetry of the resulting transform to reduce computational requirements by approximately half compared to the general complex case. This optimization is particularly valuable in applications such as signal processing, where inputs like audio or sensor data are often real-valued. The core approach involves treating the real sequence as a complex sequence with zero imaginary parts and then post-processing the output to extract the real DFT coefficients using the Hermitian symmetry property. One fundamental method is complexification followed by output separation. Given a real-valued sequence x(n) for n = 0, 1, \dots, N-1, form the complex input z(n) = x(n) + i \cdot 0 and compute its N-point complex Z(k). Due to the reality of the input, Z(k) = Z^*(N - k), where ^* denotes the complex conjugate. The real-valued coefficients are then recovered as the real part of Z(k) for k = 0 to N/2, specifically: X_r(k) = \frac{Z(k) + Z^*(N - k)}{2}, \quad k = 0, 1, \dots, \frac{N}{2}, with X_r(0) and X_r(N/2) (if N even) being purely real. The imaginary parts follow an odd symmetry but are typically discarded if only the magnitude spectrum is needed. This approach requires one full complex plus O(N) post-processing operations. For further efficiency, dedicated real-input FFT algorithms avoid the full complex computation by directly incorporating the symmetry into the factorization. A prominent example is the real-valued split-radix FFT, which adapts the split-radix decomposition to real data, eliminating redundant operations on imaginary components. This results in approximately N \log_2 N / 2 real multiplications and $3N \log_2 N / 2 real additions for power-of-two lengths N, roughly halving the arithmetic compared to a complex split-radix FFT. The algorithm proceeds by splitting the real input into even and odd indexed parts, applying recursive real FFTs, and using sine-cosine symmetries to combine results without complex arithmetic throughout. To compute multiple real FFTs efficiently, techniques pack two or more real sequences into a single complex input of the same length, leveraging phase shifts or direct interleaving. For two real sequences x_1(n) and x_2(n), form the complex input z(n) = x_1(n) + i x_2(n) and compute its N-point complex FFT Z(k). The individual DFTs are separated using: X_1(k) = \frac{Z(k) + Z^*(N - k)}{2}, \quad X_2(k) = \frac{Z(k) - Z^*(N - k)}{2i}, for k = 0, 1, \dots, N/2, with adjustments for DC and Nyquist terms. This method computes two N-point real FFTs using one N-point complex FFT plus O(N) separation steps, effectively halving the cost per transform. For example, in audio processing, this packing allows simultaneous transformation of stereo channels (left and right) with minimal overhead. Such optimizations are widely implemented in libraries like , building on these foundational techniques.

For Symmetric or Structured Data

When the input sequence to the discrete Fourier transform (DFT) possesses symmetries such as even or odd functions, specialized fast Fourier transform (FFT) algorithms can exploit these properties to reduce the computational size below that of the standard N-point FFT. For an even-symmetric input, where x(n) = x(N - n), the DFT computation reduces to an N/2-point FFT after appropriate preprocessing of the data into a real-valued sequence that captures the symmetric components. This approach eliminates redundant calculations inherent in the symmetry, achieving approximately half the operations of a full complex FFT while maintaining the same output. Similarly, for odd-symmetric inputs, x(n) = -x(N - n), a parallel reduction to an N/2-point FFT is possible, with modifications to the post-processing steps to recover the imaginary parts of the transform. These techniques build on the Cooley-Tukey radix-2 decomposition but prune unnecessary branches due to the symmetry constraints. Further savings arise when the input exhibits combined symmetries, such as quarter-wave even symmetry (even around both n=0 and n=N/4), allowing reduction to an N/4-point FFT. In this case, the input satisfies x(n) = x(N/2 - n) = x(N/2 + n), and preprocessing involves weighting and folding the data to form a smaller transform that encodes the full DFT. For example, consider an N=4 quarter-wave even-symmetric input x = [a, b, a, b]. The algorithm first forms a 2-point real sequence y(0) = a + b, y(1) = (a - b)/\sqrt{2}, computes its 2-point FFT, and then reconstructs the 4-point DFT via simple additions and multiplications by sine/cosine factors, requiring only 4 real multiplications and 6 additions total—far fewer than the 24 operations of a general 4-point FFT. Such reductions are particularly valuable in applications where data naturally arises with these symmetries, like certain filter designs or periodic extensions. A prominent application of symmetry exploitation occurs in computing the Discrete Cosine Transform (DCT) and Discrete Sine Transform (DST), which are tailored for real-valued inputs with even or odd extensions, yielding real outputs and serving as Hermitian-symmetric equivalents to the . The Type-II DCT, widely adopted in compression standards like and , is formulated as X_k = \sum_{n=0}^{N-1} x_n \cos\left( \frac{\pi (n + 0.5) k}{N} \right), \quad k = 0, 1, \dots, N-1, and can be derived by taking the real part of the of an augmented 2N-point sequence formed by even extension and zero-padding of the input. This embedding allows the DCT to be computed via a single 2N-point followed by O(N) post-processing, achieving the same O(N \log N) complexity as the while benefiting from the input's implicit symmetry. The DST, particularly Type-III, follows analogously using odd extensions. These transforms are Hermitian in the sense that their outputs are real and symmetric, enabling storage and computation savings comparable to real-input . In structured data contexts, such as symmetric —which arise in autoregressive modeling and linear prediction—the FFT exploits the constant-diagonal structure to enable fast matrix-vector multiplications. A symmetric N × N can be embedded into a larger (2N-1) × (2N-1) , whose eigendecomposition is performed efficiently via the , reducing the multiplication from O(N^2) to O(N log N) operations. For an N=4 example, a symmetric like T = \begin{bmatrix} t_0 & t_1 & t_2 & t_3 \\ t_1 & t_0 & t_1 & t_2 \\ t_2 & t_1 & t_0 & t_1 \\ t_3 & t_2 & t_1 & t_0 \end{bmatrix} is extended to a 7×7 circulant matrix, and the product T \mathbf{v} is obtained by forward and inverse FFTs on the padded vectors, with the central N elements extracted as the result. This method underpins iterative solvers for Toeplitz systems, leveraging the matrix's symmetry for numerical stability and efficiency.

Computational Analysis

Complexity and Operation Counts

The direct computation of the discrete Fourier transform requires \mathcal{O}(N^2) arithmetic operations for an input sequence of length N. In contrast, fast Fourier transform algorithms reduce this to \mathcal{O}(N \log N) operations through recursive decomposition. This \mathcal{O}(N \log N) bound follows from the divide-and-conquer structure of algorithms like , where the transform is recursively split into smaller subtransforms. The recursion forms a tree with \log_2 N levels for N = 2^k, and each level performs \mathcal{O}(N) operations (such as \frac{N}{2} butterflies, each involving a few arithmetic steps), summing to \mathcal{O}(N \log N) total work. For the radix-2 Cooley-Tukey algorithm, the operation count is approximately \frac{N}{2} \log_2 N complex multiplications and N \log_2 N complex additions, assuming trivial multiplications by 1 or -1 are excluded. Lower bounds establish that any linear algorithm for the requires at least \Omega(N \log N) complex additions, with no tight bound known for general arithmetic operations. Split-radix variants achieve improved counts, such as approximately $4N \log_2 N real operations. In-place implementations of the FFT, which overwrite the input array, require \mathcal{O}(N) space overall. Recursive formulations incur an additional \mathcal{O}(\log N) space for the call stack due to the recursion depth.
AlgorithmNComplex MultiplicationsComplex Additions
Cooley-Tukey (radix-2)408
Cooley-Tukey (radix-2)8424
Cooley-Tukey (radix-2)161664
Winograd4010
Winograd5419
Winograd71242
The table above compares representative operation counts for small N, where Winograd algorithms minimize multiplications for prime or short lengths at the cost of more additions.

Numerical Accuracy and Stability

The numerical stability of the Fast Fourier Transform (FFT) in finite-precision floating-point arithmetic has been extensively analyzed, revealing that accumulated rounding errors remain well-controlled despite the recursive structure of the algorithm. Backward stability proofs demonstrate that the computed output \hat{y} satisfies \hat{y} = \mathrm{FFT}(x + \Delta x), where the perturbation \|\Delta x\| \leq c \epsilon \log N \|x\| for some constant c > 0, \epsilon is the machine unit roundoff (typically $2^{-53} \approx 1.1 \times 10^{-16} in double precision), N is the transform length, and \| \cdot \| denotes a suitable vector norm such as the Euclidean norm. This bound arises from the logarithmic depth of the computation tree in radix-2 Cooley-Tukey FFTs, where each level introduces rounding errors of order \epsilon times the current intermediate norm, and errors propagate multiplicatively through the stages without significant growth due to the unitary nature of the transform. Although the FFT computation itself is stable, the underlying can exhibit ill-conditioning for specific inputs that lead to destructive interference in the , amplifying forward errors beyond the backward . For instance, inputs with components aligned such that certain output bins experience near-total cancellation can result in relative forward errors up to O(\epsilon \log N \cdot \kappa), where \kappa is the of the effective submatrix, potentially reaching O(N) in pathological cases like highly oscillatory signals near the . However, since the is unitary and thus well-conditioned overall (\kappa = 1), such amplification is rare in and typically bounded by the same O(\epsilon \log N) for generic inputs. Near-zero twiddle factors do not contribute to ill-conditioning, as all twiddles have unit , but precomputation errors in approximating these exponentials via can introduce additional bias if not handled carefully. Specific implementation details can further impact , including the bit-reversal stage, which is sensitive to indexing s that may scramble input elements and lead to complete output corruption if the reversal is imprecise. precomputation introduces roundoff s during sine and cosine evaluations, with worst-case analyses showing that certain decomposition methods (e.g., reduction via multiple- formulas) can accumulate up to O(\epsilon \log \log N) extra per factor, propagating through the butterflies to affect the overall bound. Numerical experiments for large N, such as N = 2^{20}, illustrate this growth: in double precision, the observed relative for random inputs reaches approximately $3 \times 10^{-15}, closely matching the theoretical O(\log N \cdot \epsilon) \approx 20 \epsilon \approx 2.2 \times 10^{-15}, confirming the of the bound without exceeding it significantly. To enhance stability, particularly for high-precision requirements, techniques such as compensated can be integrated into the butterfly additions, where an error compensation term tracks lost low-order bits, reducing summation errors from O(\epsilon n) to O(\epsilon) for n terms, as in Kahan's algorithm adapted to FFT stages. For applications demanding precision beyond standard floating-point, quadratic-time algorithms like the slow direct DFT offer exact computation up to machine precision without recursive error accumulation, though at higher cost; alternatively, mixed-precision schemes precompute twiddles in higher precision to minimize propagation. These methods ensure robustness, with compensated variants observed to halve error growth in large-scale transforms compared to naive implementations.

Extensions and Generalizations

Multidimensional Transforms

The fast Fourier transform extends naturally to multidimensional arrays through a separable decomposition, leveraging the one-dimensional FFT along each dimension independently. This approach, a generalization of the Cooley-Tukey algorithm, enables efficient computation of the multidimensional discrete Fourier transform (DFT) for data such as images or volumetric signals. For a d-dimensional array of dimensions N_1 \times N_2 \times \cdots \times N_d, the separable multidimensional FFT applies a one-dimensional FFT sequentially to all elements along the first dimension, then the second, and so on. The total number of operations scales as O(N \log N), where N = \prod_{i=1}^d N_i is the total array size, because the cost along each dimension i is O(N \log N_i). This efficiency holds assuming each N_i admits a fast one-dimensional factorization, such as powers of two. In the two-dimensional case, the DFT of an M \times N array x[m,n] is defined as X[k,l] = \sum_{m=0}^{M-1} \sum_{n=0}^{N-1} x[m,n] \exp\left(-2\pi i \left( \frac{km}{M} + \frac{ln}{N} \right)\right), for k = 0, \dots, M-1 and l = 0, \dots, N-1. The separability of the exponential term allows this double sum to be rewritten as a product of one-dimensional transforms. The row-column algorithm implements this by first computing an N-point FFT along each of the M rows of the input , producing an intermediate of size M \times N. Then, an M-point FFT is applied along each of the N columns of the intermediate to yield the final X[k,l]. For arrays with power-of-two dimensions, the algorithm supports in-place computation by reusing the input storage and applying bit-reversal permutations or similar indexing schemes from the one-dimensional Cooley-Tukey radix-2 method along each dimension, minimizing memory overhead. As an illustrative example, consider a 4×4 grayscale image represented as the matrix x = \begin{pmatrix} 1 & 2 & 3 & 4 \\ 5 & 6 & 7 & 8 \\ 9 & 10 & 11 & 12 \\ 13 & 14 & 15 & 16 \end{pmatrix}. Applying the row-column algorithm with 4-point FFTs (using Cooley-Tukey radix-2) first transforms each row, capturing horizontal frequency content, such as low-frequency trends from left to right. The subsequent column transforms incorporate vertical frequencies, revealing overall patterns like the linear increase in intensity. The resulting frequency-domain matrix X[k,l] encodes the image's spectral components, where low-frequency coefficients near (0,0) dominate due to the smooth gradient, while higher indices capture finer details or noise; shifting the zero-frequency term to the center via fftshift aids visual interpretation of the spectrum.

Non-Standard Lengths and Variants

The Fast Fourier Transform (FFT) algorithms like the Cooley-Tukey method are highly efficient for input lengths that are powers of two, but many applications require transforms of arbitrary or non-composite lengths, such as primes or other irregular sizes. To address this, specialized variants adapt the FFT framework to maintain near-optimal , often by reformulating the (DFT) into forms computable via standard FFTs of convenient lengths. These adaptations are crucial for scenarios where data lengths are dictated by physical constraints, like arrays or cryptographic keys. One prominent method for arbitrary lengths is Bluestein's chirp z-transform algorithm, which converts the DFT of length N into a linear convolution that can be efficiently evaluated using FFTs of padded power-of-two sizes. Developed in 1968, this approach exploits the quadratic phase structure of the DFT to express the output as: z_k = \sum_{n=0}^{N-1} x_n \exp\left(-\pi i \frac{n^2 - k^2 + 2kn}{N}\right), \quad k = 0, \dots, N-1 By pre- and post-multiplying the input and output with chirp signals (quadratic phase factors) and padding the resulting convolution to the next power of two, the transform achieves O(N \log N) complexity regardless of N's factorization, making it suitable for prime or composite lengths. This method is widely implemented in libraries like FFTW for non-standard sizes. For prime lengths specifically, Rader's algorithm provides an alternative by mapping the prime-length DFT to a cyclic of length N-1, which is then computed using an FFT of that size. Introduced in , it leverages properties of primitive roots modulo N to reorder the transform into a form where the convolution can be accelerated, again yielding O(N \log N) operations. This is particularly effective when N-1 has favorable factors for standard FFTs, though it incurs overhead from index permutations. Rader's method complements Bluestein's for primes, with implementations appearing in high-performance libraries for exact prime transforms. Beyond these, variants extend the FFT to specialized data structures. The sparse FFT targets compressible signals where the Fourier is k-sparse (with only k \ll N significant coefficients), enabling sublinear-time recovery via compressive sensing techniques that sample and hash the input to isolate non-zero frequencies. Recent integrations with have improved reconstruction accuracy for signals like structural vibrations. In quantum computing, the (QFT) generalizes the FFT for superposition states, running in O((\log N)^2) time on quantum hardware and serving as a core subroutine in for , where it extracts periods from modular exponentials. As an illustrative example, consider computing the DFT for N=17 (a prime) using Bluestein's algorithm with padding to the next power of two, M=32. The input x_n is multiplied by a chirp h_n = \exp(\pi i n^2 / N), convolved with a precomputed chirp filter via two 32-point FFTs and an inverse FFT, then adjusted by output chirp \exp(-\pi i k^2 / N); this yields the exact 17-point transform in approximately 5N \log_2 M operations, outperforming direct DFT evaluation by orders of magnitude.

Applications

Signal and Audio Processing

In signal and audio processing, the fast Fourier transform (FFT) enables efficient of one-dimensional time-series data, allowing the estimation of content in signals such as audio waveforms. A fundamental application is the computation of the power (PSD) via the periodogram method, where the squared magnitude of the FFT output provides an estimate of the signal's power distribution across . Specifically, for a discrete-time signal x of length N, the periodogram is given by P_{xx}(k) = \frac{1}{N} |X(k)|^2, \quad k = 0, 1, \dots, N-1, where X(k) is the k-th DFT coefficient obtained via FFT. This nonparametric estimator reveals dominant frequencies in non-stationary signals like speech or music, though it suffers from high variance that can be mitigated by averaging multiple periodograms. The FFT also facilitates fast , essential for implementing () filters in . Circular convolution of an input signal x with a filter h, both of length N, is computed as y = \text{IFFT}( \text{FFT}(x) \odot \text{FFT}(h) ), where \odot denotes element-wise multiplication; this exploits the to reduce complexity from O(N^2) to O(N \log N). For linear convolution of long signals, the overlap-add method segments the input into overlapping blocks, applies FFT-based to each, and sums the outputs after appropriate shifting and windowing to avoid artifacts. This technique is widely used for filtering in audio systems, enabling efficient removal of or unwanted frequency bands. In audio processing, the (STFT), which applies the FFT to ed segments of the signal, produces that visualize time-varying frequency content. The STFT of x with a w of length M is X(m, \omega_k) = \sum_{n=0}^{M-1} x[n+mH] w e^{-j \omega_k n}, where H is the hop size and \omega_k = 2\pi k / M; the magnitude squared yields the for applications like and effects. For detection in speech or music, autocorrelation can be efficiently computed using the FFT: the autocorrelation sequence r[\tau] is the IFFT of |X(k)|^2, with the period identified as the \tau of the first significant beyond the zero . As an illustrative example, consider designing an FFT-based low-pass FIR filter. The filter's h is derived from the inverse FFT of an ideal low-pass , such as a rectangular in the truncated to pass frequencies below a f_c. The resulting filter's , obtained by taking the FFT of h, exhibits a with near-unity gain up to f_c, a transition band with sidelobe ripples due to windowing, and in the , enabling effective high-frequency suppression in audio signals while preserving lower components.

Image Analysis and Scientific Computing

In image processing, the two-dimensional Fast Fourier Transform (2D FFT) enables efficient computation of convolutions, which are fundamental for operations such as blurring and . Blurring is achieved by convolving the image with a , like a Gaussian kernel, in the by multiplying the image's with the filter's transform, leveraging the to reduce computational cost from O(N^4) to O(N^2 N) for an N x N image. Similarly, sharpening enhances high-frequency components through high-emphasis filters, such as adding a scaled Laplacian to the original after frequency-domain multiplication, preserving edges while amplifying details. Frequency-domain filtering further utilizes 2D FFT for tasks like , where high-pass filters isolate abrupt intensity changes; for instance, a approximates gradients by emphasizing high frequencies perpendicular to edges, followed by magnitude computation and thresholding to delineate boundaries. In scientific computing, the FFT facilitates solving partial differential equations (PDEs) in spectral space, particularly for . A seminal approach solves the equation ∇²φ = -ρ by transforming to Fourier space, where it becomes -k² Φ(k) = -ρ(k), allowing direct division and inverse transform to obtain φ with O(N log N) complexity per dimension, ideal for in simulations. In , FFT accelerates computation of correlation functions, such as the pair distribution function g(r), by performing the of the S(k) obtained from particle positions, enabling efficient analysis of liquid structure and dynamics in large systems. Parallel and multigrid variants of FFT enhance in multidimensional . Multigrid methods combine FFT solvers on coarse grids with iterative refinement for non-periodic PDEs, reducing iterations in . As of 2025, GPU acceleration of FFT has been integrated into models, such as the Meso-NH atmospheric code (version 5.5), where FFT solves equations via pencil decomposition on up to nodes (256 MI250X GPUs) of the Adastra , achieving up to 6× speedup for high-resolution convection-permitting forecasts with horizontal grid spacing down to 100 m. As a precursor to modern compression, 2D FFT relates to the (DCT) in , where fast DCT algorithms exploit FFT-like butterfly structures to compute block-wise transforms, concentrating energy in low frequencies for quantization and lossy encoding.

Implementations and Alternatives

Software Libraries and Performance

Several prominent software libraries implement the Fast Fourier Transform (FFT), optimized for various languages, platforms, and . FFTPACK, a Fortran package developed at the , provides efficient subroutines for 1D and multidimensional FFTs of periodic, real, and symmetric sequences, serving as a foundational . FFTW (Fastest Fourier Transform in the West), a widely used C library, employs an adaptive architecture that generates platform-specific code through a planner routine, which benchmarks and selects from multiple algorithm variants to maximize performance. In the Python ecosystem, NumPy's fft module offers basic DFT routines, while SciPy's scipy.fft submodule extends this with advanced features like multidimensional transforms and real-to-complex optimizations, often leveraging backends such as PocketFFT for portability and speed. For GPU computing, NVIDIA's cuFFT library integrates seamlessly with , supporting batched and multidimensional FFTs on hardware, with optimizations for power-of-two sizes like N=2^{20}. Key performance factors in these libraries include cache efficiency, which reduces by improving data locality during the divide-and-conquer stages of the Cooley-Tukey algorithm; SIMD , enabling parallel execution of operations across multiple data elements using CPU extensions like AVX; and autotuning, as exemplified by FFTW's planner, which tests codelets to identify the fastest execution path for given hardware and problem size. As of 2025, the FFT landscape has evolved with tools, such as IBM's library, which includes implementations of the (QFT) for hybrid classical-quantum workflows, allowing FFT-like decompositions in quantum circuits integrated with classical post-processing. Additionally, sparse FFT capabilities have advanced in the scientific computing stack, with incorporating efficient sparse signal handling through its ecosystem, enabling sublinear-time approximations for signals with few dominant frequencies. Benchmarks highlight the throughput advantages of GPU acceleration over CPUs for large-scale FFTs. The table below presents representative single-precision complex-to-complex FFT performance for N=2^{20} (1,048,576 points), measured in GFLOPS (assuming ~5N \log_2 N operations), on 2025-era hardware; these establish the scale where GPUs excel for data-parallel workloads.
LibraryHardware PlatformThroughput (GFLOPS)Source
Platinum 8592+ (64-core CPU)High on multi-core CPUs
cuFFT GPU (single)Orders of magnitude higher than CPUs

Competing Transform Methods

While the Fast Fourier Transform (FFT) excels in efficient computation of the discrete Fourier transform for stationary signals, wavelet transforms offer a compelling alternative for non-stationary signals by providing localized time-frequency representations. The discrete wavelet transform (DWT), introduced by Mallat, decomposes signals using scalable and translatable basis functions, contrasting the FFT's global sinusoidal basis that assumes uniform frequency content across the signal. This multiresolution approach allows wavelets to capture transient features, such as sudden changes in seismic or biomedical signals, where FFT artifacts like spectral leakage degrade analysis. For instance, the Morlet wavelet, developed for geophysical applications, combines a Gaussian envelope with a complex exponential to yield a balanced time-frequency resolution suitable for continuous wavelet analysis of oscillatory non-stationary processes. The number-theoretic transform (NTT) serves as another FFT alternative in domains requiring exact integer arithmetic, particularly , by evaluating multiplications a prime via roots of unity in finite fields. Pioneered by Pollard, the NTT mirrors the FFT's divide-and-conquer structure but operates over rings like \mathbb{Z}/p\mathbb{Z}, eliminating floating-point precision issues inherent in FFT implementations. In lattice-based cryptosystems, such as those using ring , NTT accelerates key operations like , outperforming FFT in modular settings by ensuring error-free computations and leveraging hardware-optimized integer operations. Other specialized transforms address limitations of the FFT for particular signal classes. Chirplet transforms, generalized from Gabor and bases, parameterize functions with linear frequency sweeps to model signals—such as returns or echolocation—where FFT assumes constant frequencies and thus requires excessive basis elements for accurate representation. Introduced by and Haykin, chirplets adaptively track instantaneous frequency variations, improving resolution for frequency-modulated waveforms compared to FFT's fixed grid. For sparse signals dominated by few exponentials, Prony's method provides an approximation technique that recovers parameters like frequencies and amplitudes directly, bypassing the FFT's full spectral computation and enabling super-resolution beyond the Nyquist limit in low-complexity scenarios. Comparisons highlight contexts where these methods surpass FFT. In modular arithmetic for cryptography, NTT avoids FFT's approximation errors, achieving identical results to direct multiplication but with O(n \log n) complexity for large polynomials. For compression, wavelet transforms in JPEG2000 yield superior rate-distortion performance over FFT-related discrete cosine transform in JPEG, with gains of 20-30% in compression ratio or 2-4 dB in PSNR for natural images due to better energy compaction in subbands.
TransformKey Advantage over FFTTypical ApplicationPerformance Edge Example
Wavelet (DWT)Localized time-frequency analysis for non-stationaritySignal denoising, feature extraction20-30% better compression in JPEG2000 vs. for images
NTTExact integer computation in finite fieldsPolynomial multiplication in Error-free vs. FFT rounding in lattice schemes, same O(n \log n) speed
ChirpletHandles linear , audio chirpsImproved resolution for swept signals, fewer basis functions needed
PronySparse exponential recoverySuper-resolution Recovers frequencies beyond FFT's grid without full transform

References

  1. [1]
    Fast Fourier Transform -- from Wolfram MathWorld
    The fast Fourier transform (FFT) is a discrete Fourier transform algorithm that reduces computations from 2N^2 to 2NlgN for N points.
  2. [2]
    An Algorithm for the Machine Calculation of Complex Fourier Series
    An Algorithm for the Machine Calculation of. Complex Fourier Series. By James W. Cooley and John W. Tukey. An efficient method for the calculation of the ...
  3. [3]
    [PDF] The FFT - an algorithm the whole family can use
    Oct 11, 1999 · The FFT is a family of algorithms used for analyzing digital data, computing the discrete Fourier transform (DFT) efficiently in O(N log N) ...
  4. [4]
    8: The Cooley-Tukey Fast Fourier Transform Algorithm
    Mar 21, 2021 · The publication by Cooley and Tukey in 1965 of an efficient algorithm for the calculation of the DFT was a major turning point in the development of digital ...
  5. [5]
    [PDF] Applications of Fourier Transform to Imaging Analysis
    May 23, 2007 · Fourier transform is used in image processing for transformation, representation, encoding, smoothing, and sharpening images. It can also be ...<|control11|><|separator|>
  6. [6]
    Leaner Fourier transforms | MIT News
    Dec 11, 2013 · The fast Fourier transform, one of the most important algorithms of the 20th century, revolutionized signal processing.
  7. [7]
    Discrete Fourier Transform -- from Wolfram MathWorld
    The fast Fourier transform is a particularly efficient algorithm for performing discrete Fourier transforms of samples containing certain numbers of points.
  8. [8]
    FFT: The 60-Year Old Algorithm Underlying Today's Tech
    Aug 21, 2025 · The FFT has become an important tool for manipulating and analyzing signals in many areas including audio processing, telecommunications, ...
  9. [9]
    [PDF] Chapter 4 - THE DISCRETE FOURIER TRANSFORM - MIT
    4.2 Properties of the discrete Fourier transform ... The linearity of the DFT directly follows from that of the DTFT. 4.2.2 Symmetry. In deriving symmetry ...
  10. [10]
    Relation of the DFT to Fourier Series - Stanford CCRMA
    The DFT of a sampled signal $ x(n)$ (of length $ N$ ), is proportional to the Fourier series coefficients of the continuous periodic signal obtained by ...
  11. [11]
    [PDF] Estimating the Discrete Fourier Transform using Deep
    Mar 20, 2018 · As the DFT can be computed via a dense matrix multiplication, one can compute the DFT naively in O(N2) time.
  12. [12]
    Théorie analytique de la chaleur : Fourier, Jean Baptiste Joseph ...
    Dec 14, 2009 · Théorie analytique de la chaleur. by: Fourier, Jean Baptiste Joseph, baron, 1768-1830. Publication date: 1822. Topics: Heat. Publisher: Paris, F ...Missing: series | Show results with:series
  13. [13]
    [PDF] Gauss and the history of the fast Fourier transform
    Good's work published in 1958 [3] as having influenced their development. However, It was soon discovered there are major differences between the. Cooley-Tukey ...
  14. [14]
    [PDF] Historical Notes on the Fast Fourier Transform
    The fast Fourier transform algorithm of Cooley and. Tukey['] is more general in that it is applicable when N is composite and not necessarily a power of 2. Thus ...Missing: original | Show results with:original
  15. [15]
    [PDF] A More Symmetrical Fourier Analysis Applied to Transmission ...
    Some typical aberration curves as determined graphically from the screen patterns are shown in Fig. 8. These show the decrease in focal distance as the ray.
  16. [16]
    [PDF] IJ Good's Shorter Publications List
    Jun 26, 2003 · “The interaction algorithm and practical Fourier analysis”, JRSS B 20 (1958), 361-372. ... rediscover a somewhat different method for the fast ...
  17. [17]
    How IBM Research first demonstrated the Cooley-Tukey FFT
    Jun 6, 2025 · The Fast Fourier Transform casts the time-to-frequency transformation as a binary problem of even and odd indices, bringing the algorithm into a ...
  18. [18]
    [PDF] THREE FORTRAN PROGRAMS THAT PERFORM THE COOLEY ...
    JUNE 1967, THE IDEA FOR THE 'DIGIT REVERSAL WAS. C. SUGGESTED BY RALPH ALTER,. C. C. THIS IS THE FASTEST AND MOST VERSATILE VERSION OF THE FFT KNOWN. PTO THE ...
  19. [19]
    How the FFT gained acceptance - ACM Digital Library
    This committee ran the now famous Arden House Workshops on the FFT in 1967 [4] and in 1969 [5]. These were unique in several respects. One was that they ...
  20. [20]
    None
    ### Summary of Cooley-Tukey FFT Algorithm
  21. [21]
    [PDF] 1. direct computation 2. radix-2 fft 3. decimation-in-time fft 4 ...
    The flowgraph for the sum and difference operation is called the butterfly. This unit will be used as a shorthand notation for the sum and difference, to ...<|separator|>
  22. [22]
    FFT · Arcane Algorithm Archive
    The Cooley-Tukey algorithm calculates the DFT directly with fewer summations and without matrix multiplications. If necessary, DFTs can still be calculated ...
  23. [23]
    [PDF] 8.1 Efficient Computation of the DFT: FFT Algorithms 519
    Jan 8, 2021 · Thus fi(n) and f2(n) are obtained by decimating x(n) by a factor of 2, and hence the resulting FFT algorithm is called a decimation-in-time ...
  24. [24]
    [PDF] REAL-TIME DSP LABORATORY6: - Colorado State University
    There are two ways of implementing a radix-2. FFT, namely decimation-in-time and decimation-in-frequency. Since these two algorithms are transposes of each ...
  25. [25]
    [PDF] Optimizing the Fast Fourier Transform on a Multi-core Architecture
    Figure 2 shows an example the iterative FFT decomposition of 8 points using the Cooley-Tukey algorithm. Before the butterfly computation, the bit-reversal ...
  26. [26]
    Real-valued fast Fourier transform algorithms
    Insufficient relevant content. The provided URL (https://ieeexplore.ieee.org/document/1165220) does not display accessible content for extraction and summarization based on the given instructions.
  27. [27]
    None
    **Summary of "On the Use of Symmetry in FFT Computation" by Lawrence R. Rabiner, 1979**
  28. [28]
  29. [29]
    [PDF] Toeplitz and Circulant Matrices: A review
    the circulant or cyclic matrix. These ...
  30. [30]
    [PDF] A modified split-radix FFT with fewer arithmetic operations - FFTW
    Our split-radix approach involves a recursive rescaling of the trigonometric constants (“twiddle factors” [14]) in sub-transforms of the. DFT decomposition ( ...
  31. [31]
    [PDF] A Modified Split-Radix FFT With Fewer Arithmetic Operations - FFTW
    The DFT has been shown to require complex-number additions for linear algorithms under the assumption of bounded multiplicative constants [17], or alternatively ...
  32. [32]
    [PDF] An in-place truncated Fourier transform and applications to ... - arXiv
    Jan 28, 2010 · We describe in-place vari- ants of the forward and inverse TFT algorithms, achieving time complexity O(n log n) with only O(1) auxiliary space.
  33. [33]
  34. [34]
    Accuracy and Stability of Numerical Algorithms: Second Edition
    Accuracy and Stability of Numerical ... VAN LOAN, Computational Frameworks for the Fast Fourier Transform (1992). 24.1. The Fast Fourier Transform.
  35. [35]
    Accuracy of the Discrete Fourier Transform and the Fast ... - SIAM.org
    Fast Fourier transform (FFT)-based computations can be far more accurate than the slow transforms suggest. Discrete Fourier transforms computed through the ...
  36. [36]
    Improved Roundoff Error Analysis for Precomputed Twiddle Factors
    This paper presents both worst case and average case analysis of roundoff errors occurring in eight precomputation methods of twiddle factors.
  37. [37]
    [PDF] Testing The Sharpness of Known Error Bounds on The Fast Fourier ...
    Jul 19, 2023 · Abstract—The computation of Fast Fourier Transforms (FFTs) in floating-point arithmetic is inexact due to roundings, and.
  38. [38]
    A unified treatment of Cooley-Tukey algorithms for the evaluation of the multidimensional DFT
    - **Title**: A unified treatment of Cooley-Tukey algorithms for the evaluation of the multidimensional DFT
  39. [39]
    [PDF] The 2D Fourier Transform - UNM CS
    f(x,y)e−j2πux dx e−j2πvy dy. The 2D synthesis formula can be written as a. 1D synthesis in the u direction followed by a. 1D synthesis in v direction:.
  40. [40]
    [PDF] Memory Bandwidth Efficient Two-Dimensional Fast Fourier ...
    For example, the well-known row-column algorithm can be summarized as follows: Taking the vector to be a 2D n-by-n array, first apply n-point 1D-FFT to each of ...
  41. [41]
    Bluestein's FFT Algorithm - Stanford CCRMA
    Bluestein's FFT algorithm (also known as the chirp $ z$ -transform algorithm), can be used to compute prime-length DFTs in $ {\cal O}(N\lg N)$ operations.
  42. [42]
    4.3: The Chirp Z-Transform or Bluestein's Algorithm
    May 22, 2022 · As developed here, the chirp \(\mathit{z}\)-transform evaluates the \(\mathit{z}\)-transform at equally spaced points on the unit circle.
  43. [43]
    [PDF] CZT vs FFT: Flexibility vs Speed Abstract - OSTI
    Bluestein's Fast Fourier Transform (FFT), commonly called the Chirp-Z Transform. (CZT), is a little-known algorithm that offers engineers a high-resolution FFT ...
  44. [44]
    Rader's FFT Algorithm for Prime Lengths
    Rader's FFT algorithm can be used to compute DFTs of length $ N$ in $ {\cal O}(N\lg N)$ operations when $ N$ is a prime number.
  45. [45]
    Rader's FFT Algorithm for Prime Lengths | Mathematics of the DFT
    Rader's FFT Algorithm for Prime Lengths. Rader's FFT algorithm can be used to compute DFTs of length $ N$ in $ {\cal O}(N\lg N)$ operations when $ N$ ...
  46. [46]
    [PDF] Recent Developments in the Sparse Fourier Transform
    The goal of this article is to survey these recent developments, to explain the basic techniques with examples and applications in big data, to demonstrate ...Missing: 2024 2025
  47. [47]
    Deep learning-based sparsity-free compressive sensing method for ...
    Apr 1, 2024 · This paper proposes a deep learning-based CS algorithm for high accuracy reconstruction of structural vibration responses without the assumption of sparsity.
  48. [48]
    Intro to Quantum Fourier Transform | PennyLane Demos
    Apr 15, 2024 · The quantum Fourier transform (QFT) is one of the most important building blocks in quantum algorithms, famously used in quantum phase estimation and Shor's ...
  49. [49]
    [PDF] Signals, Systems and Inference, Chapter 10: Power Spectral Density
    The quantity on the right is what we defined (for the DT case) as the periodogram of the finite-length signal xT (t). Because the Fourier transform operation is ...
  50. [50]
    [PDF] Circular Convolution - MIT OpenCourseWare
    6.341: Discrete-Time Signal Processing. OpenCourseWare 2006. Lecture 16. Linear Filtering with the DFT. Reading: Sections 8.6 and 8.7 in Oppenheim, Schafer ...
  51. [51]
    A unified approach to short-time Fourier analysis and synthesis
    A unified approach to short-time Fourier analysis and synthesis. Abstract: Two distinct methods for synthesizing a signal from its short-time Fourier transform ...
  52. [52]
  53. [53]
    [PDF] Two-Dimensional Fourier Transform and Linear Filtering
    • Image smoothing, edge detection and sharpening. – Optional: Gonzalez & Woods, “Digital Image Processing”, Prentice. Hall, 2008, Section 3.5, 3.6, 10.2.
  54. [54]
    A Fast Direct Solution of Poisson's Equation Using Fourier Analysis
    A fast direct solution of Poisson's equation using Fourier analysis. Author: RW Hockney. RW Hockney Computation Center, Stanford University, Stanford, ...<|separator|>
  55. [55]
    [PDF] Porting the Meso-NH atmospheric model on different GPU ... - GMD
    May 14, 2025 · Global storm-resolving models used in research can simulate the atmosphere globally with convection-permitting resolutions of O(1 km) to ...
  56. [56]
    FFTPACK - NetLib.org
    FFTPACK is a package of Fortran subprograms for the fast Fourier transform of periodic and other symmetric sequences. It includes complex, real, sine, cosine, ...
  57. [57]
    [PDF] FFTW: An Adaptive Software Architecture for the FFT
    Our tests show that FFTW's self-optimizing approach usually yields significantly better performance than all other publicly available software. FFTW also ...
  58. [58]
    1. Introduction — cuFFT 13.0 documentation - NVIDIA Docs
    Oct 2, 2025 · The cuFFT library provides a simple interface for computing FFTs on an NVIDIA GPU, which allows users to quickly leverage the floating-point power and ...cuFFTDx · cuFFTMp · Contents
  59. [59]
    [PDF] SIMD Vectorization of Straight Line FFT Code
    FFT kernels are accelerated by auto- matically vectorizing blocks of straight line code for processors featuring two-way short vector SIMD extensions like AMD' ...
  60. [60]
    [PDF] FFTW User's Manual
    Besides the automatic performance adaptation performed by the planner, it is also pos- sible for advanced users to customize FFTW for their special needs ...
  61. [61]
    QFT (latest version) | IBM Quantum Documentation
    The circuit that implements this transformation can be implemented using Hadamard gates on each qubit, a series of controlled-U1 (or Z, depending on the phase) ...