Fact-checked by Grok 2 weeks ago

FFTW

FFTW (Fastest Fourier Transform in the West) is a free and open-source C subroutine library designed for high-performance computation of the discrete Fourier transform (DFT) in one or more dimensions, supporting arbitrary input sizes for both real and complex data types, as well as discrete cosine and sine transforms (DCT/DST).^[1] Developed at the Massachusetts Institute of Technology (MIT) by Matteo Frigo and Steven G. Johnson, FFTW employs advanced code-generation and runtime optimization techniques, including a planner that automatically selects and tunes algorithms based on the hardware and input characteristics to achieve near-optimal performance.^[1]^[2] First released in the late 1990s with version 2.x, the library saw a major upgrade to version 3 in 2003, introducing multithreaded support, better portability across platforms, and SIMD (single instruction, multiple data) optimizations such as AVX and ARM Neon extensions in later releases.^[1]^[3] The current stable version, 3.3.10, includes distributed-memory parallelization via MPI and a Fortran 2003 interface, making it suitable for scientific computing, signal processing, and high-performance applications on everything from desktops to supercomputers.^[4]^[5] FFTW has been widely recognized for its impact; it received the 1999 J. H. Wilkinson Prize for Numerical Software from the Society for Industrial and Applied Mathematics (SIAM) for excellence in design, analysis, and implementation, and the 1999 paper "A Fast Fourier Transform Compiler" earned the 2009 Most Influential PLDI Paper Award from ACM SIGPLAN.^[1]^[6]

Overview

Introduction

FFTW, known as the "Fastest Fourier Transform in the West," is a free, open-source C library designed for computing the discrete Fourier transform (DFT) in one or more dimensions, supporting arbitrary input sizes and both real and complex data types.^[1] The DFT serves as the core mathematical operation underlying FFTW, transforming a sequence of N complex numbers x_0, \dots, x_{N-1} into another sequence \hat{x}_0, \dots, \hat{x}_{N-1} via the formula

\hat{x}_k = \sum_{n=0}^{N-1} x_n e^{-2\pi i k n / N}, \quad k = 0, \dots, N-1.

This unnormalized forward transform is the standard convention employed by the library, enabling efficient frequency-domain analysis in various applications.^[7] The library's development began in 1997 at the Massachusetts Institute of Technology (MIT) by Matteo Frigo and Steven G. Johnson, who introduced innovative adaptive techniques to optimize FFT performance across hardware platforms.^[8] As of November 2025, the current stable version remains 3.3.10, released on September 15, 2021, with ongoing maintenance through the project's Git repository but no major updates since.^[3] FFTW's significance lies in its exceptional speed and portability, making it a preferred choice for scientific computing, signal processing, and numerical simulations on diverse systems including Unix, Windows, and macOS, where the same code delivers high performance without modifications.^[1]

Core Functionality

FFTW serves as a comprehensive library for computing the discrete Fourier transform (DFT) and its variants, enabling both forward and inverse operations essential for signal processing and numerical analysis. It supports complex-to-complex DFTs, which perform full-spectrum computations on complex input data using functions such as fftw_plan_dft_1d for one-dimensional cases and fftw_plan_dft for multi-dimensional arrays.^[9] For real-valued data, FFTW provides optimized real-to-complex forward transforms that output half the spectrum due to Hermitian symmetry, and corresponding complex-to-real inverse transforms, accessible via planners like fftw_plan_r2c_1d and fftw_plan_c2r_1d.^[10] These variants reduce memory and computational requirements by exploiting the symmetry properties of real DFTs.^[2] The library extends DFT computations to multi-dimensional arrays, supporting transforms from one dimension up to arbitrary ranks, such as 2D for images or 3D for volumetric data, through generalized planning functions that specify array ranks and sizes.^[9] FFTW accommodates non-contiguous data layouts by allowing users to define strides and offsets in the planning process, facilitating seamless integration with strided memory arrangements like those in Fortran or C arrays without necessitating data rearrangement.^[11] This flexibility ensures compatibility with diverse application contexts, including scientific simulations where data may reside in non-linear memory configurations. In addition to standard DFTs, FFTW offers specialized transforms tailored to specific symmetries and applications. It includes real even/odd DFTs, which correspond to discrete cosine transforms (DCT) and discrete sine transforms (DST) of types I through IV, computed via the r2r interface for efficient handling of symmetric real data in areas like compression and solving partial differential equations.^[12] For prime-sized inputs where factorization is challenging, FFTW employs the Bluestein algorithm to enable fast computation by converting the convolution into a larger chirp transform.^[3] A key feature enhancing usability is the wisdom system, which allows FFTW to export and import optimized execution plans to and from files, reusing them across program invocations to bypass repeated planning overhead.^[13] Functions like fftw_export_wisdom_to_filename and fftw_import_wisdom_from_filename manage this process, storing plan details derived from prior optimizations. Regarding precision, FFTW supports double-precision (default, using double), single-precision (float), and long-double precision, with installations providing separate libraries for each to match application needs.^[14] Error handling involves returning NULL for failed plans and printing diagnostic messages for issues like memory allocation failures, with customizable hooks available for advanced control.^[15]

Design Philosophy

FFTW's design philosophy centers on creating an adaptive software architecture that maximizes performance by tailoring computations to specific hardware environments at runtime, rather than relying on static, precompiled kernels. This approach emphasizes runtime code generation, where a special-purpose compiler called genfft automatically produces optimized codelets from high-level discrete Fourier transform (DFT) algorithms, enabling adaptation to varying data sizes and processor characteristics without manual intervention. By generating machine-specific code dynamically, FFTW achieves high efficiency while maintaining flexibility, as described by its creators Matteo Frigo and Steven G. Johnson in their overview of the library's implementation.^[2] A core principle is the separation of the planning phase, which analyzes the input size, data type, and hardware to generate an optimized execution plan, from the actual execution phase, where the transform is computed using that reusable plan. This two-stage process allows users to amortize the planning cost over multiple transforms with the same parameters, significantly improving overall performance in applications requiring repeated FFT computations. The planning stage employs techniques like dynamic programming to select the fastest combination of codelets, ensuring that the execution remains efficient and cache-friendly across invocations.^[2]^[16] Portability is a foundational goal, achieved through implementation in standard ANSI C, which ensures compatibility across diverse compilers such as GCC, Intel, and Microsoft Visual C++, as well as architectures including x86, ARM, and PowerPC, without compromising speed. FFTW avoids vendor-specific intrinsics in its core code, instead leveraging generic optimizations that the compiler can translate effectively for the target platform, allowing the same source code to deliver competitive performance on varied systems. This design enables seamless deployment in heterogeneous environments, outperforming many vendor-tuned libraries while remaining fully portable.^[2]^[16]^[1] Underpinning this adaptability is the philosophy of "write once, optimize everywhere," realized through the domain-specific language and compiler framework GenFFT, which automates the generation of portable yet highly tuned FFT code from abstract algorithm descriptions. GenFFT, implemented in Objective Caml, produces C code that schedules operations to minimize register spills and exploit hardware features implicitly, allowing developers to specify algorithms once and have them optimized for any supported platform. This approach not only simplifies maintenance but also facilitates the discovery of novel, efficient FFT variants tailored to modern processors.^[17]

History and Development

Origins and Initial Release

Development of FFTW began in 1997 as part of Matteo Frigo's PhD research at the Massachusetts Institute of Technology (MIT), driven by the need for a high-performance, portable library to compute discrete Fourier transforms (DFTs) that could outperform existing tools amid their architectural limitations. Existing libraries such as FFTPACK, a widely used Fortran package for scientific computing, and vendor-specific offerings like IMSL suffered from poor adaptability to diverse hardware platforms and suboptimal performance on modern processors, often requiring manual tuning or restricting users to power-of-two input sizes.^[18] Frigo aimed to address these shortcomings by designing a self-optimizing system that automatically generates and selects efficient code sequences at runtime, ensuring portability across Unix-like systems without sacrificing speed.^[17] The initial version, FFTW 1.0, was released on March 24, 1997, marking the library's public debut and quickly gaining traction with over 600 downloads in its first month from academic and research users seeking a free alternative to proprietary software.^[3]^[18] This release focused on one-dimensional complex-to-complex transforms, establishing FFTW's reputation for superior benchmarks against public-domain codes like FFTPACK and even rivaling vendor libraries such as IMSL in execution time on contemporary hardware.^[18] Version 2.0 followed on September 11, 1998, introducing significant enhancements including completely rewritten real-to-complex transforms using specialized codelets and native support for multi-dimensional real-complex operations, which expanded its applicability to a broader range of scientific simulations and signal processing tasks.^[3] Funding for the early development came primarily from MIT, supplemented by external grants, while Steven G. Johnson joined Frigo as a key collaborator starting with version 2.x to refine the library's architecture and implement advanced optimization techniques.^[18] This partnership, built on shared expertise in high-performance computing at MIT's Laboratory for Computer Science, propelled FFTW's evolution from a doctoral project into a cornerstone tool for the research community, later earning recognition through awards like the 1999 J. H. Wilkinson Prize for Numerical Software.^[17]

Key Contributors and Milestones

The primary developers of FFTW are Matteo Frigo, who serves as the lead architect, and Steven G. Johnson, the co-developer and principal maintainer as of 2025.^[19] Both initiated the project during their time at MIT, with Frigo focusing on the core algorithmic innovations and Johnson contributing to implementation, optimization, and ongoing stewardship.^[20] Their collaboration has sustained FFTW's evolution as a high-performance library, earning recognition such as the 1999 J. H. Wilkinson Prize for Numerical Software from the Society for Industrial and Applied Mathematics.^[19] Key milestones in FFTW's development include the release of version 3.0 on April 20, 2003, which introduced the innovative planner system for runtime optimization of transform plans and the "wisdom" mechanism for saving and reusing these plans across executions.^[3] Version 3.3, first released on July 26, 2011, brought support for advanced vectorization including AVX extensions, with further enhancements in version 3.3.7 (October 29, 2017) adding experimental AVX-512 capabilities to leverage modern CPU instruction sets.^[3] The most recent stable release, 3.3.10, arrived on September 15, 2021, incorporating bug fixes for SIMD operations and other refinements.^[3] Since 3.3.10, no major versions have been issued, but maintenance continues through the project's Git repository, where bug fixes and minor updates are applied, including a recent commit in late 2025 addressing build dependencies.^[21] Community contributions have extended FFTW's reach, notably through the pyFFTW Python wrapper, which integrates FFTW with NumPy and SciPy ecosystems and received its latest update in version 0.15.1 on October 22, 2025.^[22] Additionally, AMD's AOCL-FFTW fork, based on FFTW 3.3.10, introduced EPYC-specific optimizations in its 3.0.1 release on October 10, 2024, enhancing performance for AMD hardware via new parallel algorithms. Frigo and Johnson remain the principal maintainers, ensuring open-source continuity under the project's established licensing framework.^[19]

Awards and Recognition

In 1999, FFTW received the James H. Wilkinson Prize for Numerical Software from the Society for Industrial and Applied Mathematics (SIAM), awarded to developers Matteo Frigo and Steven G. Johnson for their innovative code-generation approach that optimizes discrete Fourier transforms for diverse hardware platforms while addressing all phases of mathematical software development, including design, implementation, and performance.^[23]^[1] FFTW's impact is further evidenced by its inclusion in prominent benchmarks, such as the SPEC workstation suites, where it powers Fourier transform workloads to assess CPU and system performance in real-world computing scenarios.^[24] It has also garnered recognition in high-performance computing literature, including papers presented at SC (Supercomputing) conferences, which highlight its efficiency in large-scale simulations and parallel environments.^[2] Major software ecosystems have adopted FFTW as a core component; for instance, MATLAB utilizes it as the backend for its FFT functions to deliver optimized transform computations.^[25] Similarly, it is integrated into scientific libraries like PETSc, providing matrix types for FFT operations, and Trilinos, supporting advanced solver frameworks in high-performance applications.^[26]^[27] As of 2025, FFTW has been cited in over 10,000 academic papers per Google Scholar metrics, demonstrating its enduring influence across domains such as MRI imaging for signal processing and climate modeling for spectral analysis.^[28]^[29]

Technical Architecture

Algorithms Employed

FFTW primarily employs the Cooley-Tukey algorithm for computing discrete Fourier transforms (DFTs) of composite sizes, utilizing both radix-2 and mixed-radix variants to recursively decompose the problem.^[2] The core DFT is defined as Y = \sum_{j=0}^{n-1} X \omega_n^{jk}, where \omega_n = e^{-2\pi i / n}. For a transform size n = n_1 n_2, the algorithm splits the DFT into n_1 smaller DFTs of size n_2 (or vice versa), expressed as
Y[k_1 + k_2 n_1] = \sum_{j_2=0}^{n_2-1} \left[ \sum_{j_1=0}^{n_1-1} X[j_1 n_2 + j_2] \omega_{n_1}^{j_1 k_1} \right] \omega_n^{j_2 k_1} \omega_{n_2}^{j_2 k_2},
enabling a divide-and-conquer approach that reduces computational complexity from O(n^2) to O(n \log n).^[2] This recursive decomposition supports radices such as 2, 3, 4, 5, and higher (up to 32 or more), with mixed-radix combinations for arbitrary composite n, optimizing for factors that align with hardware characteristics.^[2] For prime transform sizes, where Cooley-Tukey decomposition is inapplicable, FFTW uses Rader's algorithm and Bluestein's chirp z-transform to reduce the DFT to a convolution computable via smaller DFTs.^[3] Rader's method re-expresses the prime-length DFT as a cyclic convolution of length n-1, leveraging the properties of primitive roots modulo n to map indices, followed by a discrete Hartley transform (DHT) for the real symmetric convolution.^[2] A zero-padded variant limits recursive applications of Rader's algorithm.^[3] Bluestein's algorithm, implemented for enhanced performance on larger primes, reformulates the DFT as a convolution by exploiting the quadratic phase:
\hat{x}_k = e^{-\pi i k^2 / N} \sum_{n=0}^{2M-1} \left( x_n e^{\pi i n^2 / N} \right) e^{-2\pi i k n / N} e^{\pi i k^2 / N},
where M > N/2, allowing the sum to be evaluated as a linear convolution of length $2M-1 using DFTs of composite sizes.^[3] FFTW also supports discrete cosine transforms (DCTs) and discrete sine transforms (DSTs) of types I through IV, derived from symmetry-reduced DFTs to exploit real-input properties and reduce computation.^[2] These transforms are generated programmatically as specialized DFTs; for example, DCT-II, commonly used in applications like JPEG compression, is given by
y_k = \sum_{n=0}^{N-1} x_n \cos\left( \pi k (n + 0.5) / N \right),
and can be computed via a real-input DFT of size N.^[2] DCT-I and DST-I use a size-$2N real DFT, while DCT-IV and DST-IV employ a size-$4N complex DFT or optimized variants for even/odd N.^[2] Transforms in FFTW are available in both out-of-place and in-place formats to accommodate memory constraints and access patterns. Out-of-place transforms use array strides to implicitly handle bit-reversal permutations without a dedicated pass, improving cache locality by processing data in natural order.^[2] In-place variants employ decimation-in-time (DIT) or decimation-in-frequency (DIF) butterflies interleaved with matrix transposes (e.g., for sizes p \times q \times m), further avoiding explicit permutations and minimizing temporary storage.^[2] Bit-reversal, traditionally required in naive radix-2 implementations, is integrated into the recursion or butterflies to enhance efficiency on modern architectures.^[2]

Planner System

FFTW's planner system generates optimized execution plans for discrete Fourier transforms at runtime by recursively decomposing the problem into subproblems and selecting the most efficient combination of codelets based on measured performance. This adaptive approach allows FFTW to tailor transforms to specific hardware, dimensions, and data types without relying on compile-time decisions. The planner uses dynamic programming with memoization, hashing problem specifications to avoid redundant computations during the search.^[2] The planning process unfolds in phases determined by user-specified flags that balance optimization thoroughness against planning time. With the FFTW_ESTIMATE flag, the planner applies a heuristic cost model—estimating operations, loads, and stores— to quickly generate a suboptimal but fast plan without executing or overwriting the input array. The FFTW_MEASURE flag advances to timing several candidate plans on small synthetic inputs, selecting the one with the lowest runtime and potentially overwriting the input array. For greater thoroughness, FFTW_PATIENT extends measurement by exhaustively exploring more plan variants, including alternative factorizations and codelet combinations, to identify highly optimized sequences that can yield significant speedups on complex transforms. An even more rigorous FFTW_EXHAUSTIVE mode tests all possible plans, though it is rarely used due to its computational intensity.^[30]^[2] At the core of planning are codelets: compact, hand-optimized kernels that compute small DFTs of fixed sizes (typically 2 to 64) and types, such as complex, real-input, or trigonometric transforms. These are produced offline by GenFFT, a domain-specific compiler written in OCaml that translates high-level recursive algorithms—like Cooley-Tukey radix decompositions—into unrolled, assembly-optimized C (or direct assembly for SIMD instructions) routines. GenFFT's pipeline involves building a directed acyclic graph (DAG) of operations, simplifying it through algebraic optimizations (e.g., constant folding and common subexpression elimination), scheduling for register efficiency, and emitting the final code, ensuring codelets are tailored for minimal memory access and maximal instruction-level parallelism. The planner then assembles these pre-generated codelets into a full execution plan, akin to composing a factorization tree for the transform size.^[2]^[15] To facilitate reuse and reduce repeated planning overhead, FFTW captures successful plans in "wisdom," a binary-encoded database of transform specifications and their optimal codelet sequences. Wisdom is accumulated globally during planning and can be exported to files using fftw_export_wisdom_to_filename, allowing import via fftw_import_wisdom_from_filename in future sessions for instant plan recreation. This mechanism is particularly valuable in production environments, where wisdom from exhaustive planning on representative hardware can accelerate initialization without sacrificing performance.^[13] While the planner operates in a single thread to maintain thread safety—requiring serialized calls to planning functions—FFTW integrates built-in multi-threading for plan execution via OpenMP or POSIX threads (pthreads), configurable with fftw_init_threads and fftw_plan_with_nthreads to scale transforms across multiple cores without altering the planning phase.^[31]^[32]

Supported Data Types and Dimensions

FFTW primarily operates on complex data represented by the fftw_complex type, which is defined as an array of two double values storing the real and imaginary parts of a complex number.^[33] For real-valued inputs and outputs, FFTW uses arrays of double elements, enabling efficient storage for transforms that exploit Hermitian symmetry.^[33] In real-to-complex (r2c) and complex-to-real (c2r) transforms, the output for real inputs adopts a half-complex format: an array of n/2 + 1 fftw_complex elements, where the first and last elements are real-valued (imaginary part zero), and the remaining pairs represent conjugate-symmetric frequencies.^[10] For the real-to-halfcomplex (r2hc) and halfcomplex-to-real (hc2r) variants, the half-complex format uses a contiguous array of n double values, with real parts from index 0 to n/2 followed by imaginary parts from n/2 - 1 down to 1 (excluding zeros at DC and Nyquist).^[34] FFTW supports multiple precision levels to accommodate varying numerical requirements. Single-precision operations use [float](/page/Float) and fftwf_complex (prefixed with f), double-precision uses [double](/page/Double) and fftw_complex, and long-double precision uses [long double](/page/Long_double) and fftwl_complex (prefixed with l), enabled during compilation with the --enable-long-double option.^[14] Quadruple precision is available on supported hardware like x86 with recent GCC compilers, utilizing __float128 and fftwq_complex (prefixed with q), requiring linkage to libquadmath and the --enable-quad-precision flag.^[14] The library handles transforms across one to multiple dimensions, supporting arrays from 1D up to arbitrary ranks without a fixed upper limit, provided memory allows.^[11] Data is stored in row-major order, but FFTW accommodates non-contiguous memory layouts through strides, which specify the distance in elements between array indices along each dimension, facilitating operations on subarrays or matrix rows.^[11] The advanced "guru" interface provides fine-grained control over array configurations, allowing specification of arbitrary ranks, offsets (starting positions in memory), and strides for both input and output arrays, as in fftw_plan_dft_r2c for real-to-complex transforms in custom layouts.^[11] To optimize for SIMD instructions, FFTW requires data arrays to be aligned to at least 16 bytes, typically achieved by allocating memory with fftw_malloc, which guarantees compatibility and pairs with fftw_free for deallocation.^[35] Convenience allocators like fftw_alloc_real and fftw_alloc_complex ensure proper alignment for real and complex arrays, respectively.^[36]

Usage and Integration

API Overview

FFTW's application programming interface (API) follows a two-phase paradigm consisting of a planning stage, where an optimized execution plan is generated based on the transform parameters and hardware characteristics, and an execution stage, where the plan is applied to the data. This separation allows plans to be reused for multiple identical transforms, improving efficiency. The API is designed in C and emphasizes portability, with no reliance on exceptions for error handling; instead, functions return integer status codes.^[15] The basic workflow involves three primary steps: creating a plan with functions such as fftw_plan_dft_1d for one-dimensional complex DFTs, executing the transform via fftw_execute, and destroying the plan using fftw_destroy_plan to free associated resources. For example, fftw_plan_dft_1d takes parameters including the transform size, input and output arrays, the sign of the transform, and planning flags that control optimization effort. The fftw_execute function applies the plan to the specified arrays without recomputing the strategy, making it thread-safe for concurrent calls. Cleanup is managed explicitly through fftw_destroy_plan and, optionally, fftw_cleanup to reset internal state after all plans are destroyed.^[15] FFTW offers three levels of interface complexity to accommodate varying user needs. The basic interface provides straightforward functions for single transforms on contiguous data, including complex-to-complex DFTs via fftw_plan_dft_1d/2d/3d and real-input variants like fftw_plan_dft_r2c_1d for real-to-complex (r2c) and fftw_plan_dft_c2r_1d for complex-to-real (c2r) transforms, which exploit Hermitian symmetry to halve storage for real data. The advanced interface extends this for multiple transforms or strided arrays, using planners such as fftw_plan_many_dft and fftw_plan_many_dft_r2c, allowing efficient handling of batched operations on non-contiguous memory layouts. The guru interface grants full control over dimensions, strides, and multiplicities through functions like fftw_plan_guru_dft, enabling custom data layouts for advanced applications such as in-place transforms or irregular arrays.^[15] For parallel computing, FFTW includes a multi-threaded API that requires initialization with fftw_init_threads before creating plans. Subsequent plans can specify the number of threads using fftw_plan_with_nthreads, after which fftw_execute automatically distributes the computation across threads for supported operations, leveraging shared-memory parallelism without altering the core workflow. This approach integrates seamlessly with the planning system, where the planner selects thread-friendly codelets.^[15] To enhance reusability across program runs, FFTW supports "wisdom" mechanisms for exporting and importing optimized plans. Functions like fftw_export_wisdom_to_filename save planning knowledge to a file, while fftw_import_wisdom_from_filename or fftw_import_wisdom_from_file loads it, allowing rapid plan creation by applying precomputed strategies; these return FFTW_SUCCESS on success or zero on failure. Wisdom is particularly useful for repeated executions on similar hardware, as it avoids redundant planning overhead.^[15] Error handling in the API relies on return values rather than exceptions, promoting deterministic behavior in performance-critical code. Most planner and wisdom functions return FFTW_SUCCESS (a non-zero value) upon completion, with zero indicating failure, often due to invalid parameters or memory issues; users must check these codes explicitly, as the library does not throw exceptions or use global error states.^[15]

Installation Process

FFTW is typically installed by downloading the source code from the official website and building it manually, ensuring compatibility with specific hardware and software configurations. The latest stable release, version 3.3.10 as of 2025, is available as a compressed tarball from https://www.fftw.org/download.html. After unpacking the tarball, users run the GNU Autotools-based configure script to customize the build; common options include --enable-threads for multithreaded support, --enable-sse2 for SSE2 SIMD instructions, and --enable-avx for AVX vectorization on x86 architectures.^[4]^[37]^[15] The build process proceeds with make to compile the library, which generates both shared and static variants by default (configurable via --enable-shared or --disable-static), followed by make install to place the libraries, headers, and executables in system directories, often requiring elevated privileges.^[37]^[15] No external dependencies are mandatory beyond an ANSI C compiler (such as GCC) and the standard math library (-lm), though optional features like OpenMP for threading (via --enable-openmp) enhance functionality if the compiler supports it.^[15] Pre-built packages simplify installation on common platforms without manual compilation. On Ubuntu and Debian-based systems, the development headers and libraries are installed via sudo apt install libfftw3-dev. For macOS, Homebrew users execute brew install fftw to obtain the library. On Windows, the vcpkg package manager supports installation with vcpkg install fftw3, providing binaries compatible with Visual Studio and CMake workflows.^[38]^[39] Cross-compilation for embedded systems or non-standard architectures requires specifying a cross-compiler in the configure script (e.g., CC=arm-none-eabi-gcc ./configure) and may involve manual edits to configuration files for unsupported platforms, as detailed in the FFTW manual. For GPU acceleration, NVIDIA's cuFFTW fork offers a drop-in compatibility layer that ports FFTW calls to CUDA-enabled hardware.^[37]^[40]

Basic Usage Examples

FFTW provides a straightforward interface for computing discrete Fourier transforms (DFTs) in C, separating the planning phase—where an optimized execution strategy is determined—from the execution phase, where the transform is applied to data.^[20] Basic operations require including the header <fftw3.h> and using aligned memory allocation via fftw_malloc to ensure compatibility with SIMD instructions.^[41] Plans are created with functions like fftw_plan_dft_1d for one-dimensional complex transforms, specifying the transform size, input/output arrays, direction (forward or backward), and flags such as FFTW_ESTIMATE for quick planning without runtime measurement or FFTW_MEASURE for more accurate optimization via timing.^[41] The forward transform is unnormalized, so the inverse requires manual scaling by the transform size N to recover the original data.^[41] A simple one-dimensional complex DFT example allocates input and output arrays of size N, creates a forward plan, executes it, and handles cleanup as follows:

c
#include <fftw3.h>

int main() {
    int N = 16;  // Transform size
    fftw_complex *in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
    fftw_complex *out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
    
    // Initialize input data here, e.g., in[0][0] = 1.0, in[0][1] = 0.0, etc.
    
    fftw_plan p = fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_ESTIMATE);
    fftw_execute(p);  // Compute the forward transform
    
    // For inverse: fftw_plan q = fftw_plan_dft_1d(N, out, in, FFTW_BACKWARD, FFTW_ESTIMATE);
    // fftw_execute(q); then scale in by 1.0 / N for each element
    
    fftw_destroy_plan(p);
    fftw_free(in);
    fftw_free(out);
    return 0;
}
#include <fftw3.h>

int main() {
    int N = 16;  // Transform size
    fftw_complex *in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
    fftw_complex *out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
    
    // Initialize input data here, e.g., in[0][0] = 1.0, in[0][1] = 0.0, etc.
    
    fftw_plan p = fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_ESTIMATE);
    fftw_execute(p);  // Compute the forward transform
    
    // For inverse: fftw_plan q = fftw_plan_dft_1d(N, out, in, FFTW_BACKWARD, FFTW_ESTIMATE);
    // fftw_execute(q); then scale in by 1.0 / N for each element
    
    fftw_destroy_plan(p);
    fftw_free(in);
    fftw_free(out);
    return 0;
}

This code performs an unnormalized forward DFT; repeating with the backward direction and dividing by N yields the original input.^[41] For real-input one-dimensional transforms, such as the real-to-half-complex (R2HC) format that exploits Hermitian symmetry to halve storage for real data, use fftw_plan_r2r_1d with real (double) arrays:

c
#include <fftw3.h>

int main() {
    int N = 8;  // Transform size
    double *in = (double*) fftw_malloc(sizeof(double) * N);
    double *out = (double*) fftw_malloc(sizeof(double) * N);
    
    // Initialize real input data here
    
    fftw_plan p = fftw_plan_r2r_1d(N, in, out, FFTW_R2HC, FFTW_ESTIMATE);
    fftw_execute(p);  // Compute the R2HC transform
    
    fftw_destroy_plan(p);
    fftw_free(in);
    fftw_free(out);
    return 0;
}
#include <fftw3.h>

int main() {
    int N = 8;  // Transform size
    double *in = (double*) fftw_malloc(sizeof(double) * N);
    double *out = (double*) fftw_malloc(sizeof(double) * N);
    
    // Initialize real input data here
    
    fftw_plan p = fftw_plan_r2r_1d(N, in, out, FFTW_R2HC, FFTW_ESTIMATE);
    fftw_execute(p);  // Compute the R2HC transform
    
    fftw_destroy_plan(p);
    fftw_free(in);
    fftw_free(out);
    return 0;
}

The non-redundant Hermitian-symmetric spectrum is packed into a real output array of size N, enabling compact representation for real signals.^[42] Two-dimensional complex DFTs are planned similarly using fftw_plan_dft_2d, treating the array as a separable product of 1D transforms along rows and columns:

c
#include <fftw3.h>

int main() {
    int rows = 64, cols = 64;  // Dimensions
    fftw_complex *in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * rows * cols);
    fftw_complex *out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * rows * cols);
    
    // Initialize 2D input data here, row-major order assumed
    
    fftw_plan p = fftw_plan_dft_2d(rows, cols, in, out, FFTW_FORWARD, FFTW_MEASURE);
    fftw_execute(p);  // Compute the 2D forward transform
    
    fftw_destroy_plan(p);
    fftw_free(in);
    fftw_free(out);
    return 0;
}
#include <fftw3.h>

int main() {
    int rows = 64, cols = 64;  // Dimensions
    fftw_complex *in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * rows * cols);
    fftw_complex *out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * rows * cols);
    
    // Initialize 2D input data here, row-major order assumed
    
    fftw_plan p = fftw_plan_dft_2d(rows, cols, in, out, FFTW_FORWARD, FFTW_MEASURE);
    fftw_execute(p);  // Compute the 2D forward transform
    
    fftw_destroy_plan(p);
    fftw_free(in);
    fftw_free(out);
    return 0;
}

The FFTW_MEASURE flag measures execution times to select an optimal plan, suitable for repeated use on the same dimensions.^[43] To enable multi-threading for parallel execution across multiple cores, initialize the threading support before planning and specify the thread count:

c
#include <fftw3.h>

int main() {
    fftw_init_threads();  // Initialize multi-threaded FFTW
    fftw_plan_with_nthreads(4);  // Use 4 threads
    
    int N = 1024;
    fftw_complex *in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
    fftw_complex *out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
    
    // Initialize input
    
    fftw_plan p = fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_ESTIMATE);
    
    // Execute in a loop or parallel context with different data per thread
    for (int i = 0; i < 10; ++i) {
        // Update in with new data
        fftw_execute_dft(p, in, out);  // Thread-safe execution
    }
    
    fftw_destroy_plan(p);
    fftw_free(in);
    fftw_free(out);
    fftw_cleanup_threads();  // Final cleanup
    return 0;
}
#include <fftw3.h>

int main() {
    fftw_init_threads();  // Initialize multi-threaded FFTW
    fftw_plan_with_nthreads(4);  // Use 4 threads
    
    int N = 1024;
    fftw_complex *in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
    fftw_complex *out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
    
    // Initialize input
    
    fftw_plan p = fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_ESTIMATE);
    
    // Execute in a loop or parallel context with different data per thread
    for (int i = 0; i < 10; ++i) {
        // Update in with new data
        fftw_execute_dft(p, in, out);  // Thread-safe execution
    }
    
    fftw_destroy_plan(p);
    fftw_free(in);
    fftw_free(out);
    fftw_cleanup_threads();  // Final cleanup
    return 0;
}

Only the execution function fftw_execute (or variants like fftw_execute_dft) is thread-safe; planning must occur from a single thread.^[32] FFTW's "wisdom" feature allows saving optimized plans to disk for reuse in future runs, avoiding recomputation of planner measurements. After creating and executing plans, export wisdom to a file; in subsequent invocations, import it before planning to apply the saved optimizations:

c
#include <fftw3.h>

int main() {
    // First run: Plan and export wisdom
    int N = 1024;
    fftw_complex *in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
    fftw_complex *out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
    
    fftw_plan p = fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_MEASURE);  // Use MEASURE to generate wisdom
    fftw_execute(p);
    
    fftw_export_wisdom_to_filename("my_fftw_wisdom.fftw");  // Save wisdom
    
    fftw_destroy_plan(p);
    fftw_free(in);
    fftw_free(out);
    return 0;
}

// Next run: Import wisdom before planning
int main() {
    fftw_import_wisdom_from_filename("my_fftw_wisdom.fftw");  // Load wisdom
    
    // Now create plans; they will use imported optimizations if applicable
    // ... (planning and execution as before)
    return 0;
}
#include <fftw3.h>

int main() {
    // First run: Plan and export wisdom
    int N = 1024;
    fftw_complex *in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
    fftw_complex *out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
    
    fftw_plan p = fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_MEASURE);  // Use MEASURE to generate wisdom
    fftw_execute(p);
    
    fftw_export_wisdom_to_filename("my_fftw_wisdom.fftw");  // Save wisdom
    
    fftw_destroy_plan(p);
    fftw_free(in);
    fftw_free(out);
    return 0;
}

// Next run: Import wisdom before planning
int main() {
    fftw_import_wisdom_from_filename("my_fftw_wisdom.fftw");  // Load wisdom
    
    // Now create plans; they will use imported optimizations if applicable
    // ... (planning and execution as before)
    return 0;
}

Wisdom files are portable across compilations but may require regeneration if hardware or FFTW version changes significantly.^[44] For more advanced control over strides and alignments, the guru interface offers extended planning options, as detailed in the API overview.^[20]

Performance Characteristics

Optimization Techniques

FFTW employs cache-oblivious algorithms to optimize memory access patterns, ensuring performance independence from specific cache sizes or hierarchies. These algorithms utilize recursive, depth-first traversal in the Cooley-Tukey decomposition, which naturally blocks data in a way that exploits cache locality across multiple levels without explicit tuning to hardware parameters. Additionally, for in-place multi-dimensional transforms, FFTW implements a cache-oblivious matrix transposition that minimizes cache misses during data rearrangement, achieving asymptotically optimal cache complexity.^[2] Vectorization in FFTW is achieved through automated code generation via the genfft tool, which produces hand-optimized codelets incorporating loop unrolling to reduce overhead from branches and increments. This process also enables operation fusion, where multiple transform steps—such as twiddle factor multiplications and butterflies—are combined into a single computational kernel, eliminating intermediate stores and loads to enhance instruction-level parallelism. These techniques allow FFTW to leverage SIMD instructions indirectly by generating vector-friendly code, though specific intrinsics are handled during compilation.^[2] The planner system in FFTW features adaptive radix selection, dynamically choosing the most efficient prime-factor decomposition for a given transform size based on empirical profiling during plan creation. For instance, for sizes like 15, it may prefer a mixed-radix factorization such as 3 × 5 over a single radix-15 approach if the former yields better performance due to smaller, more cache-efficient subtransforms. This selection considers a repertoire of radices (e.g., 2, 3, 4, 5, 7, 8, 11, 13, 16, 32) and balances arithmetic complexity with memory access costs.^[2] In-place transforms form a core optimization in FFTW, computing the DFT directly within the input/output array to minimize memory allocation and bandwidth usage. This is realized through decimation-in-time (DIT) or decimation-in-frequency (DIF) variants combined with transpositions, requiring temporary buffers only for unavoidable strides or alignments, such as in non-contiguous multi-dimensional arrays. For example, a 1D transform of size N = 2^k can be executed in-place using a radix-4 DIT followed by a DIF-transpose, reducing the overall footprint to essentially that of the input data.^[15]^[2] Multi-threaded execution in FFTW provides load-balanced parallelism for large transform sizes N, distributing work across threads via OpenMP or Pthreads to parallelize outer vector loops and codelet invocations. This approach ensures scalable performance on multi-core CPUs by minimizing synchronization overhead and balancing computational load, achieving high efficiency—often approaching 80-90% of ideal scaling—for sufficiently large problems where memory bandwidth is not the bottleneck. The planner's role is crucial here, as it generates thread-safe plans that adapt these optimizations to the available core count.^[15]^[2]

Benchmark Results

In benchmarks of 3D FFT workloads around 2020, Intel MKL demonstrated approximately 1.5 to 2 times lower execution times than FFTW on x86 architectures, with similar relative performance in community evaluations.^[45] For a representative timing example, a one-dimensional double-precision FFT of size N = 2^{20} (1,048,576 points) on an Intel Core i9 processor completes in approximately 1 ms using FFTW with exhaustive planning.^[46]^[47] FFTW exhibits the expected linearithmic O(N \log N) scaling for DFT computations, with empirical constants 20-50% lower than those in its predecessor FFTW2 or in simpler libraries like KissFFT, as measured across power-of-two sizes up to N = 2^{22} on modern x86 hardware.^[2]^[48] In direct comparisons, raw C implementations of FFTW outperform NumPy's FFT module (which interfaces with PocketFFT or MKL depending on the build) by up to 5 times for large one-dimensional transforms due to lower Python overhead, while against Intel MKL, FFTW achieves competitive results on non-x86 platforms or when using its patient planning mode for non-standard sizes.^[47]^[49] As of 2025, AMD's AOCL-FFTW variant includes optimizations for Zen-based architectures like EPYC and Ryzen, particularly for in-place multi-dimensional and MPI-distributed FFTs.^[50]^[51]

Hardware-Specific Adaptations

FFTW incorporates hardware-specific adaptations primarily through support for single instruction, multiple data (SIMD) instruction sets across various CPU architectures, enabling vectorized computations for improved performance on compatible processors. On x86 and x86-64 architectures, the library supports SSE and SSE2 extensions, which are enabled via the --enable-sse (single precision) and --enable-sse2 (single and double precision) configure flags during compilation.^[5] Advanced vectorization is provided by AVX and AVX2, activated with --enable-avx and --enable-avx2 respectively, both supporting single and double precision operations.^[5] AVX-512 support, enabled by --enable-avx512, extends this to 512-bit vectors, processing up to 8 double-precision elements or 16 single-precision elements per instruction, with codelets generated specifically for these wider registers.^[5] For ARM-based systems, FFTW offers runtime detection and utilization of the NEON SIMD extension, introduced in version 3.3.1, targeting 128-bit vectors for both single and double precision.^[1] Codelets are dynamically generated and selected at runtime to leverage NEON instructions without requiring explicit configuration flags, ensuring portability across ARM processors. Scalable vector extensions like SVE are not yet natively supported as of 2025, though the library's runtime probing mechanism allows for future extensibility to wider vectors, such as 256-bit operations, once implemented. On PowerPC architectures, FFTW includes conditional compilation for AltiVec (introduced in version 3.0) and VSX extensions, enabled via --enable-altivec (single precision) and --enable-vsx (single and double precision on Power8 and later).^[3] These adaptations generate specialized codelets for 128-bit vectors in AltiVec and extended double-precision support in VSX, with runtime checks to select the optimal path based on the processor's features. FFTW does not provide native GPU acceleration but integrates with accelerator hardware through third-party wrappers that maintain API compatibility. For NVIDIA GPUs, cuFFTW serves as a drop-in replacement, wrapping the cuFFT library to execute FFTW plans on CUDA-enabled devices while preserving the original interface for seamless migration. Similarly, for AMD GPUs under ROCm, hipFFTW (built on rocFFT) exports FFTW-compatible symbols, allowing existing FFTW code to offload computations to HIP-accelerated hardware without modification.^[52] A key aspect of these adaptations is FFTW's runtime CPU feature detection, performed during the planning phase (e.g., via fftw_plan_* functions), which probes the host processor for supported SIMD extensions and selects the corresponding pre-generated codelets dynamically.^[5] This approach ensures binary portability: a single library compiled with multiple SIMD options runs efficiently on diverse hardware, falling back to scalar code if advanced features are unavailable, without crashing or requiring recompilation. As of November 2025, no major performance updates beyond version 3.3.10 have been reported in official releases.

Licensing and Ecosystem

Licensing Terms

FFTW is distributed under the GNU General Public License version 2 (GPLv2) or later, a copyleft license that requires any modifications or derivative works to be released under compatible terms, ensuring the source code remains freely available.^[53] This licensing model promotes open collaboration while protecting the software's freedoms as defined by the Free Software Foundation.^[53] For users integrating FFTW into proprietary or closed-source applications, where compliance with the GPL's copyleft provisions—such as releasing the entire application's source code—would be undesirable, a commercial license is available from the Massachusetts Institute of Technology (MIT), the copyright holder.^[54]^[55] This dual-licensing strategy enables flexible use: the GPL supports open-source ecosystems, while the commercial option allows linking into non-GPL software without imposing reciprocal licensing on the user's code.^[54] Unmodified redistributions of FFTW are permitted under the GPL, provided the accompanying source code is distributed and the original copyright notice and license are preserved; this aligns with permissive-like terms for binary sharing, though attribution to authors Matteo Frigo and Steven G. Johnson is recommended in publications or documentation.^[53] Historically, versions prior to 3.0 were licensed solely under the GPL following the adoption of GPLv2 in version 1.3 (1999), limiting proprietary use until the explicit commercial licensing pathway was emphasized in version 3.0 and later to encourage wider adoption in industry.^[54]^[3] Compliance considerations include wisdom files—optimization data generated by FFTW—which fall under the GPL and must be treated as such if distributed, potentially requiring source disclosure for derivatives; in contrast, core library linking offers flexibility for closed-source scenarios via the commercial license.^[53]^[15]

Ports and Wrappers

FFTW has been adapted for various programming languages through wrappers and bindings that expose its C API while integrating with language-specific features for array handling and performance optimization. These ports enable seamless use in diverse computational environments, particularly in scientific computing workflows. In Python, the pyFFTW package provides a high-level interface to FFTW, wrapping the core C routines and offering tight integration with NumPy arrays for efficient data manipulation without explicit memory copying. Released in version 0.15.1 on October 22, 2025, pyFFTW supports multithreading and wisdom import/export, making it suitable for large-scale signal processing tasks in Python ecosystems.^[22] For Fortran, FFTW includes native support through the fftw3.f03 module, which defines Fortran 2003 interfaces for nearly all C functions, allowing direct calls from modern Fortran codebases. This interface is widely adopted in high-performance scientific simulations, such as the Quantum ESPRESSO suite for density functional theory calculations, where it handles parallel FFT operations on distributed grids.^[56]^[57] Julia's FFTW.jl package delivers bindings to the FFTW library, extending the base AbstractFFTs interface to leverage FFTW as a backend for multidimensional transforms. It automatically dispatches to FFTW for complex or high-precision needs while falling back to native Julia implementations for simpler cases, ensuring optimal performance in Julia's just-in-time compilation environment.^[58] Additional wrappers exist for other languages, including JFFTW for Java, which provides JNI-based access to FFTW's one-, two-, and three-dimensional transforms on double-precision arrays. In Rust, the fftw crate offers safe bindings with aligned memory allocation for SIMD compatibility. MATLAB users can interface with FFTW via MEX files, compiling C code that links to the library for custom FFT operations beyond the built-in fft function, which itself utilizes FFTW internally. Vendor-specific adaptations include Intel's oneAPI Math Kernel Library (oneMKL), which implements an FFTW3-compatible interface layer to route calls to its optimized DFT routines, supporting threaded and distributed execution.^[4]^[59]^[60]^[61] For mobile and embedded platforms, FFTW can be cross-compiled using the Android Native Development Kit (NDK) to produce static libraries for ARM architectures, enabling FFT computations in Android applications. Similar builds are possible for iOS via Xcode toolchains, targeting resource-constrained devices for signal processing tasks. WebAssembly ports, compiled via Emscripten, allow FFTW execution in browser environments for web-based scientific visualization and audio analysis.^[62]

Community and Maintenance

The FFTW project maintains an active online presence through its official website at fftw.org, which serves as the central hub for documentation and resources. The site's user's manual, available in both HTML and PDF formats, provides in-depth coverage of the library's APIs, including basic and advanced interfaces, multi-dimensional transforms, parallelization options, and optimization strategies, along with dedicated tutorials and troubleshooting sections for common issues such as installation errors and performance tuning.^[63]^[15] This extensive manual spans approximately 110 pages and is designed to support users from beginners to experts in implementing efficient Fourier transforms.^[15] Complementing the manual, the website features a detailed FAQ that addresses frequently asked questions on topics like licensing, compatibility with other libraries, and basic usage pitfalls.^[64] Community engagement is facilitated through mailing lists, including [email protected] for general discussions and user support, and the low-volume fftw-announce list for release announcements and updates.^[64]^[65] The source code repository is hosted on GitHub at github.com/FFTW/fftw3, where development occurs using a code-generation approach; while major releases have not occurred since version 3.3.10 in December 2020, the repository shows occasional commits for maintenance tasks as recently as late 2025, and the issue tracker remains active for reporting and resolving bugs.^[66]^[1] The broader community contributes through specialized forks and extensions tailored to specific hardware. AMD's AOCL-FFTW, an optimized variant for EPYC processors, is actively maintained with releases such as version 5.0 in October 2024 and 5.1 in May 2025, incorporating performance enhancements for AMD architectures.^[67] Similarly, NVIDIA's cuFFTW provides an FFTW-compatible interface within the cuFFT library for GPU-accelerated computations on CUDA-enabled devices, with its last major integration occurring in CUDA Toolkit 13.0 in August 2025.^[40] As of 2025, FFTW is primarily maintained by Steven G. Johnson, with a focus on stability, bug fixes, and compatibility rather than introducing new features beyond the capabilities of version 3.3.10.^[1]^[15]

References

[1]
FFTW Home Page
We believe that FFTW, which is free software, should become the FFT library of choice for most applications. The latest official release of FFTW is version 3.3.Download Page · FFTW 3.3.10 · Upgrading from FFTW version 2 · FAQ
[2]
[PDF] The Design and Implementation of FFTW3
Johnson, of the. 1999 J. H. Wilkinson Prize for Numerical Software, in recognition of their work on FFTW. Steven G. Johnson joined the faculty of Applied.
[3]
FFTW Release Notes
Go back to the FFTW home page. The FFTW Release Notes. This document describes the new features and changes in each release of FFTW. FFTW 3.3.10.
[4]
FFTW Download Page
Download FFTW source code, view platform-specific notes sent in by users, or jump to mirror sites.
[5]
Installation on Unix (FFTW 3.3.10)
FFTW comes with a configure program in the GNU style. Installation can be as simple as: ./configure make make install This will build the uniprocessor complex ...
[6]
https://www.fftw.org/~athena/abstracts/abstract13.html
[7]
What FFTW Really Computes (FFTW 3.3.10)
### Summary of FFTW One-Dimensional DFT Formula
[8]
[PDF] The Fastest Fourier Transform in the West - FFTW
September 11, 1997. Matteo Frigo was supported in part by the Defense Advanced Research Projects Agency (DARPA) under. Grant N00014-94-1-0985.
[9]
Complex DFTs (FFTW 3.3.10)
### Summary of Complex DFTs in FFTW
[10]
One-Dimensional DFTs of Real Data (FFTW 3.3.10)
This flag is also not currently supported for multi-dimensional real DFTs (next section). Readers familiar with DFTs of real data will recall that the 0th ...
[11]
https://www.fftw.org/fftw3_doc/Multi_002ddimensional-Array-Format.html
[12]
https://www.fftw.org/fftw3_doc/Real-even_002fodd-DFTs-_0028cosine_002fsine-transforms_0029.html
[13]
Wisdom (FFTW 3.3.10)
### FFTW Wisdom System Summary
[14]
Precision (FFTW 3.3.10)
### Summary of Precision Support in FFTW 3.3.10
[15]
[PDF] fftw3.pdf
This manual is for FFTW (version 3.3.10, 10 December 2020), which includes an introduction and tutorial.
[16]
[PDF] FFTW: An Adaptive Software Architecture for the FFT
Matteo Frigo was supported in part by the Defense Advanced Re- search ... Steven G. Johnson was sup- ported in part by a DoD NDSEG Fellowship, an MIT ...
[17]
[PDF] A Fast Fourier Transform Compiler - FFTW
Feb 16, 1999 · FFTW was compiled with Sun's C compiler (WorkShop Compilers 4.2 30 Oct 1996 C 4.2).
[18]
[PDF] The Fastest Fourier Transform in the West - DTIC
Sep 11, 1997 · FFTW is a fast DFT package using codelets, a planner, and codelet generation. It is faster than other publicly available DFT software.
[19]
FFTW Home Page
### Summary of FFTW Content
[20]
Introduction (FFTW 3.3.10)
This manual documents version 3.3.10 of FFTW, the Fastest Fourier Transform in the West. FFTW is a comprehensive collection of fast C routines for computing the ...
[21]
Commits · FFTW/fftw3
**Summary of Commit History for FFTW/fftw3 (Post-September 2021):**
[22]
pyFFTW - PyPI
pyFFTW 0.15.1. pip install pyFFTW. Copy PIP instructions. Latest version. Released: Oct 22, 2025. A pythonic wrapper around FFTW, the FFT library, presenting a ...
[23]
Prize History | James H. Wilkinson Prize for Numerical Software | SIAM
The 2023 James H. Wilkinson Prize for Numerical Software is awarded to Field Van Zee and Devin Matthews for the development of BLIS, a portable open-source ...
[24]
SPECworkstation 3 workload descriptions - SPEC.org
Dec 4, 2018 · Search all SPEC CPU 2017 Integer Speed results · Search all SPEC CPU ... Energy (oil and gas). FFTW. This workload computes the discrete ...
[25]
fftw - Define method for determining FFT algorithm - MATLAB
The fftw function can improve the performance of Fourier transform functions by trying to optimize computation based on the size and type of the data. Clear any ...
[26]
MATFFTW — PETSc 3.24.1 documentation
MATFFTW#. “fftw” - Matrix type that provides FFT via the FFTW external package. See Also#. Matrices, Mat , MatCreate() , MatType , MatCreateFFT(). Level#.Missing: integrates | Show results with:integrates
[27]
[Trilinos-Users] Interfacing fftw and trilinos
Nov 28, 2012 · [Trilinos-Users] Interfacing fftw and trilinos ; Trilinos-Users at software.sandia.gov >>>> ; http://software.sandia.gov/mailman/listinfo/trilinos ...Missing: integrates | Show results with:integrates
[28]
‪Matteo Frigo‬ - ‪Google Scholar‬
The design and implementation of FFTW3. M Frigo, SG Johnson. Proceedings of the IEEE 93 (2), 216-231, 2005. 6959, 2005 ; FFTW: An adaptive software architecture ...
[29]
https://scholar.google.com/citations?user=_MHaph0AAAAJ&hl=en
[30]
Planner Flags (FFTW 3.3.10)
### Summary of FFTW Planner Flags
[31]
Usage of Multi-threaded FFTW (FFTW 3.3.10)
It returns zero if there was some error (which should not happen under normal circumstances) and a non-zero value otherwise. Third, before creating a plan ...<|separator|>
[32]
Thread safety (FFTW 3.3.10)
### Summary of Multi-threaded Planning in FFTW
[33]
FFTW Reference (FFTW 3.3.10)
### Summary of Data Types Supported by FFTW
[34]
The Halfcomplex-format DFT (FFTW 3.3.10)
### Summary of Halfcomplex-format DFT in FFTW
[35]
SIMD alignment and fftw_malloc (FFTW 3.3.10)
### Memory Alignment Requirements for FFTW
[36]
Memory Allocation (FFTW 3.3.10)
### Summary of Memory Allocation in FFTW 3.3.10
[37]
Installation and Customization (FFTW 3.3.10)
In principle, FFTW should work on any system with an ANSI C compiler ( gcc is fine). However, planner time is drastically reduced if FFTW can exploit a hardware ...
[38]
rust-math/fftw: FFTW3 binding for Rust - GitHub
For Linux systems, please install FFTW using your package manager, e.g. in Debian or Ubuntu run apt install libfftw3-dev; For macOS, please run brew install ...
[39]
fftw3 - vcpkg package
May 14, 2025 · FFTW is a C library for computing the discrete Fourier transform (DFT). Install with `vcpkg install fftw3` or `vcpkg add port fftw3`.
[40]
1. Introduction — cuFFT 13.0 documentation - NVIDIA Docs Hub
Oct 2, 2025 · The cuFFTW library is provided as a porting tool to enable users of FFTW to start using NVIDIA GPUs with a minimum amount of effort. The FFT is ...
[41]
Complex One-Dimensional DFTs (FFTW 3.3.10)
FFTW computes an unnormalized DFT. Thus, computing a forward followed by a backward transform (or vice versa) results in the original array scaled by n . For ...
[42]
Real-to-Real Transforms (FFTW 3.3.10)
### Summary of Real-to-Real Transforms (FFTW 3.3.10)
[43]
Multi-dimensional Transforms (FFTW 3.3.10)
### Summary of Multi-dimensional Transforms (FFTW 3.3.10)
[44]
3.3 Words of Wisdom—Saving Plans - FFTW
The wisdom mechanism is a way to get the best of both worlds: you compute a good plan once, save it to disk, and later reload it as many times as necessary.
[45]
project-gemmi/benchmarking-fft: choosing FFT library... - GitHub
FFTW library has an impressive list of other FFT libraries that FFTW was benchmarked against. Unfortunately, this list has not been updated since about 2005 ...
[46]
How to configure intel MKL FFT for best performance?
Apr 25, 2023 · The Bechmark slightly improved, now Intel MKL FFT takes 0.6 ms and FFTW as usual takes 0.34 ms. FFTW still beats Intel MKL by 2 times. Translate.1D FFT performance - Intel Communitymultiple, data parallel FFTs in SIMD? - Intel CommunityMore results from community.intel.com
[47]
Faster Fast Fourier Transforms in Python - QMUL ITS Research Blog
Jun 11, 2021 · The pyFFTW interfaces API provides a drop-in replacement to Numpy's FFT functions. I have found them to be marginally quicker for power-of-two ...
[48]
mborgerding/kissfft: a Fast Fourier Transform (FFT) library ... - GitHub
KISS FFT - A mixed-radix Fast Fourier Transform based up on the principle, "Keep It Simple, Stupid." There are many great fft libraries already around.
[49]
Evaluating the performance of FFT library implementations on ...
Jan 20, 2021 · This paper discusses the application of IBM ESSL [14] and FFTW [15] mathematical libraries for performing fast Fourier transform (FFT) ...
[50]
AOCL-FFTW (Fastest Fourier Transform in the West) - AMD
AOCL-FFTW is an AMD optimized version of FFTW implementation targeted for AMD EPYC™ CPUs. Highlights of AOCL-FFTW 5.1. No change. Documentation. Prior versions ...Missing: fork 2024
[51]
[PDF] AOCL User Guide | AMD
The applications must preload AOCL-LibMem to replace standard c library memory functions for better performance gains on AMD “Zen” micro-architectures. To ...
[52]
hipFFT and hipFFTW documentation
hipFFTW is a GPU-aware library for fast Fourier transforms using rocFFT as the backend. It exports an interface borrowing the most commonly-used symbols of FFTW ...Missing: ROC | Show results with:ROC
[53]
License and Copyright (FFTW 3.3.10)
FFTW is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation.
[54]
FFTW FAQ - Section 1 Introduction and General Information
Such users can purchase an unlimited-use license from MIT. Contact us for more details. We could instead have released FFTW under the LGPL, or even ...
[55]
FFTW - Fastest Fourier Transform in the West
FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions of arbitrary input size.
[56]
Overview of Fortran interface (FFTW 3.3.10)
FFTW provides a file fftw3.f03 that defines Fortran 2003 interfaces for all of its C routines, except for the MPI routines described elsewhere.
[57]
2.5 Libraries - Quantum Espresso
The FFTXlib of QUANTUM ESPRESSO contains a copy of an old FFTW library. It also supports the newer FFTW3 library and some vendor-specific FFT libraries.Missing: usage | Show results with:usage
[58]
Home · FFTW.jl
This package provides bindings to the FFTW library for fast Fourier transforms. Installation The package is available for Julia versions 1.0 and later.
[59]
fftw - Rust - Docs.rs
Rust binding of FFTW. Examples: Complex-to-Complex, use fftw::array::AlignedVec; use fftw::plan::*; use fftw::types::*; use std::f64:
[60]
Matlab FFT vs MEX FFTW - MathWorks
Mar 7, 2013 · It turns out that the matlab routine runs faster than the mex routine for very large arrays (about twice as fast). The mex routine uses wisdom and and performs ...
[61]
FFTW Interface Support - Intel
Oct 31, 2024 · The wrappers are the superstructure of FFTW to be used for calling the Intel® oneAPI Math Kernel Library (oneMKL) Fourier transform functions.
[62]
FFTW In Android NDK - Stack Overflow
Apr 12, 2018 · You have to build the library locally on your computer using the build script. You can use the Android.mk file in the README, just make ...can not link to FFTW library for Android NDKHow to add FFTW to an Android project using CMAKEMore results from stackoverflow.com
[63]
Top (FFTW 3.3.10)
Mar 3, 2010 · FFTW 3.3.10 computes the 1d Discrete Fourier Transform (DFT), including complex and real data, and multi-dimensional transforms.
[64]
FFTW Frequently Asked Questions with Answers
- **Source**: FFTW Frequently Asked Questions with Answers
[65]
fftw-announce - Google Groups
Welcome to fftw-announce. This is a low-volume mailing list of announcements related to the FFTW project (http://fftw.org/).
[66]
FFTW/fftw3: DO NOT CHECK OUT THESE FILES FROM ... - GitHub
This is the git repository for the FFTW library for computing Fourier transforms (version 3.x), maintained by the FFTW authors.
[67]
Releases · amd/amd-fftw - GitHub
Oct 10, 2024 · AVX-512 enablement of DFT kernels; AVX-512 optimization of copy and ... Upgraded AOCL-FFTW to align with the reference FFTW 3.3.10 from MIT ...
[68]
cuFFT - NVIDIA Docs Hub
Jan 30, 2023 · The cuFFT API is modeled after FFTW, which is one of the most popular and efficient CPU-based FFT libraries. cuFFT provides a simple ...