Fact-checked by Grok 2 weeks ago

Spectrogram

A spectrogram is a two-dimensional visual representation depicting the spectrum of frequencies in a signal as it evolves over time, with intensity or color encoding the signal's amplitude or power at each frequency-time coordinate. Typically computed as the squared magnitude of the short-time Fourier transform (STFT), it divides the signal into short, overlapping windows, applies the Fourier transform to each, and plots the results to capture local spectral characteristics of non-stationary signals. This method trades off time and frequency resolution due to the fixed window size inherent in the STFT, though alternatives like wavelet transforms offer variable resolution for specific applications. Originating from the sound spectrograph invented in 1946 by Ralph K. Potter, Waldo E. Koenig Jr., and H. C. Lacey at Bell Laboratories for speech analysis, spectrograms initially supported phonetic research and military communications during World War II. They have since become essential in diverse domains, including audio engineering for identifying harmonics and formants, vibration analysis for fault detection, and radar for signal classification, providing intuitive insights into transient spectral events that waveform or static spectra alone obscure.

Fundamentals

Definition and Mathematical Foundation

A spectrogram provides a visual depiction of a signal's spectrum evolving over time, with the horizontal axis representing time, the vertical axis , and color or intensity encoding the of spectral components, often on a such as decibels. Mathematically, the spectrogram of a signal x(t) is the squared magnitude of its (STFT), yielding a time-frequency :
\mathrm{spectrogram}(t, \omega) = \left| \mathrm{STFT}(t, \omega) \right|^2.
For a continuous-time signal x(t), the STFT is defined as
\mathrm{STFT}(t, f) = \int_{-\infty}^{\infty} x(\tau) \, w(t - \tau) \, e^{-j 2\pi f \tau} \, d\tau,
where w(\cdot) is a —typically real-valued and concentrated near zero—to restrict the to a short interval around time t, and f denotes in hertz. Variations may include a complex conjugate on the window for analytic representations or \omega = 2\pi f.
This formulation arises from applying the Fourier transform locally in time, balancing the global frequency resolution of the full Fourier transform with temporal localization. The window w(t) determines the trade-off: its duration inversely affects frequency resolution via the Fourier uncertainty principle, as narrower windows yield broader spectral spreads. In discrete implementations, the integral becomes a summation over samples, with the exponential evaluated at discrete frequencies via the discrete Fourier transform. The resulting spectrogram thus quantifies local power spectral density, enabling analysis of non-stationary signals where frequency content varies causally with time.

Physical and Causal Interpretation

The spectrogram physically represents the local of a signal in the time-frequency plane, where the horizontal axis denotes time, the vertical axis denotes (in hertz, corresponding to oscillation cycles per second), and the color or at each point quantifies the signal's or squared at that around that time. For acoustic signals, this maps to the distribution of kinetic and in air oscillations, with brighter regions indicating higher-intensity vibrations at specific rates driven by the sound . The underlying (STFT) decomposes the signal into overlapping windowed segments, each analyzed for sinusoidal components, yielding a physically interpretable approximation of how frequency-specific evolves, limited by the Heisenberg-Gabor that trades time resolution for frequency resolution based on window length. Causally, spectrogram features arise from the physical mechanisms generating the signal, such as periodic forcing in oscillatory systems producing sustained energy concentrations at resonant frequencies. In string instruments, for example, horizontal bands at integer multiples of the reflect standing wave modes excited by the string's vibration, where the fundamental determines pitch via length, tension, and mass density per the wave equation v = \sqrt{T/\mu}, and overtones emerge from boundary conditions enforcing nodal points. Transient vertical streaks often signal impulsive causes like plucking or impact, releasing energy that decays according to physics. This causal mapping enables inference of source dynamics: formant structures in speech, for instance, trace to vocal tract resonances shaped by anatomical configurations, while chirp-like sweeps in returns indicate accelerating targets via Doppler shifts proportional to relative velocity. Limitations include windowing artifacts that smear causal events, as non-stationarities (e.g., sudden frequency shifts from mode coupling) violate the stationarity assumption implicit in , necessitating validation against first-principles models of wave propagation and energy transfer.

Historical Development

Pre-20th Century Precursors

The , invented by Édouard-Léon Scott de Martinville and patented on March 25, 1857, represented an early attempt to visually capture airborne sound waves by tracing their vibrations onto soot-covered paper or glass using a diaphragm-connected . This device produced phonautograms—graphical representations of sound amplitude over time—but lacked frequency decomposition or playback capability, serving primarily for acoustic study rather than reproduction. Scott's motivation stemmed from mimicking the human ear's structure to "write sound" for scientific , predating Edison's by two decades and establishing a for temporal of acoustic signals. In parallel, mid-19th-century advancements in emerged through Hermann von Helmholtz's vibration microscope, developed around the 1850s, which magnified diaphragm oscillations driven by sound to reveal vibrational patterns and interactions. Helmholtz's 1863 treatise Die Lehre von den Tonempfindungen als physiologische Grundlage für die Theorie der Musik theoretically decomposed complex tones into sinusoidal components via principles, influencing empirical tools for breakdown without direct time-frequency plotting. Rudolph Koenig, building on these foundations from the 1860s, engineered the manometric flame apparatus circa 1862, employing rotating gas flames sensitive to acoustic pressure for visualizing wave harmonics as modulated light patterns, enabling qualitative observation of content in steady tones. Koenig further refined this into a resonator-based analyzer by 1865, featuring tunable Helmholtz resonators to isolate specific frequencies from a composite , functioning as an analog precursor to spectrum analysis by selectively amplifying and detecting partials across a range of about 65 notes. These devices, while static in frequency display and limited to continuous or quasi-steady signals, provided the causal insight that could be dissected into elements for visual scrutiny, bridging amplitude-time traces and dynamic spectrographic methods.

World War II Origins and Early Devices

The sound spectrograph, the first practical device for generating spectrograms, was developed at Bell Laboratories by Ralph K. Potter and his team starting in early 1941, with the aim of producing visual representations of interpretable by the human eye. A rough laboratory prototype was completed by the end of 1941, just prior to the ' entry into . This instrument functioned as a specialized wave analyzer, converting audio input into a permanent graphic record displaying the distribution of acoustic energy across frequency and time dimensions, thereby enabling detailed analysis of phonetic structure. During World War II, the spectrograph's development accelerated under military auspices, with the first operational models deployed for cryptanalytic purposes to decode and identify speech patterns in intercepted communications. engineers adapted the device to support Allied efforts in voice identification, allowing acoustic analysts to distinguish individual speakers from and radio transmissions by revealing unique spectral signatures resistant to verbal disguise. The U.S. military, including collaboration with agencies like the FBI, leveraged these early spectrographs to counter radio traffic, marking the technology's initial real-world application in rather than its original civilian motivations of improvement and speech education. These wartime devices operated by recording sound onto a rotating magnetic drum, filtering it through a bank of bandpass filters spanning approximately 0 to 8000 Hz, and plotting intensity as darkness on electrosensitive paper, with time advancing horizontally and frequency vertically. Typical analysis windows were short, on the order of 0.0025 to 0.064 seconds, to capture rapid phonetic transients, though resolution trade-offs between time and frequency were inherent due to the analog filtering constraints. Post-war declassification in 1945–1946 revealed the spectrograph's efficacy, as documented in technical papers by Potter and colleagues, confirming its role in advancing empirical speech analysis amid the era's secrecy.

Post-War Advancements

In the immediate post-World War II period, the sound spectrograph transitioned from classified military use to commercial availability, enabling broader scientific application. In 1951, Kay Electric Company, under license from Bell Laboratories, introduced the first commercial model known as the Sona-Graph, which produced two-dimensional visualizations of sound spectra with time on the horizontal axis and frequency on the vertical axis, where darkness indicated . This device facilitated detailed analysis of speech formants and acoustic patterns, supplanting earlier impressionistic phonetic notations with empirical spectrographic data in linguistic research. Advancements in the 1950s included integration with speech synthesis tools, such as the Pattern Playback developed at Haskins Laboratories around 1950, which converted spectrographic patterns back into audible sound, advancing synthetic speech production. The Sona-Graph's portability relative to wartime prototypes and its adoption in fields like phonetics and bioacoustics—exemplified by its use in visualizing bird vocalizations—expanded spectrographic analysis beyond wartime cryptanalysis to civilian studies of animal communication and human audition training for the hearing impaired. By the 1960s, early digital implementations emerged alongside analog refinements, with three-dimensional sonagrams providing volumetric representations of frequency, time, and amplitude to capture signal strength more intuitively. adaptations persisted, as modified spectrographic techniques for the Sound Surveillance System () in , processing data to track submarines via time-frequency displays. These developments laid groundwork for computational spectrography, though analog devices like the Kay Sona-Graph dominated until efficient digital algorithms proliferated later.

Generation Techniques

Short-Time Fourier Transform

The short-time Fourier transform (STFT) generates a time-frequency representation by computing the of short, overlapping segments of a non-stationary signal, enabling analysis of how content evolves over time. In practice, the signal is divided into frames using a sliding , each frame is multiplied by a to minimize , and the (DFT) or (FFT) is applied to yield complex-valued coefficients for each time step and bin. The resulting two-dimensional array, when taking the squared magnitude, produces the spectrogram, which displays signal power as a function of time and . For a discrete-time signal s, the STFT at time index m and frequency index k is given by X[m, k] = \sum_{n=0}^{N-1} s[n + m H] w e^{-j 2\pi k n / N}, where N is the window length, w is the (e.g., Hamming or Hann with length N), and H is the size determining overlap (typically H = N/2 to N/4 for 50–75% overlap to enhance temporal smoothness and reconstruction fidelity). Overlap reduces artifacts from abrupt frame transitions and improves spectrogram continuity, as non-overlapping windows can introduce discontinuities in the that manifest as streaking in the . Window choice trades off resolution (longer windows yield narrower main lobes in the ) against time localization; for instance, a 256-sample Hann window provides moderate resolution suitable for audio signals sampled at 44.1 kHz, balancing leakage suppression with computational efficiency via the FFT. Implementation often involves zero-padding frames to the next for efficient FFT computation, with the spectrogram plotted using scaling of |X[m,k]|^2 to emphasize . Parameter selection—window length from 20–100 ms for speech, hop sizes of 10 ms—depends on the signal's characteristics, as shorter windows capture transients better but broaden frequency estimates due to the inherent time-frequency uncertainty. In libraries like MATLAB's stft function, default settings use Kaiser windows with high overlap for analytic applications, ensuring invertibility under the constant overlap-add (COLA) condition where the window satisfies \sum_{m} w[n + m H]^2 = constant.

Windowing and Parameter Choices

Windowing is applied to signal segments in the (STFT) to mitigate , which occurs when abrupt truncation of finite segments introduces discontinuities that broaden the frequency response. The multiplies the segment, tapering its edges to reduce sidelobe amplitudes in the , though this widens the and thus decreases frequency resolution compared to a rectangular . Rectangular windows maximize frequency resolution but exhibit high sidelobes (-13 dB), leading to significant leakage; tapered alternatives like the Hann window achieve sidelobe suppression of about -31 dB at the cost of roughly doubling the main lobe width. Common window types for spectrogram generation include Hann (raised cosine), Hamming (similar but with different sidelobe decay), and (adjustable via β for balancing and leakage). The Hann window is frequently selected for audio spectrograms due to its effective suppression of leakage while maintaining reasonable , avoiding the end-point discontinuities of rectangular . Parameter choices depend on application: for transient signals, narrower (e.g., 10-20 ms) prioritize time localization, while broader (e.g., 40-60 ms) enhance detail in stationary tones. Overlap between consecutive windows, controlled by hop size (typically 25-50% of window length), increases temporal density and smoothness in the spectrogram by providing redundancy that interpolates between frames, reducing aliasing artifacts from non-overlapping analysis. A 50% overlap doubles effective time resolution without excessive computation, whereas 75-90% overlaps yield visually refined displays but demand more processing resources. Optimal selections balance the Heisenberg-like time-frequency trade-off, with empirical tuning often required; for instance, excessive overlap in long signals inflates memory use without proportional gains in accuracy. In practice, libraries like MATLAB's spectrogram function default to Hann windows with 50% overlap for general signals, adjustable via user parameters to suit specific resolution needs.

Alternative Time-Frequency Methods

Alternative methods to the (STFT) for generating time-frequency representations include the (CWT), which yields a scalogram defined as the squared of the CWT coefficients plotted against time and (inversely related to frequency). Unlike the fixed-resolution STFT, the CWT employs scalable, translated wavelets, providing higher time resolution at high frequencies and higher frequency resolution at low frequencies, making it suitable for analyzing non-stationary signals with transient components. Quadratic time-frequency distributions, such as the Wigner-Ville distribution (WVD), offer theoretically optimal joint time-frequency resolution for linear frequency-modulated signals by computing the of the signal's instantaneous . The discrete WVD for a signal x is given by W(n, \omega) = \sum_{m} x[n+m] x^*[n-m] e^{-j 2 \pi \omega (2m)}, preserving and marginals but introducing oscillatory cross-terms between signal components that can obscure interpretation. To address cross-term artifacts in the WVD, kernel-modified variants like the Choi-Williams distribution (CWD) apply an exponential \phi(\tau, \nu) = e^{-\sigma \tau^2 \nu^2} to suppress interferences while retaining desirable auto-term concentration. The CWD demonstrates advantages over the STFT in resolving closely spaced frequency components in non-stationary signals, such as fusion fluctuations, due to reduced smearing and better localization, though it requires careful selection for kernel spread \sigma. Smoothed pseudo WVD further mitigates and cross-terms via windowing in time and lag domains, balancing and artifact reduction for practical applications in signal .

Representations and Formats

Axis Conventions and Scales

In standard spectrogram representations derived from the (STFT), the horizontal axis denotes time, progressing from left to right in linear seconds or milliseconds, reflecting the sequential progression of the signal. The vertical axis represents , typically in hertz (Hz), with conventions placing lower frequencies at the bottom and higher frequencies ascending upward to match perceptual intuition in audio and . This orientation aligns with common plotting practices in software, where frequency on the y-axis and time on the x-axis facilitate intuitive reading of temporal evolution across spectral content. Alternative orientations, such as frequency on the x-axis and time on the y-axis, exist in specialized tools like MATLAB's spectrogram function but are less prevalent for general visualization. Frequency scales are predominantly linear in raw hertz for precise applications, such as testing or analysis, ensuring uniform bin spacing that corresponds directly to the Fourier transform's output. However, logarithmic scales, which compress higher frequencies and expand lower ones, are favored in audio and to approximate human auditory , where intervals are roughly logarithmic; this is evident in tools emphasizing psychoacoustic relevance over uniform spectral resolution. The , a perceptually warped logarithmic variant, further refines this for by mimicking spacing in the , though it deviates from physical frequency linearity. Time scales remain consistently linear to preserve causal ordering without perceptual distortion. The third dimension, spectral intensity or power, is mapped to color, , or height in pseudocolor plots, with scales often logarithmic in decibels () to handle the wide of signals—typically spanning 60-100 —avoiding visual dominance by peaks and revealing subtle features. Linear scales are rarer due to their of low-level details, while scaling (e.g., 20 log10(|STFT|)) provides perceptual uniformity akin to . Customizable options in analysis software allow switching between linear, logarithmic, and for intensity, balancing resolution and visibility based on application needs like transient detection or assessment.

Visualization and Color Mapping

Spectrograms are visualized as two-dimensional heatmaps, with time along the horizontal axis, along the vertical axis, and signal power or encoded via color or shading at each time- coordinate. This representation leverages the output, where each pixel's value derives from the squared magnitude of the complex STFT coefficients, scaled logarithmically in and often in to match human auditory perception. Grayscale mappings predominate in traditional displays, such as those in software, where darker shades denote higher energy levels, offering monotonic perceptual scaling and accessibility for color-deficient viewers. Colored colormaps extend this by assigning hues to intensity gradients; for instance, Audacity's default scheme transitions from white (low) through and to (high), with adjustable range settings to optimize contrast for specific signals. Warmer colors like or typically signify elevated amplitudes, while cooler or greens indicate lower ones, facilitating rapid visual identification of spectral features in applications like audio editing. However, rainbow-like colormaps, such as MATLAB's jet, introduce disadvantages including non-uniform perceptual steps and illusory contours, which can distort quantitative interpretations by implying false data gradients. Perceptually uniform alternatives, like viridis or turbo, mitigate these by ensuring consistent lightness progression across the spectrum, enhancing accuracy in scientific analysis without sacrificing hue-based segmentation. In hardware analyzers, such as Keysight's, discrete color counts (e.g., 16-256) define the mapping resolution, balancing detail with computational efficiency. Alternative visualizations include waterfall plots, which accumulate sequential spectrograms vertically for a pseudo-3D effect, with color indicating persistence over time, useful for detecting transient signals in analysis. Surface renders treat the spectrogram as a height field, emphasizing topography, though they risk of underlying details. Selection of mapping depends on context: for precision, sequential colors for qualitative overview, prioritizing uniformity to avoid misperception in quantitative tasks.

Common Variants (e.g., Mel-Spectrogram)

A Mel-spectrogram represents the short-time power spectrum of a signal with frequencies remapped to the , which approximates the nonlinear resolution of human auditory , emphasizing lower frequencies where discrimination is finer. The transformation from linear frequency f (in Hz) to mel scale m follows m = 2595 \log_{10}(1 + f/700), derived from psychophysical experiments on perceived equality. Computation involves applying a of overlapping triangular filters—typically 40 to 128—spaced linearly in mel domain to the magnitude-squared STFT output, yielding filterbank energies that are often logarithmically compressed for akin to human . This variant reduces dimensionality compared to linear-frequency spectrograms while preserving perceptually salient features, making it computationally efficient for tasks like , where linear scales underrepresent low-frequency formants critical for identification. In contrast to the uniform frequency bins of standard spectrograms, Mel-spectrograms exhibit denser binning below 1 kHz and sparser above, aligning with or equivalent rectangular (ERB) scales that model critical bands of auditory filtering. Empirical evaluations in audio classification show Mel-spectrograms outperforming linear alternatives in convolutional neural networks for event detection, as the perceptual scaling mitigates in high frequencies irrelevant to human-like processing. However, this warping introduces artifacts at high frequencies and assumes stationarity within frames, potentially distorting transient s. Other common variants include the constant-Q spectrogram, generated via constant-Q transform (CQT), which employs logarithmically spaced bins with constant relative bandwidth Q = f/\Delta f, ideal for analyzing harmonic structures in music where octave intervals are perceptually equidistant. Unlike fixed-Q STFT, CQT adapts inversely with frequency, enabling efficient sparse representations for polyphonic signals, though at higher computational cost due to variable window lengths per bin. Bark-spectrograms use the , dividing the spectrum into 24 critical bands up to 16 kHz, offering a physiologically grounded to mel for bioacoustic analysis, with similar nonlinear compression but tied to cochlear models. Log-spectrograms, while not scale-warped, apply logarithmic scaling to power estimates universally across variants to match perceived intensity, reducing sensitivity to variations in applications like noise-robust feature extraction. These adaptations prioritize domain-specific trade-offs between perceptual fidelity, , and invertibility, with selection guided by signal characteristics and task requirements.

Theoretical Limitations and Criticisms

Uncertainty Principle and Resolution Trade-offs

In time-frequency analysis, the uncertainty principle, analogous to Heisenberg's principle in quantum mechanics, imposes a fundamental limit on the joint resolvability of a signal's temporal and spectral features in the short-time Fourier transform (STFT), from which spectrograms are derived as the squared magnitude. Specifically, the product of the standard deviations of the time and frequency localizations, \sigma_t \sigma_f, satisfies \sigma_t \sigma_f \geq \frac{1}{4\pi}, with equality achievable using a Gaussian window function, known as the Gabor limit. This bound arises from the mathematical properties of the Fourier transform, ensuring that no windowing scheme can arbitrarily sharpen both resolutions without trade-offs. For spectrogram generation, the window duration T directly governs this trade-off: a shorter T yields finer time resolution (\Delta t \approx T), enabling precise localization of transient events like onsets or impulses, but coarser frequency resolution (\Delta f \approx 1/T), resulting in smeared spectral lines and reduced ability to distinguish closely spaced frequencies. Conversely, extending T improves frequency discrimination, as narrower spectral lobes emerge from the longer Fourier analysis, but at the cost of temporal smearing, where rapid signal variations appear blurred across the window. This reciprocity is evident in applications such as audio processing, where short windows (e.g., 10-20 ms) suit percussive sounds but fail for harmonic stability, while longer windows (e.g., 50-100 ms) favor pitched tones yet obscure attacks. Window shape further modulates the effective s, with functions like the Hann or Hamming reducing sidelobe leakage to mitigate some effects, though the core \Delta t \Delta f product remains bounded. In implementations, factors such as overlap ratio and FFT length influence practical , but cannot circumvent the principle; for instance, excessive overlap computationally approximates continuous without alleviating the inherent limit. These constraints highlight why spectrograms often require adaptive or hybrid methods for signals with varying stationarity, as fixed parameters inevitably compromise one domain to favor the other.

Phase Information Loss and Ambiguities

The spectrogram is computed as the squared magnitude of the (STFT) coefficients, inherently discarding the information contained in the complex-valued STFT. This component encodes relative timing and details across frequencies, which are critical for accurate signal and perceptual fidelity. Without it, the magnitude-only representation loses the capacity to distinguish signals that differ solely in phase relationships, leading to inherent ambiguities in interpreting or inverting the spectrogram back to the time-domain . One fundamental arises from the non-uniqueness of : multiple distinct signals can produce identical STFT magnitude spectrograms, as the mapping from time-domain signals to is many-to-one. For instance, a signal and its time-reversed counterpart yield the same spectrum because time reversal conjugates the without altering magnitudes, though STFT windowing introduces window-specific variations that may partially mitigate but not eliminate this issue. In real-valued signals, an additional exists, where the reconstructed signal could be the negative of the original, equivalent to a shift of π. These trivial ambiguities (global phase or ) represent the minimal indeterminacy under ideal conditions, but practical often encounters more severe non-uniqueness, particularly for sampled or bandlimited functions without supportive constraints like window design with a simple . Perceptual consequences underscore the phase loss: reconstructions from magnitude-only spectrograms, such as in , result in reversed or garbled audio lacking intelligibility, whereas phase-only reconstructions preserve speech comprehension despite noise-like , highlighting phase's role in carrying essential temporal structure. Addressing these ambiguities requires iterative algorithms like Griffin-Lim or advanced optimization techniques, which estimate via consistency constraints on the STFT but remain susceptible to local minima and imperfect recovery, especially in real-time applications where global phase shifts propagate as sign flips. In general, while certain analytic conditions (e.g., Gaussian windows) enable uniqueness up to global for bandlimited signals, empirical reconstruction fidelity depends heavily on signal sparsity and noise levels, with no universal guarantee of invertibility from alone.

Parametric vs. Non-Parametric Estimation Issues

Non-parametric estimation in spectrograms relies on direct computation of the power spectral density (PSD) for each time-localized window, typically via the periodogram or smoothed variants like Welch's method within the short-time Fourier transform (STFT) framework. These approaches make no assumptions about the underlying signal model, offering robustness against misspecified processes but exhibiting high variance that manifests as noise-like fluctuations in the spectrogram, especially for short windows constrained by the need for temporal resolution. Frequency resolution is fundamentally limited by the window length, leading to spectral leakage and broadened peaks that obscure fine structure in sparse or harmonic signals. Parametric estimation, in contrast, posits a model such as an autoregressive (AR) process for the signal segment in each window, deriving the PSD from estimated coefficients via methods like Yule-Walker or Burg's . This enables higher effective resolution and smoother estimates with reduced variance, as the model extrapolates beyond the data length, proving advantageous for detecting components like formants in speech or echoes in with limited observations. However, performance hinges on accurate model order selection (e.g., using or ), which demands iterative fitting and can falter in noisy or non-stationary conditions, introducing bias if the AR assumption mismatches the true dynamics. at high orders amplifies artifacts, such as spurious peaks, while underfitting smooths valid features, compromising the spectrogram's fidelity for transient events. A core trade-off arises in time-frequency analysis: non-parametric methods preserve distributional flexibility but demand ensemble averaging or longer windows to curb variance, degrading time localization and exacerbating the Heisenberg principle's resolution limits. approaches mitigate this by leveraging prior structure but risk systematic errors in heterogeneous signals, such as biomedical recordings where time-varying AR models have shown improved peak detection over STFT yet sensitivity to parameter initialization. Empirical comparisons, including those for acoustic spectrograms of dolphin vocalizations, indicate methods yield crisper representations under model fit but underperform non-parametric ones when deviations from occur, underscoring the need for diagnostic checks like residual whiteness tests. strategies, blending model-based refinement with non-parametric safeguards, address these issues but increase complexity without guaranteed universality.

Resynthesis and Inversion Challenges

Inversion Algorithms

The inversion of a spectrogram to reconstruct the original time-domain signal from its -only representation constitutes a problem, as the of the (STFT) is discarded, leaving the reconstruction underdetermined since infinitely many signals can produce the same spectrogram. This arises from the nonlinear of the operation and the in the STFT, which overlaps windows to capture local stationarity but does not uniquely specify the signal without . The , introduced in 1984, provides a foundational iterative by exploiting STFT redundancy to estimate a consistent . It minimizes the between the magnitude of the reconstructed STFT and the target spectrogram through alternating projections: starting from an initial phase estimate (often random or zero), it computes the inverse STFT (iSTFT) to yield a time-domain signal, applies the forward STFT, replaces the computed magnitude with the given spectrogram while retaining the phase, and iterates until . Typically, 20–100 iterations suffice for acceptable , with higher overlap (e.g., 75% window hop) improving consistency via greater , though computational cost scales with iterations and STFT size. The algorithm yields a signal whose spectrogram approximates the input but may introduce artifacts like blurred onsets due to suboptimal local phase estimates. For real-time inversion, the Real-Time Iterative Spectrogram Inversion (RTISI) algorithm adapts Griffin-Lim principles to process sequentially, initializing each new 's from the previous 's boundary and performing 2–5 iterations per with minimal look-ahead (e.g., one ) to minimize . RTISI enforces overlap-add consistency across , enabling applications like live audio processing, and has been shown to produce perceptual quality comparable to offline Griffin-Lim for hop sizes around 25–50% of the length. Enhancements like gradient heap integration further refine RTISI by propagating instantaneous estimates, reducing unwrapping errors in non-stationary signals. Other approaches include single-pass methods that avoid full iterations by direct phase propagation via instantaneous frequency integration, suitable for low-latency scenarios but prone to accumulation errors over long durations. These algorithms generally assume a real-valued signal and rectangular or Hann windows, with performance degrading for low redundancy or highly transient content, necessitating hybrid techniques or constraints like sparsity for improved fidelity.

Fidelity and Artifacts in Reconstruction

Reconstructing the time-domain signal from a spectrogram is fundamentally underdetermined, as infinitely many signals can produce the same (STFT) due to the omission of information, which encodes temporal alignments essential for unique recovery. This arises from the non-invertibility of the operation, where distortions or permutations can yield identical , necessitating approximate inversion methods that prioritize consistency over exact . The Griffin-Lim algorithm, introduced in 1984, addresses this by iteratively alternating between enforcing spectrogram magnitude consistency via phase adjustment and time-domain consistency via overlap-add reconstruction, progressively minimizing STFT magnitude (MSE). Despite monotonic MSE convergence, practical fidelity remains limited; for instance, hundreds of iterations may yield signal-to-noise ratios (SNR) of only 10-15 dB for speech signals, far below the near-perfect reconstruction possible with full STFT data. Perceptual evaluations, such as mean opinion scores in vocoding tasks, often reveal discrepancies, with reconstructed signals exhibiting muffled transients and reduced compared to originals. Common artifacts in such reconstructions include temporal smearing, where sharp onsets blur across frames due to inconsistent estimates, and amplitude modulation artifacts resembling buzzing or fluttering, stemming from suboptimal projections onto the consistency constraints. In source separation applications, assigning mixture phases to isolated estimates exacerbates interference, manifesting as ghostly echoes or distortions. These issues persist even in variants, where truncated iterations amplify artifacts like coarse spectral errors detectable across multiple STFT resolutions. Fidelity metrics such as structural similarity index (SSIM) on reconstructed spectrograms highlight these shortcomings, with deep learning enhancements sometimes improving SSIM but introducing new parametric biases. Overall, while algorithmic refinements mitigate some distortions, inherent information loss precludes artifact-free, high-fidelity resynthesis without auxiliary data like prior models or multi-resolution constraints.

Applications

Audio and Speech Processing

![Spectrogram of a male voice saying 'ta ta ta'].(./assets/Praat-spectrogram-tatata.png)[float-right] Spectrograms provide a visual representation of the frequency spectrum of audio signals over time, revealing essential characteristics such as harmonic structure, formant trajectories, and temporal events in speech. In speech processing, dark horizontal bands indicate formants—resonant frequencies that distinguish vowels—while vertical striations mark glottal pulses from voiced sounds and bursts from plosives. This time-frequency depiction facilitates phonetic analysis, enabling researchers to identify phonemes and prosodic features like pitch contours. In automatic speech recognition (ASR) systems, spectrograms form the basis for feature extraction, where (STFT) computations generate the underlying data often processed further into mel-scale variants for perceptual relevance. They support segmentation of continuous speech into phonemes, syllables, and words by highlighting spectral patterns unique to linguistic units. For instance, AI-driven voice assistants like those from major tech firms rely on spectrogram-derived features to interpret spoken commands with accuracies exceeding 95% in controlled environments as of 2023 benchmarks. Beyond recognition, spectrograms aid in audio editing and by visually isolating frequency-specific artifacts, such as hums or clicks, allowing precise filtering without auditory trial-and-error. In forensic audio , they enable speaker identification through characteristic spectral envelopes and modulation patterns. Modulation spectrograms, an extension, enhance robustness in reverberant conditions, improving word error rates by up to 20% in challenging acoustic settings as demonstrated in 1998 studies. Recent advancements integrate spectrograms with for end-to-end , where convolutional neural networks treat them as images for tasks like detection from vocal variations. A 2024 details practical spectrogram analysis for transient signals like finger snaps, underscoring their utility in real-time audio with low computational overhead. These applications underscore spectrograms' role in bridging with perceptual modeling, though limitations in preservation necessitate complementary techniques for full reconstruction.

Biomedical and Acoustic Analysis

Spectrograms provide a time-frequency representation essential for analyzing acoustic signals in bioacoustics, revealing patterns such as , , and temporal structure in animal vocalizations. In studies of mammals, they distinguish chirps as inverted V-shapes, clicks as vertical lines, and whistles as horizontal bands, aiding identification and communication analysis. Similarly, songs exhibit distinct overtones and sequences when visualized spectrographically, facilitating behavioral and ecological . In biomedical contexts, spectrograms enable of voice disorders by capturing deviations in , such as irregular harmonics or shifts indicative of pathologies like vocal nodules or neurological impairments. Machine learning models applied to mel-spectrograms achieve high accuracy in classifying disordered versus healthy voices, with features extracted from spectrographic images supporting laryngological assessments. For cardiopulmonary analysis, spectrograms separate from lung recordings using on time-frequency domains, improving detection of adventitious sounds like wheezes, which appear as prolonged high-frequency bands. This approach enhances classification of respiratory pathologies, with on spectrograms yielding robust performance in identifying murmurs or . Beyond auditory signals, spectrograms of biomedical time-series like EEG detect conditions such as through automated feature extraction from frequency-time patterns, demonstrating efficacy in distinguishing clinical groups. In depression screening, fusion of EEG spectrograms with audio representations supports classification, highlighting spectral asymmetries linked to affective states. These applications underscore spectrograms' utility in non-invasive physiological monitoring, though resolution limits necessitate complementary techniques for precise .

Machine Surveillance and Identification

Spectrograms facilitate machine by transforming audio or signals into time-frequency representations amenable to automated , enabling the detection and identification of specific acoustic events or sources in noisy environments. In acoustic surveillance systems, spectrogram-derived features have been employed to classify sounds robustly, such as distinguishing environmental noises from targeted events like footsteps or machinery , with techniques reducing image dimensionality to enhance computational efficiency for real-time processing. For speaker identification in contexts, spectrograms provide visual cues of vocal tract resonances and structures unique to individuals, supporting forensic and applications through aural-visual comparison methods developed since the mid-20th century. Modern implementations integrate spectrograms with neural networks, achieving accuracies of 92.96% using classic spectrograms and 93.75% with Mel spectrograms on datasets, even under , by leveraging logarithmic that approximates human auditory . In industrial and mechanical surveillance, spectrograms from sound or vibration data enable for machine health monitoring, where convolutional neural networks trained on spectrogram images identify faults in rotating by highlighting spectral shifts indicative of wear or imbalance, as demonstrated in studies on robot arms and bearings with high detection rates post-2019 advancements. Audio fingerprinting techniques further support identification by extracting salient peaks from spectrograms to create robust hashes for matching specific signals, such as verifying sounds or detecting unauthorized audio in secure perimeters, building on algorithms like those in that prioritize prominent time-frequency landmarks for invariance to distortions. Underwater and environmental surveillance applications extend spectrogram use to target recognition, such as extracting salient lines from spectrograms for classifying vessel noises or animal vocalizations in protected areas, aiding in illegal activity detection like via TinyML-integrated systems reported in 2024. These methods underscore spectrograms' role in causal signal decomposition, though performance degrades in extreme noise without preprocessing, necessitating hybrid approaches with for reliable identification.

Recent Advances (2020–2025)

Deep Learning Integrations

Deep learning architectures have adopted spectrograms as primary inputs by treating them as two-dimensional time-frequency images, enabling convolutional neural networks (CNNs) and transformers to perform tasks such as audio classification, sound event detection, and anomaly identification with high efficacy. This integration leverages the spectrogram's visual structure for feature extraction, often outperforming traditional signal-domain methods in scalability and accuracy on large datasets. For instance, in audio classification, spectrogram-based models process raw audio via short-time Fourier transform (STFT) to generate inputs compatible with image-oriented deep networks, achieving state-of-the-art results on benchmarks like AudioSet through end-to-end training. A pivotal advancement is the Audio Spectrogram (AST), proposed in 2021, which discards convolutions entirely in favor of pure self-attention mechanisms applied directly to spectrogram patches, akin to Vision Transformers but optimized for audio. AST demonstrates superior generalization on variable-length inputs and captures long-range dependencies in the time-frequency domain, attaining a mean average precision of 0.4593 on AudioSet after , surpassing prior CNN-based spectrogram models. Extensions like ElasticAST, introduced in 2024, further adapt this framework to handle diverse audio lengths and resolutions without retraining, enhancing applicability in real-world scenarios such as egocentric video sound analysis. In , has advanced handling of complex-valued spectrograms to mitigate phase reconstruction losses inherent in magnitude-only representations. Neural architectures, including generative adversarial networks and diffusion models, estimate both magnitude and for applications like enhancement and separation, with training strategies emphasizing multi-resolution losses and consistency constraints to improve perceptual quality. A 2025 survey notes that these methods achieve lower word error rates in automatic by directly modeling complex STFT outputs, though challenges persist in phase ambiguity resolution without ground-truth audio supervision. Spectrogram inversion, the process of reconstructing time-domain signals from spectrograms, has benefited from hybrid deep learning techniques combining neural phase prediction with numerical optimization. In 2025, online speech inversion methods integrate deep networks with the to compute phase derivatives iteratively, enabling low-latency reconstruction with minimal artifacts, as evaluated on synthesis tasks where perceptual scores exceed those of classical Griffin-Lim algorithms by up to 20% in mean opinion scores. These integrations underscore 's role in overcoming spectrogram ambiguities, facilitating applications in generative audio and forensic analysis.

Generative and Augmented Spectrogram Techniques

Generative spectrogram techniques leverage models to synthesize time-frequency representations from latent distributions or conditional inputs, enabling applications in audio , , and signal simulation. Diffusion-based models, introduced in works around 2023–2025, progressively denoise random noise to produce realistic spectrograms, as demonstrated in unconditional generation for (RF) signals where models trained on LTE datasets yield diverse spectrograms mimicking real emissions. Similarly, masked generative modeling, such as SpecMaskGIT proposed in 2024, applies iterative masking and infilling on spectrograms using architectures, achieving efficient text-to-audio (TTA) by reconstructing masked regions conditioned on textual prompts. These methods outperform traditional autoregressive approaches by handling long-range dependencies in 2D spectrogram structures, with evaluations showing improved perceptual quality in generated audio after Griffin-Lim inversion. Conditional generative adversarial networks (GANs) extend this paradigm by incorporating priors like music or speech attributes. For instance, cMelGAN, developed in 2022, conditions spectrogram generation on genre labels using a framework, producing genre-specific audio clips with fidelity metrics surpassing baselines like WaveGAN in Fréchet Audio Distance scores. models have also been adapted for spectrogram up-sampling in text-to-speech systems, where a 2024 boosting technique enhances low-resolution inputs to high-fidelity outputs, reducing artifacts in streaming synthesis by iteratively refining frequency bins. Such generative approaches facilitate scalable expansion, particularly in domains with scarce , though they require careful hyperparameter tuning to mitigate mode collapse in variants. Augmented spectrogram techniques focus on modifying existing representations to enhance model robustness in machine learning pipelines, often through domain-specific transformations. Source-filter warping, detailed in a 2022 study, decomposes speech spectrograms into source excitation and vocal tract filters, then recombines augmented components to simulate prosodic variations, improving speech recognition accuracy by 5–10% on augmented datasets without altering temporal alignment. Frame-level augmentation methods, like FrameAugment introduced in 2022 for encoder-decoder architectures, apply localized perturbations such as frequency masking or time stretching directly to spectrogram frames, yielding augmented inputs that boost denoising performance in speech enhancement tasks. In self-supervised pre-training, efficient audio transformers (EAT) from 2024 employ bootstrap frameworks on paired augmented spectrograms—generated via mixing or SpecAugment-style masking—to learn invariant representations, achieving state-of-the-art results on downstream audio classification with reduced computational overhead compared to contrastive methods. Hybrid generative-augmented pipelines combine synthesis with augmentation for tasks like or sound separation. A 2025 physics-aware deep reconstructs augmented spectrograms by embedding structures into dictionary learning, enabling hit detection in polyphonic audio mixtures with signal-to-distortion ratios exceeding 20 . These techniques underscore a shift toward causal, invertible augmentations that preserve physical signal properties, contrasting with non-parametric methods prone to perceptual distortions, though empirical validation remains dataset-dependent. Ongoing challenges include ensuring phase consistency in generated spectrograms for high-fidelity waveform inversion, addressed in part by multi-channel autoregressive models operating on complex-valued representations.

References

  1. [1]
    What is a Spectrogram? - Signal Analysis - Vibration Research
    A spectrogram is a graph that displays the strength of a signal over time for a given frequency range. Using a color spectrum, it points to the frequencies ...
  2. [2]
    Classic Spectrograms | Spectral Audio Signal Processing
    The spectrogram can be defined as an intensity plot (usually on a log scale, such as dB) of the Short-Time Fourier Transform (STFT) magnitude. As defined ...
  3. [3]
    Spectrogram using short-time Fourier transform - MATLAB
    The spectrogram function has a matrix containing either the power spectral density (PSD) or the power spectrum of each segment as the fourth output argument.
  4. [4]
    The Short-Time Fourier Transform (STFT) and Time-Frequency ...
    Since the proliferation of digital computers, spectrograms have been computed using the Short-Time Fourier Transform (STFT), which is simply a sequence of FFTs ...<|separator|>
  5. [5]
    [PDF] Formalizing Knowledge Used in Spectrogram Reading - DSpace@MIT
    Since the invention of the sound spectrograph in 1946 by Koenig, Dunn and Lacev, spectrograms have been widely used for speech research.<|separator|>
  6. [6]
    What is a Spectrogram? A Guide to Types & Analysis - Tektronix
    Nov 15, 2022 · A spectrogram is a graphic that shows the viewer how the frequency domain content (spectral content) of a signal or signals is changing or progressing over ...Missing: definition | Show results with:definition
  7. [7]
    3.3. Spectrogram and the STFT - Introduction to Speech Processing
    Spectrograms are effective visualizations of speech signals, showing properties like harmonic structure, temporal events, and formants.
  8. [8]
    Spectrograms - Stanford CCRMA
    The spectrogram can be defined as an intensity plot (usually on a log scale, such as dB) of the Short-Time Fourier Transform (STFT) magnitude.<|separator|>
  9. [9]
    Local time-frequency analysis and short time Fourier transform
    The resulting local time-frequency analysis procedure is referred to as (continuous) short time Fourier transform or windowed Fourier transform.
  10. [10]
    Short-Time Fourier Transform - an overview | ScienceDirect Topics
    ... of the actual frequency. The mathematical equation for the short time Fourier Transform is given by: (7) S f , τ = ∑ t = 0 N − 1 x t ω t − τ e ...
  11. [11]
    The Short-Time Fourier Transform | Spectral Audio Signal Processing
    The Short-Time Fourier Transform (STFT) (or short-term Fourier transform) is a powerful general-purpose tool for audio signal processing.
  12. [12]
    The Physiological Interpretation of Sound Spectrograms | PMLA
    Dec 2, 2020 · Briefly, a sound spectrogram shows the energy distribution on a time-frequency scale where time is read from left to right, frequency from ...
  13. [13]
    [PDF] Short-Time Fourier - Transform - Electrical and Computer Engineering
    For example, the physical phenomenon of Doppler shift in signals from moving sources is generally characterized as a change in center frequency over time. If ...
  14. [14]
    The instantaneous frequency rate spectrogram - ScienceDirect.com
    It is the estimate of a signal׳s energy distributed over the time–frequency (TF) plane, therefore it can be referred to as an energy spectrogram. This ...
  15. [15]
    Energy density signal analysis on the time‐frequency plane
    Aug 12, 2005 · With this modification, the resulting display gives the energy density as a function of the local time and frequency attributes of signals and, ...
  16. [16]
    Spectral Analysis - University of St Andrews
    These are usually displayed as a 2D graph of frequency vs time, with power (the Z-dimension) being colour-coded. Spectrograms are also available from the ...
  17. [17]
    Phonautograph - Wikipedia
    Invention and Technical Design​​ The phonautograph, created by Édouard-Léon Scott de Martinville in 1857, was designed to visually capture sound waves. It used a ...Missing: visualization | Show results with:visualization
  18. [18]
    Édouard-Léon Scott de Martinville - FirstSounds.ORG
    His "phonautograph" inscribed airborne sounds onto paper, over time, to be studied visually. He called his recordings "phonautograms". Collections of his work ...
  19. [19]
    Picturing Sound: Édouard-Léon Scott de Martinville (1817–1879)
    The phonautograph was the first device to imitate the structure and function of the human ear. Scott intended the phonautograph to transform sound into a kind ...
  20. [20]
    The Evolution of Sound Visualization: Historical Perspective
    Acoustic apparatus of the 19th century​​ Around forty years later, Hermann von Helmholtz invented a vibration microscope for sound and vibration visualisation of ...
  21. [21]
    Rudolph Koenig - Sound and Science
    Koenig's most important inventions were his manometric flame apparatus, which made it possible to visualize sound waves, a sound analyzer, and a vibration ...
  22. [22]
    Koenig Sound Analyser - Sound and Science
    For his vowel studies between 1965 and 1872, Koenig invented an analyzer with adjustable resonators that could cover range of 65 notes. Provenance details: This ...
  23. [23]
    Koenig's Apparatus for the Analysis of Sound | Whipple Museum
    This late 19th-century apparatus (image 1) is part of a sound analysis device. It imitates a design by Rudolph Koenig for what is considered the first spectrum ...Missing: inventions | Show results with:inventions<|separator|>
  24. [24]
    [PDF] from visible speech to voiceprints – the missing link - ISCA Archive
    Evidently, Bell Labs tried to sell the spectrograph to the military for two different purposes in two separate projects. The second project also focuses on the ...Missing: invention World
  25. [25]
    [PDF] The Sound Spectrograph - Language Log
    The sound spectrograph is a wave analyzer which produces a permanent visual record show- ing the distribution of energy in both frequency and time.Missing: history Ralph
  26. [26]
    The Secret Military Origins of the Sound Spectrograph
    Jul 26, 2018 · Bell engineers initially proposed the spectrograph to improve telephone transmission as well as to support oral education and visual telephony ...
  27. [27]
    WHAT IS SOUND SPECTROGRAPH AND VOICE PRINTS - Law Web
    Oct 10, 2012 · Acoustic scientists used it during World War II to identify enemy voices on telephones and radios. ... The FBI asked Bell Labs to help.
  28. [28]
    [PDF] HL1144.pdf - Haskins Laboratories
    The spectrograph (Potter, 1945; Koenig, Dunn and Lacy,. 1946) had been developed by Ralph Potter and his associates at. Bell Laboratories just before the War, ...
  29. [29]
    [PDF] LAB()RAT()RIES - World Radio History
    By RALPH K. POTTER. Director of Transmission Research. PERSON totally ... One type of sound spectrograph devel- oped during the war is shown in Figure ...
  30. [30]
    [PDF] SCIENCE
    1 and others described later were made by an instrument that we have called the sound spectrograph. In this instrument, the sound to be pictured is recorded ...Missing: history | Show results with:history
  31. [31]
    Kay Sonagraph DSP5500
    In 1951, Kay Electric was licensed by Bell Labs to develop the first commercial version of the sound spectrograph called the Sona-Graph. The Sona-Graph became ...Missing: advancements | Show results with:advancements
  32. [32]
    A Brief History of Spectrograms - Earbirding
    Dec 7, 2009 · The first spectrograms were created by the Sona-Graph in 1951, called Sonagrams. The Golden Field Guide included them, and digital software ...
  33. [33]
    The World War II Tool That Changed How We Listen To Birdsong
    Nov 25, 2015 · Beginning in the early 1940s, scientists at New Jersey's Bell Laboratories ramped up work on the sound spectrograph, a machine that could chew ...
  34. [34]
    Kay Sona-Graph - Sound and Science
    After the war, the sound spectrograph, along with many other war-related developments, entered the commercial market. In the late 1940s, it was used in ...Missing: advancements | Show results with:advancements
  35. [35]
    1960 – 3D Spectrogram - Data Physicalization
    This term originates from the first audio spectrometer called the "Sona-Graph" and commercialized in 1951.
  36. [36]
    The Cold War: History of the SOund SUrveillance System (SOSUS)
    Jan 14, 2022 · The first SOSUS stations were sited from Barbados to Nova Scotia on a huge semi-circle looking out into the North Atlantic Ocean. Whitman, 2005.
  37. [37]
    stft - Short-time Fourier transform - MATLAB - MathWorks
    The short-time Fourier transform (STFT) is used to analyze how the frequency content of a nonstationary signal changes over time. The magnitude squared of the ...Description · Examples · Input Arguments · Name-Value Arguments
  38. [38]
    The Short-Time Fourier Transform - Stanford CCRMA
    The Short-Time Fourier Transform (STFT) (or short-term Fourier transform) is a powerful general-purpose tool for audio signal processing.
  39. [39]
    Discrete Short-Time Fourier Transform (STFT)
    ... STFT: Y(m,k):=|X(m,k) ... The spectrogram reveals the frequency information of the played notes over time.Missing: continuous | Show results with:continuous<|control11|><|separator|>
  40. [40]
    [PDF] 3. Short-Time Fourier Transforms
    Short-time Fourier transforms (STFT) characterize signals with time-varying frequency components, considering short segments of a signal and computing its  ...
  41. [41]
    Spectrograms need Window Functions - ZipCPU
    Nov 21, 2020 · The solution to this problem is to use a spectral taper, sometimes called a window function. The Short-Time Fourier Transform. There have been a ...Missing: generation | Show results with:generation
  42. [42]
    STFT: Influence of Window Function
    For computing the STFT, we use a Hann as well as a rectangular window each having a size of 62.5 msec. The following figure shows the resulting spectrograms.
  43. [43]
    Analysis Window (Step 1) - Stanford CCRMA
    The most commonly used windows are called Rectangular, Triangular, Hamming, Hanning, Kaiser, and Chebyshev.Missing: types | Show results with:types
  44. [44]
    Understanding Short Time Fourier Transforms and Implementing in C
    Jul 1, 2024 · Common window functions include Hamming, Hann, and Gaussian windows. The choice of window function affects the trade-off between time and ...
  45. [45]
    analysis:course:week6 [wiki]
    Apr 17, 2018 · This means that spectrograms tend to be particularly sensitive to the choice of window. A Hanning window is often a good default choice. As ...<|control11|><|separator|>
  46. [46]
    fourier transform - STFT: why overlapping the window?
    Nov 25, 2014 · The uncertanty about frequency and time is determined by the width of the window, however, I can't understand what is the point of having overlap windows.Understanding overlapping in STFTEffect of overlapping percentage on STFT outputMore results from dsp.stackexchange.comMissing: generation | Show results with:generation
  47. [47]
    What is the purpose of overlapping windows in acoustic signal ...
    Jul 29, 2022 · More overlap is useful when one wanted a smoother (higher correlated) spectrogram. Note: to estimate power spectral density (PSD), window shape ...Missing: generation | Show results with:generation
  48. [48]
  49. [49]
    Scalogram Computation in Signal Analyzer - MATLAB & Simulink
    The scalogram is the absolute value of the continuous wavelet transform (CWT) of a signal, plotted as a function of time and frequency.
  50. [50]
    Spectrograms and Scalograms: visualizing signal data - Medium
    Nov 19, 2020 · A spectrogram is a visual way of representing the signal strength, or “loudness”, of a signal over time at various frequencies present in a particular waveform.
  51. [51]
    wvd - Wigner-Ville distribution and smoothed pseudo ... - MathWorks
    The Wigner-Ville distribution provides a high-resolution time-frequency representation of a signal. The distribution has applications in signal visualization, ...
  52. [52]
  53. [53]
    (PDF) Choi-Williams transform and atomic functions in digital signal ...
    Aug 7, 2025 · The Choi-Williams transform in signal analysis has various advantages ... This is an improvement over the Wigner-Ville distribution, which ...
  54. [54]
    Time–frequency analysis of nonstationary fusion plasma signals
    Oct 1, 2004 · Here, a comparison is made with real fusion plasma signals that shows the advantages of the Choi–Williams distribution over wavelets as a ...
  55. [55]
    [PDF] Speech Spectra and Spectrograms
    It follows the normal convention of having frequency (in Hertz) on the vertical axis and time on the horizontal axis. Intensity is denoted by the darkness of ...
  56. [56]
  57. [57]
    Log-Frequency Spectrogram and Chromagram
    The vertical stripes (along the frequency axis) shown by the spectrogram indicate that some of the signal's energy is spread over large parts of the spectrum.
  58. [58]
    Do You Understand How To Use Spectrograms? - Production Expert
    A linear scale spaces frequencies equally, while a logarithmic scale enlarges the lower frequencies and compresses the display of higher frequencies. Below ...
  59. [59]
    Spectrogram Display Options - Faber Acoustical
    Three scale types are available for viewing spectral magnitude data: Linear, Logarithmic, and dB. The desired scale type is determined by the selection of this ...
  60. [60]
    [PDF] Audio Engineering Society - Fulcrum Acoustic
    A spectrogram is a two-dimensional depiction of a waveform or transfer function in which frequency is depicted on one axis and time is depicted on the other.<|separator|>
  61. [61]
    Understanding the Spectrogram/Waveform display - Amazon S3
    LINEAR: Displays frequencies spread out in a uniform way. · LOGARITHMIC: this scale puts more attention on lower frequencies. · MEL: the Mel scale (derived from ...Overview# · Spectrogram Settings# · Rulers#
  62. [62]
    What is spectrogram? Definition and examples - earth.fm
    Nov 9, 2022 · A spectrogram visually represents the frequency and amplitude (the distance between the top and the bottom of a wave) of a nonstationary signal over time.
  63. [63]
    1.5.5. The Spectrogram Viewer
    The default color scheme (Gray) for the spectrogram image is grayscale with higher color intensities (darker parts) corresponding to higher amplitudes. The ...
  64. [64]
    Spectrogram View - Audacity Manual
    The Spectrogram View of an audio track provides a visual indication of how the energy in different frequency bands changes over time. The Spectrogram can ...Missing: physical | Show results with:physical<|separator|>
  65. [65]
    On the importance of color in mass spectrometry imaging - PMC - NIH
    These two disadvantages of rainbow colormaps can confuse the reader, increase the difficulty of data interpretation, and are not an accurate visualization of ...
  66. [66]
    [PDF] The Importance of Colormaps - Sci-Hub
    Aug 17, 2020 · ... & THE COLORS CHOSEN to represent data in a col- ormap have a large influence on accurate data interpretation.
  67. [67]
    Color Map Advice for Scientific Visualization - Kenneth Moreland
    This page provides color maps that you can use while using pseudocoloring of a scalar field. The color maps are organized by how and where they are best used.
  68. [68]
    [PDF] Colormaps
    • Some fields (e.g. meteorology) have long used rainbow-like colormaps. • Argument is that segments are more easily located. • Turbo post claims that hue is ...
  69. [69]
    Map Color Scheme (Spectrogram / 3D Map)
    The color map determines the spectrum of colors used for the spectrogram display. The number of colors used by a 3D Map trace is set by Color Count.Missing: visualization techniques
  70. [70]
    Spectrogram Graph - REW
    The spectrogram is like a waterfall viewed from above, with the level indicated by colour. The scale showing how colour relates to level is optionally displayed ...
  71. [71]
    A Survey of Colormaps in Visualization - PMC - PubMed Central
    In this survey, we attempt to provide readers with a comprehensive review of colormap generation techniques and provide readers a taxonomy.Missing: spectrogram | Show results with:spectrogram
  72. [72]
    MelSpectrogram - Fon.Hum.Uva.Nl.
    An object of type MelSpectrogram represents an acoustic time-frequency representation of a sound: the power spectral density P(f, t).
  73. [73]
    [PDF] Mel-Spectrogram Enhancement for Improving Both Speech Quality ...
    Feb 22, 2024 · Compared to linear-frequency spectrogram or time-domain waveform, Mel-frequency presents speech in a more compact way (but still perceptually ...
  74. [74]
  75. [75]
    [PDF] A Mel Spectrogram Enhancement Paradigm Based on CWT ... - arXiv
    Jul 9, 2024 · The proposed method enhances Mel spectrograms using CWT to improve speech clarity, as the original Mel spectrogram has fine-grained loss.
  76. [76]
    Spectral and Rhythm Features for Audio Classification with Deep ...
    Oct 9, 2024 · The main advantage of Mel-scaled spectrograms is their effectiveness in capturing relevant features of an audio signal from a human perception ...
  77. [77]
  78. [78]
    Mel-Spectrogram Enhancement for Improving Both Speech Quality ...
    Compared to linear-frequency spectrogram or time-domain waveform, Mel-frequency spectrogram presents speech in a more compact and less-detailed way (but still ...
  79. [79]
    Representing Audio — Open-Source Tools & Data for Music Source ...
    Four types of spectrograms with different scaling for function. Fig. 13 A visual comparison of the four types of spectrograms discussed in this section.
  80. [80]
  81. [81]
    mel-frequency spectrum - learnius
    The mel spectrum is a frequency representation where the frequencies are scaled to better match the human perception of sound.
  82. [82]
  83. [83]
    [PDF] The Fourier uncertainty principles - UChicago Math
    The most popular use of Fourier uncertainty principles is as a description of the natural tradeoff between the stability and measurability of a system, ...
  84. [84]
    [PDF] B3. Short Time Fourier Transform (STFT) - Faculty
    However a longer window length N is going to give more uncertainty on the time the signal changes . This leads to what is called the “uncertainty principle” in ...
  85. [85]
    [PDF] Digital Signal Processing Lecture Outline Spectral Analysis Using ...
    ▫ The window can produce spectral leakage. ▫ Side lobes of the DTFT of the window. ❑ These two are always a tradeoff. ▫ time-frequency uncertainty principle.
  86. [86]
    [PDF] Lecture 16 Limitations of the Fourier Transform: STFT
    The reason for this is that the topic of the lecture, the Short Time Fourier Transform (STFT), is named after the time-domain case. However, note that this ...
  87. [87]
    [PDF] signal reconstruction from stft magnitude: a state of the art
    Sep 23, 2011 · Signal reconstruction from STFT magnitude involves techniques without phase, using the spectrogram (squared magnitude of STFT) to estimate the  ...Missing: loss | Show results with:loss
  88. [88]
    [PDF] The Importance of Phase in Signals
    As suggested by the spectrograms and confiied by listening, intelligibility is lost in the magnitude-only reconstruction but not in the phase-only ...
  89. [89]
    [PDF] Uniqueness of STFT phase retrieval for bandlimited functions - arXiv
    Jun 18, 2020 · It is well-known that signals are uniquely determined (up to global phase) by their STFT magnitude when the underlying window has an ambiguity ...
  90. [90]
    [PDF] A Non-iterative Method for Reconstruction of Phase from STFT ... - ltfat
    In this paper, we consider a particular case of the phase retrieval problem; the reconstruction from the magnitude of the Gabor transform coefficients obtained ...Missing: loss | Show results with:loss
  91. [91]
    Non-uniqueness theory in sampled STFT phase retrieval - arXiv
    Jul 12, 2022 · In the present paper, we answer this question in the negative by providing general non-identifiability results which lead to a non-uniqueness theory for the ...
  92. [92]
  93. [93]
    [2108.06154] $L^2$-stability analysis for Gabor phase retrieval - arXiv
    Aug 13, 2021 · Abstract:We consider the problem of reconstructing the missing phase information from spectrogram data |\mathcal{G} f|, with \mathcal{G}f(x ...
  94. [94]
    [PDF] Spectral Estimation - Electrical and Computer Engineering
    Spectral estimators may be classified as either nonparametric or parametric. The nonparametric ones such as the periodogram, Blackman-Tukey, and minimum vari-.
  95. [95]
    Spectral Estimation - an overview | ScienceDirect Topics
    Spectral estimation techniques are categorized into parametric and non-parametric approaches. · Model order selection, denoted by p, is crucial in parametric ...Theoretical Foundations and... · Parametric and Non... · Applications of Spectral...
  96. [96]
    Parametric Spectral Estimation - MATLAB & Simulink - MathWorks
    Parametric spectral estimation uses Burg, Yule-Walker, covariance, and modified covariance methods based on autoregressive models.
  97. [97]
    What are some of the disadvantages of parametric PSD estimation ...
    May 10, 2019 · A non-parameteric estimation method will be more robust than a parameteric model when the underlying parametric assumptions are not true.
  98. [98]
    Comparison of nonparametric and parametric methods for time ...
    Nov 9, 2020 · The objectives of the current study were to (1) compare how the nonparametric STFT and a parametric time-varying AR model influenced the time- ...
  99. [99]
    Comparison of parametric and nonparametric spectral estimation of ...
    Comparison of parametric and nonparametric spectral estimation of ... spectrograms and to find the best parametric model suitable for this investigation.
  100. [100]
    [PDF] Nonparametric and parametric methods of spectral analysis
    The parametric methods are about rational spectrum estimation with AR time series model driven by white noise.
  101. [101]
  102. [102]
    5.9. The Griffin-Lim algorithm: Signal estimation from modified short ...
    For reconstruction, the Griffin-Lim uses the leakage between time-frequency components. ... The basic algorithm is an iteration which takes the prior estimate of ...
  103. [103]
    [PDF] Signal Estimation from Modified Short-Time Fourier Transform
    Abstract-In this paper, we present an algorithm to estimate a signal from its modified short-time Fourier transform (STFT). This algorithm.
  104. [104]
    librosa.griffinlim — librosa 0.11.0 documentation
    Approximate magnitude spectrogram inversion using the “fast” Griffin-Lim algorithm. Given a short-time Fourier transform magnitude matrix ( S ), the algorithm ...
  105. [105]
    [PDF] An Incremental Algorithm for Signal Reconstruction from Short-Time ...
    Griffin and Lim's algorithm attempts to estimate a signal that is consistent with a given spectrogram by inverting the full STFT at each iteration ...
  106. [106]
    [PDF] AN EFFICIENT ALGORITHM FOR REAL-TIME SPECTROGRAM ...
    The Real-Time Iterative Spectrogram Inversion (RTISI) algorithm for constructing real audio signals from a sequence of magnitude spectra was presented. The ...
  107. [107]
    [PDF] REAL-TIME ITERATIVE SPECTRUM INVERSION WITH LOOK-AHEAD
    Based on G&L, the RTISI algorithm [1] was developed to invert spectrograms in real-time. RTISI generates the initial phase estimation for a new frame from the ...
  108. [108]
    [PDF] Real-Time Spectrogram Inversion Using Phase Gradient Heap ... - ltfat
    In case of real signals, the global phase shift turns into the reconstructed signal sign ambiguity. In practice however, and in the real-time setting in particu ...
  109. [109]
    Single Pass Spectrogram Inversion | IEEE Conference Publication
    Abstract: We present a computationally efficient real-time algorithm for constructing time-domain audio signals from spectrograms.
  110. [110]
    Signal Reconstruction from Mel-spectrogram Based on Bi-level ...
    Jul 23, 2023 · The Griffin-Lim algorithm (GLA) has been widely used because it relies only on the redundancy of STFT and is applicable to various audio signals ...<|separator|>
  111. [111]
    [PDF] Spectrogram inversion and potential applications for hearing research
    Griffin and Lim. (1984) proved that the mean squared error of the STFT magnitude of the generated time-domain signal monotonically decreases with each iteration ...<|separator|>
  112. [112]
    Phase reconstruction from amplitude spectrograms based on ...
    We propose phase reconstruction methods from amplitude spectrograms using directional statistics deep neural networks (DNNs).
  113. [113]
    Back to Ear: Perceptually Driven High Fidelity Music Reconstruction
    Sep 18, 2025 · This approach assesses the signal across various STFT resolutions, enabling it to detect a wide range of artifacts from coarse spectral errors ...
  114. [114]
    [PDF] Real Time Spectrogram Inversion
    Luckily, Griffin and Lim developed an algorithm that accomplishes signal reconstruction from STFT-magnitude [Sue11]. However, Griffin and Lim's algorithm ...
  115. [115]
    [PDF] Spectrogram Inversion for Audio Source Separation via Consistency ...
    In this paper, we design a general framework for deriving spectrogram inversion algorithms, which is based on formulating optimization problems by combining ...
  116. [116]
    [PDF] Experimental Study on Deep Learning for Spectrum Reconstruction ...
    Mar 5, 2025 · SPECTROGRAM FIDELITY SSIM is useful in evaluating spectrogram reconstruction be- ... reconstruction artifacts and improve perceptual ...
  117. [117]
    Multi-band Rectified Flow for Audio Waveform Reconstruction - arXiv
    Jun 2, 2024 · In this study, we introduce RFWave, a cutting-edge multi-band Rectified Flow approach designed to reconstruct high-fidelity audio waveforms from ...
  118. [118]
    Introduction to Spectrograms for Speech Visualization
    A spectrogram turns an audio signal into an image, allowing us to see the distinct patterns of speech. Let's look at a spectrogram for the spoken word "speech" ...
  119. [119]
    What are spectrograms, and how are they used in speech recognition?
    A spectrogram is a visual representation of how the frequencies in a sound signal change over time. It is created by breaking an audio signal into short ...
  120. [120]
    Spectrograms in Speech AI: Visualization and Applications
    Efficient Speech Segmentation: By identifying phonemes, syllables, and words, spectrograms aid in segmenting continuous speech into discrete units. This ...<|separator|>
  121. [121]
    What Is A Spectrogram? Understanding ... - Tomarok Engineering
    Below are some common types: 1. Audio Spectrograms. One of the most familiar applications of spectrogram analysis is in audio processing. When sound is ...Missing: variants | Show results with:variants
  122. [122]
    Robust speech recognition using the modulation spectrogram
    Using the modulation spectrogram as a front end for ASR provides a significant improvement in performance on highly reverberant speech.
  123. [123]
    Audio classification using spectrograms - GeeksforGeeks
    Jul 23, 2025 · The spectrogram, or time-frequency representation of an audio signal, helps us to understand valuable insights about the audio content, like ...
  124. [124]
    A Practical Guide to Spectrogram Analysis for Audio Signal Processing
    Mar 14, 2024 · The paper summarizes spectrogram and gives practical application of spectrogram in signal processing. For analysis, finger-snapping is recorded.
  125. [125]
    Spectrogram Analysis of Animal Sound Production
    Spectrograms visualise the time-frequency content of a signal. They are commonly used to analyse animal vocalisations. Here, we analyse how far we can ...
  126. [126]
    A practical guide for generating unsupervised, spectrogram‐based ...
    Jun 3, 2022 · Hence, each vocalization is first transformed into a spectrogram, a visual representation of the frequency content of the signal over time. Then ...
  127. [127]
    Decoding phonation with artificial intelligence (DeP AI): Proof of ...
    We hypothesize that applying an image‐based neural network approach to classify voice disorders may result in similar advancements in laryngology. ... spectrogram ...
  128. [128]
    Voice pathology identification using mel spectrogram features and ...
    Jul 24, 2025 · [9] proposed a machine learning model for identifying voice disorders using a dataset of 150 voice disorders. ... phonation had just started, to ...
  129. [129]
    Heart Sounds Separation From Lung Sounds Using Independent ...
    Removing heart sounds (HS) from lung sound recordings or vice ... This method applies an ICA algorithm to the spectrograms of two simultaneous lung sound ...
  130. [130]
    The use of spectrograms improves the classification of wheezes and ...
    May 21, 2020 · A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time. In spectrograms, adventitious lung ...
  131. [131]
    Deep Learning in Heart Sound Analysis: From Techniques to ...
    These vibrations, i.e., heart sounds, are audible on the chest wall, and their graphical time series representation is known as phonocardiogram (PCG). Four ...
  132. [132]
    A spectrogram image based intelligent technique for automatic ... - NIH
    Jun 25, 2021 · This study intends to develop an efficient diagnostic framework based on time-frequency spectrogram images of EEG signals to automatically identify ASD.
  133. [133]
    Multimodal Fusion of EEG and Audio Spectrogram for Major ... - NIH
    Oct 15, 2024 · This study investigates the classification of Major Depressive Disorder (MDD) using electroencephalography (EEG) Short-Time Fourier-Transform (STFT) ...
  134. [134]
    Pattern Recognition in Vital Signs Using Spectrograms - PMC - NIH
    A.​​ Spectrograms are typically applied to audio signals as they are best suited for frequency domain analysis. They can also be applied to time-series signals ...
  135. [135]
    Robust audio surveillance using spectrogram image texture feature
    Aug 6, 2015 · A sound signal produces a unique texture which can be visualized using a spectrogram image and analyzed for automatic sound recognition.Missing: acoustic | Show results with:acoustic
  136. [136]
    Noise robust audio surveillance using reduced spectrogram image ...
    This paper builds on the technique of feature extraction from the spectrogram image of sound signals for automatic sound recognition.
  137. [137]
    THREE METHODS LISTENING, MACHINE, AND AURAL-VISUAL
    SPEAKER IDENTIFICATION METHODS FALL INTO THREE GROUPS--A LISTENING PROCESS, MACHINE ANALYSIS, AND AURAL-VISUAL COMPARISON USING SPEECH SPECTROGRAMS; EACH ...
  138. [138]
    Proposal and Implementation of Neural Network-Based Approach ...
    The Classic Spectrograms predicted the speakers with 92.96% accuracy. Mel Spectrograms were comparatively more efficient with 93.75% testing accuracy. The ...
  139. [139]
    Analyzing Noise Robustness of Cochleogram and Mel Spectrogram ...
    Dec 31, 2022 · In this study, analysis of noise robustness of Cochleogram and Mel Spectrogram features in speaker recognition using deep learning model is conducted
  140. [140]
    Machine Anomaly Detection using Sound Spectrogram Images and ...
    Aug 14, 2019 · Machine Anomaly Detection using Sound Spectrogram Images and Neural Networks. Sound and vibration analysis is a prominent tool used for ...Missing: surveillance | Show results with:surveillance
  141. [141]
    A robust audio fingerprinting method using spectrograms saliency ...
    In this paper, an audio fingerprinting method is proposed, it uses the spectrogram representation of an audio signal, combined with a global fingerprint ...
  142. [142]
    Efficient music identification using ORB descriptors of the ...
    Jul 11, 2017 · The Shazam algorithm was developed a fingerprinting technique which involved the use of the highest amplitude peaks of the spectrogram (robust ...3 Proposed Method · 4.1 Robustness · 4.2 Other Factors
  143. [143]
    Underwater Acoustic Signal Recognition Based on Salient Features
    Jan 5, 2024 · Through the Demon spectrogram line extraction method, the comprehensive time-frequency characteristics of underwater acoustic signals are ...
  144. [144]
    An Acoustic Surveillance and TinyML-Based for Detecting Illegal ...
    This study presents an innovative acoustic surveillance system utilizing TinyML to detect illegal logging in the rainforests of the Philippines.
  145. [145]
    [2104.01778] AST: Audio Spectrogram Transformer - arXiv
    Apr 5, 2021 · In this paper, we answer the question by introducing the Audio Spectrogram Transformer (AST), the first convolution-free, purely attention-based model for ...
  146. [146]
    A Survey of Deep Learning for Complex Speech Spectrograms - arXiv
    May 13, 2025 · Abstract page for arXiv paper 2505.08694: A Survey of Deep Learning for Complex Speech Spectrograms. ... spectrogram processing. As recent studies ...
  147. [147]
    [PDF] Efficient Neural and Numerical Methods for High-Quality Online ...
    Aug 17, 2025 · Recent work in online speech spectrogram inversion effectively combines Deep Learning with the Gradient Theorem to predict.
  148. [148]
    Unconditional Spectrogram Generation using Diffusion Architectures
    Oct 11, 2025 · To address this critical issue, we propose a diffusion-based generative model for synthesizing realistic and diverse spectrograms ... spectrograms ...<|control11|><|separator|>
  149. [149]
    SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms ...
    Jun 25, 2024 · We propose SpecMaskGIT, a light-weighted, efficient yet effective TTA model based on the masked generative modeling of spectrograms.
  150. [150]
    cMelGAN: An Efficient Conditional Generative Model Based on Mel ...
    May 15, 2022 · The goal for this project was to develop a genre-conditional generative model of music based on Mel spectrograms and evaluate its performance.
  151. [151]
    Boosting Diffusion Model for Spectrogram Up-sampling in Text-to ...
    Jun 7, 2024 · In this paper, we systematically investigate varied diffusion models for up sampling stage, which is the main bottleneck for streaming synthesis ...
  152. [152]
    [PDF] arXiv:2206.09396v1 [eess.AS] 19 Jun 2022
    Jun 19, 2022 · construct the augmented spectrogram by multiplying the source and filter component. Figure 1 illustrates the effect of source- filter warping with varying ...
  153. [153]
    A Simple Data Augmentation Method for Encoder–Decoder Speech ...
    Jul 28, 2022 · Figure 9 shows the original spectrogram and the augmented spectrogram when FrameAugment is applied. ... ESPnet: End-to-End Speech Processing Toolkit. arXiv ...
  154. [154]
    Hit detection in audio mixtures by means of a physics-aware Deep ...
    Feb 1, 2025 · ... in the reconstruction of the (estimated) augmented spectrogram S = H T ( S ̃ X ̃ ) . Indeed, thanks to the Hankel structure of each dictionary sub-block W S ...