Fact-checked by Grok 2 weeks ago

Spectral flatness

Spectral flatness, also known as the tonality coefficient or Wiener entropy, is a metric in digital signal processing that quantifies the uniformity of a signal's power spectral density across a frequency band, distinguishing between noise-like (flat) and tonal (peaked) characteristics.^[1] It is computed as the ratio of the geometric mean to the arithmetic mean of the power spectrum values, yielding a value between 0 and 1, where 1 represents perfect flatness akin to white noise and 0 indicates a highly tonal spectrum with concentrated energy in few frequencies.^[1] This measure was formalized by James D. Johnston in 1988 as part of perceptual models for audio coding, where it helps estimate signal tonality to optimize noise shaping and masking thresholds.^[2] In practice, spectral flatness is often expressed in decibels as 10 log₁₀ of the ratio for finer granularity, with typical ranges from -60 dB (tonal) to 0 dB (noisy), enabling its use in deriving a tonality coefficient α (ranging from 0 for noise-like to 1 for tonal) that scales perceptual masking levels—tonal signals (higher α) apply higher masking offsets (e.g., 14.5 dB) compared to noise-like ones (lower α, e.g., 5.5 dB).^[2] The computation typically involves the fast Fourier transform (FFT) to obtain the power spectrum, followed by mean calculations over critical bands or the full spectrum, making it efficient for real-time analysis.^[1] Originally developed for transform coding in audio compression, such as achieving transparent quality at 128 kbit/s, it has since been generalized for non-Gaussian processes to detect excessive structure beyond simple tonality.^[2]^[3] Beyond audio, spectral flatness finds applications in robust signal matching, where it aids identification under distortions by comparing spectral uniformity; in filter design to evaluate passband deviations; and in acoustic analysis for segmentation, such as distinguishing speech from music based on tonal content.^[4]^[5]^[6] Its perceptual relevance stems from human hearing's sensitivity to spectral structure, influencing standards like MPEG audio layers, and it remains a key feature in modern tools for audio processing and machine learning-based sound classification.^[2]^[7]

Fundamentals

Definition

Spectral flatness, also known as the tonality coefficient or Wiener entropy, is a measure in digital signal processing that assesses the uniformity of a signal's power spectral density (PSD). It quantifies the degree to which the signal's frequency content approximates the even distribution characteristic of white noise, where power is equally spread across all frequencies.^[8] This metric was first introduced by James D. Johnston in 1988 as part of developing perceptual models for audio coding, enabling the differentiation between structured and random spectral components in sound signals.^[2] At its core, spectral flatness highlights the contrast between tonal signals, such as pure sinusoids with concentrated energy at discrete frequencies leading to a peaked spectrum, and noise-like signals featuring a broad, even power distribution that mimics randomness.^[2] In audio analysis, it serves as a foundational tool for evaluating signal characteristics relevant to perception and processing.^[8]

Interpretation

Spectral flatness quantifies the uniformity of a signal's power spectral density, with values approaching 1 indicating a nearly flat spectrum typical of white noise or uncorrelated random processes, where energy is evenly distributed across frequencies.^[2] Conversely, values near 0 reflect a spectrum with energy concentrated in a limited number of frequency bins, characteristic of tonal or harmonic signals such as pure sinusoids or periodic waveforms.^[9] This distinction arises because the measure compares the geometric mean to the arithmetic mean of the power spectrum values, yielding a normalized ratio that highlights deviations from ideal noise-like behavior.^[2] In psychoacoustics, spectral flatness serves as an indicator of perceived sound quality, linking spectral characteristics to human auditory perception. Higher flatness values correspond to sounds perceived as noisier due to their broadband, unstructured nature, while lower values evoke a sense of tonality, akin to pitched or musical elements that align with harmonic structures in hearing.^[8] This relevance stems from its use in models that differentiate noise-like maskers from tonal ones in auditory masking experiments.^[10] From an information-theoretic perspective, spectral flatness is related to the Wiener entropy of the power spectrum, offering a measure of signal predictability; flatter spectra imply higher entropy and thus greater unpredictability, while peaked spectra suggest lower entropy and more deterministic structure. This connection underscores its role in assessing stochasticity in signals. In standards like MPEG-7, it functions as an audio descriptor for characterizing spectral tonality in content analysis.^[11]

Formulation

Mathematical Expression

Spectral flatness, denoted as SF, is mathematically defined as the ratio of the geometric mean to the arithmetic mean of the power spectral density (PSD) values across N frequency bins. Let x(n) for n = 0, 1, \dots, N-1 represent the PSD values. The arithmetic mean is given by \frac{1}{N} \sum_{n=0}^{N-1} x(n), while the geometric mean is \left( \prod_{n=0}^{N-1} x(n) \right)^{1/N}, which is equivalently expressed as \exp\left( \frac{1}{N} \sum_{n=0}^{N-1} \ln x(n) \right) to enhance numerical stability in computation. Thus,

SF = \frac{\exp\left( \frac{1}{N} \sum_{n=0}^{N-1} \ln x(n) \right)}{\frac{1}{N} \sum_{n=0}^{N-1} x(n)}.

This measure is equivalent to the exponential of the negative Wiener entropy normalized by the number of bins.^[12] This formulation originates from early work on linear prediction in speech analysis, where the measure quantifies spectral uniformity.^[13] The derivation follows directly from the inequality between arithmetic and geometric means, which states that the arithmetic mean is always greater than or equal to the geometric mean for positive real numbers, with equality holding if and only if all values are identical. For a constant PSD, where x(n) = c for all n and some constant c > 0, both means equal c, yielding SF = 1, indicating perfect flatness akin to white noise. Conversely, for a delta-like spectrum, such as when one x(k) > 0 and all others are zero, the geometric mean approaches zero due to the product including zero terms (or requiring careful handling of \ln 0, typically by excluding zeros or using limits), while the arithmetic mean remains positive, resulting in SF \to 0, reflecting high tonality or peaked energy concentration. In sub-band analysis, the formula is applied analogously to the PSD values restricted to a specific frequency partition, allowing localized assessment of flatness without altering the core expression. This measure inherently normalizes to the interval [0, 1], providing a bounded indicator of spectral uniformity.

Normalization and Units

Spectral flatness is typically expressed on a linear scale, where values range strictly between 0 and 1, with 1 indicating a perfectly flat spectrum akin to white noise and values approaching 0 signifying a highly tonal or peaked spectrum.^[14] In audio engineering, it is often converted to a decibel (dB) scale for perceptual analysis, defined as

SF_{\text{dB}} = 10 \log_{10} (SF),

where SF is the linear spectral flatness value; this yields 0 dB for perfect flatness and approaches -\infty dB for highly tonal signals.^[15] To handle numerical issues such as zero-valued frequency bins that could lead to undefined logarithms in the geometric mean computation, a small positive constant (e.g., $10^{-10}) is commonly added to the power spectrum values before calculation.^[16] For multi-resolution analysis, spectral flatness can be normalized within sub-bands by computing the measure separately for each band—using the ratio of the band's geometric to arithmetic mean—and then averaging across bands to obtain an overall value, enabling localized assessments of spectral uniformity.^[17]

Properties

Range and Bounds

Spectral flatness (SF), also known as the spectral flatness measure, is bounded in the linear scale between 0 and 1, with values approaching 0 for highly tonal signals and 1 for perfectly noise-like signals.^[3] The upper bound of exactly 1 is attained when the power spectral density (PSD) is uniform across all frequencies, as in the case of white noise.^[18] Conversely, the lower bound of 0 is reached in the limiting case of an ideal Dirac delta function in the frequency domain, representing energy concentrated at a single frequency.^[19] In decibel scale, defined as

\text{SF}_\text{dB} = 10 \log_{10} (\text{SF}),

the measure ranges from -\infty dB, corresponding to the tonal extreme, to 0 dB for white noise.^[20] This logarithmic representation is commonly used for practical reporting due to its alignment with perceptual scales in audio processing.^[20] SF demonstrates monotonicity with respect to spectral concentration: it decreases as peaks in the PSD sharpen, indicating a transition from noise-like to more tonal characteristics.^[19] The measure is invariant to multiplicative scaling of the PSD, since it relies on the ratio of the geometric mean to the arithmetic mean, but it is sensitive to frequency bin resolution in discrete Fourier transform-based implementations, where insufficient resolution can introduce empty bins or distort peak representations.^[21]

Relation to Other Measures

Spectral flatness, also known as the Wiener entropy, is an information-theoretic measure that assesses the randomness or predictability of a signal's power spectral density (PSD). It is closely related to the Shannon entropy of the normalized PSD, where H = -\sum_k p_k \log_2 p_k and p_k represents the probability distribution of power across frequency bins; both increase with greater spectral uniformity, with maximum values corresponding to a flat, white-noise-like spectrum. This connection highlights spectral flatness's role in quantifying the informational uniformity of spectral energy distribution.^[8] In contrast to measures like spectral centroid and spectral flux, spectral flatness provides a complementary perspective on spectral characteristics by emphasizing flatness over location or dynamics. The spectral centroid computes the weighted average frequency, often interpreted as the "center of mass" or perceptual brightness of the spectrum, focusing on the distribution's central tendency rather than its variance in uniformity. Spectral flux, meanwhile, quantifies the magnitude of changes between consecutive spectral frames, capturing temporal evolution and onset detection in signals. These metrics together offer a multifaceted analysis of tonality—spectral flatness highlights noise-like versus harmonic content through global evenness, whereas centroid and flux address positional and transitional aspects, respectively—enabling more robust signal classification in audio processing tasks.^[22] Spectral flatness also relates to broader information-theoretic constructs, particularly the dual total correlation, which measures multivariate dependencies among frequency components. As established by Dubnov (2004), for Gaussian processes, spectral flatness equates to the dual total correlation (or multi-information) of the spectral variables, reflecting the total redundancy or structure imposed by linear dependencies in the frequency domain. This equivalence extends the measure's interpretive power, portraying deviations from flatness as indicators of correlated, non-independent frequency behaviors, and has been generalized to non-Gaussian linear processes to account for higher-order dependencies. Such ties position spectral flatness as a bridge between classical signal processing and multivariate information theory.^[23]

Computation

Estimation Techniques

To estimate spectral flatness from a discrete-time signal, the power spectral density (PSD) is first obtained as a prerequisite, typically through the discrete Fourier transform (DFT) applied to windowed segments of the signal or, for time-varying analysis, via the short-time Fourier transform (STFT). The STFT involves dividing the signal into short, overlapping frames to capture local spectral characteristics, with common frame lengths of 20 to 50 milliseconds for audio signals sampled at rates like 22.05 kHz or 44.1 kHz.^[24]^[16]^[6] The computation then follows these steps on each frame or spectral slice. First, a window function such as the Hann or Hamming window is applied to the frame to mitigate spectral leakage caused by finite-length segmentation. Overlaps between frames, often 50% or more (e.g., 10 ms shift for 20 ms frames), ensure smooth transitions and reduce artifacts. Next, the DFT or fast Fourier transform (FFT) is computed on the windowed frame, with the magnitude squared yielding the periodogram-based PSD estimate; typical FFT sizes range from 512 to 2048 points to balance resolution and efficiency, providing frequency bins spaced at 10-50 Hz for standard audio sampling rates.^[16]^[6]^[24] The arithmetic mean of the PSD values is then calculated across the relevant frequency bins, often limited to the audible range (e.g., 500 Hz to 4 kHz) to focus on perceptually important content. For the geometric mean, the logarithms of the PSD values are averaged, and the result is exponentiated; to prevent undefined logarithms from zero-valued bins, a small positive constant such as $10^{-10} is added to all PSD values for numerical stability. Finally, spectral flatness is obtained as the ratio of this geometric mean to the arithmetic mean, yielding a value between 0 and 1 that quantifies spectral uniformity.^[16]^[24]^[6]

Practical Implementation

In practical implementations of spectral flatness computation, numerical stability is a key concern due to the involvement of logarithmic operations on power spectral density (PSD) values, which can include zeros or near-zeros leading to undefined or infinite results. To mitigate division by zero or log(0) errors, a common approach is to apply a small positive threshold, such as \epsilon = 10^{-10}, by thresholding the PSD magnitudes before computing means; for instance, replacing values below this threshold ensures finite logarithms without significantly altering the measure for typical audio signals.^[25] Software libraries facilitate efficient PSD estimation and flatness calculation. In Python, NumPy provides vectorized array operations for mean computations, while Librosa offers a dedicated spectral_flatness function that internally handles short-time Fourier transform (STFT) via FFT, applies the necessary thresholding for stability, and returns frame-wise flatness values, making it suitable for audio analysis pipelines.^[16] Similarly, MATLAB's Signal Processing Toolbox includes the spectralFlatness function, which computes the measure directly from signals or spectrograms generated by spectrogram, incorporating built-in handling for edge cases in PSD estimation.^[1] For real-time or large-scale processing, efficiency optimizations are essential. The overlap-add (OLA) method in STFT implementations allows continuous analysis by overlapping frames (typically 50-75% overlap with Hann windows), enabling low-latency updates of spectral flatness without full signal buffering, as used in audio streaming applications.^[26] Vectorized FFT routines further accelerate PSD computation; for example, the FFTPACK library, a Fortran package for fast Fourier transforms, supports efficient real and complex transforms and is integrated into tools like SciPy for high-performance numerical arrays.^[27] Non-stationary signals, common in audio, require averaging spectral flatness across multiple STFT frames to capture temporal variations robustly, such as in long-term spectral flatness measures that aggregate over extended windows for stable estimates in voice activity detection.^[24]

Applications

Audio and Music Processing

In perceptual audio coding, spectral flatness plays a key role in estimating the tonality of audio signals to model psychoacoustic masking thresholds and optimize bit allocation. Introduced by Johnston in 1988, the spectral flatness measure (SFM) quantifies how noise-like or tonal a signal is by comparing the geometric and arithmetic means of its power spectral density, enabling efficient compression in standards like MP3 while minimizing perceptual distortion.^[2] This approach was foundational for subsequent codecs, including Advanced Audio Coding (AAC), where SFM informs noise-to-mask ratio calculations in filter bank-based psychoacoustic models to allocate fewer bits to noise-like components. The MPEG-7 multimedia content description standard incorporates AudioSpectralFlatness as a low-level audio descriptor to characterize the flatness of a signal's short-term power spectrum density function, facilitating content-based retrieval and similarity matching in music databases.^[28] This descriptor, computed over frequency bands, supports applications like audio indexing by providing a normalized measure (0 to 1) of spectral randomness, aiding in the differentiation of tonal music from percussive or noisy segments.^[29] Recent machine learning applications from 2020 to 2025 have leveraged spectral flatness as a feature in audio processing tasks. In singing voice detection, it contributes to spectral descriptors that enhance classification accuracy in polyphonic music, as seen in surveys and models distinguishing vocals from instruments via tonality cues in audio scene recognition.^[30] For speech enhancement, spectral flatness aids noise profiling in machine learning frameworks by assessing the noise-like quality of interfering signals, improving denoising performance in real-time systems.^[31] Additionally, a 2025 approach encodes audio features into images for voice characteristic representation, incorporating spectral flatness in the red channel to capture tonal versus noisy attributes, enabling better multimodal analysis of speaker traits.^[32]

Biomedical and Other Fields

In biomedical signal processing, spectral flatness serves as a key feature for analyzing electroencephalogram (EEG) signals to detect epileptic seizures, where lower values indicate tonal or structured activity typical of ictal states, contrasting with higher flatness in noisy interictal periods. For instance, in noise-robust seizure detection algorithms, spectral flatness is combined with bandwidth and entropy measures to quantify signal irregularities under Gaussian noise, achieving improved classification accuracy on benchmark EEG datasets.^[33] Similarly, subband spectral analysis incorporating spectral flatness has been employed in automated EEG seizure detection systems, enhancing computational efficiency by distinguishing seizure onsets through flatness variations across frequency bands.^[34] In ethology, particularly for assessing birdsong complexity, spectral flatness quantifies the tonality-to-noisiness gradient in avian vocalizations, with values near 0 indicating pure tones and higher values reflecting noisy, complex spectra associated with behavioral diversity. This measure has been integrated into bioacoustic analyses to evaluate phylogenetic signals in vocal learning species, revealing how spectral flatness correlates with evolutionary adaptations in song structure.^[35] In scoping reviews of bioacoustics for animal behavior, spectral flatness is listed among standard metrics for measuring vocal features in animals, including birds, aiding studies on communication and environmental influences without requiring exhaustive feature sets.^[36] Beyond biomedicine, spectral flatness contributes to psychoacoustic models of tonality perception, where partial variants like the partial spectral flatness measure (PSFM) estimate perceived tonal content by assessing spectrum uniformity, outperforming traditional metrics in perceptual audio coding tasks. Perceptual evaluations confirm that spectral flatness variants predict listener judgments of spectral variance, with higher flatness linked to noise-like sensations in controlled listening experiments.^[8] In emerging quantum signal processing, quantum-adapted spectral flatness, computed via quantum Fourier transforms, enhances audio steganalysis by detecting hidden embeddings in quantum-secure channels, integrated with neural networks for improved detection rates in machine learning frameworks.^[37] In virtual reality (VR) applications, spectral flatness informs machine learning-based spatial audio rendering by serving as an input feature for estimating directional parameters, enabling realistic sound localization in immersive environments as reviewed in data-driven audio processing surveys. This application underscores its role in perceptual audio synthesis for VR, where flatness helps balance tonal and diffuse components in binaural rendering.^[38]

References

[1]
spectralFlatness - Spectral flatness for signals and spectrograms
flatness = spectralFlatness(x,f) returns the spectral flatness of the signal, x, over time. How the function interprets x depends on the shape of f.Missing: definition | Show results with:definition<|control11|><|separator|>
[2]
[PDF] Transform coding of audio signals using perceptual noise criteria
Johnston, “A method of estimating the perceptual entropy of an audio signal,” submitted to ICASSP '88. [7] -, “Digital coding of musical sound-Some statistics ...
[3]
[PDF] Generalization of Spectral Flatness Measure for Non-Gaussian ...
The Spectral Flatness Measure (SFM) quantifies how much tone-like a sound is, and is equivalent to the rate of growth of multi-information (MIR) for Gaussian ...
[4]
Robust matching of audio signals using spectral flatness features
This paper discusses the problem of robust identification of audio signals by matching them to a known reference. In order to perform well under realworld ...
[5]
Spectral Flatness - Crystal Instruments
Aug 28, 2021 · Spectral flatness is a way to quantify the deviation of a passband from being perfectly flat across the frequency spectrum.
[6]
[PDF] Using a Spectral Flatness Based Feature for Audio Segmentation ...
The Spectral Flatness Measure. (SFM) and the corresponding tonality coefficient (Johnston 1988) are used to quantify the tonal quality, i.e. how much tone ...
[7]
Note on measures for spectral flatness | Electronics Letters
Spectral flatness is a feature of acoustic signals that has been useful in many audio signal processing applications. The traditional definition of spectral ...
[8]
Perceptual evaluation of measures of spectral variance
Jun 5, 2018 · Some of the common measures of whiteness include the Wiener Entropy or Spectral Flatness Measure (SFM),5 Ljung-Box test,6 and Drouiche Test.7.
[9]
Spectral Flatness - an overview | ScienceDirect Topics
It serves to distinguish between noise-like signals, which exhibit a flat spectrum, and tone-like signals, which have a peaked spectrum. 1. Formally, spectral ...
[10]
[PDF] a psychoacoustic model with partial spectral flatness measure for ...
[10] J.D. Johnston, “Estimation of perceptual entropy using noise masking criteria,” in Acoustics, Speech, and Sig- nal Processing, 1988. ICASSP-88., 1988 ...<|control11|><|separator|>
[11]
Note on measures for spectral flatness - ResearchGate
Aug 6, 2025 · This is confirmed by the values of spectral flatness determined using Wiener entropy described by the following formula [37] : ...
[12]
[PDF] Content-based Identification of Audio Material Using MPEG-7 Low ...
The so-called SFM (Spectral Flatness Measure) [16] is a function which is related to the tonality aspect of the audio signal and can therefore be used as a ...
[13]
https://ieeexplore.ieee.org/document/1162572
[14]
[PDF] Linear prediction of audio signals - ISCA Archive
SFMR(dB) = 10 log10 exp 1. M. ˜ fs f=0ln |R(f)|2. 1. M. ˜ fs f=0 |R(f)|2. , (2) where ... sacrifices part of its zeros to achieve spectral flatness in the high-.
[15]
librosa.feature.spectral_flatness — librosa 0.11.0 documentation
A high spectral flatness (closer to 1.0) indicates the ... spectral flatness for each frame. The returned value is in [0, 1] and often converted to dB scale.
[16]
[PDF] Feature Vectors
Spectral flatness. • It reflects the flatness properties of the power ... the average of the sub-band flatness values. SFb = " kb. X(kb ). 2. Nb. 1. Nb. X(kb ).<|control11|><|separator|>
[17]
[PDF] A Segmental Spectral Flatness Measure for Harmonic-Percussive ...
Knowing if an audio signal originates from a har- monic or a percussive source can be very helpful for fur- ther processing in a lot of audio signal processing ...Missing: origin | Show results with:origin<|control11|><|separator|>
[18]
[PDF] Modified Spectral Flatness Approach for Robust Train Localisation
[5] N. Madhu “Note on measures for spectral flatness” in ELECTRONICS LET-. TERS 5th November 2009 Vol. 45 No. 23. [6] Localisation Working Group (LWG) ...
[19]
[PDF] Speech Enhancement Using Spectral Flatness Measure Based ...
[6]. GRAY, A.H., and MARKEL, J.D. “A spectral-flatness measure for studying the autocorrelation method of linear prediction of speech analysis,” IEEE ...
[20]
https://www.iosrjournals.org/iosr-jvlsi/papers/vol7-issue2/Version-1/F0702014146.pdf
[21]
https://digital-library.theiet.org/doi/pdf/10.1049/el.2009.1977
[22]
Generalization of spectral flatness measure for non-Gaussian linear processes
**Summary:**
[23]
Efficient voice activity detection algorithm using long-term spectral ...
Jul 16, 2013 · A low spectral flatness indicates that the spectral power is less uniform in frequency structure, and this would typically sound like speech.
[24]
Source code for librosa.feature.spectral
Compute the spectral centroid. Each frame of a magnitude spectrogram is normalized and treated as a distribution over frequency bins.Missing: practical | Show results with:practical
[25]
Overlap-Add STFT Processing - Stanford CCRMA
This chapter discusses use of the Short-Time Fourier Transform (STFT) to implement linear filtering in the frequency domain.
[26]
FFTPACK - NetLib.org
FFTPACK is a package of Fortran subprograms for the fast Fourier transform of periodic and other symmetric sequences. It includes complex, real, sine, cosine, ...
[27]
[DOC] ISO/IEC JTC 1/SC 29 N
Description of the audio spectral flatness of the audio signal. loEdge ... MPEG-7 description. There are additional descriptive tags such as key that ...
[28]
(PDF) Streaming Audio Using MPEG–7 Audio Spectrum Envelope to ...
Apr 13, 2017 · 60-74. [19] MPEG–7. MPEG 7 Library: A Complete API to Manipulate ... Audio spectral flatness (the flatness properties of the short-term ...
[29]
Singing Voice Detection: A Survey - PMC - NIH
Jan 12, 2022 · Singing voice detection or vocal detection is a classification task that ... spectral flatness) as well as special features such as fluctograms [15].
[30]
Noise profiling for speech enhancement employing machine ...
Dec 16, 2022 · Noise profiling for speech enhancement employing machine learning models ... Spectral flatness is a measure of an audio sound spec- trum that provides ...
[31]
Audio-to-Image Encoding for Improved Voice Characteristic ... - arXiv
Mar 7, 2025 · ... spectral flatness, spectral contrast, chroma, and harmonic-to-noise ratio), and the blue channel comprises subframes representing these ...
[32]
Investigating the effects of Gaussian noise on epileptic seizure ...
Investigating the effects of Gaussian noise on epileptic seizure detection: The role of spectral flatness, bandwidth, and entropy ... frequency resolution, and ...
[33]
A computationally efficient automated seizure detection method ...
... EEG segments into sub-bands, a total of four different spectral features including spectral centroid, spectral flatness, spectral spread, and spectral slope ...
[34]
Phylogenetic signal in the vocalizations of vocal learning and vocal ...
Spectral flatness indicates the tonality versus noisiness of a signal, on a gradient from 0 for white noise (equal energy at all frequencies) to 1.0 for a ...<|separator|>
[35]
A scoping review of the use of bioacoustics to assess various ...
Spectral flatness of a sound, calculated as the ratio of a power spectrum's geometric mean to its arithmetic mean measured on a logarithmic scale (higher ...
[36]
Towards quantum audio steganalysis using synergy of quantum ...
The statistical analysis of these features includes the quantum spectral center (QSC), quantum spectral bandwidth (QSB), quantum spectral flatness measurement ( ...
[37]
An overview of machine learning and other data-based methods for ...
May 16, 2022 · The input to the network were well-known hand-crafted audio features such as spectral centroid, spectral flatness, or spectral flux. More ...