Fact-checked by Grok 2 weeks ago

Mel scale

The Mel scale is a perceptual scale of pitches that approximates the nonlinear way humans perceive differences in sound frequency, such that equal intervals on the scale correspond to subjectively equal steps in height. Developed through psychophysical experiments, it maps physical frequencies in hertz (Hz) to a unit called the mel, with a reference point of 1000 mels assigned to a 1000 Hz tone at moderate loudness. Unlike linear frequency scales, the Mel scale is roughly linear for low frequencies (below about 1000 Hz) and logarithmic for higher frequencies, reflecting the human auditory system's greater resolution at lower es and compression at higher ones. The scale originated in 1937 from experiments by S. S. Stevens, J. Volkmann, and E. B. Newman, who used the method of —asking listeners to identify tones that halved the perceived interval between two reference tones—to construct a subjective measure of magnitude across from 200 Hz to 8000 Hz. Their work, published in the Journal of the Acoustical Society of America, revealed that perceived differences align with the of just-noticeable differences ( thresholds) in , suggesting a uniform psychological scaling along the basilar membrane in the . Although the original data lacked a simple closed-form equation, later approximations facilitated practical use; a widely adopted formula to convert f (in Hz) to mels m is
m = 2595 \log_{10} \left(1 + \frac{f}{700}\right),
with the inverse
f = 700 \left(10^{m/2595} - 1\right).
This approximation, refined in subsequent psychoacoustic studies, ensures that 1000 mels corresponds closely to 1000 Hz and captures the scale's quasi-logarithmic behavior.
In and , the Mel scale is foundational for features like Mel-frequency cepstral coefficients (MFCCs), introduced by and Mermelstein in 1980 as compact representations of speech spectra that mimic human hearing. MFCCs apply Mel-scaled filter banks to short-time Fourier transforms of audio, followed by logarithm and , enabling robust applications in automatic , speaker identification, , and sound systems. By prioritizing perceptually salient regions, the scale improves model performance in tasks where linear frequency representations fall short, such as distinguishing vowels or detecting audio anomalies.

Perceptual Foundations

Human Pitch Perception

Pitch is a perceptual attribute of sound that arises from the auditory system's processing of acoustic stimuli, distinct from the physical property of frequency, which measures the number of vibrations per second in hertz (Hz). Human listeners perceive pitch changes nonlinearly with respect to frequency, such that equal intervals in perceived pitch correspond to multiplicative rather than additive changes in frequency. For instance, the octave interval between 100 Hz and 200 Hz is readily distinguishable as a major perceptual shift, whereas a similar absolute difference of 100 Hz at higher frequencies, such as between 9000 Hz and 9100 Hz, produces a much subtler pitch variation relative to the overall scale. Psychophysical experiments have demonstrated this logarithmic-like perception of pitch for frequencies above approximately 500 Hz, where the just noticeable difference (JND) in frequency— the smallest detectable change—scales proportionally to the base frequency, aligning with Weber's law in audition. According to Weber's law, the JND (Δf) is a constant fraction (k) of the stimulus frequency (f), expressed as Δf / f = k, with k around 0.006 for frequencies above 1 kHz at moderate sensation levels. Below 500 Hz, the JND remains relatively constant at about 3 Hz, indicating higher absolute sensitivity in the low-frequency range. This pattern reflects the Weber-Fechner law's application to auditory pitch, where perceived pitch magnitude grows logarithmically with frequency, as evidenced in discrimination tasks using pure tones. Seminal measurements by Wier, Jesteadt, and Green confirmed these thresholds through adaptive forced-choice procedures, showing that frequency discrimination improves relatively at higher frequencies but requires larger absolute changes to achieve equivalent perceptual differences. The basilar membrane in the plays a central role in this frequency-to-place mapping, exhibiting tonotopic organization where high frequencies stimulate the base and low frequencies the apex, resulting in a nonlinear transformation of frequency into spatial excitation patterns. This place coding contributes to the perceptual nonlinearity, as the membrane's mechanical properties cause broader displacement envelopes at low frequencies, enhancing resolution there. , frequency ranges of about 100-200 Hz width that increase with , further define the auditory filters underlying perception; sounds within the same critical band interact strongly, limiting independent resolution and influencing discrimination. Zwicker's foundational work established these bands through masking experiments, revealing that perceptual judgments depend on integrated activity across these cochlear filters rather than precise frequency tuning alone.

Motivation for the Mel Scale

auditory perception of does not align linearly with physical in hertz (Hz), as equal intervals produce unequal perceived differences. For example, a 100 Hz increase from 100 Hz to 200 Hz is heard as a substantially larger shift than the same 100 Hz increase from 3000 Hz to 3100 Hz, reflecting the compressed sensitivity of the at higher frequencies. This nonlinearity stems from the cochlear mechanics, where low frequencies elicit broader neural activation than high ones, distorting linear scales for perceptual tasks. To rectify this, the mel scale introduces a perceptual unit called the "mel," designed such that equal intervals in correspond to equal subjective distances as judged by listeners. It is anchored by defining a 1000 Hz tone at 40 dB level (SPL) as exactly 1000 , providing a reference for scaling other frequencies. The scale's core objective is to remap physical frequencies onto a perceptually uniform domain, enabling precise quantification of sensations for psychoacoustic research and designs like audio systems. This transformation supports applications where perceptual equivalence must align with physical measurements, such as in speech and hearing models.

Mathematical Definition

Standard Formula

The standard formula for converting an acoustic frequency f (in hertz) to its corresponding value m on the Mel scale (in mels) is given by m = 2595 \log_{10}\left(1 + \frac{f}{700}\right). This expression approximates the nonlinear relationship between physical and perceived in human audition, exhibiting approximately linear behavior for low frequencies below around 1000 Hz—where \log_{10}(1 + x) \approx x / \ln(10) for small x, yielding m \approx (2595 / 700 / \ln(10)) f \approx 1.61 f—and transitioning to logarithmic scaling at higher frequencies, which aligns with the compressive nature of perception in that range. The inverse formula, which maps a Mel value back to frequency, is f = 700 \left(10^{m / 2595} - 1\right). This bidirectional mapping facilitates applications requiring perceptual frequency warping while preserving the scale's empirical foundations. The constants in the formula arise from a least-squares fitting process applied to psychophysical on pitch judgments, ensuring close alignment with listener-reported equal-percept intervals; specifically, the knee point at Hz reflects the frequency where auditory begins to exhibit logarithmic , as observed in experiments using and equisection methods on from 20 Hz to several kHz. The prefactor 2595 normalizes the scale such that m = 1000 at f = 1000 Hz, establishing a convenient reference tied to the original 's anchoring at a 1000-Hz judged as 1000 mels. This particular approximation, among several possible fits to the , has become the most widely adopted due to its simplicity and accuracy across the audible range.

Approximations and Implementations

Practical approximations of the Mel scale for computational use often adopt a structure, applying a linear transformation for frequencies below 1000 Hz with spacing of approximately 66.7 Hz per mel (corresponding to a of ≈0.015 mels/Hz), and a logarithmic transformation above this to capture the nonlinear perception of higher pitches. This design, as in the Slaney-style implementation, enhances efficiency in by avoiding complex continuous functions while maintaining perceptual fidelity, though the low-frequency scaling differs from the psychophysical of ≈1.61 mels/Hz. A widely adopted logarithmic , derived from empirical to psychophysical data and equivalent to the standard formula, is given by m = 1127 \ln\left(1 + \frac{f}{700}\right), where f is the frequency in Hz and m is the corresponding Mel value; this formula uses the natural logarithm and approximates the linear behavior at low frequencies since \ln(1 + x) \approx x for small x, yielding m \approx 1.61 f. In implementations, the Mel scale is typically realized through a filter bank consisting of overlapping triangular filters centered at frequencies spaced uniformly in the Mel domain. These filters, often 20 to 40 in number, weight the power spectrum to produce Mel-frequency coefficients, as introduced in the seminal MFCC framework. The lower edge of the filter bank starts near 0 Hz, while the upper limit is set to approximately 8000 Hz for standard speech processing (corresponding to a 16 kHz sampling rate) or up to 11000 Hz for wider-band audio, ensuring the analysis stays within the Nyquist limit of half the sampling rate. To handle varying sampling rates, implementations normalize the frequency range by scaling the maximum frequency proportionally (e.g., f_{\max} = 0.5 \times sr, where sr is the sample rate), and adjust filter bandwidths accordingly for consistent perceptual coverage. A concrete example appears in Python's librosa library, where the mel_frequencies function computes Mel-spaced center frequencies using Slaney's piecewise approximation: linear below 1000 Hz (with ≈66.7 Hz per ) and logarithmic above, with parameters tuned to replicate the Auditory Toolbox behavior for 128 bins spanning 0 to 8000 Hz by default. This facilitates efficient computation of Mel spectrograms via librosa.feature.melspectrogram. For illustration, for the common logarithmic approximation is:
import math

def hz_to_mel(f):
    return 1127 * math.[log](/page/Log)(1 + f / 700)

def mel_to_hz(m):
    return 700 * (math.[exp](/page/Exp)(m / 1127) - 1)
Such functions are inverted for generating centers and can be adapted for Slaney-style variants using the library's implementation, which applies linear spacing below 1000 Hz rather than m = f.

Historical Development

Early Experiments

The foundational empirical basis for the Mel scale emerged from psychophysical experiments in the and , focusing on how humans perceive equal intervals across frequencies. In 1937, Stevens, Volkmann, and Newman conducted a where five observers fractionated tones at 10 different frequencies to determine the "half-value" of es, with held constant at 60 above threshold. This method of bisection involved listeners judging tones that bisected the perceived interval between a standard tone and silence or a low-frequency reference, establishing an initial subjective scale in units later termed mels, with a 1000 Hz tone arbitrarily assigned 1000 mels. The experiment covered frequencies from approximately 125 Hz to 12,000 Hz, yielding early mel values up to around 3000 mels for higher es, and revealed that perceived intervals, such as octaves, expand in subjective size at higher frequencies. During the 1940s, wartime research at the Harvard Psycho-Acoustic Laboratory, established in to enhance communication systems for military applications, expanded the dataset underlying the Mel scale. These efforts addressed challenges in quality assessment and in noisy environments, requiring detailed mappings of to improve signal intelligibility. Building on the 1937 work, a revision incorporated additional psychophysical data from equisection tasks, where listeners divided intervals into equal perceptual segments, refining the scale across a broader range of intensities and frequencies. Key methods in these early studies included matching, where observers adjusted variable tones to match the perceived of standards, and magnitude estimation, in which listeners assigned numerical values to the subjective of differences. and equisection judgments provided consistent data points, such as equating 1000 Hz at 40 to exactly 1000 mels, anchoring the scale to a perceptually uniform reference. These approaches prioritized direct listener reports to quantify nonlinear perception, laying the groundwork for the Mel scale's empirical validity without relying on musical training.

Key Contributors and Publications

The Mel scale was primarily developed by psychologist Stanley Smith Stevens, who led the foundational research on perceptual scaling of . Stevens, along with collaborators John Volkmann and Edwin B. Newman, introduced the scale in their seminal paper published in the Journal of the Acoustical Society of America, where they proposed a unit called the "" to quantify subjective pitch intervals based on listener judgments. The unit was named "" after the word "melody." This work established the scale's core empirical basis, defining one mel as one-thousandth of the pitch span from 0 to 1000 Hz at a 1000 Hz reference tone. In the early 1940s, Stevens further refined his theoretical framework for measurement scales, including perceptual ones like the , in his influential 1946 paper in , which categorized scales into nominal, ordinal, , and types and emphasized scales for sensory magnitudes such as . A key practical refinement to the mel scale itself came in 1940 through Stevens and Volkmann's collaborative paper in The American Journal of Psychology, which revised the original formulation to better align with frequency-to-pitch mappings across a wider auditory range, incorporating adjustments for low-frequency tones. During the , Stevens continued to refine psychophysical methods applicable to the mel scale through his broader work on sensory scaling, including direct magnitude estimation techniques that reinforced the scale's ratio properties. By the , the mel scale had become integrated into emerging models of auditory , with Stevens and contemporaries citing it in publications within the Journal of the Acoustical Society of America to link to physiological and computational auditory processes. These efforts by Stevens, Volkmann, and Newman solidified the mel scale as a standard tool in .

Alternative Formulations

Early psychophysical studies on perception, such as those by S.S. Stevens and colleagues in the 1930s and 1940s, showed that direct magnitude estimation of height follows a power-law relation m \propto f^{0.3} to f^{0.4}, reflecting compressive nonlinearity. However, the mel scale was constructed using interval methods, leading to a quasi-logarithmic form rather than power-law. A widely used approximation for the mel scale, proposed by Douglas in his 1987 book Speech Communication: Human and Machine, is the formula
m = 2595 \log_{10} \left(1 + \frac{f}{700}\right),
which applies across the frequency range and ensures 1000 mels at 1000 Hz. This logarithmic mapping captures the scale's behavior, linear at low frequencies and compressive at high ones, based on fits to historical discrimination data.
Another common variant, often used in computational auditory models like those in MATLAB's Audio Toolbox, is the Slaney formulation (1998):
m = 1127 \ln \left(1 + \frac{f}{700}\right),
equivalent to the form but using for in implementations. These approximations facilitate applications in , differing slightly in high-frequency scaling due to fitting choices. Other variants include adjustments for specific listener data or cochlear models, such as inverse mappings for low-frequency emphasis.
These formulations highlight the mel scale's evolution toward precise, computationally efficient models prioritizing perceptual linearity.

Comparisons to Other Perceptual Scales

The , introduced by Eberhard Zwicker in 1961, models the audible range into approximately 24-25 critical bands, each a Bark unit, based on psychoacoustic masking experiments. Sounds within the same band interact strongly. Unlike the mel scale's focus on intervals, the Bark scale is more linear at high frequencies, suiting spectral masking and loudness models. The conversion from f in Hz to Bark z uses:
z = 13 \arctan(0.00076 f) + 3.5 \arctan\left(\left(\frac{f}{7500}\right)^2\right).
The equivalent rectangular bandwidth (ERB) scale, developed by Brian R. Glasberg and Brian C. J. Moore in 1990, approximates cochlear filter bandwidths from notched-noise masking, spanning ~17-20 ERBs. The bandwidth at center frequency f is
\text{ERB}(f) = 24.7 + 0.108 f,
with narrower filters at low frequencies. The ERB-rate (spectral position) is often approximated as
\text{ERB-rate}(f) = \frac{21.4 \ln \left( \frac{0.108 f + 24.7}{24.7} \right)}{0.108},
but common implementations integrate the inverse bandwidth. It emphasizes frequency selectivity, differing from Bark's discrete bands.
These scales nonlinearly warp frequency: expanding lows and compressing highs. Mel prioritizes pitch equality, Bark critical bands for integration, ERB filter bandwidths for resolution. Bark and ERB align on bandwidth phenomena, while mel suits logarithmic pitch tasks. Example mappings using standard formulas (Mel: m(f) = 2595 \log_{10}(1 + f/700), Bark as above, ERB-rate via Glasberg-Moore approximation yielding ~9.3 at 1000 Hz):
Frequency (Hz)Mel ScaleBark ScaleERB-rate Scale
1001501.02.8
100010008.59.3
4000214617.019.5

Applications

Speech and Audio Processing

The Mel scale plays a central role in speech and audio processing through its integration into Mel-frequency cepstral coefficients (MFCCs), which extract perceptually relevant features from speech signals for tasks like and analysis. MFCCs approximate the auditory system's nonlinear , enabling compact representations that capture essential spectral envelopes of . The MFCC extraction process begins with preprocessing the speech signal, including pre-emphasis to boost higher frequencies and segmentation into short overlapping frames (typically 20-30 ms). Each frame undergoes windowing (e.g., Hamming) followed by a (FFT) to compute the power spectrum. This spectrum is then filtered using 20-40 triangular bandpass filters spaced linearly on the Mel scale, which maps linear to perceptual pitch via the approximation m = 2595 \log_{10}(1 + f/700), where f is in Hz. The filter outputs are logarithmically compressed to model human perception, and a (DCT) is applied to obtain the cepstral coefficients, decorrelating the log-energies and yielding low-order features (usually the first 12-13) that represent vocal tract resonances. In automatic (ASR), MFCCs serve as primary inputs to acoustic models, enhancing phonetic discrimination in systems processing continuous speech. They are integral to commercial ASR implementations, supporting real-time transcription by providing robust invariance to variations in speaking rate and noise. For speaker identification, MFCCs encode unique timbral and patterns, enabling systems to verify identities with accuracies often exceeding 90% on clean data using classifiers like Gaussian mixture models. In detection from , MFCCs highlight prosodic cues like modulation and spectral tilt, achieving recognition rates around 80% for basic emotions (e.g., happy, sad) across datasets. The Mel scale's advantages stem from its emulation of cochlear filtering, where frequency resolution is finer at low frequencies and broader at high ones, aligning features with auditory for superior speech separation over linear scales. This perceptual alignment reduces dimensionality while preserving discriminative power, as seen in Mel spectrograms—a time-frequency using Mel-binned power spectra—that intuitively depict formants and transients in speech waveforms for diagnostic and modeling purposes.

Music and Acoustics

In music information retrieval (MIR), the Mel scale plays a key role in processing audio features that align with human pitch perception, enabling tasks such as pitch tracking and automatic music transcription. Mel-scaled spectrograms, which warp frequency axes to the Mel scale, serve as input representations for deep learning models that detect pitch contours in polyphonic music, improving accuracy over linear frequency scales by emphasizing perceptually relevant bands. For instance, convolutional neural networks trained on Mel spectrograms achieve robust pitch estimation in complex mixtures, as demonstrated in transcription systems that process raw audio into log-Mel representations with 229 frequency bins. Chord recognition in MIR similarly benefits from Mel-scaled features, where chroma profiles—projections of spectral energy onto pitch classes—are often derived from Mel-filtered spectrograms to capture structures invariant to shifts. These features enhance of progressions in polyphonic recordings by prioritizing mid-frequency ranges critical for tonal perception, with evaluations showing superior performance in for automated labeling. analysis leverages Mel-warped representations to model instrumental textures and source separation, as Mel-frequency cepstral coefficients derived from the scale distinguish subtle envelopes in music signals, supporting tasks like genre and similarity retrieval. In beat detection algorithms, Mel spectrograms facilitate onset detection through flux computation, where perceptual frequency weighting aids in identifying rhythmic events across diverse musical genres, as integrated in dynamic programming and neural network-based trackers. In room acoustics, Mel-warped filters simulate human hearing for modeling impulse responses and , providing a perceptually accurate basis for environmental audio . These filters, which apply all-pass transformations to mimic the Mel scale's nonlinearity, are used to design equalizers that compensate for room resonances in ways that align with auditory sensitivity, enhancing reproduction fidelity in reverberant spaces. For , models employ Mel filterbanks to process interaural time and level differences, with log-scaled channels from 50 Hz to 8 kHz enabling precise estimation in simulated environments, as validated in multi-stage neural architectures. The Mel scale's approximation of the auditory system's logarithmic underpins these applications, ensuring simulations reflect natural spatial hearing cues. Practical implementations appear in software tools like MATLAB's Audio Toolbox, which includes functions for generating Mel spectrograms and designing parametric equalizers with Mel-spaced frequency bands to achieve equal-perceived-pitch adjustments in audio systems. These tools support virtual acoustics by simulating room effects through warped filter designs, allowing engineers to prototype binaural renderings and equalization for immersive environments without physical measurements.

Limitations and Criticisms

Empirical Challenges

The empirical foundations of the Mel scale have faced significant scrutiny, particularly regarding methodological biases in its foundational experiments. The 1956 study commissioned by S.S. Stevens to refine the scale involved listeners equisecting pitch differences between tones, but this approach introduced a systematic bias due to uncontrolled order effects in stimulus presentation, leading to an overestimation of perceived pitch intervals at higher frequencies. This flaw, identified through reanalysis of the raw data, suggests that the resulting scale deviates from true perceptual equidistance, aligning more closely with equal cochlear distances rather than subjective pitch judgments. Listener variability further challenges the universality of the Mel scale, as pitch judgments are influenced by factors such as , hearing status, and cultural background. Aging and age-related can alter discrimination thresholds, with older listeners exhibiting reduced sensitivity that causes deviations from Mel scale predictions in pitch matching tasks. Similarly, cultural differences manifest in aspects like equivalence, where non-Western groups such as the Tsimane' show weaker perception compared to Western listeners, resulting in interval reproductions that better fit logarithmic scales than the Mel formulation. Studies on individual differences report substantial inter-subject variability in pitch , with thresholds differing by factors of up to 6-7 between trained musicians and non-musicians. Pitch perception exhibits some dependence on intensity, with small shifts in perceived pitch occurring with changes in , though these effects are minimal above approximately 40 dB sensation level.

Modern Alternatives and Revisions

In the and , alternatives to the fixed Mel scale have emerged to better accommodate applications in audio processing, particularly through learnable refinements that enhance feature extraction in neural models. These approaches aim to improve alignment with nonlinear auditory by allowing filterbanks to adapt during , rather than relying solely on fixed triangular filters. For instance, learnable frontends like LEAF parameterize the spectrogram generation process, outperforming traditional Mel filterbanks on tasks such as and environmental classification by optimizing warping end-to-end. Prominent alternatives to the fixed Mel scale include Gammatone filterbanks, which model the cochlear more biologically accurately through cascading resonators and provide finer resolution in lower frequencies. Gammatone Frequency Cepstral Coefficients (GFCCs), derived from these filters, have demonstrated superior performance over Mel-Frequency Cepstral Coefficients (MFCCs) in speech emotion recognition. In contexts, perceptual scalers bypass predefined formulas entirely by learning nonlinear frequency mappings from data; for example, architectures generate raw audio waveforms autoregressively, implicitly capturing perceptual nonlinearities without explicit Mel-scale conditioning in their core generative process, leading to higher-fidelity synthesis in text-to-speech systems. Recent developments include inverse-Mel scale spectrograms, which address limitations in capturing high-frequency components for applications like machine , achieving improvements of up to 37% in specific benchmarks as of 2025. Future directions include developing individualized perceptual models that account for inter-subject variability in auditory processing. Recent studies highlight individual differences in phonetic boundary under , which predict speech-in-noise performance and suggest potential for personalized applications in hearing aids. Ongoing research in the Journal of the Acoustical Society of America (JASA) from the 2020s explores these variations, advocating for data-driven models informed by behavioral measures to surpass population-averaged scales.

References

  1. [1]
    The perception of pitch | Oxford Handbook of Music Psychology
    Pitch is a percept, measurable only by psychophysical investigation, and frequency is a physical quantity, which describes the periodic properties of a signal.
  2. [2]
    A Unified Theory of Psychophysical Laws in Auditory Intensity ...
    Integrating this equation, namely ΔL = ΔI/I, he produced what is known as Fechner's law: loudness is a logarithmic function of sound intensity (L = log I). Not ...
  3. [3]
    Subdivision of the Audible Frequency Range into Critical Bands ...
    February 01 1961 Subdivision of the Audible Frequency Range into Critical Bands (Frequenzgruppen) Available E. Zwicker
  4. [4]
    [PDF] a comparison of psychological tuning curves and frequency limens
    May 1, 2014 · Pitch perception of musicians and non-musicians: A comparison of psychophysical tuning curves and frequency difference limens. Unpublished.
  5. [5]
    [PDF] Pitch, periodicity, and auditory organization - William M. Hartmann
    The tutorial began by sneaking up on the subject of pitch by first dealing with the perception of frequency. Complex tones are composed of many frequen- cies, ...
  6. [6]
    [PDF] Pitch perception - The University of Texas at Dallas
    This article is a review of the psychophysical study of pitch perception. ... Popper, & R. R. Fay (Eds.), Human psychophysics. (pp. 56-116). New York ...
  7. [7]
    Mel-Frequency Cepstral Analysis - Stanford CCRMA
    As a reference point, the pitch of a 1kHz tone, 40 dB above the perceptual hearing threshold, is defined as 1000 mels.Missing: anchor | Show results with:anchor
  8. [8]
    Auditory Feature-based Perceptual Distance - PMC - NIH
    Ideally the frequency scale should be perceptually uniform. That is, any ... mel scale (μ = 1.0, σ = 0.26, range 0.80~1.38). The STAR scale is the most ...
  9. [9]
    Mel Frequency Cepstral Coefficient - an overview - ScienceDirect.com
    ... Mel scale is used to map frequency to a perceived pitch. MFCCs involve taking the Fourier transform of the audio signal, applying a filter bank to obtain the ...
  10. [10]
    Fundamental Frequency and Pitch (Chapter 13)
    Stevens, S., Volkman, J. & Newman, E. (1937). A scale for the measurement of the psychological magnitude of pitch. Journal of the Acoustical Society of ...
  11. [11]
    librosa.mel_frequencies — librosa 0.11.0 documentation
    Compute an array of acoustic frequencies tuned to the mel scale. The mel scale is a quasi-logarithmic function of acoustic frequency designed such that ...
  12. [12]
    [PDF] Auditory Toolbox - Purdue Engineering
    This report describes a collection of tools that implement several popular auditory models for a numerical program- ming environment called MATLAB.
  13. [13]
    A Scale for the Measurement of the Psychological Magnitude Pitch
    A subjective scale for the measurement of pitch was constructed from determinations of the half‐value of pitches at various frequencies.
  14. [14]
    In the Lab With Stevens & Skinner
    In 1940 he established the Psycho-Acoustic Laboratory (PAL) at the request of the US Army Air Corps. His mission was to improve communication in noisy ...
  15. [15]
    The relation of pitch to frequency: a revised scale. - APA PsycNet
    Citation. Stevens, S. S., & Volkmann, J. (1940). The relation of pitch to ... The scale unit derived is called a mel and is defined as 1/1000th of the ...
  16. [16]
    A Scale for the Measurement of the Psychological Magnitude Pitch
    Stevens, S. S.. Abstract. Publication: Acoustical Society of America Journal. Pub Date: 1937; DOI: 10.1121/1.1915893. Bibcode: 1937ASAJ....8..185S. full text ...Missing: Volkmann Newman
  17. [17]
    On the Theory of Scales of Measurement - Science
    On the Theory of Scales of Measurement. S. S. StevensAuthors Info & Affiliations. Science. 7 Jun 1946. Vol 103, Issue 2684. pp. 677-680. DOI: 10.1126/science ...
  18. [18]
    S. S. Stevens's Invariant Legacy: Scale Types and the Power Law
    Continuing this work on pitch, Stevens and Volkmann (1940b) described a revision of new scale of pitch, the mel scale, in relation to frequency of pure tones. ...Missing: key publications
  19. [19]
    A psychophysical pitch function determined by absolute magnitude ...
    Jan 13, 2012 · Such a relation was originally determined by Stevens et al. (1937), and was called the mel scale. The mel scale was revised in a further study ...<|control11|><|separator|>
  20. [20]
    Comparison of Parametric Representations for Monosyllabic Word ...
    Aug 1, 1980 · Davis and Paul Mermelstein}, year={1980}, url={https://api ... MFCC through the Japanese dictation system with 20,000 word vocabulary ...
  21. [21]
    Mel frequency cepstral coefficients (Mfcc) feature extraction ...
    Aug 6, 2025 · The Mel frequency scale is a perceptual scale that is linear below 1 kHz and turns to be logarithmic above this frequency. ... Artificial ...
  22. [22]
  23. [23]
    [PDF] Analyzing Chroma Feature Types for Automated Chord Recognition
    In most automated chord recognition procedures, the given music recording is first converted into a sequence of chroma-based audio features and then pattern ...
  24. [24]
    Audio Features — The GenAI Guidebook - Ravin Kumar
    They are particularly useful for tasks involving pitch and harmony analysis, such as chord recognition, key detection, and music similarity estimation.
  25. [25]
    [PDF] Analysis of Sound Features for Music Timbre Recognition
    The spectrum features have two different frequency domains: Hz frequency and Mel frequency. Frame size is carefully designed to be 120ms, so that the 0th octave ...
  26. [26]
    [PDF] Better Beat Tracking Through Robust Onset Aggregation
    Robust onset aggregation uses a median operator to filter across frequencies, improving beat tracking by making it less sensitive to asynchronous events.
  27. [27]
    [PDF] Warped Filters And Their Audio Applications
    Audio applications, including modeling of auditory and musical phenomena, equalization techniques, auralization, and audio coding. will be presented. 1.
  28. [28]
    Room Response Equalization—A Review - MDPI
    Warped FIR and IIR filters can be obtained by replacing the tapped delay line with a chain of first-order all-pass filters, but while the implementation of ...
  29. [29]
    A multi-stage auditory model for binaural sound localization using ...
    Jul 25, 2025 · We filter the magnitude spectrograms into 64 channels using a log-scaled Mel filterbank with center frequencies ranging from 50 Hz to 8 kHz.Missing: simulation | Show results with:simulation
  30. [30]
    Automated Design of Audio Filters for Room Equalization - MathWorks
    This example combines Optimization Toolbox™ and Audio Toolbox™ to develop an algorithm that automatically tunes a set of filter parameters.
  31. [31]
    [PDF] Individual differences in pitch perception - White Rose eTheses Online
    This thesis reports a number of experiments investigating individual differences in pitch perception. Experiment 1 identified otherwise normally hearing ...
  32. [32]
  33. [33]
    [PDF] The perceptual correlate of frequency
    The pitch of 1000 Hz is 1000 mels. A sound that sounds half as high would have a pitch of 500 mels, while a sound that sounds twice as high would have a pitch ...Missing: anchor | Show results with:anchor
  34. [34]
    LEAF: A Learnable Frontend for Audio Classification
    Mar 12, 2021 · The output of LEAF is a time-frequency representation (a spectrogram) similar to mel filterbanks, but fully learnable. So, for example, while a ...
  35. [35]
    Robust Audio Content Classification Using Hybrid-Based SMD and ...
    A robust approach for the application of audio content classification (ACC) is proposed in this paper, especially in variable noise-level conditions.
  36. [36]
    Evaluating Gammatone Frequency Cepstral Coefficients with Neural ...
    Jun 23, 2018 · The results provide evidence that GFCCs outperform MFCCs in speech emotion recognition.
  37. [37]
    Individual differences in the perception of phonetic category ...
    Sep 13, 2024 · Results show that individual differences in the perception of two consonant contrasts significantly predict speech-in-noise performance, even ...