Articulatory phonetics
Articulatory phonetics is the branch of phonetics that investigates the physiological mechanisms involved in producing speech sounds, focusing on how the articulators—such as the tongue, lips, jaw, and vocal folds—shape airflow from the lungs to create audible phones.[1] This subfield emphasizes the coordination of the vocal tract to generate consonants and vowels, distinguishing it from acoustic phonetics, which examines sound waves, and auditory phonetics, which studies perception.[2] Speech production typically begins with pulmonic egressive airflow, where air is expelled from the lungs through the larynx and supralaryngeal vocal tract, modified by voicing (vibration of the vocal folds for sounds like or ) or voicelessness (no vibration, as in or ).[1][3] The vocal tract, extending from the glottis to the lips and including nasal passages, comprises key structures like the pharynx, oral cavity, alveolar ridge, hard and soft palates, teeth, and lips, all of which act as movable articulators to constrict or open the airway.[1] For instance, the tongue's position relative to the palate determines places of articulation, while the velum controls nasal versus oral airflow.[1] These configurations allow for precise control, enabling speakers to produce a wide range of sounds across languages, though individual variations in anatomy influence realization.[2] Consonants are classified primarily by place of articulation (e.g., bilabial for [p, b] involving both lips, alveolar for [t, d] at the ridge behind the teeth) and manner of articulation (e.g., stops with complete closure like , fricatives with turbulent airflow like , nasals with velum lowering like ).[1] Voicing further differentiates pairs, such as voiced versus voiceless .[2] Vowels, in contrast, involve open vocal tract configurations defined by tongue height (high like , low like ), frontness/backness (front , back ), and lip rounding (rounded , unrounded [ɪ]), forming the resonant quality of speech.[1] Articulatory phonetics underpins phonological analysis by revealing how abstract sound units (phonemes) are physically realized as allophones, informing fields like speech therapy, language acquisition, and computational linguistics.[1] Advances in imaging techniques, such as MRI and ultrasound, have enhanced empirical study of these processes, providing dynamic views of articulation in real time.[4]Vocal Tract Anatomy
Articulators
The articulators are the movable and fixed structures within the vocal tract that interact to modify airflow and produce speech sounds. These include both active components, which can move to create constrictions or openings, and passive ones that serve as points of contact. The primary articulators extend from the lips to the glottis, shaping the acoustic properties of sounds by altering the vocal tract's configuration.[5] The jaw (mandible) is a primary active articulator, consisting of the movable lower jawbone that hinges at the temporomandibular joint. It elevates and depresses to control the vertical dimension of the oral cavity, facilitating mouth opening for vowels and closures for stops, while also influencing tongue positioning through its support of the tongue base.[6] Lips are the outermost active articulators, consisting of soft, muscular tissue capable of protrusion, rounding, and spreading. They modify airflow by closing completely to create bilabial closures or approximating to produce fricative turbulence, and their shape influences vowel resonance through rounding or spreading.[3] The tongue is the most versatile active articulator, a muscular hydrostat divided into parts: the tip (apex) for precise contact at forward points; the blade (underside near the tip) for broader approximations; the front for raising toward the palate; the back for contacting rear structures; and the root anchoring near the pharynx. The tongue raises, lowers, advances, or retracts to adjust cavity sizes, enabling high vowels via elevation or consonant constrictions via approximation to fixed structures.[6] Teeth, particularly the upper incisors, act as passive articulators, providing a fixed edge for the tongue tip or lower lip to contact, generating fricative sounds through narrow channels or stops via complete blockage.[5] The alveolar ridge, a bony prominence just behind the upper teeth, serves as a passive target for the tongue tip or blade, facilitating alveolar constrictions that shape sounds by narrowing the front oral cavity.[3] The hard palate, the bony roof of the mouth extending rearward from the alveolar ridge, functions as a passive articulator for the tongue front or blade, creating palatal contacts that modify airflow for higher-frequency resonances.[6] The soft palate (velum) is a movable muscular flap at the rear of the hard palate, actively raising to seal the nasal cavity for oral sounds or lowering to allow nasal airflow, thus directing resonance paths and producing nasal versus oral distinctions. The uvula, a small projection from the velum's trailing edge, can vibrate or contact the tongue back in some languages, adding pharyngeal modulation.[5] The pharyngeal wall, the flexible rear wall of the throat above the larynx, acts passively as a target for the tongue root, constricting the pharynx to lower formant frequencies in certain vowels or consonants. The epiglottis, a leaf-shaped cartilage at the laryngopharynx entrance, contributes minimally to sound shaping but helps protect the airway during swallowing, indirectly supporting articulation.[3] The glottis, the space between the vocal folds within the larynx, controls airstream vibration for voicing; it narrows for voiced sounds via fold adduction or widens for voiceless ones, fundamentally altering sound timbre.[6] Anatomical variations in articulators, such as differences in tongue length, palate concavity, or pharyngeal curvature, occur across individuals and populations, influencing articulation precision and acoustic output; for instance, reduced palatal concavity can shift vowel formants, requiring compensatory adjustments in tongue positioning. These variations, explaining up to 46% of palatal shape differences and 78.5% of pharyngeal curvature variance, arise from genetic and developmental factors and affect sound production across languages.[7] A sagittal view of the vocal tract illustrates these structures in a midline cross-section, showing the lips at the front, tongue body curving upward, alveolar ridge and hard palate forming the oral roof, velum and uvula at the rear, pharyngeal wall descending to the epiglottis, and glottis at the base above the trachea; this perspective highlights how articulator movements reshape the tract from lips to larynx for sound filtering.[6]Larynx and Pharynx
The larynx, often referred to as the voice box, is a cartilaginous structure located in the anterior neck at the level of the C3 to C6 vertebrae, serving as the primary site for airflow initiation and modification in speech production.[8] It consists of several key cartilages that provide structural support: the thyroid cartilage, which forms the anterior and lateral walls and is prominent as the laryngeal prominence (Adam's apple); the cricoid cartilage, a ring-shaped structure inferior to the thyroid that anchors the larynx and connects to the trachea; and the paired arytenoid cartilages, which sit atop the cricoid and facilitate movement of the vocal folds through rotation and rocking motions.[8] The vocal folds, central to the larynx, are paired structures comprising the vocal ligaments (elastic bands of fibrous tissue) covered by mucous membrane and supported by the vocalis muscle, allowing for tension adjustments during sound production.[8] Intrinsic laryngeal muscles, such as the cricothyroid muscle, tilt the thyroid cartilage forward relative to the cricoid to stretch and tense the vocal folds, thereby influencing pitch control in speech.[8] The pharynx, a muscular tube extending from the base of the skull to the level of the C6 vertebra, lies immediately superior to the larynx and forms the lower portion of the vocal tract, contributing to airflow shaping and resonance.[9] Its walls are composed of a fibromuscular layer, including the posterior pharyngeal wall lined with mucosa, which can constrict to modify the acoustic properties of sound.[9] The pharyngeal constrictor muscles—superior, middle, and inferior—encircle the pharynx and enable narrowing of its lumen; the superior constrictor originates from the pterygoid plate and mandible, inserting into the pharyngeal raphe, while the middle and inferior constrictors arise from the hyoid and laryngeal cartilages, respectively, allowing sequential contraction to adjust pharyngeal width for articulatory purposes.[9] In addition to supporting airflow, the larynx performs critical protective functions during speech and swallowing, with the epiglottis—a leaf-shaped elastic cartilage—folding over the laryngeal inlet to close the airway and prevent aspiration of food or liquid.[8] The pharynx and larynx together provide a foundational chamber for resonance, where the pharyngeal cavity acts as an acoustic resonator that amplifies and filters sounds generated below the oral cavity.[9] Cross-linguistically, the pharynx plays a prominent role in producing pharyngeal sounds, as seen in Arabic, where pharyngeal fricatives like /ħ/ (voiceless) and /ʕ/ (voiced) involve retraction of the tongue root and epiglottis toward the posterior pharyngeal wall, narrowing the lower pharynx to create turbulent airflow.[10] Emphatic consonants in Arabic, such as /tˤ/ and /sˤ/, feature secondary pharyngeal constriction via tongue body retraction and larynx raising, enhancing velarization and distinguishing them from non-emphatic counterparts.[10] These articulations highlight the pharynx's capacity for precise narrowing, which is less utilized in languages without such phonemes.[10]Airstream Mechanisms
Pulmonic Initiation
Pulmonic initiation refers to the airstream mechanism in which airflow for speech sounds is generated by the lungs through the coordinated action of respiratory muscles.[11] In the predominant pulmonic egressive variant, air is expelled outward from the lungs, driven by the contraction of the diaphragm and the expansion of the rib cage, which decreases the thoracic volume and builds positive pressure below the glottis.[12] This process relies on the elastic recoil of the lungs and muscular effort to initiate and sustain airflow through the vocal tract, forming the basis for the production of consonants and vowels in spoken language.[13] While pulmonic egressive is the default mechanism, pulmonic ingressive airstream—where air is drawn inward by expanding the rib cage and relaxing the diaphragm—is far rarer and typically serves auxiliary roles rather than primary phonation.[14] Examples include paralinguistic affirmatives like "yes" or "no" in Scandinavian languages such as Swedish and Norwegian, or occasional accompaniments in click consonants within certain African languages, where it supplements the main velaric ingressive flow.[15] These ingressive instances are not contrastive in most languages and are constrained by physiological limits on sustained inward airflow. The efficiency of pulmonic initiation depends on subglottal pressure differentials, with typical values ranging from 5 to 10 cm H₂O during modal phonation in conversational speech.[16] This pressure gradient, created between the lungs and the atmosphere, directly influences sound intensity: higher subglottal pressures increase airflow velocity and amplitude, resulting in louder articulation, while lower pressures produce softer sounds.[17] Pulmonic mechanisms power approximately 99% of consonants and vowels across the world's languages, underscoring their universal role in human speech production.[15]Non-Pulmonic Initiation
Non-pulmonic initiation refers to the production of speech sounds using airstream mechanisms that do not rely on the lungs, instead employing enclosed cavities within the oral or laryngeal regions to generate airflow through expansion or compression. These mechanisms are less common than pulmonic ones and are typically associated with consonants in a minority of the world's languages, enabling unique phonetic contrasts.[18] Glottalic mechanisms involve the movement of the larynx to create pressure differences, with the glottis closed to form a seal. Ejectives are produced via glottalic egressive airstream, where the larynx is raised to compress air behind an oral closure, followed by simultaneous release of the oral and glottal stops, resulting in a voiceless plosive with no pulmonic involvement; for example, the velar ejective [kʼ] occurs in Quechua.[19] Implosives, in contrast, use glottalic ingressive airstream, where the larynx is lowered to rarefy air behind the oral closure, producing a voiced ingressive sound upon release; an example is the bilabial implosive [ɓ̤] in Sindhi. Both ejectives and implosives are represented in the International Phonetic Alphabet (IPA) with diacritics indicating glottalization, and they often feature acoustic correlates such as abrupt pressure releases in ejectives (manifesting as high-intensity bursts) and lowered fundamental frequency in implosives due to laryngeal lowering.[20] Velaric mechanisms, also known as lingual ingressive airstream, utilize the tongue to enclose and manipulate air in the oral cavity, independent of the larynx or lungs. Clicks are the primary sounds produced this way, involving two lingual closures: an anterior one (e.g., at the lips, teeth, or alveolar ridge) and a posterior one (typically velar), with the tongue body lowered to create suction before the anterior release produces an ingressive "pop." In Khoisan languages like !Xóõ, the alveolar click is transcribed as [ǃ] in the IPA, often accompanied by efflux notations for accompanying airstreams (e.g., tenuis [ǃ], voiced [ɡǃ], or nasal [ŋǃ]). Acoustically, clicks exhibit rarefaction bursts characterized by a sharp negative pressure spike followed by a transient noise, distinguishing them from pulmonic stops.[21] Rarely attested epiglottal or pharyngeal initiations involve constriction or movement in the lower pharynx or epiglottal region to generate localized airflow, often in conjunction with glottalic features. Epiglottal plosives, such as [ʡʔ], occur in languages like Agul (a Lezgic language), where the epiglottis forms a posterior closure with glottal sealing for egressive or ingressive effects. These sounds produce acoustic profiles with intense frication and formant lowering due to pharyngeal narrowing, but they remain marginal in global phoneme inventories.Consonant Articulation
Places of Articulation
Places of articulation refer to the locations in the vocal tract where a consonant's primary constriction or closure occurs, typically along the midline from the lips to the glottis. This classification is central to the International Phonetic Alphabet (IPA), which organizes consonants horizontally by place in its chart, distinguishing the point of articulatory contact or narrowing that shapes the sound's acoustic properties.[22] The active articulator is the movable organ, such as the tongue tip or lower lip, that approaches or contacts the passive articulator, the stationary structure like the upper teeth or alveolar ridge.[3] Sagittal diagrams, which provide midsagittal cross-sections of the vocal tract, illustrate these positions and degrees of constriction, from complete closure in stops to narrow channels in fricatives.[23] The major places of articulation, from front to back, are as follows:- Bilabial: The lips come together, with the lower lip (active) contacting the upper lip (passive); examples include /p/ and /b/ as in English "pat" and "bat."[24]
- Labiodental: The lower lip (active) approaches the upper teeth (passive); examples are /f/ and /v/ as in "fan" and "van."[24]
- Dental: The tongue tip or blade (active) contacts or nears the upper teeth (passive); examples include /θ/ and /ð/ as in "think" and "this."[24]
- Alveolar: The tongue tip or blade (active) touches the alveolar ridge behind the upper teeth (passive); common in English with /t/, /d/, /n/, /s/, /z/, /l/, and /ɹ/ as in "top," "dog," "no," "see," "zoo," "let," and "red."[3]
- Postalveolar: The tongue blade (active) contacts the area just behind the alveolar ridge (passive); examples are /ʃ/ and /ʒ/ as in "ship" and "measure."[24]
- Retroflex: The tongue tip (active) curls back toward the postalveolar or palatal region (passive); examples include /ʈ/ and /ɖ/ found in languages like Hindi.[24]
- Palatal: The tongue front or body (active) raises to the hard palate (passive); examples are /c/, /ɟ/, and /j/ as in some Romance languages or English "yes" (/j/).[24]
- Velar: The tongue back (active) contacts the soft palate or velum (passive); examples include /k/, /g/, and /ŋ/ as in "cat," "go," and "sing."[3]
- Uvular: The tongue root or back (active) approaches the uvula (passive); examples are /q/ and /ɢ/ in languages like Arabic or French /ʁ/.[24]
- Pharyngeal: The tongue root (active) constricts against the pharyngeal wall (passive); examples include /ħ/ and /ʕ/ in Arabic.[24]
- Glottal: Constriction occurs at the glottis between the vocal folds (both active and passive); examples are /ʔ/ (glottal stop) as in English "uh-oh" and /h/ as in "hat."[3]