Fact-checked by Grok 2 weeks ago

Latin Extended-G

Latin Extended-G is a block of the Unicode character encoding standard located in the Supplementary Multilingual Plane (Plane 1), designed to provide additional Latin-script characters for advanced phonetic transcription and specialized linguistic notations. It encompasses the code point range U+1DF00 to U+1DFFF, allocating 256 positions in total. The block was introduced in Unicode version 14.0, released in September 2021, initially encoding 31 characters focused on extensions to the International Phonetic Alphabet (IPA) for representing disordered speech patterns, such as atypical articulations in clinical linguistics. Subsequent versions have expanded it, with Unicode 15.0 (September 2022) adding 6 characters for Malayalam transliteration; as of Unicode 17.0 (September 2025), a total of 37 characters are assigned across categories including IPA extensions, clicks, laterals, letters with palatal or retroflex hooks, and characters for Malayalam transliteration. Examples include 𝼀 (U+1DF00, Latin small letter feng digraph with trill) for disordered speech trills and 𝼝 (U+1DF1D, Latin small letter c with retroflex hook) for retroflex consonants in phonetic analysis. These characters support precise documentation in fields like , speech therapy, and , filling gaps in earlier Latin extension blocks by accommodating rare diacritics and digraphs not feasible through combining marks. Ongoing proposals continue to propose additions to the block for emerging needs, such as modifier letters for implosives and symbols from historical alphabets like the .

Overview

Description

Latin Extended-G is a that provides additional Latin characters primarily for use in systems. It occupies the code point range from U+1DF00 to U+1DFFF in the Supplementary Multilingual Plane (), marking it as one of the initial extensions of the beyond the Basic Multilingual Plane (BMP). The block's key purpose is to support specialized phonetic notations, including symbols required for transcribing languages, click consonants, extensions to the International Phonetic Alphabet () for disordered speech, and characters for Malayalam transliteration. These characters enable precise representation of sounds that are not adequately covered by earlier Latin blocks or combining diacritics. This expansion holds significant importance for linguistic and academic transcription, allowing for standalone precomposed characters that enhance compatibility and readability in digital texts without dependence on complex combining sequences. Along with the block, it represents the first Latin characters encoded in the , broadening 's coverage for diverse phonetic needs. As of Unicode 17.0, the block allocates 256 code points, with 37 assigned to specific characters.

Unicode Block Details

The Latin Extended-G block is allocated the code point range U+1DF00 to U+1DFFF, encompassing 256 consecutive positions in the character standard. This range is situated in the Supplementary Multilingual Plane (SMP), designated as Plane 1, which extends beyond the initial 65,536 s of the Basic Multilingual Plane (). The block is categorized under the , serving as an extension to earlier Latin blocks primarily located within the . As of Unicode version 17.0, 37 characters within this block have been formally assigned, leaving 219 positions unassigned and reserved for potential future allocations. The official Unicode block name is "Latin Extended-G," with no widely recognized aliases in use. Unlike preceding Latin Extended blocks such as (U+0100–U+017F) and (U+A720–U+A7FF), which reside in the 16-bit and can be represented in a single UTF-16 code unit, Latin Extended-G marks the first such block to require the full 21-bit encoding capacity, typically encoded as surrogate pairs in UTF-16 or directly in UTF-32. This placement reflects the evolving needs for expanded Latin-based character sets in supplementary planes.

Characters

Assigned Characters

The Latin Extended-G assigns 37 characters across specific code points in the range U+1DF00–U+1DFFF, leaving gaps at U+1DF1F–U+1DF24 and U+1DF2B–U+1DFFF unassigned. These characters consist of small letters with distinctive modifications such as hooks, curls, belts, and reversals, designed for precise phonetic notation in linguistic applications. The primary sub-range U+1DF00–U+1DF1E encompasses 31 characters, grouped thematically by phonetic function: those for extended in disordered speech (U+1DF00–U+1DF07), extensions including retroflex and notations (U+1DF08–U+1DF10), laterals (U+1DF11), palatal hook modifications (U+1DF12–U+1DF18), and retroflex hook modifications (U+1DF19–U+1DF1D), plus an extension (U+1DF1E). A secondary sub-range U+1DF25–U+1DF2A includes six characters with mid-height left hooks for transliteration. Below is the complete list, with brief descriptions of glyph appearance and primary phonetic role based on their official categories.
Code PointGlyphNameDescription
U+1DF00𝼀LATIN SMALL LETTER FENG DIGRAPH WITH TRILLA ligature-like small letter combining f and ng with a trill mark; used in extended IPA for disordered speech to denote fricative trills.
U+1DF01𝼁LATIN SMALL LETTER REVERSED SCRIPT GA small reversed script-style g; represents specific velar or uvular articulations in disordered speech per extended IPA.
U+1DF02𝼂LATIN LETTER SMALL CAPITAL TURNED GA small capital turned g; denotes backed or uvular fricatives in extended IPA for disordered speech.
U+1DF03𝼃LATIN SMALL LETTER REVERSED KA small reversed k; used for retroflex or uvular stops in extended IPA disordered speech notation.
U+1DF04𝼄LATIN LETTER SMALL CAPITAL L WITH BELTA small capital l crossed by a horizontal belt; indicates lateral fricatives in extended IPA for disordered speech.
U+1DF05𝼅LATIN SMALL LETTER LEZH WITH RETROFLEX HOOKA small ʒ (ezh) with a retroflex hook; represents retroflex postalveolar fricatives in IPA extensions.
U+1DF06𝼆LATIN SMALL LETTER TURNED Y WITH BELTA small turned y crossed by a belt; used for lateral approximants in extended IPA disordered speech.
U+1DF07𝼇LATIN SMALL LETTER REVERSED ENGA small reversed ŋ (eng); denotes nasal sounds in extended IPA for disordered speech.
U+1DF08𝼈LATIN SMALL LETTER TURNED R WITH LONG LEG AND RETROFLEX HOOKA small turned r with extended leg and retroflex hook; for retroflex rhotics in IPA extensions.
U+1DF09𝼉LATIN SMALL LETTER T WITH HOOK AND RETROFLEX HOOKA small t with retroflex and hook diacritics; represents retroflex dentals in IPA extensions.
U+1DF0A𝼊LATIN LETTER RETROFLEX CLICK WITH RETROFLEX HOOKA small retroflex click symbol with hook; used for retroflex click consonants.
U+1DF0B𝼋LATIN SMALL LETTER ESH WITH DOUBLE BARA small ʃ (esh) with double vertical bar; denotes fricative clicks.
U+1DF0C𝼌LATIN SMALL LETTER ESH WITH DOUBLE BAR AND CURLA small ʃ with double bar and rightward curl; for ejective or delayed release fricatives in clicks.
U+1DF0D𝼍LATIN SMALL LETTER TURNED T WITH CURLA small turned t with curl; represents alveolar clicks.
U+1DF0E𝼎LATIN LETTER INVERTED GLOTTAL STOP WITH CURLAn inverted small glottal stop with curl; used for glottalized clicks.
U+1DF0F𝼏LATIN LETTER STRETCHED C WITH CURLA stretched small c with curl; denotes palatal or alveolar lateral clicks.
U+1DF10𝼐LATIN LETTER SMALL CAPITAL TURNED KA small capital turned k; for voiceless lateral clicks.
U+1DF11𝼑LATIN SMALL LETTER L WITH FISHHOOKA small l with a fishhook curl; represents alveolar lateral approximant.
U+1DF12𝼒LATIN SMALL LETTER DEZH DIGRAPH WITH PALATAL HOOKA small dʒ (dezh) digraph with palatal hook; for palatalized postalveolar affricates.
U+1DF13𝼓LATIN SMALL LETTER L WITH BELT AND PALATAL HOOKA small l with belt and palatal hook; denotes palatalized lateral fricatives.
U+1DF14𝼔LATIN SMALL LETTER ENG WITH PALATAL HOOKA small ŋ with palatal hook; for palatalized velar nasals.
U+1DF15𝼕LATIN SMALL LETTER TURNED R WITH PALATAL HOOKA small turned r with palatal hook; represents palatalized rhotics.
U+1DF16𝼖LATIN SMALL LETTER R WITH FISHHOOK AND PALATAL HOOKA small r with fishhook and palatal hook; for palatalized alveolar trills.
U+1DF17𝼗LATIN SMALL LETTER TESH DIGRAPH WITH PALATAL HOOKA small tʃ (tesh) digraph with palatal hook; denotes palatalized postalveolar affricates.
U+1DF18𝼘LATIN SMALL LETTER EZH WITH PALATAL HOOKA small ʒ with palatal hook; for palatalized postalveolar fricatives.
U+1DF19𝼙LATIN SMALL LETTER DEZH DIGRAPH WITH RETROFLEX HOOKA small dʒ with retroflex hook; represents retroflex postalveolar affricates.
U+1DF1A𝼚LATIN SMALL LETTER I WITH STROKE AND RETROFLEX HOOKA small i with stroke and retroflex hook; for retroflex vowels or approximants.
U+1DF1B𝼛LATIN SMALL LETTER O WITH RETROFLEX HOOKA small o with retroflex hook; denotes retroflex rounded vowels.
U+1DF1C𝼜LATIN SMALL LETTER TESH DIGRAPH WITH RETROFLEX HOOKA small tʃ with retroflex hook; for retroflex postalveolar affricates.
U+1DF1D𝼝LATIN SMALL LETTER C WITH RETROFLEX HOOKA small c with retroflex hook; represents retroflex alveolo-palatal fricatives as an IPA extension.
U+1DF1E𝼞LATIN SMALL LETTER S WITH CURLA small s with rightward curl; used for sibilant fricatives in Malayalam transliteration and as an IPA extension.
U+1DF25𝼥LATIN SMALL LETTER D WITH MID-HEIGHT LEFT HOOKA small d with mid-height left-pointing hook; for voiced dental stops in Malayalam transliteration.
U+1DF26𝼦LATIN SMALL LETTER L WITH MID-HEIGHT LEFT HOOKA small l with mid-height left hook; represents retroflex laterals in Malayalam transliteration.
U+1DF27𝼧LATIN SMALL LETTER N WITH MID-HEIGHT LEFT HOOKA small n with mid-height left hook; for retroflex nasals in Malayalam transliteration.
U+1DF28𝼨LATIN SMALL LETTER R WITH MID-HEIGHT LEFT HOOKA small r with mid-height left hook; denotes retroflex flaps in Malayalam transliteration.
U+1DF29𝼩LATIN SMALL LETTER S WITH MID-HEIGHT LEFT HOOKA small s with mid-height left hook; for retroflex sibilants in Malayalam transliteration.
U+1DF2A𝼪LATIN SMALL LETTER T WITH MID-HEIGHT LEFT HOOKA small t with mid-height left hook; represents retroflex stops in Malayalam transliteration.

Character Categories

The characters in the Latin Extended-G block are classified into thematic groups based on their typographic design and intended phonetic roles, extending the Latin script's utility for precise linguistic transcription in the Supplementary Multilingual Plane. These categories emphasize precomposed forms without diacritics to ensure compatibility with existing Latin encoding practices and avoid complex stacking issues common in earlier phonetic blocks. Small capital letters represent one key category, employed for phonetic precision in denoting voiceless or sounds not adequately distinguished in prior extensions. For example, the small capital turned G (U+1DF02) facilitates notation for devoiced velar stops, while the small capital turned K (U+1DF10) supports uvular or back articulations. Several such characters—approximately three in total—address gaps in the small capital repertoire from blocks like , providing unified forms for advanced usage. Reversed or modified letters form another category, tailored for retroflex, uvular, or inverted articulations to capture sounds beyond standard Latin orientations. Representative instances include the reversed (U+1DF03) for retroflex stops and the turned with long leg and retroflex (U+1DF08) for rhotic retroflex . These modifications, numbering around five, evolved from proposals to fill deficiencies in , where similar turned and reversed forms were limited, enabling better representation of non-European phonologies without combining sequences. Hook and stroke variants constitute the predominant category, incorporating hooks to denote consonants and other secondary articulations in line with Douglas Beach's notation for languages. Letters with retroflex hooks, such as the lezh with retroflex hook (U+1DF05) and with retroflex hook (U+1DF0A), support transcription of clicks in and , while palatal hooks on digraphs like the dezh digraph with (U+1DF12) extend to ejective and notations. Over 25 characters fall here, including more than 10 dedicated to phonetics such as clicks and ejectives, with the remainder serving general extensions like mid-height left hooks on (e.g., U+1DF25 D with mid-height left hook) for implosive or lateral sounds. This grouping directly remedies limitations in earlier blocks like , where hook diacritics often required decomposition, by offering stable precomposed glyphs.

History

Proposal and Addition

The Latin Extended-G block originated from proposals submitted to the Unicode Technical Committee (UTC) addressing deficiencies in existing Latin and phonetic character sets for transcribing complex sounds in linguistics, particularly those used in African languages and disordered speech analysis. Key proponents included phoneticians such as Kirk Miller and Martin Ball, with support from organizations like SIL International, which has long advocated for Unicode expansions to facilitate minority language documentation and phonetic notation. These efforts built on revisions to the Extensions to the International Phonetic Alphabet (extIPA) and Voice Quality Symbols (VoQS) systems, approved by the International Clinical Phonetics and Linguistics Association (ICPLA) in 2016, highlighting the need for dedicated characters beyond the Basic Multilingual Plane (BMP). In 2020, multiple documents outlined the initial request for 31 precomposed characters, focusing on digraphs, modifier letters, and symbols for clicks, trills, and other non-European phonemes that could not be adequately represented through combining marks due to rendering instability and glyph complexity. The primary proposal, L2/20-266R, consolidated these requests for stable forms in the new Latin Extended-G block for academic and clinical use, while L2/20-115R targeted additional click letters essential for and language transcription. These submissions emphasized the limitations of blocks like , proposing placement in the Supplementary Multilingual Plane () to accommodate intricate shapes without compromising legibility in publishing tools. The UTC reviewed these proposals during meetings in 2020, including UTC #163 and subsequent sessions, debating the SMP allocation to ensure compatibility with complex font rendering for phonetic diacritics and digraphs. Approval was granted in January (UTC #166), stabilizing 31 characters in the new Latin Extended-G block (U+1DF00–U+1DF1E) for Unicode 14.0, released in September . This decision provided linguists and language experts with reliable precomposed forms for clicks (e.g., alveolar and retroflex variants), small capitals (e.g., turned G and L with belts), and related symbols, enabling precise transcription in scholarly works without reliance on unstable combinations. Subsequent minor additions occurred in later versions, but the core block addressed critical gaps in phonetic support.

Versions and Updates

The Latin Extended-G block was officially created in Unicode 14.0, released on September 14, 2021, with the addition of its initial 31 characters primarily for extended representations in disordered speech and other phonetic notations. Unicode 15.0, released on September 13, 2022, expanded the block by adding six more characters, such as U+1DF25 LATIN SMALL LETTER D WITH MID-HEIGHT LEFT HOOK through U+1DF2A LATIN SMALL LETTER T WITH MID-HEIGHT LEFT HOOK, which support transliteration; this brought the total number of assigned characters to 37. Following Unicode 15.0, the block has remained stable with no further character assignments as of Unicode 17.0, released on September 9, 2025; the range U+1DF00–U+1DFFF reserves 256 code points overall for potential future extensions in phonetic and linguistic applications. Updates to the Latin Extended-G block adhere to the Unicode Consortium's stability policies, which guarantee that once encoded, characters retain their code points, properties, and decompositions across versions to preserve text integrity. These version-specific additions and stability measures have facilitated , allowing existing phonetic fonts and transcription systems to integrate the new characters without disrupting prior encodings.

Usage and Support

Phonetic Applications

The Latin Extended-G block supports the Extended International Phonetic Alphabet (extIPA) by providing precomposed characters for transcribing disordered speech, including symbols for atypical articulations and voice qualities not covered in the standard chart. These characters enable precise representation of phonetic phenomena in clinical and speech , such as unusual releases or fricatives. Small capital letters in the block denote voiceless versions of voiced , a convention in extIPA for disordered speech transcription where the small capital form indicates lack of voicing. For instance, U+1DF02 LATIN LETTER SMALL CAPITAL TURNED G represents the voiceless uvular stop [ɢ̥], used to transcribe devoiced uvular articulations in pathological speech or certain dialects. Reversed letters, such as U+1DF03 LATIN SMALL LETTER REVERSED K, facilitate notation of retroflex and uvular , aiding conceptual clarity in phonetic analysis over composite forms. Hooked letters in the block draw from historical systems like Douglas Beach's notation for click consonants, providing dedicated glyphs for non-pulmonic sounds in general . U+1DF0A LATIN LETTER RETROFLEX CLICK WITH RETROFLEX HOOK, for example, transcribes retroflex clicks in such as !Kung, where such sounds are phonemic. These symbols integrate into IPA-based charts for academic publications, enhancing readability in studies of African languages with complex consonant inventories, including click systems and tonal features in languages like Yoruba. The precomposed nature of these characters in the Supplementary Multilingual Plane minimizes rendering inconsistencies associated with stacking, promoting stability in digital phonetic tools. They are particularly valuable in software like SIL FieldWorks, which leverages support for fieldwork transcription of endangered languages, allowing linguists to document non-standard phonetics without fallback approximations. However, the block's characters are restricted to specialized phonetic and clinical applications, unsuitable for standard orthographies or general text processing. Unicode 18.0 (September 2025) added further characters to the block, expanding support for advanced phonetic notations in disordered speech and historical systems, as of November 2025.

Font and System Support

Support for the Latin Extended-G Unicode block (U+1DF00–U+1DFFF) in fonts remains limited due to its specialized nature for phonetic transcription, but several free and open-source fonts provide comprehensive coverage. Noto Sans and Noto Serif from Google offer full glyph support for all assigned characters in the block, ensuring consistent rendering across diverse text environments. Similarly, SIL International's Gentium Plus, Andika, and Charis SIL fonts include complete or near-complete support for Latin Extended-G, designed specifically for linguistic and phonetic applications with broad Unicode compliance. Other free options like Symbola and Unifont provide full bitmap-based coverage, making them suitable for fallback rendering in systems lacking advanced font features. In contrast, widely used fonts such as DejaVu Sans offer only partial support, covering select characters while omitting others. At the operating system level, native support for Latin Extended-G is available in modern versions that incorporate 14.0 and later. (version 22H2 and subsequent updates) fully handles the block through its built-in Segoe UI font family and system libraries, supporting rendering via DirectWrite. (13.0) and later provide native integration, leveraging 15.0 compatibility for seamless display in apps like . On distributions, support is facilitated by the shaping engine (version 3.0+), which includes 14.0 and beyond, enabling proper glyph positioning in environments like and . Handling characters in the Supplementary Multilingual Plane () requires encodings such as 4-byte sequences or surrogate pairs in UTF-16 to avoid truncation or misinterpretation. Challenges in implementation arise from the block's SMP location, where incomplete font coverage often results in fallback to empty boxes or glyphs for unassigned or unsupported s. Additionally, legacy systems or applications without full SMP awareness may fail to render these characters correctly without explicit configuration. Tools for viewing and testing Latin Extended-G characters include BabelMap, a Windows utility that displays glyphs across planes and allows inspection. The official charts provide reference renderings for verification. For document preparation, integration in is achieved via XeLaTeX, which loads compatible fonts like Charis SIL to support phonetic typesetting. As adoption of 14.0 and subsequent versions expands in academic and publishing software, font and system support for Latin Extended-G is expected to increase, driven by growing needs in linguistic research.

References

  1. [1]
    [PDF] Latin Extended-G - The Unicode Standard, Version 17.0
    The Unicode Standard, Version 17.0, Copyright © 1991-2025 Unicode, Inc. All rights reserved. 1768. 1DFFF. Latin Extended-G. 1DF00. 1DF0 1DF1 1DF2 1DF3 1DF4 1DF5 ...Missing: block | Show results with:block
  2. [2]
    Unicode 14.0.0
    ### Summary
  3. [3]
    [PDF] Unicode request for Initial Teaching Alphabet
    Dec 13, 2024 · Proposed name of script: b. The proposal is for addition of character(s) to an existing block: yes. Name of the existing block: Latin Extended-G.
  4. [4]
    [PDF] Unicode request for modifier voiceless implosive letters
    Jun 15, 2024 · ... propose placing any additional modifier Latin letters at the end of Latin Extended-G, to help keep them separate from baseline letters ...
  5. [5]
    Latin Extended-G - Codepoints
    Latin Extended-G is a Unicode block containing additional characters for phonetic transcription. The Latin Extended-F and -G blocks contain the first Latin ...
  6. [6]
    Chapter 7 – Unicode 17.0.0
    The Latin Extended-B block contains letterforms used to extend Latin scripts to represent additional languages. It also contains phonetic symbols not included ...
  7. [7]
    Latin Extended-G - Unicode
    Latin Extended-G · Extended IPA for disordered speech · IPA extensions · Clicks · Lateral · Letters with palatal hooks · Letters with retroflex hooks · IPA extension.Missing: proposal VoQS
  8. [8]
    [PDF] Unicode request for additional para-IPA letters
    Jan 11, 2021 · Latin Extended-G. U+1DF1x. 𝼐. 𝼑. 𝼒. 𝼓. 𝼔. 𝼕. 𝼖. 𝼗. 𝼘. 𝼙. 𝼚. 𝼛. 𝼜 ... Can any of the proposed characters be encoded using a composed character ...
  9. [9]
    [PDF] Unicode request for additional phonetic click letters
    Jul 10, 2020 · This request is for phonetic symbols used for click consonants, primarily for Khoisan and Bantu ... Douglas Beach (1938) The Phonetics of the ...
  10. [10]
    [PDF] Expansion of the extIPA and VoQS Combining diacritics - Unicode
    Jul 11, 2020 · A7CB;LATIN SMALL LETTER FENG DIGRAPH WITH TRILL;Ll;0;L;;;;;N ... Does the proposal address other aspects of character data processing ...
  11. [11]
    [PDF] L2/21-021 - Unicode
    Oct 19, 2020 · A7CB LATIN SMALL LETTER FENG DIGRAPH WITH TRILL. 1DF00. A7CC LATIN SMALL LETTER LEZH WITH RETROFLEX HOOK. 1DF05. A7CD LATIN SMALL LETTER TURNED ...
  12. [12]
    Unicode 15.0.0
    ### Summary of Latin Extended-G Block Additions in Unicode 15.0
  13. [13]
    None
    - **Chart**: Latin Extended-G
  14. [14]
    Unicode® Character Encoding Stability Policies
    Jan 9, 2024 · These policies are intended to ensure that text encoded in one version of the standard remains valid and unchanged in later versions.
  15. [15]
    [PDF] extIPA SYMBOLS FOR DISORDERED SPEECH
    extIPA symbols for disordered speech include consonant symbols, diacritics for articulations, and symbols for connected speech voicing, such as pauses and ...
  16. [16]
    Character Set Support - Doulos SIL
    A font that provides complete support for the International Phonetic Alphabet. ... Latin Extended-G*, U+1DF00..U+1DF1E, U+1DF25..U+1DF2A. Cyrillic Extended-D, U+ ...
  17. [17]
    Noto Home - Google Fonts
    Noto is a collection of high-quality fonts in more than 1000 languages and over 150 writing systems.
  18. [18]
  19. [19]
    Andika — A literacy font - SIL Language Technology
    Andika is a sans-serif font family designed and optimized especially for use as a literacy font. It supports a near-complete range of Unicode characters.Download · Support · Contact · Resources<|control11|><|separator|>
  20. [20]
    Charis — An extended Latin font - SIL Language Technology
    Charis is a Unicode-based, extended Latin font supporting many languages, with variations of the 26 ABCs, and also supports Cyrillic.Support · Download · Contact · ResourcesMissing: G | Show results with:G
  21. [21]
    HarfBuzz Manual: HarfBuzz Manual
    This document is for HarfBuzz 12.2.0 . The latest version of this documentation can be found on-line at https://harfbuzz.github.io. HarfBuzz is a text ...Installing HarfBuzz · Building HarfBuzz · What does HarfBuzz do? · Hb-unicode
  22. [22]
    Font Support for Unicode Block 'Latin Extended-G' - FileFormat.Info
    Font Support for Unicode Block 'Latin Extended-G'. Summary. This is a list of fonts that support characters in the Latin Extended-G Unicode block. Detail ...