Fact-checked by Grok 2 weeks ago

Latin Extended-A

Latin Extended-A is a block of the Unicode Standard comprising 128 characters in the code point range from U+0100 to U+017F, which extends the Basic Latin and blocks by providing precomposed Latin letters with diacritical marks such as macrons, breves, ogoneks, and carons, along with ligatures like Œ (U+0152) and special letters like Ð (U+0110). This block originated from standards including ISO/IEC 8859 parts 2, 3, 4, and 9, as well as ISO/IEC 6937:1984, to support the representation of text in numerous languages that use Latin-based alphabets beyond the basic set. It includes compatibility digraphs such as (U+0132) for and LJ (U+01C7) for Croatian, facilitating accurate orthographic rendering in digital text. The characters in Latin Extended-A are essential for languages including , , , , , Croatian, , Danish, , , , , , , Galician, , , , , , Latvian, Lithuanian, Maltese, , , , , Romany, Sámi, Slovak, Slovenian, Sorbian, , , Turkish, Welsh, and others, enabling proper encoding of accented letters like Ā (U+0100) for Latvian and Ć (U+0106) for . As part of the Basic Multilingual Plane, it ensures compatibility with legacy systems while supporting modern multilingual computing needs.

Overview

Block Specifications

The Latin Extended-A Unicode block occupies the code point range from U+0100 to U+017F, encompassing 128 consecutive positions within the standard. This range follows immediately after the block (U+0080 to U+00FF), extending support for additional Latin-based characters beyond the initial ISO Latin-1 set. The block was introduced in Unicode version 1.0 in October 1991 and was fully allocated with 128 characters in version 1.1 in June 1993. All 128 s in this block are assigned, with no reserved or unallocated positions, and they exclusively belong to the category (denoted as "L" for letters in Unicode properties). As a component of the Basic Multilingual Plane (BMP), which corresponds to plane 0 in the Unicode code space (U+0000 to U+FFFF), Latin Extended-A facilitates efficient encoding for legacy systems and ensures compatibility within the 16-bit subset. For a visual overview of the characters and their glyphs, refer to the official code chart U0100.pdf.

Purpose and Scope

The Latin Extended-A block, spanning the code point range U+0100–U+017F, encodes Latin letters derived from the ISO/IEC 8859 series (excluding the Latin-1 subset in Part 1) and the ISO 6937 standard, providing support for extended European alphabets. This block complements the Basic Latin (U+0000–U+007F) and (U+0080–U+00FF) blocks by incorporating precomposed characters with diacritical marks, ensuring compatibility with legacy 8-bit encodings used in European text processing. Its primary aim is to facilitate the representation of accented and modified Latin characters required by languages that extend beyond the basic ASCII and Latin-1 repertoires, focusing on orthographic needs in scripts. The block covers 63 pairs of uppercase and lowercase letters, along with special forms such as ligatures, tailored for these alphabets. It deliberately excludes phonetic symbols and non-Latin extensions, which are addressed in separate blocks like IPA Extensions (U+0250–U+02AF). As of Unicode 17.0, released in 2025, the block contains no new character assignments and has remained stable since its full establishment in version 1.1, with all 128 code points allocated to maintain consistency in legacy support.

Character Categories

Diacritic-Equipped Letters

The (U+0100–U+017F) encompasses numerous uppercase and lowercase letter pairs modified by diacritics, enabling precise representation of phonetic distinctions in various languages. These modifications, totaling 63 pairs excluding non-letters and specials, facilitate support for orthographies requiring indications of , , , or palatalization. Diacritics in this block build on traditions while extending to modern usages, with forms like the and deriving from ancient prosodic marks adapted for contemporary scripts. Letters with macrons feature a horizontal bar (¯) above the base letter, a originating from the makrón ("long"), initially used in Greco-Roman metrics to denote syllable length and later adopted for vowel duration in languages like Latvian and . In this block, macrons appear on A, E, I, O, and Y, distinguishing long vowels from short counterparts in , where they alter pronunciation to reflect historical Indo-European length contrasts. Examples include Ā/ā (U+0100/U+0101), Ē/ē (U+0112/U+0113), Ī/ī (U+012A/U+012B), Ō/ō (U+014C/U+014D), and Ū/ū (U+016A/U+016B). The (´), etymologically from Latin acūtus ("sharp"), a of oxús for high pitch in prosody, marks or palatalization; while basic forms like Á/á overlap with the (U+00C1/U+00E1), Extended-A extends it to consonants for languages such as and Croatian. Here, it appears on C, L, N, R, S, and Z, indicating soft or sounds, as in Ć/ć (U+0106/U+0107) for the palatalized /tɕ/ in . Other instances include Ĺ/ĺ (U+0139/U+013A), Ń/ń (U+0143/U+0144), Ŕ/ŕ (U+0154/U+0155), Ś/ś (U+015A/U+015B), and Ź/ź (U+0179/U+017A). Breves (˘), shaped like an inverted arc and named from Latin brevis ("short") to contrast the , indicate short vowels or reduced sounds, originating in Latin grammatical texts for phonetic clarity. In Extended-A, they modify A, E, G, I, O, and U, as seen in for short vowels, with examples like Ă/ă (U+0102/U+0103), Ĕ/ĕ (U+0114/U+0115), Ğ/ğ (U+011E/U+011F), Ĭ/ĭ (U+012C/U+012D), Ŏ/ŏ (U+014E/U+014F), and Ŭ/ŭ (U+016C/U+016D). Letters with a dot above (˙) denote distinct consonants or dotted vowels, a tracing to medieval scribal practices for emphasis or to avoid confusion with undotted forms like ı. In this block, it equips C, E, G, I, and Z for languages including Maltese and Lithuanian, such as Ċ/ċ (U+010A/U+010B), Ė/ė (U+0116/U+0117), Ġ/ġ (U+0120/U+0121), İ/ı (U+0130/U+0131), and Ż/ż (U+017B/U+017C). The ogonek (˛), a hook under the letter meaning "little tail" in Polish, emerged in 15th-century Polish orthography to represent nasal vowels, inspired by Cyrillic forms and later adopted in Lithuanian. It attaches to A, E, I, and U in Extended-A, marking nasalization as in Polish Ą/ą (U+0104/U+0105) and Ę/ę (U+0118/U+0119), or Lithuanian Į/į (U+012E/U+012F) and Ų/ų (U+0172/U+0173). Caron (ˇ), also known as háček ("little hook" in Czech), evolved from a supralinear dot introduced by Jan Hus in early 15th-century Czech orthography to simplify digraphs and indicate palatalization in Slavic languages. In Extended-A, it adorns C, D, E, L, N, R, S, T, and Z, as in Č/č (U+010C/U+010D), Ď/ď (U+010E/U+010F), Ě/ě (U+011A/U+011B), Ľ/ľ (U+013D/U+013E), Ň/ň (U+0147/U+0148), Ř/ř (U+0158/U+0159), Š/š (U+0160/U+0161), Ť/ť (U+0164/U+0165), and Ž/ž (U+017D/U+017E). Other diacritic-adjacent forms include ligatures like Œ/œ (U+0152/U+0153), a fusion of O and E from Latin orthography denoting /œ/ in French, and IJ/ij (U+0132/U+0133), a Dutch digraph for /ɛi/. These transitional forms bridge basic ligatures and accented letters, with Œ/œ etymologically rooted in Vulgar Latin vowel shifts. The following table catalogs all 63 diacritic-equipped letter pairs in the block, with code points, forms, primary diacritic, and a brief example language based on standard usages.
Code Point (Upper/Lower)UppercaseLowercasePrimary DiacriticExample Language
U+0100 / U+0101ĀāMacronLatvian
U+0102 / U+0103ĂăBreveRomanian
U+0104 / U+0105ĄąOgonekPolish
U+0106 / U+0107ĆćAcutePolish
U+0108 / U+0109ĈĉCircumflexEsperanto
U+010A / U+010BĊċDot AboveMaltese
U+010C / U+010DČčCaronCzech
U+010E / U+010FĎďCaronCzech
U+0110 / U+0111ĐđStrokeCroatian
U+0112 / U+0113ĒēMacronLatvian
U+0114 / U+0115ĔĕBreveRomanian
U+0116 / U+0117ĖėDot AboveLithuanian
U+0118 / U+0119ĘęOgonekPolish
U+011A / U+011BĚěCaronCzech
U+011C / U+011DĜĝCircumflexEsperanto
U+011E / U+011FĞğBreveAzerbaijani
U+0120 / U+0121ĠġDot AboveMaltese
U+0122 / U+0123ĢģCedillaLatvian
U+0124 / U+0125ĤĥCircumflexEsperanto
U+0126 / U+0127ĦħStrokeMaltese
U+0128 / U+0129ĨĩTildePortuguese
U+012A / U+012BĪīMacronLatvian
U+012C / U+012DĬĭBreveRomanian
U+012E / U+012FĮįOgonekLithuanian
U+0130 / U+0131İıDot AboveTurkish
U+0132 / U+0133IJijLigature (IJ)Dutch
U+0134 / U+0135ĴĵCircumflexEsperanto
U+0136 / U+0137ĶķCedillaLatvian
U+0139 / U+013AĹĺAcuteSlovak
U+013B / U+013CĻļCedillaLatvian
U+013D / U+013EĽľCaronSlovak
U+013F / U+0140ĿŀMiddle DotCatalan
U+0141 / U+0142ŁłStrokePolish
U+0143 / U+0144ŃńAcutePolish
U+0145 / U+0146ŅņCedillaLatvian
U+0147 / U+0148ŇňCaronCzech
U+014A / U+014BŊŋStrokeInuktitut
U+014C / U+014DŌōMacronLatvian
U+014E / U+014FŎŏBreveRomanian
U+0150 / U+0151ŐőDouble AcuteHungarian
U+0152 / U+0153ŒœLigature (OE)French
U+0154 / U+0155ŔŕAcuteSlovak
U+0156 / U+0157ŖŗCedillaLatvian
U+0158 / U+0159ŘřCaronCzech
U+015A / U+015BŚśAcutePolish
U+015C / U+015DŜŝCircumflexEsperanto
U+015E / U+015FŞşCedillaTurkish
U+0160 / U+0161ŠšCaronCzech
U+0162 / U+0163ŢţCedillaRomanian
U+0164 / U+0165ŤťCaronSlovak
U+0166 / U+0167ŦŧStrokeSami
U+0168 / U+0169ŨũTildePortuguese
U+016A / U+016BŪūMacronLatvian
U+016C / U+016DŬŭBreveEsperanto
U+016E / U+016FŮůRing AboveCzech
U+0170 / U+0171ŰűDouble AcuteHungarian
U+0172 / U+0173ŲųOgonekLithuanian
U+0174 / U+0175ŴŵCircumflexWelsh
U+0176 / U+0177ŶŷCircumflexWelsh
U+0178 (lower: U+00FF)ŸÿDiaeresisFrench
U+0179 / U+017AŹźAcutePolish
U+017B / U+017CŻżDot AbovePolish
U+017D / U+017EŽžCaronCzech

Ligatures and Special Forms

The Latin Extended-A block includes several ligatures and special character forms that represent fused or variant glyphs essential for specific languages and historical typography. These characters address orthographic needs beyond simple diacritic additions, such as combining letters into single units for phonetic or aesthetic reasons, or providing modified shapes for phonetic distinctions in non-Latin scripts adapted to Latin alphabets. Ligatures in this block primarily consist of the IJ and OE combinations. The capital ligature IJ (U+0132) and its lowercase counterpart ij (U+0133) are used in Dutch to represent the digraph "ij," which functions as a single vowel sound and is treated as a distinct letter in the alphabet; graphically, ij renders as a fused i and j, often with the j's dot shared or omitted for compactness. Similarly, Œ (U+0152) and œ (U+0153) form the OE ligature, employed in French for words like "œuvre" to denote the /œ/ sound, and in for analogous diphthongs; visually, Œ fuses the o and e, with the e's crossbar integrated into the o's curve, creating a rounded, enclosed form reminiscent of medieval scribal practices. Special forms encompass stroked letters, dotless variants, and historical shapes tailored to linguistic requirements. The D with stroke Đ (U+0110) and đ (U+0111) are vital in Serbo-Croatian (Croatian and Serbian) to represent the /dʒ/ sound, as well as in Vietnamese and Sami languages; the stroke through the d stem distinguishes it phonetically without altering basic letter height. In Polish, the L with stroke Ł (U+0141) and ł (U+0142) denote the /w/ sound, with the vertical stroke crossing the l's descender for clear differentiation in cursive scripts. The dotless i variants—I with dot above İ (U+0130) and dotless ı (U+0131)—support Turkish and Azerbaijani case rules, where uppercase İ retains the dot to match dotted lowercase i, while ı avoids it to prevent redundancy in words like "İstanbul"; this pairing ensures proper titlecasing without semantic shifts. Additional special forms include the historical long s ſ (U+017F), a variant lowercase s used in early modern printing until the 18th century, featuring an elongated ascender similar to f but without the crossbar, still relevant in Fraktur and Gaelic typography. For Sami languages, the T with stroke Ŧ (U+0166) and ŧ (U+0167) represent /θ/, with a horizontal bar through the t's stem, while the eng Ŋ (U+014A) and ŋ (U+014B) encode the velar nasal /ŋ/, shaped like a tailed n. The Catalan legacy form ŀ (U+0140), L with middle dot, combines l and a centered dot (·) for the /ɲ/ sound in words like "l·luna," though modern usage favors separate characters. A deprecated special form is the small letter n preceded by apostrophe (U+0149), once used in Afrikaans for contractions but now discouraged in favor of composed sequences.
Code PointCharacterNamePrimary Usage
U+0132IJLATIN CAPITAL LIGATURE IJDutch digraph
U+0133ijLATIN SMALL LIGATURE IJDutch digraph
U+0152ŒLATIN CAPITAL LIGATURE OEFrench, Occitan
U+0153œLATIN SMALL LIGATURE OEFrench, Occitan
U+0110ĐLATIN CAPITAL LETTER D WITH STROKESerbo-Croatian, Vietnamese, Sami
U+0111đLATIN SMALL LETTER D WITH STROKESerbo-Croatian, Vietnamese, Sami
U+0141ŁLATIN CAPITAL LETTER L WITH STROKEPolish
U+0142łLATIN SMALL LETTER L WITH STROKEPolish
U+0130İLATIN CAPITAL LETTER I WITH DOT ABOVETurkish
U+0131ıLATIN SMALL LETTER DOTLESS ITurkish
U+017FſLATIN SMALL LETTER LONG SHistorical typography
U+0166ŦLATIN CAPITAL LETTER T WITH STROKESami
U+0167ŧLATIN SMALL LETTER T WITH STROKESami
U+014AŊLATIN CAPITAL LETTER ENGSami
U+014BŋLATIN SMALL LETTER ENGSami
U+0140ŀLATIN SMALL LETTER L WITH MIDDLE DOTCatalan (legacy)
U+0149'nLATIN SMALL LETTER N PRECEDED BY APOSTROPHEAfrikaans (deprecated)
This table highlights key examples, emphasizing graphical fusion or modification for orthographic efficiency.

Usage

European Language Support

The Latin Extended-A block provides essential characters for extending the basic Latin alphabet to accommodate the phonetic and orthographic needs of numerous European languages, particularly those in Central, Eastern, and Northern Europe. These extensions often involve diacritics such as ogoneks, acute accents, carons (háčeks), and cedillas to represent palatalized consonants, nasal vowels, and length distinctions, enabling precise spelling rules that distinguish sounds not present in the standard 26-letter alphabet. For instance, in Polish orthography, characters like ą (U+0105), ę (U+0119), and ł (U+0142) denote nasal vowels and a unique lateral approximant, respectively, as seen in words such as "Łódź" (a city name meaning "boat foundry"), where ł produces a sound akin to English "w". Similarly, ó (U+00D3 from Latin-1 Supplement, but accented forms like ś U+015B integrate with Extended-A for sibilants) follows rules for historical length or openness, ensuring etymological clarity in loanwords and native terms. These elements were standardized in the Polish alphabet during the 19th-century orthographic reforms to unify regional variations. In Czech and Slovak, the block supports the widespread use of the caron diacritic for palatalization and affrication, with characters including č (U+010D), ď (U+010F), ě (U+011B), ň (U+0148), ř (U+0159), š (U+0161), ť (U+0165), and ž (U+017E). This diacritic, known as háček, alters consonant articulation—e.g., č represents /tʃ/ as in "český" (Czech)—and is applied according to phonological rules derived from 19th-century Jan Hus-inspired reforms, which aimed to phonemically represent the language's Slavic features while avoiding digraphs. Slovak orthography mirrors this closely but adds acute accents on vowels for length, integrating seamlessly with Extended-A consonants for compound words like "šťastie" (happiness), where š and ť denote softened s and t sounds. These conventions facilitate consistent spelling in literature and official documents across both languages. Croatian and Serbian Latin scripts draw heavily on the block for their shared South Slavic orthography, employing č (U+010D), ć (U+0107), đ (U+0111), š (U+0161), and ž (U+017E), alongside digraphs like dž (composed as d + ž) and lj/nj (composed). The letter đ specifically marks a , as in "đak" (), and follows ekavski/ikavski dialect rules standardized in the 19th century by Vuk Karadžić to promote phonetic spelling. In Croatian, these extend the alphabet for ijekavian variants, ensuring differentiation in words like "čovjek" (person), where č and j reflect palatal shifts; Serbian Latin usage aligns similarly, though Cyrillic predominates. This integration supports bilingual contexts and preserves historical ties to older Glagolitic influences. Latvian orthography utilizes macrons for vowel length and cedillas for palatal consonants, incorporating ā (U+0101), ē (U+0113), ģ (U+0123), ī (U+012B), ķ (U+0137), ļ (U+013C), ņ (U+0146), ō (U+014D), ŗ (U+0157), ū (U+016B), along with caron forms like č (U+010D), š (U+0161), and ž (U+017E). Adopted in the 1922 reform to replace older Gothic-influenced digraphs, these rules emphasize phonemic accuracy—e.g., ā in "māsa" (sister) indicates a long /aː/—and apply to stress patterns in inflected nouns and verbs, distinguishing minimal pairs like "vārds" (word) from "vards" (non-standard short form). The system supports Latvia's Finno-Ugric substrate while aligning with Indo-European roots. Other Baltic and Finno-Ugric languages like Lithuanian employ ogoneks for nasalization (ą U+0105, ę U+0119, į U+012F, ų U+0173) and dots for palatals (ė U+0117, ū U+016B), per 20th-century reforms that preserved archaic Indo-European features; for example, ą in "mąstymas" (thinking) marks a nasal /õ/. Hungarian integrates double acute accents on ő (U+0151) and ű (U+0171) for closed long vowels, as in "tő" (stem), following 19th-century rules for vowel harmony. Northern Sami, a Uralic language, uses đ (U+0111), ŋ (U+014B), and ŧ (U+0167) for fricatives and nasals, standardized in 1979—e.g., "ŋ" in "sáŋat" (autumn)—to reflect consonant gradation in verbs. Western European examples include Dutch's ij (U+0133) as a ligature for the /ɛi/ diphthong in "ijzer" (iron), treated as a single letter in uppercase forms like IJ, per historical conventions; French's œ (U+0153) in loanwords like "cœur" (heart), retaining medieval ligature for /ø/; and Catalan's ŀ (U+0140) for geminate /ɲ/, as in "ŀluna" (moon), though often composed with middle dot. Additional languages benefiting include Sorbian (ł U+0142 for /w/), Livonian (ŗ U+0157), Welsh (ŵ U+0175, ŷ U+0177 for mutations), Slovenian (č, š, ž), and Turkish/Azerbaijani (ı U+0131, ğ U+011F for vowel harmony and softening). These characters collectively enable over 20 European orthographies to maintain phonological fidelity without digraph proliferation.

Non-European and Transliterative Applications

The Latin Extended-A block supports non-European languages, including several African languages, through characters that represent specific phonetic sounds not covered in basic scripts. In Afrikaans, a language derived from Dutch with African roots, diacritics are used sparingly, primarily for loanwords, and the character ʼn (U+0149, Latin small letter n preceded by apostrophe) was once encoded for contractions like "het nie" but is now deprecated in favor of separate apostrophe and n. Additionally, Ŋ (U+014B) appears in African languages like Mende for the velar nasal, aiding phonetic transcription in linguistic contexts. Indigenous languages of the Arctic, such as Greenlandic (Kalaallisut), historically employed characters from this block in older orthographies to mark nasalized vowels, including Ĩ (U+0128) and Ũ (U+0168), though modern usage favors basic Latin letters with length indicated by doubling. The block's basic accented forms provide limited support for such scripts, often requiring composition for full representation. In transliteration systems for non-European languages, Latin Extended-A characters facilitate romanization of tonal and phonetic features. For Pinyin, the standard romanization of Mandarin Chinese, macrons like Ā (U+0100), Ē (U+0114), and Ō (U+014C) denote long vowels or specific tones, though they are frequently composed from base letters and diacritics in digital text. Vietnamese legacy orthographies incorporate letters such as Ă (U+0102) and Đ (U+0110) for distinct vowel qualities and consonants, extending beyond Latin-1 Supplement needs in historical texts. Esperanto, an international auxiliary language with global adoption, relies on caron-equipped letters like Ĉ (U+0108), Ĝ (U+011C), Ĥ (U+0124), Ĵ (U+0134), Ŝ (U+015C), and Ŭ (U+016C) to represent unique consonants, enabling precise phonetic rendering. Phonetic applications extend to linguistic transcription for African languages, where ogonek marks like Ą (U+0104) appear in borrowings or adaptations from Baltic influences, though such uses are niche. Caron diacritics, as in Č (U+010C), also support transliterations of Arabic names into Latin script for scholarly or administrative purposes in African contexts. However, the block has limitations for complex non-European phonologies; for instance, African click consonants (e.g., ǂ U+01C2) are encoded in the subsequent Latin Extended-B block to accommodate languages like Khoisan.

History and Development

Origins in ISO Standards

The Latin Extended-A Unicode block traces its origins to several pre-Unicode international standards developed by the International Organization for Standardization (ISO) in the 1980s, which aimed to extend the basic 7-bit ASCII set to support additional Latin-script characters for European languages. Primarily, it incorporates characters from ISO/IEC 8859-2, published in 1987 as "Latin Alphabet No. 2," designed for Central and Eastern European languages such as Polish and Czech, including diacritic marks like the ogonek (e.g., ) and acute accent on consonants (e.g., ). It also incorporates characters from ISO/IEC 8859-3 (Latin Alphabet No. 3, 1988) for Southern European languages including Turkish, and from ISO/IEC 8859-9 (Latin Alphabet No. 5, 1999), an update for Turkish orthography. This standard addressed the need for 8-bit encodings beyond the Western European focus of ISO/IEC 8859-1 (Latin-1), which was excluded from Latin Extended-A to avoid overlap, with its characters instead allocated to the separate Latin-1 Supplement block in Unicode. Further contributions came from ISO/IEC 8859-4, first published in 1988 and revised in 1998 to better support Baltic languages like Latvian and Lithuanian, providing characters with macrons (e.g., ) and other diacritics essential for these orthographies. Additionally, elements from ISO 6937, a 1983 standard for coded character sets in text communication (including multimedia applications), influenced the inclusion of special forms such as the apostrophe n (), preserved in Unicode for legacy compatibility with variable-length encodings that combined spacing and non-spacing elements. These ISO standards collectively formed the basis for European Latin extensions, prioritizing characters not covered in earlier 7-bit or basic 8-bit sets. The Unicode Consortium's selection process for Latin Extended-A involved mapping these 8-bit ISO code pages into the 16-bit Unicode space during the late 1980s and early 1990s, with a focus on harmonizing European extensions to create a unified repertoire. Key milestones include ISO drafts from the 1980s, such as those for the 8859 series developed under ECMA and , which directly informed the Unicode 1.0 proposal in 1990 and its release in 1991, ensuring compatibility while expanding coverage for accented letters and special symbols. This mapping effort excluded redundancies from and emphasized characters vital for accurate representation in information interchange.

Unicode Evolution and Changes

The Latin Extended-A block was introduced in Unicode 1.0 in 1991. In Unicode 1.1 in 1993, U+017F LATIN SMALL LETTER LONG S was added to support the rendering of historical texts, particularly those using early modern typography where the long s variant appeared in Fraktur and other scripts until the 18th century. In Unicode 3.0 (2000), the standard clarified key properties for characters in this block, including canonical decomposition mappings for accented letters such as U+0100 Ā decomposing to U+0041 A followed by U+0304 COMBINING MACRON, enabling consistent normalization in text processing. Similarly, ligatures like U+0152 Œ LATIN CAPITAL LIGATURE OE received specified compatibility decompositions to U+004F O and U+0045 E, supporting legacy compatibility without altering canonical forms. Unicode 5.2 (2009) introduced the deprecation of U+0149 ʼn LATIN SMALL LETTER N PRECEDED BY APOSTROPHE, marking it as discouraged for new use and recommending the sequence U+02BC MODIFIER LETTER APOSTROPHE followed by U+006E LATIN SMALL LETTER N to represent contractions like "o'n" in Irish orthography, thereby promoting simpler plain text representations. The block has remained stable from Unicode 6.0 through Unicode 17.0 (2024), with no new code points added and all 128 positions fully assigned since version 1.1, reflecting Unicode's policy of freezing complete blocks to ensure backward compatibility. Property updates have been limited to refinements in the Unicode Character Database, confirming the bidirectional class as L (Left-to-Right) for all characters to align with Latin script rendering, and decomposition types such as canonical for diacritic-equipped letters and compatibility for ligatures to facilitate normalization processes. Looking forward, the block is considered frozen for compatibility, with potential future adjustments limited to aliasing or property annotations rather than structural changes, preserving its role in encoding legacy Latin extensions derived from ISO standards.

References

  1. [1]
    Chapter 7 – Unicode 17.0.0
    The Latin Extended-A block contains a collection of letters that, when added to the letters contained in the Basic Latin and Latin-1 Supplement blocks, allow ...
  2. [2]
    Latin Extended-A - Unicode
    Latin Extended-A ; European Latin ; 0100, Ā, Latin Capital Letter A With Macron ; ≡ ; ↓ ; 0101, ā, Latin Small Letter A With Macron.
  3. [3]
    [PDF] Latin Extended-A - The Unicode Standard, Version 17.0
    Latin Extended-A has a range of 0100–017F and includes characters like Ā, ā, Ē, ē, and ğ.
  4. [4]
  5. [5]
    Unicode 17.0.0
    Sep 9, 2025 · This page summarizes the important changes for the Unicode Standard, Version 17.0.0. This version supersedes all previous versions of the Unicode Standard.
  6. [6]
    Macron - Etymology, Origin & Meaning
    From Greek makron (long) via Latin, this term means a short horizontal line over a vowel indicating length. Origin reflects "long" from PIE root *mak-.
  7. [7]
    Breve - Etymology, Origin & Meaning
    Originating c.1300 from Latin breve meaning "short," breve denotes a letter of authority, a medieval short musical note, and a grammatical mark indicating ...
  8. [8]
    [PDF] Design and Positioning of Diacritical Marks in Latin Typefaces Authors
    Character dcroat (Đ) is used in Croatian, Viet- namese and Sami language. The uppercase ver- sion can also be found in Islandic, but the low- ercase version is ...
  9. [9]
    Latvian Alphabet: Guide to All 33 Letters, Diacritics, and ... - Preply
    The Latvian alphabet consists of 33 letters: a, ā, b, c, č, d, e, ē, f, g, ģ, h, i, ī, j, k, ķ, l, ļ, m, n, ņ, o, p, r, s, š, t, u, ū, v, z, and ž. It uses the ...Missing: Unicode | Show results with:Unicode
  10. [10]
    Character Requirements for Europe/Eurasian (Latin) Orthographies
    Feb 4, 2013 · Character Requirements for Europe/Eurasian (Latin) Orthographies ; 01DA, LATIN SMALL LETTER U WITH DIAERESIS AND CARON ; 01DB, LATIN CAPITAL ...
  11. [11]
    Northern Sámi language and alphabet - Omniglot
    Dec 25, 2023 · ... official orthography for use in all three countries was adopted in 1979, and last modified in 1985. In Norway Northern Sámi is an official ...
  12. [12]
    Unicode Character 'LATIN SMALL LIGATURE IJ' (U+0133)
    LATIN SMALL LETTER I J. Index entries, SMALL LIGATURE IJ, LATIN LIGATURE IJ ... Dutch. Approximations, 0069 006A. Version, Unicode 1.1.0 (June, 1993) ...
  13. [13]
    Unicode Character 'LATIN SMALL LIGATURE OE' (U+0153)
    Unicode Data. Name, LATIN SMALL LIGATURE OE. Block, Latin Extended-A. Category, Letter, Lowercase [Ll]. Combine, 0. BIDI, Left-to-Right [L]. Mirror, N.
  14. [14]
    U+013F, U+0140 Latin Capital / Small L with Middle Dot
    Apr 16, 2004 · U+0140 is a small 'l' with a middle dot, intended for Catalan, but often replaced by U+006C U+00B7. It's a compatibility character, and the dot ...
  15. [15]
    Romanization - Chinese Mac
    Introduction. Pinyin is scattered among the Latin-1, Latin Extended-A, and Latin Extended-B blocks in Unicode: ā á ǎ à a ē é ě è e ī í ǐ ì i ō ó ǒ ò o ū ú ǔ ù ...
  16. [16]
    [PDF] Latin Extended-B - The Unicode Standard, Version 17.0
    These charts are provided as the online reference to the character contents of the Unicode Standard, Version 17.0 but do not provide all the information needed ...Missing: limitations | Show results with:limitations
  17. [17]
    ISO 8859-2:1987 Information processing — 8-bit single byte coded ...
    Publication date. : 1987-02 ; Stage. : Withdrawal of International Standard [95.99] ; Edition. : 1 ; Number of pages. : 6 ; Technical Committee : ISO/IEC JTC 1/SC 2.
  18. [18]
    ISO 8859-4:1988 Information processing — 8-bit single-byte coded ...
    Publication date. : 1988-03 ; Stage. : Withdrawal of International Standard [95.99] ; Edition. : 1 ; Number of pages. : 5 ; Technical Committee : ISO/IEC JTC 1/SC 2.Missing: Baltic | Show results with:Baltic
  19. [19]
    ISO 6937-2:1983/Add 1:1989
    Coded character sets for text communication — Part 2: Latin alphabetic and non-alphabetic graphic characters ...Missing: multimedia ligatures Unicode
  20. [20]
    [PDF] Guide to the use of character set standards in Europe - Unicode
    Jul 23, 1999 · ISO 8859-2) ... The characters of the Latin script occupy the first four named blocks BASIC LATIN, LATIN-1-SUPPLEMENT, LATIN EXTENDED-A, LATIN ...
  21. [21]
    None
    Summary of each segment:
  22. [22]
  23. [23]
    None
    Below is a merged summary of the `UnicodeData.txt` content based on all provided segments. To retain as much information as possible in a dense and organized format, I will use tables in CSV format where applicable, supplemented by narrative text for additional details and context. The response will cover properties, decompositions, bidirectional classes, deprecation status, and useful URLs, focusing on the key Unicode points U+0149 and U+0100, as well as the Latin Extended-A block (U+0100–U+017F).
  24. [24]
    Accumulated Feedback on PRI #489 - Unicode
    One line reads as follows: 0149; 02BC 006E; Deprecated # LATIN SMALL ... LETTER N PRECEDED BY APOSTROPHE, has had the Deprecated property since Unicode 5.2.
  25. [25]
  26. [26]