Two dots
The two dots (¨), also known as the diaeresis or umlaut, is a diacritical mark placed above a letter, typically a vowel, in various writing systems. It serves different functions depending on the language: as a diaeresis, it indicates that two adjacent vowels are pronounced separately (e.g., in English "naïve" or "coöperate"), preventing them from forming a diphthong. In Germanic languages like German, it represents an umlaut, denoting a specific vowel mutation or fronting (e.g., German ⟨ä⟩, ⟨ö⟩, ⟨ü⟩).[1][2] The mark has origins in medieval scribal abbreviations and evolved into its modern forms across European and other scripts.[3]Orthographic Uses
Diaeresis
The diaeresis is a diacritical mark consisting of two dots (¨) placed over the second of two adjacent vowels to indicate that they are pronounced as separate syllables rather than forming a diphthong.[4] This mark, also known as a trema in some contexts, serves to clarify vowel hiatus in writing.[5] The term "diaeresis" derives from the Late Latin diaeresis, borrowed from Ancient Greek diaíresis (διαίρεσις), meaning "division" or "separation," reflecting its function to divide syllables.[4] In Classical Greek, the diaeresis—often called a trema, from the Greek word for "perforation"—appeared as two dots over vowels like iota (ι) or upsilon (υ) to denote separate pronunciation in adjacent vowel pairs, aiding readers in ancient manuscripts and papyri.[6] This usage emerged in Hellenistic Greek orthography to resolve ambiguities in polytonic script, where it combined with accents for precise prosody.[5] The mark was later adopted into Latin-based orthographies during the Renaissance, as European scholars integrated Greek influences into vernacular languages, using it to adapt the Roman alphabet for similar phonetic separations in words borrowed from Greek or requiring hiatus indication.[5] In English, the diaeresis has been employed in loanwords and compounds to prevent misreading of vowel sequences, such as in "naïve" (pronounced /naɪˈiːv/, with distinct /aɪ/ and /iː/ sounds, rather than /neɪv/ as a single diphthong), "coöperate," and "reëlect."[2] These examples highlight its role in signaling syllable breaks, as in "coöperate" (/koʊˈɒpəreɪt/), where it avoids confusion with "cooperate" implying a blended sound.[7] However, its use in modern English has declined significantly since the early 20th century, largely supplanted by hyphens (e.g., "co-operate") or omission due to simplified spelling conventions, typesetting limitations, and autocorrect software that removes the dots.[7] Today, it persists mainly in stylistic publications like The New Yorker or proper names, but is considered obsolescent in general prose.[7] Unlike the umlaut, which modifies a vowel's sound for phonetic or grammatical purposes, the diaeresis strictly denotes separation without altering individual vowel quality.[4]Umlaut
The umlaut is a diacritic mark consisting of two dots placed over vowels such as a, o, or u to produce ä, ö, or ü, primarily in Germanic languages to denote vowel sounds resulting from assimilation to a following high vowel or glide. This process, known as regressive assimilation, fronted or raised the preceding vowel, creating distinct phonemes like /ɛ/ for ä, /ø/ for ö, and /y/ for ü.[8][9] Historically, the umlaut originated from the i-umlaut or j-umlaut sound shifts in Proto-Germanic and Old High German, where back or central vowels were altered before /i/, /iː/, or /j/ in the following syllable. For instance, in German, the word Apfel ("apple") evolved from Proto-Germanic aplaz, with the stem vowel /a/ fronted to /ɛ/ due to the influence of a subsequent /i/ in the plural form. This phonetic change became orthographically represented by the two dots, distinguishing it from the original vowels.[10][11] In modern German, the umlaut fulfills key grammatical roles beyond mere phonetic indication, often marking morphological alternations. It commonly signals plural forms of nouns, as in Mann ("man") becoming Männer ("men"); diminutives, such as Haus ("house") to Häuschen ("little house"); and comparative adjectives, like lang ("long") to länger ("longer"). These functions highlight the umlaut's integration into the language's inflectional system, where it triggers vowel modification to convey grammatical meaning. Unlike the trema (or diaeresis), which serves to separate adjacent vowels into distinct syllables without altering their quality—as briefly seen in visually similar but functionally distinct marks in other languages—the umlaut specifically effects a change in the vowel's articulation for assimilation or morphological purposes.[12][1] Similar principles apply in other Germanic languages like Swedish and Danish, though with variations in notation and phonetics. In Swedish, ä and ö represent /ɛː/ and /øː/, used in alternations such as man ("one") to män ("ones" or in compounds), reflecting historical fronting. Danish employs æ (akin to ä for /æ/) and ø (/ø/), as in fod ("foot") to fødder ("feet"), where umlaut indicates plural or other inflections, supplemented by unique letters like å for additional vowel distinctions.[8][13]Similar Diacritics in Other Scripts
In the Cyrillic script, the letter ё (U+0451 CYRILLIC SMALL LETTER IO) features two dots (a diaeresis) placed above the base letter е to denote the sound /jo/, distinguishing it from е, which typically represents /je/ or /e/.[14] This diacritic ensures clarity in pronunciation for Russian and related languages, where the rounded vowel quality of /o/ after a palatal /j/ requires explicit marking.[14] In Arabic script and its extensions, diacritics resembling two dots appear in forms like the fatha with two dots (U+065E ARABIC FATHA WITH TWO DOTS), used in the Kalami language to indicate a specific short open vowel sound, often /a/ with additional phonetic nuance in Dardic contexts.[15] In historical Turkic orthographies, front rounded vowels (/ø/ and /y/) were later standardized as ö and ü in the modern Latin-based Turkish alphabet following the 1928 script reform.| Script | Diacritic Form | Visual Representation | Phonetic Role |
|---|---|---|---|
| Cyrillic | Diaeresis on е | ё (/jo/) | Indicates palatalized rounded vowel /jo/, distinct from /je/ in е.[14] |
| Arabic (Kalami) | Fatha with two dots | ٞ (/a/ variant) | Modifies short /a/ for specific open vowel in Dardic phonology.[15] |
| Greek | Dialytika | Ϊ Ϋ, ϊ ϋ | Separates adjacent vowels (diaeresis), preventing diphthong formation, e.g., /i.u/ in ϋ. |
Historical Development
Origins in Ancient Scripts
The trema, or διάκρισις (diákrisis), denoting "division" or "separation," emerged in ancient Greek writing during the Hellenistic period around the 3rd century BCE as a diacritic to indicate vowel hiatus. Placed as two dots over the second of two adjacent vowels, it signaled that they should be pronounced separately rather than forming a diphthong, aiding clarity in poetic meter and recitation. This mark appeared primarily over iota (ϊ) and upsilon (ϋ), especially at word beginnings following another vowel, and is attested in Hellenistic literary texts, including poetry by Callimachus, where it preserved the rhythmic integrity of verses like those in his Aetia.[16] Possible precursors to such two-dot configurations may trace back to Egyptian hieroglyphs circa 2000 BCE, where small dot-like determinatives occasionally denoted duality or paired concepts in non-phonetic roles, such as clarifying semantic plurality without alphabetic intent. However, these hieroglyphic elements, part of the broader system of ideograms and classifiers, bear no direct evolutionary link to later alphabetic diacritics, serving instead as visual classifiers in a logographic script.[17][18] Semitic writing systems, including Phoenician and early Aramaic inscriptions from the 1st millennium BCE, employed dot clusters for rudimentary vowel indication and word separation, influencing the development of pointing systems. These evolved into the niqqud of Hebrew by the 7th–10th centuries CE, with two-dot variants like the shva (two vertical dots below a letter) marking reduced or epenthetic vowels, and combinations such as hiriq yod (a sublinear dot with yod) approximating short i-sounds in certain contexts.[19][20] The diaeresis entered Latin script in medieval times, used in manuscripts to clarify pronunciation in Greek-derived terms for ecclesiastical and scholarly texts. Early examples include occasional diaeresis marks in the 4th-century Codex Vaticanus by the original scribe, though their purpose remains uncertain; most such diacritics were added later by medieval correctors to denote separation over initial iotas and upsilons, reflecting Hellenistic textual traditions. Early attestations in Latin include 9th-century manuscripts of grammatical works, such as those copying Priscian, where diaeresis marked hiatus in Greek-derived vocabulary. By the 10th century, it appeared more frequently in glossed Bibles and patristic texts.[21][22][23]#History)Evolution in European Languages
The two dots diacritic, functioning as a diaeresis to separate vowels or as an umlaut to indicate vowel mutation, evolved significantly in European orthographies from medieval times onward, influenced by phonetic needs and technological advances in printing. Drawing from ancient Greek precedents where the diaeresis marked hiatus to prevent diphthong formation, the mark entered medieval Latin scripts to clarify pronunciation in Greek-derived terms. Diaeresis marks appear in some Carolingian minuscule manuscripts from the late 8th century onward, aiding the reading of classical and ecclesiastical texts with foreign phonetic elements.[24] The advent of the printing press in the 15th century accelerated the adoption and refinement of the two dots. Johannes Gutenberg's 42-line Bible of 1455, printed in Mainz with financial support from Johann Fust, represented a milestone in typographic precision for Latin texts, though it primarily used basic diacritics; Fust's subsequent partnership with Peter Schöffer in 1457 introduced more sophisticated type designs that facilitated the inclusion of accents for vernacular languages. As printing spread, the umlaut in German texts transitioned from a superscript e (e.g., a^e for ä) — a medieval convention to denote vowel fronting — to the more efficient two dots, first appearing consistently in printed works by the late 15th and early 16th centuries to save space and improve legibility. This shift was driven by phonetic accuracy, as the dots visually approximated the superscript e's loops while allowing for faster typesetting.[1][25] In the 19th century, linguistic scholarship and orthographic reforms further distinguished the functions of the two dots. Jacob Grimm documented the umlaut as a systematic vowel change in Germanic languages in his Deutsche Grammatik (1819–1837), coining the term "Umlaut" (meaning "change of sound") to describe the phonetic process underlying forms like Hand to Hände, influencing its standardized use as a distinct marker from the diaeresis. The 1901 German orthographic reform (Rechtschreibung), convened in Berlin, codified rules for the umlaut in official spelling while clarifying its separation from the diaeresis in cases of vowel separation, promoting consistency across education and publishing. Meanwhile, the French Academy promoted the tréma (diaeresis) in 17th-century dictionaries, such as the 1694 Dictionnaire de l'Académie française, to indicate separate pronunciation in words like aiguë, where it prevents fusion with preceding vowels, a practice that persisted through subsequent editions to reflect evolving phonetic norms.[26]Language-Specific Applications
In Germanic Languages
In German, the two dots, known as umlaut, play a mandatory role in inflectional morphology, particularly for marking plurals of nouns with susceptible back vowels such as u, o, or au. For instance, singular Haus (house) becomes plural Häuser, while Buch (book) forms Bücher.[27][28] Umlaut also appears in certain feminine forms derived from masculine nouns, as in Arzt (doctor) yielding Ärztin (female doctor).[28] In verbs, umlaut is required in certain conjugations of strong and modal verbs, such as the second-person singular fährst from fahren (to drive) or kannst from können (to be able).[27] When umlaut characters (ä, ö, ü) are unavailable, especially in non-Germanic keyboards or Swiss German contexts, digraph substitutions like ae, oe, and ue are used systematically—Häuser becomes Haeuser, though the umlaut form remains preferred in standard orthography.[28] Among Scandinavian languages, the two dots appear primarily as umlaut in Swedish for ä and ö, which evolved from superscript e notations simplified into diaeresis marks, while å (with a ring) historically derived from aa but shares typographic roots with umlaut developments in related vowels.[1] In Norwegian, diaeresis (ë) is employed sparingly in loanwords to indicate vowel separation, such as in adaptations of French or English terms like naïve. Danish orthography largely avoids two dots, favoring slashed letters like ø in place names such as Sønderborg, though ä and ö may appear rarely in foreign-derived names or domains.[29] In Dutch and Afrikaans, the two dots function as diaeresis to separate adjacent vowels and prevent diphthongization, as in Dutch reëel (real) or Afrikaans reën (rain).[30][31] Following the 1995 Dutch spelling reform, which primarily addressed compound nouns and suffixes, diaeresis usage has declined in favor of hyphenation or unmarked forms in some compounds, though it persists in words like coöperatie (cooperation) for clarity.[32] Afrikaans retains diaeresis more consistently in native words, such as reënboë (rainbows), to mark syllable breaks.[31] Phonologically, the two dots interact with ablaut (vowel gradation) in Germanic strong verbs, where umlaut modifies ablaut-derived forms for grammatical distinction; for example, in singen (to sing), ablaut shifts produce present sing (I), preterite sang (A), and past participle gesungen (U), with umlaut further altering the subjunctive to sänge via fronting of the ablaut vowel.[33] This pattern, rooted in i-umlaut phonetics, underscores the grammatical integration of the diacritic beyond mere sound shifts.[33] Dialectal variations in Germanic languages highlight retention of umlaut patterns, particularly in Bavarian, where the diacritic is preserved in informal speech to differentiate moods, such as indicative schlug versus subjunctive schlüge from schlagen (to hit), reflecting historical Upper German umlaut regularity.[11] In Bavarian corpora, umlaut alternants like ä from a appear at rates around 42% in Middle High German texts, maintaining phonological consistency into modern informal usage.[11]In Romance and Other Indo-European Languages
In Romance languages, the two dots, known as the diaeresis or tréma, serve primarily as an optional phonetic aid to indicate vowel hiatus, preventing the fusion of adjacent vowels into a diphthong and ensuring separate pronunciation. This contrasts with more obligatory uses in other language families, focusing here on clarity in native words and loanword adaptations. In French, the tréma is placed on the vowels i or u to mark them as distinct syllables, avoiding liaison or elision with preceding or following vowels; for example, in naïf (pronounced /nɛ.if/, not /nɛf/) and aiguë (/ɛ.ɡy/, not /ɛɡ/). The 1990 orthographic rectifications, approved by the Académie française, clarified its placement—positioning it on the second of two consecutive identical vowels (e.g., aïeul) or on u after a hard g (e.g., argüer)—while allowing hyphens as alternatives in compound words to simplify spelling without altering pronunciation.[34][35] In Spanish, the diaeresis (diéresis) is mandatory on the u in the diphthongs gue and gui to signal that the u is pronounced as /w/, breaking the default silence of u in those sequences; representative examples include pingüino (/piŋˈɡwino/) and vergüenza (/beɾˈɡwensa/). According to the Real Academia Española, this rule applies uniformly in modern Castilian orthography, distinguishing it from cases where u remains mute, such as guerra (/ˈɡera/).[36] Portuguese historically employed the trema (a diaeresis variant) on u in loanwords and certain native forms to indicate separate pronunciation, as in adaptations like lingüística (from German influences). The 1990 Orthographic Agreement, implemented across Portuguese-speaking countries by 2009, abolished the trema entirely in Portuguese or Portuguesized words, replacing forms like lingüística with linguística to promote orthographic unity; it is now restricted to foreign terms only when required by their original spelling.[37] Among other Indo-European languages, Albanian uses the diaeresis on e as ë to represent the schwa sound /ə/, a central vowel essential to its phonology, as in këndon ("sings"); this marks a key distinction from plain e (/ɛ/), with ë comprising about 7.74% of letters in standard Albanian texts. In Welsh, while the circumflex primarily denotes long vowels (e.g., ŵ and ŷ), historical manuscripts occasionally feature dot variants over vowels to indicate hiatus in diphthongs, though modern orthography favors separation by other means. English borrowings from Romance languages often retain the diaeresis for phonetic fidelity, such as naïve (from French naïf, pronounced /naɪˈiːv/) and Noël (/noʊˈɛl/), preserving the original hiatus in educated or formal usage.[38]In Non-Indo-European Languages
In Turkic languages, particularly Turkish, the two dots diacritic (known as umlaut or diaeresis) was introduced as part of the 1928 alphabet reform led by Mustafa Kemal Atatürk, which replaced the Ottoman Arabic script with a modified Latin alphabet to enhance literacy and align with Western modernization efforts.[39] The letters ö and ü represent the front rounded vowels /ø/ and /y/, respectively, and play a crucial role in Turkish vowel harmony, a phonological process where vowels in a word must share features of backness, rounding, and height to maintain phonetic cohesion.[40] This system ensures that suffixes and affixes adapt their vowels to match the root word's harmony, as seen in examples like ev-ler (houses, with front unrounded vowels) versus kapı-lar (doors, with back unrounded vowels), preventing disharmonic forms. Similar adaptations appear in other Turkic languages like Azerbaijani, which also employ ö and ü for vowel harmony following Soviet-era Latinization influences.[41] Hungarian, a Finno-Ugric language unrelated to Indo-European families, incorporates the two dots diacritic on ö and ü to denote the front rounded vowels /ø/ and /y/, borrowed from Germanic orthographic traditions during the language's standardization in the 19th century.[16] Historically, early printed texts from the 16th to 18th centuries often rendered these as u or o with two dots above, before the modern double acute accents (ő and ű) were formalized in the 20th century to distinguish long versions of the same vowels, such as /øː/ and /yː/. This usage supports Hungarian's vowel harmony, inherited from its Uralic roots, where front vowels trigger front harmony in suffixes, exemplified by ház-ban (in the house, back harmony) contrasting with kéz-ben (in the hand, front harmony).[42] In Vietnamese, an Austroasiatic language, the Quốc Ngữ romanization system—developed by Portuguese and French missionaries in the 17th century and officially adopted in the early 20th century—employs diacritics for its six tones, though the hỏi tone (rising, questioning intonation) is marked by a hook above rather than two dots; this system, finalized during French colonial standardization, integrates Latin elements to capture tonal distinctions essential for meaning, such as ma (ghost) versus mả (tomb, with underdot for falling tone).[43][44] Among Indigenous American languages, Nahuatl (Uto-Aztecan family) adaptations in Latin-based orthographies sometimes employ the diaeresis to separate adjacent vowels and clarify pronunciation, particularly in contexts influenced by Mexican Spanish, while the glottal stop (/ʔ/, or "saltillo") is typically represented by an apostrophe or 'h' rather than two dots.[45] Modern standardized orthographies, developed since the 20th century by institutions like SIL International, prioritize consistent vowel length and glottal marking, as in hua (to eat) versus huā (long vowel), to preserve the language's phonological structure amid colonial Latin script imposition.[46]Computing and Typography
Encoding Standards
The two dots diacritic, known as diaeresis or umlaut, is encoded in Unicode primarily through precomposed characters in the Latin-1 Supplement block (U+0080 to U+00FF), which includes code points such as U+00E4 for lowercase a with diaeresis (ä), U+00EB for lowercase e with diaeresis (ë), and U+00EF for lowercase i with diaeresis (ï).[47] These precomposed forms allow direct representation of letters with the diacritic attached. Additionally, Unicode provides the combining diaeresis (U+0308), a non-spacing mark that can be applied to any base character for flexible placement, enabling custom compositions like ä. Prior to the 1980s, the 7-bit ASCII standard (ANSI X3.4-1968) lacked code points for diacritics, limiting support to basic Latin letters and necessitating workarounds such as digraphs (e.g., "ae" substituting for ä in German text). This constraint persisted in early computing environments, where non-English characters were often transliterated or omitted. The introduction of ISO/IEC 8859-1 in 1987 addressed these gaps by extending the ASCII range to 8 bits, incorporating 191 characters including umlaut forms like ä (0xE4) in its Latin alphabet no. 1 subset, facilitating broader Western European language support.[48] In web and HTML contexts, these characters are referenced via named entities, such as ä for ä, ensuring compatibility across browsers and documents.[49] Unicode normalization forms further enhance interoperability: NFC (Normalization Form C) composes precombined characters like ä from base letter and diacritic, while NFD (Normalization Form D) decomposes them (e.g., a + U+0308), aiding searches, sorting, and legacy system migrations without altering semantic meaning.[50] Some variants of IBM's EBCDIC encoding, particularly English-focused ones prevalent in mainframe systems from the 1960s, ignored or inadequately mapped diacritics, treating umlauts as absent or substituting them with base letters due to its focus on punched-card legacy, while language-specific code pages provided support. The shift to UTF-8 in the post-1990s era marked a pivotal change, as this variable-length encoding fully supports Unicode's diacritic characters (e.g., ä as 0xC3 0xA4) and became the dominant standard for internet and software applications by the early 2000s, superseding single-byte sets like ISO 8859-1. The International Organization for Standardization (ISO) and the Unicode Consortium have played central roles in standardizing these encodings since 1991, when the Consortium was incorporated and began aligning with ISO/IEC 10646; their 1993 merger unified diaeresis and umlaut under shared code points, resolving prior discrepancies in international character sets.[51][52]Rendering and Input Challenges
The rendering of two-dot diacritics, such as the umlaut or diaeresis, exhibits variability across fonts, particularly in sans-serif typefaces where dot positioning may shift to avoid overlap with letter ascenders or descenders. For instance, in some designs, the dots are placed slightly offset or adjusted in height relative to the baseline, as seen in typefaces like Linux Libertine where they are positioned beside the stems for letters like Ä and Ö to maintain legibility.[53] This can lead to differences in appearance between desktop and mobile environments, where mobile browsers like Chrome on Android may fail to display umlauts entirely, rendering them as blank spaces due to font substitution or incomplete glyph support.[54] Inputting two-dot diacritics varies by platform and device. On Windows systems, users can employ keyboard shortcuts such as pressing CTRL+SHIFT+ followed by the colon key (:) and then the desired vowel to produce characters like ä, ö, or ü.[55] For macOS, holding the Option key and pressing 'u' activates a dead key, after which typing the vowel inserts the umlaut, such as Option+u followed by 'o' for ö. On mobile devices, including iOS and Android, virtual keyboards allow input by long-pressing a vowel key to reveal a popup menu with umlaut variants, enabling selection of ä, ö, or ü.[56] Cross-platform inconsistencies often arise when handling two-dot diacritics, especially in legacy software and file formats. In older applications, exporting text with umlauts to PDFs can result in dropped or garbled diacritics due to non-standard font encoding or lack of Unicode support, leading to characters appearing as placeholders or incorrect glyphs upon copying and pasting.[57] In informal texting environments, users without easy access to umlauts may approximate them using digraphs like "ae" for ä, "oe" for ö, and "ue" for ü, a practice rooted in historical transcription methods for electronic communication.[58] Localization challenges emerge in right-to-left (RTL) scripts, where combining diacritics like the two dots may misalign or detach from base characters due to bidirectional text rendering. To mitigate this, Unicode encoding is recommended over legacy ISO standards, ensuring proper attachment of marks in RTL scripts that use combining diacritics.[59] Solutions include CSS properties such asfont-feature-settings, which can activate OpenType features to fine-tune mark positioning and attachment in supported fonts, though it is intended for specialized cases beyond standard font-variant controls.[60]
Modern tools address these issues through assistive features. Word processors like Microsoft Word allow customization of autocorrect rules to replace common digraphs—such as "oe" with ö—streamlining input for users without dedicated keyboards.[61] For accessibility, screen readers including JAWS, NVDA, and VoiceOver leverage Unicode and language tagging (e.g., lang="de") to pronounce umlauts correctly in German text, with options to adjust symbol pronunciation via built-in utilities for accurate phonetic rendering.[62]