Arabic script
The Arabic script is an abjad writing system comprising 28 letters, primarily used to write the Arabic language and adapted for numerous others such as Persian, Urdu, Pashto, and Malay, characterized by its right-to-left cursive flow and positional variations in letter forms.[1] Originating from the Nabatean Aramaic cursive script around the 4th century CE, with additional influences from Syriac, it evolved through transitional "Nabateo-Arabic" phases and was standardized between the 5th and 6th centuries, as evidenced by early inscriptions like the Namāra epitaph from 328 CE.[2][3] This script's development accelerated with the rise of Islam in the 7th century, particularly through its role in transcribing the Quran, transitioning from rudimentary Hijazi styles—marked by slanted, undotted letters on parchment—to more angular Kufic forms in the 9th century and fluid, rounded Naskh scripts by the 11th century for improved readability.[4][2] Key structural features include letters that connect in cursive chains, with most adopting 2 to 4 allographic shapes depending on their position (isolated, initial, medial, or final) in a word, and diacritical dots to differentiate similar forms, while short vowels are optionally indicated by diacritics and long vowels by specific letters.[1] Lacking uppercase or lowercase distinctions, the script's flexibility allowed it to accommodate non-Semitic languages by adding letters or modifications, such as the four extra letters in Persian.[5][6] Historically, it served not only linguistic but also artistic and mystical purposes, with letters assigned numerical values linked to lunar stations and talismanic significance in Islamic tradition.[5] In modern usage, the Arabic script remains the third-most widely employed writing system globally, supporting over 400 million native Arabic speakers and additional users in regions from the Middle East to South Asia and Southeast Asia, though contemporary texts often omit vowel diacritics for brevity.[7][1] Its calligraphic styles, including the everyday Ruqʿa and ornate Thuluth, continue to influence typography, digital fonts, and cultural expressions, while adaptations persist in languages like Kurdish (Sorani) and Uyghur, reflecting its enduring adaptability across diverse linguistic and technological contexts.[2][5]History
Origins and early development
The Arabic script originated as a derivative of the Nabataean variant of the late Aramaic alphabet, which itself evolved from the earlier Phoenician script around the 4th century CE in the Arabian Peninsula and surrounding regions.[8] This development occurred among Arab tribes in northern Arabia and the Levant, where the Nabataean kingdom (c. 320 BCE–106 CE) facilitated cultural and linguistic exchanges through trade routes connecting Petra, Syria, and the Hijaz.[9] The script's emergence reflects a gradual adaptation of Aramaic's 22-letter consonantal system to accommodate Arabic phonetics, expanding to 28 letters by incorporating additional sounds unique to Arabic.[10] Key evidence of proto-Arabic forms appears in early inscriptions, such as the Namara inscription from 328 CE, discovered near Damascus in modern-day Syria. This funerary stele for the Lakhmid king Imru' al-Qays is written in the Nabataean script but employs classical Arabic language, marking it as one of the oldest dated attestations of written Arabic and demonstrating the script's transitional use for Arabic texts.[11] Similarly, the Zabad inscription from 512 CE, found near Aleppo, is a trilingual dedication in Greek, Syriac, and Arabic on a church lintel, with the Arabic portion showcasing emerging cursive tendencies and terms like "al-ilah" (the God), highlighting the script's role in pre-Islamic religious and communal contexts.[12] These artifacts, primarily from funerary and dedicatory purposes, illustrate the script's initial application in northern Arabian and Syrian territories influenced by nomadic and settled Arab communities.[9] The transition from angular, monumental forms to more fluid, cursive styles was driven by practical needs in trade, administration, and everyday writing on perishable materials like papyrus during the pre-Islamic era. Nabataean inscriptions, often chiseled in stone for durability, featured rigid lines suited to lapidary work, but as Arabic speakers adopted the script for broader commercial exchanges along caravan routes, ligatures and connections between letters began to appear, smoothing the forms for quicker inscription.[8] This evolution is evident in the semi-cursive Nabataean examples from the 2nd century BCE onward, which prefigure the interconnected nature of mature Arabic writing.[9] A representative example of letter evolution is the Arabic alif (ا), which developed from the Phoenician aleph (𐤀), an ox-head symbol simplified in Aramaic to a vertical stroke and further adapted in Nabataean to a slanted or hooked form before straightening in proto-Arabic by the 4th century CE.[9] Such changes preserved the consonantal value while aligning with Arabic's phonetic requirements, laying the groundwork for the script's later refinements.[10]Spread and evolution in the Islamic era
The rise of Islam in the 7th century facilitated the rapid dissemination of the Arabic script across vast territories through military conquests, transforming it from a regional writing system into a vehicle for religious, administrative, and cultural expression. Following the death of Prophet Muhammad in 632 CE, Arab Muslim armies expanded into Persia by 651 CE under the Rashidun Caliphate and later the Umayyads, conquering the Sasanian Empire and introducing Arabic script for Quranic dissemination and governance. Similarly, conquests reached North Africa by the late 7th century, with Umayyad forces capturing Egypt in 642 CE and advancing westward to establish script use in official documents and Islamic texts, laying the groundwork for regional linguistic integrations. These expansions, extending to the Iberian Peninsula and Central Asia by the 8th century, prompted initial adaptations, such as the incorporation of Persian sounds into the script for administrative purposes in conquered regions.[13][14][15] Under the Umayyad Caliphate (661–750 CE), efforts to standardize the Arabic script intensified to support the empire's administrative needs and the accurate transmission of the Quran. Caliph ʿAbd al-Malik (r. 685–705 CE) played a pivotal role, commissioning the codification of the Kufic script between 684 and 692 CE, an angular, geometric style suited for monumental inscriptions like those on the Dome of the Rock in Jerusalem. This script, derived from earlier Hijazi forms, became the primary medium for Quranic manuscripts, emphasizing clarity and aesthetic rigidity without vowels or extensive diacritics in early versions. The Abbasid Caliphate (750–1258 CE) further refined this standardization in the 8th and 9th centuries, with Baghdad emerging as a center for script evolution; Kufic continued for religious texts while transitioning toward more cursive styles to enhance legibility in expanding literary and bureaucratic contexts.[16][17][16] To address ambiguities in consonant differentiation, particularly for non-Arabic speakers reciting the Quran, diacritical marks known as i'jam were introduced around 684 CE by the grammarian Abu al-Aswad al-Du'ali (d. 688 CE), a companion of Ali ibn Abi Talib. Commissioned by the Umayyad governor of Basra, al-Du'ali devised a system of dots placed above, below, or beside letters to distinguish similar forms, such as ب (bāʾ), ت (tāʾ), ث (thāʾ), ن (nūn), ي (yāʾ), and ى (alif maqṣūrah), thereby preventing misreadings in sacred texts. This innovation, initially applied to Quranic codices, marked a crucial step in script legibility during the early Islamic expansions.[18][19] Building on this foundation, vowel pointing or tashkil was innovated in the 8th century by the Basran scholar Khalil ibn Ahmad al-Farahidi (d. 786 CE) to indicate short vowels and phonetic nuances more precisely. Al-Farahidi replaced earlier colored dots with a refined set of symbols—such as fatḥah (a horizontal line for /a/), kasrah (a diagonal line for /i/), and ḍammah (a curl for /u/)—derived from letter shapes, enabling accurate recitation for diverse linguistic communities. His system, integrated into Quranic and grammatical works like Kitāb al-ʿAyn, became standard by the 11th century, supporting the script's adaptability in Persian and North African contexts where vowel systems differed from Arabic.[20][21]Modern standardization and reforms
The Tanzimat reforms in the Ottoman Empire from 1839 to 1876 marked a pivotal era for the Arabic script's adaptation to modern printing technologies, as the expansion of printing presses enabled the widespread dissemination of official edicts and educational materials in Arabic script, overcoming earlier restrictions on Muslim use of movable type.[22] This period also saw early proposals for orthographic simplification to address the script's complexities in representing Turkish phonetics, notably through Iranian intellectual Mirza Malkum Khan's 1860s reform plan, which advocated adding diacritics and new letters to enhance readability and literacy rates.[23] These efforts laid groundwork for later script reforms by highlighting the need for standardization amid growing print culture. In post-colonial contexts, Egypt's 1920s intellectual and journalistic movements pushed for Arabic language consistency to support national identity and print media expansion, culminating in the 1932 founding of the Academy of the Arabic Language (Majmaʿ al-Lughah al-ʿArabīyah) by King Fuad I, which focused on unifying orthography, coining modern terms, and resolving ambiguities in diacritic usage.[24] Similar standardization initiatives occurred across Arab states, aiming to bridge classical Arabic with contemporary needs in education and administration. Reforms in non-Arab countries diverged more radically; in 1928, Turkey under Mustafa Kemal Atatürk enacted Law No. 1353, mandating a switch from the Arabic script to a Latin-based alphabet to boost literacy from around 10% to over 20% within a decade and align with Western modernization, effectively severing ties to Ottoman-Islamic traditions.[25] In Iran, Reza Shah Pahlavi's 1930s cultural policies included orthographic tweaks to the Perso-Arabic alphabet, such as standardizing short vowel markings and reducing optional ligatures through the 1935 Academy of Iran, to better suit Persian phonology without a full script change.[26] Contemporary efforts have emphasized digital adaptation and cultural preservation; in the 1980s, Saudi Arabia advanced Arabic script standards for computing, with initiatives at institutions like King Saud University developing early font encoding systems to accommodate the script's cursive forms in word processors and databases, paving the way for global digital typography.[27] Since the 2000s, UNESCO has supported projects to safeguard endangered Arabic script variants, including digitization of Ajami manuscripts—modified Arabic scripts for African languages like Wolof and Hausa—through partnerships like the British Library's Endangered Archives Programme, such as the pilot project EAP915, which identified and cataloged 807 endangered Arabic manuscripts in regions including Ivory Coast. More recently, as of 2023, the EAP has digitized over 100,000 pages of Ajami materials across West Africa, including projects for Mandinka and Wolof languages.[28][29]Core Features
Alphabet and basic letters
The Arabic script is fundamentally an abjad, a writing system that primarily represents consonants while vowels are either omitted in standard orthography or indicated through optional diacritics, setting it apart from full alphabets like the Latin script that include inherent vowel markers. This consonantal focus facilitates concise writing but requires contextual knowledge for accurate pronunciation, particularly among native speakers. The standard Arabic alphabet comprises 28 letters, each with a distinct name and phonemic value, derived from the ancient Nabataean and Aramaic scripts but refined over centuries. Unlike scripts with case distinctions, Arabic employs a unicase design, meaning there are no uppercase or lowercase forms; instead, letters adapt through four positional variants—initial (at the start of a word), medial (within a word), final (at the end of a word), and isolated (standalone or after a non-connecting letter)—to suit the cursive flow of connected writing. This positional flexibility is a core feature enabling the script's elegant, flowing appearance without altering the letter's essential identity. The following table lists the 28 core letters in their isolated forms, along with their conventional names and approximate International Phonetic Alphabet (IPA) phonemic values, ordered from right to left as per Arabic reading direction. These phonemes represent standard Modern Standard Arabic pronunciations, though regional dialects may vary slightly.| Isolated Form | Name | IPA Phoneme |
|---|---|---|
| ا | alif | /ʔ/ or /aː/ |
| ب | bāʾ | /b/ |
| ت | tāʾ | /t/ |
| ث | thāʾ | /θ/ |
| ج | jīm | /dʒ/ |
| ح | ḥāʾ | /ħ/ |
| خ | khāʾ | /x/ |
| د | dāl | /d/ |
| ذ | dhāl | /ð/ |
| ر | rāʾ | /r/ |
| ز | zāy | /z/ |
| س | sīn | /s/ |
| ش | shīn | /ʃ/ |
| ص | ṣād | /sˤ/ |
| ض | ḍād | /dˤ/ |
| ط | ṭāʾ | /tˤ/ |
| ظ | ẓāʾ | /ðˤ/ |
| ع | ʿayn | /ʕ/ |
| غ | ghayn | /ɣ/ |
| ف | fāʾ | /f/ |
| ق | qāf | /q/ |
| ك | kāf | /k/ |
| ل | lām | /l/ |
| م | mīm | /m/ |
| ن | nūn | /n/ |
| ه | hāʾ | /h/ |
| و | wāw | /w/ or /uː/ |
| ي | yāʾ | /j/ or /iː/ |
Contextual forms and cursive nature
The Arabic script is fundamentally cursive, with letters designed to connect to one another within words, creating a continuous flow even in printed text. This connectivity is a core feature that distinguishes it from many other writing systems, facilitating efficient writing and aesthetic harmony in calligraphy. The script is written and read from right to left, with letters aligning along a horizontal baseline that serves as the primary anchor for glyph placement, ensuring uniform visual structure across words and lines.[30][31] Letters in the Arabic script assume one of four contextual forms depending on their position within a word: isolated (when standing alone or not connecting), initial (at the beginning of a word, connecting only to the following letter), medial (in the middle, connecting to both preceding and following letters), and final (at the end of a word, connecting only to the preceding letter). These forms allow each of the 28 basic letters to adapt its shape dynamically, often resulting in significant visual differences from their isolated counterparts—for instance, the letter beh (ب) appears as ب in isolated form, ﺑ in initial, ﺒ in medial, and ﺐ in final. This positional variation is governed by Unicode standards for rendering, where font systems select the appropriate glyph based on the letter's joining behavior and neighbors.[30][31] The cursive nature is further defined by specific joining rules: 22 letters are dual-joining, capable of connecting to both the preceding (right) and following (left) letters, while 6 letters are right-joining only, connecting solely to the preceding letter and leaving a gap to the left. Examples of right-joining letters include dal (د), which maintains its isolated or final form (د or ﺩ) without linking leftward, and reh (ر), similarly non-connective to the left (ر or ﺭ). Non-joining elements, such as the hamza (ء), do not connect at all. These rules ensure readability by preventing ambiguous merges, particularly for letters with similar shapes.[30][31] In word formation, these principles combine to create fluid sequences; for example, the word كتاب (kitāb, meaning "book") is rendered from right to left with the initial kāf (ك) connecting to the medial tāʾ (ﺘ), which connects to the final alif (ا); the beh (ب) appears isolated to the right of the alif since the right-joining alif terminates the connection without linking leftward. This example illustrates how dual-joining letters like kāf and tāʾ adapt across positions, while the right-joining alif prevents connection to the beh, maintaining the script's baseline alignment and cursive integrity.[30][31]Diacritics, vowels, and orthographic conventions
The Arabic script employs a system of diacritical marks known as harakat (حَرَكَاتْ) to indicate short vowels and other phonetic nuances, which are optional in most modern writing but essential for precise pronunciation. These marks are placed above or below consonants and include the fatha (ـَ), a short diagonal line above the letter representing the vowel sound /a/ as in "father"; the damma (ـُ), a small curl above the letter for /u/ as in "put"; and the kasra (ـِ), a short diagonal line below the letter for /i/ as in "bit". For example, the consonant ب (bāʾ) becomes بَ (ba), بُ (bu), and بِ (bi) respectively.[32][31] Absence of a vowel is denoted by the sukun (ـْ), a small circle placed above the consonant, indicating a consonant closure without a following short vowel, as in بْ (b, pronounced with a brief pause). The shadda (ـّ), resembling a small "w" above the letter, signifies gemination or consonant doubling, where the consonant is held longer and often stressed, effectively combining a sukun on the first instance and a vowel on the second; for instance, بّ (bb) in words like شَدَّة (shadda itself). Tanwin, or nunation, marks indefinite nouns with a doubled short vowel sound at the end, using fathatan (ـً) for /an/, dammatan (ـٌ) for /un/, and kasratan (ـٍ) for /in/, as seen in كِتَابٌ (kitābun, "a book"). These diacritics are encoded in Unicode as combining characters, such as U+064E for fatha and U+0651 for shadda, ensuring consistent digital representation.[32][31] Orthographic conventions in Arabic script govern the placement and form of elements like the hamza (ء), a glottal stop consonant that requires specific seating rules based on its position and surrounding vowels to maintain visual and phonetic clarity. In initial position, hamza is always seated on an alif (ا), forming أ (with fatha or damma) or إ (with kasra), as in أَب (ab, "father"). Medially, it seats on the nearest compatible letter: on a dotless yā (ي) for kasra (ئ), on wāw (و) for damma (ؤ), or on alif otherwise, as in سُؤَال (suʾāl, "question"). At the end of a word, it appears on the line (ء) after a long vowel or sukun, or seated on the appropriate carrier after a short vowel, such as نَشَأَ (nashaʾa, "he arose"). These rules prevent ambiguity and align with the script's cursive flow, though pronunciation of hamza may vary by dialect. Tajweed rules such as assimilation (idgham) and clear pronunciation (izhar) apply to the recitation of tanwin or nun sakinah before certain letters—for example, idgham merges the sound before ي, ر, م, ل, و, ن, while izhar pronounces it clearly before throat letters like ء, ه, ع, ح, غ, خ—but these affect pronunciation only and do not alter the written orthography or diacritic placement.[33][34] In modern usage, diacritics are fully employed in religious texts like the Quran for accurate tajweed recitation and in educational materials for learners, where they aid comprehension and reduce ambiguity in a script that otherwise relies on context for vowels. However, in everyday print media, newspapers, and literature, they are largely omitted to save space and reflect native reader proficiency, appearing sporadically only for disambiguation in poetry or proper names. Digital trends since the 2010s have seen increased optional use in social media and apps for clarity among non-native speakers, though full diacritization remains rare outside formal contexts; scholarly analyses note significant variation across genres, with religious and pedagogical texts showing near-complete marking rates compared to under 5% in general prose.[35]Styles and Variants
Calligraphic and typographic styles
The Arabic script has evolved through a rich tradition of calligraphic styles, each reflecting cultural, religious, and artistic influences across centuries. These styles originated in the early Islamic period and adapted to various media, from manuscripts to architecture, emphasizing the script's inherent cursive and contextual forms. Major styles like Kufic, Naskh, and Nastaliq emerged as foundational, balancing aesthetic elegance with readability, and later influenced typographic developments in printing and digital media. Kufic, one of the earliest formal styles, developed in the late 7th century in Kufa, Iraq, characterized by its angular, geometric letterforms with thick strokes and minimal curves. It flourished from the 7th to 10th centuries, primarily used for Qur'anic manuscripts and architectural inscriptions due to its bold, monumental appearance suitable for stone, metal, and coinage. Examples include early Qur'ans on vellum and decorations on mosques like the Dome of the Rock in Jerusalem. Naskh emerged in the 10th century as a more fluid, cursive alternative to angular scripts, designed for legibility in everyday and scholarly writing. It became the predominant bookhand by the 11th century, serving as the basis for copying Qur'ans, administrative documents, and literature across the Islamic world, with regional variations in Egypt, Iraq, and Syria. Its rounded, proportional letters facilitated widespread adoption and laid the groundwork for modern printed Arabic. Nastaliq, a highly stylized and flowing script, originated in 14th-century Persia, blending elements of Naskh and Ta'liq for poetic expression. It gained dominance in the 15th century for Persian and Urdu literature, poetry, and official documents, prized for its diagonal slant, elongated horizontals, and rhythmic curves that evoke motion. Prominent in regions like Iran, Pakistan, and India, it remains a staple in South Asian manuscript traditions. The transition to typography began with the introduction of movable type for Arabic script in the early 19th century, notably at Egypt's Bulaq Press established in 1815 under Ottoman rule, where naskh and nasta'liq typefaces were cast for books like dictionaries and Qur'ans. In the Ottoman Empire proper, widespread adoption occurred in the 1860s with innovations by typefounder Ohanis Mühendisoğlu, who adapted calligraphic proportions to metal type for naskh-based printing. This evolution addressed the script's cursive joining and contextual variants, enabling mass production of texts. By the 20th century, refined typefaces like that designed by Mohamed Bek Ja‘far in 1906 for Bulaq Press set standards for clarity in printed Qur'ans. Digital typography advanced in the 2010s with open-source fonts such as Amiri, a 2011 revival by Khaled Hosny of early 20th-century Bulaq naskh, optimized for book typesetting and Qur'anic text in software like LaTeX and web browsers.| Style | Period | Key Characteristics | Primary Regions | Notable Uses and Examples |
|---|---|---|---|---|
| Kufic | 7th–10th centuries | Angular, geometric, bold strokes | Iraq, Syria, Arabia | Qur'anic manuscripts; architectural inscriptions (e.g., Dome of the Rock) |
| Naskh | 10th century onward | Cursive, rounded, legible proportions | Egypt, Iraq, broader Islamic world | Books, documents; basis for modern print (e.g., medieval Qur'ans) |
| Nastaliq | 14th–15th centuries onward | Flowing, slanted, rhythmic curves | Persia (Iran), South Asia (Pakistan, India) | Poetry, literature (e.g., Persian divans, Urdu manuscripts) |
Regional and language-specific variants
The Arabic script exhibits notable regional variations in letter forms and orthographic conventions, primarily distinguishing between the Western (Maghribi) and Eastern (Mashriqi) traditions. The Maghribi script, prevalent in North Africa including Morocco, Algeria, and Tunisia, features rounded letterforms with curved vertical strokes for letters such as alif, lam, and ta, along with exaggerated horizontal extensions and open final curves that descend below the baseline.[36] These characteristics, which evolved from early Kufic influences softened by sweeping curves, facilitate a fluid, cursive appearance adapted to local manuscript traditions and still used in Moroccan Arabic texts today.[37] In contrast, the Eastern or Mashriqi script, dominant in the Middle East such as the Gulf states and Levant, employs sharper angles and more angular proportions in letters like dal and dhal, with less pronounced recurves and a straighter posture that aligns with styles like Naskh. This distinction, attested in medieval Arabic sources, reflects geographical and cultural divergences in scribal practices, where Mashriqi forms prioritize precision in angular connections.[37] Language-specific adaptations further diversify the script, particularly in Southeast Asia. The Jawi script, used for Malay in regions like Malaysia and Brunei, modifies the standard Arabic alphabet by adding diacritics to six letters to accommodate Malay phonemes, such as extra dots for sounds absent in Arabic.[38] This adaptation, introduced by Muslim traders and refined over centuries, incorporates all 31 Arabic letters plus six constructed ones, enabling representation of Malay's vowel and consonant inventory while maintaining right-to-left cursive flow.[39] Similarly, the Pegon script adapts Arabic for Javanese in Indonesia, employing 28 graphemes to denote 23 consonants through minimal modifications, such as added diacritics for Javanese-specific sounds like the ng phoneme, and requiring harakat for vowels to suit the language's syllabic structure.[40] These tweaks, developed in Islamic scholarly contexts, preserve the script's core while aligning with local phonetics, though Pegon usage has waned with the rise of Latin-based orthographies.[41] Historical variants in the Indian subcontinent illustrate further evolution and decline. The Bihari script, a regional Arabic calligraphy style from Bihar and surrounding areas, emerged in the 13th century as a blend of angular and cursive elements, featuring elongated horizontals and distinctive loops in letters like sin and sad.[42] Primarily used for Qur'anic manuscripts, it persisted into the 19th century with over 137 known examples, many dated to the 15th century, but gradually declined following the Mughal promotion of Nastaliq, which standardized more fluid Persian-influenced forms across northern India.[43] This shift marked the end of Bihari's prominence, confining it to a niche in pre-modern Islamic textual production.[44]Usage and Distribution
Current geographical and linguistic use
The Arabic script serves as the official writing system in the 22 member states of the Arab League, spanning North Africa and the Middle East, where it is used for government, education, media, and daily communication.[45] Collectively, these regions are home to approximately 420 million speakers of Arabic dialects, making the script essential for literacy and cultural expression among the world's fifth-most spoken language.[46] Beyond the Arab world, the Arabic script is adapted for several major languages in non-Arab regions, supporting diverse linguistic communities. In Iran and Afghanistan, the Perso-Arabic variant is used for Persian (Farsi) and Dari, with around 80 million speakers, primarily in official documents, literature, and signage.[47] In Pakistan and Afghanistan, the Nastaʿlīq style of the script writes Urdu and Pashto for approximately 230 million and 40 million speakers respectively, serving as the national language in administration, education, and print media.[47][48] In Iraq and Iran, a modified Arabic script is used for Sorani Kurdish, supporting about 6-7 million speakers in official and cultural contexts.[49] In Southeast Asia, the Jawi variant persists in Indonesia and Malaysia, particularly for religious texts, Islamic education, and regional signage in areas like Riau province, where it coexists with Latin script. As of 2025, the Arabic script's digital adoption continues to expand in Central Asia, notably for Uyghur in China's Xinjiang region, where the Uyghur Arabic alphabet remains the official standard for government publications, social media, and signage, reflecting efforts to integrate it into modern technology platforms.[50] This growth underscores the script's resilience in bilingual digital environments despite competing Latin and Cyrillic systems. UNESCO and World Bank data highlight the script's role in literacy across Arabic-script regions, with adult literacy rates in Arab states averaging around 75% as of recent assessments, though variations exist—such as 98% in Bahrain and 79% in Egypt—tied to educational access and script-based instruction.[51] In non-Arab contexts, literacy tied to the script, like in Persian- and Urdu-speaking areas, exceeds 80% in urban centers, supported by widespread schooling in the adapted forms.[52]Adaptations for non-Arabic languages
The Arabic script has been adapted for various non-Arabic languages by incorporating additional characters or diacritics to represent phonemes absent in Classical Arabic, particularly to accommodate Indo-Iranian and Bantu linguistic features.[53][54] In Persian (Farsi), the script adds four letters—چ (che), پ (pe), ژ (zhe), and گ (gaf)—to denote the sounds /tʃ/, /p/, /ʒ/, and /g/, which are not present in the standard Arabic alphabet.[53] These modifications emerged during the Islamicization of Persia in the 7th–9th centuries, enabling the script to fully represent Persian phonology while retaining the cursive, right-to-left structure.[55] Urdu, an Indo-Aryan language, extends the Perso-Arabic script with four additional letters—ٹ (ṭe), ڈ (ḍāl), ڑ (ṛe), and ڻ (ṇūn)—specifically for retroflex consonants /ʈ/, /ɖ/, /ɽ/, and /ɳ/, which are characteristic of its phonology influenced by Prakrit and Sanskrit substrates.[56] These letters, marked by a nukta (dot) diacritic below the base forms, were standardized in the 19th century during the development of modern Urdu literature in British India.[57] Swahili (also known as Kiswahili), a Bantu language, historically employed the Arabic script (known as Ajami) from the 10th century onward, adapting it with extra diacritics and vowel notations to capture its five-vowel system and syllable-timed structure, which differ from Arabic's consonant-heavy phonology.[58] This adaptation facilitated religious and trade-related writing along the East African coast until the 1930s, when colonial authorities mandated a shift to the Latin alphabet for standardization and education.[59] The following table illustrates key phoneme-to-grapheme mappings in these adaptations:| Language | Phoneme | Grapheme | Description |
|---|---|---|---|
| Persian | /p/ | پ | Modified bāʾ with three dots above for labial stop.[53] |
| Persian | /tʃ/ | چ | Modified jīm with three dots above for affricate.[53] |
| Persian | /ʒ/ | ژ | Modified rāʾ with three dots above for voiced fricative.[53] |
| Persian | /g/ | گ | Modified kāf with two dots above for voiced velar stop. |
| Urdu | /ʈ/ | ٹ | Ṭe: Dental tāʾ with nukta below for retroflex stop.[56] |
| Urdu | /ɖ/ | ڈ | Ḍāl: Dental dāl with nukta below for retroflex stop.[56] |
| Urdu | /ɽ/ | ڑ | Ṛe: Dental re with nukta below for retroflex flap.[57] |
| Urdu | /ɳ/ | ڻ | Ṇūn: Dental nūn with nukta below for retroflex nasal.[56] |
| Swahili (Ajami) | /ɪ/, /ʊ/ | ِ, ُ (with extensions) | Short vowels marked by kasra and damma, often with additional dots or lines for Bantu contrasts.[58] |
Historical and discontinued uses
In Central Asia, the Sogdian language and its Aramaic-derived script, widely used from the 4th to 8th centuries CE, began transitioning to the Arabic script following the Islamic conquests of the region in the 8th century, as Sogdian speakers adopted Islam and incorporated Arabic linguistic elements into their writings.[60] This shift marked the integration of Sogdian into the broader Islamic scholarly tradition, with Arabic script facilitating religious and administrative texts among Sogdian communities in areas like Transoxiana. By the early 20th century, under Soviet influence, Arabic-based scripts for Turkic languages in Central Asia, including remnants of Sogdian-influenced systems, were phased out; a Latin alphabet was introduced in the late 1920s, followed by a full replacement with Cyrillic scripts in the late 1930s to promote Russification and literacy in Russian.[61] This discontinuation effectively ended the use of Arabic script for local languages across Soviet Central Asia by the 1940s.[62] In Europe, the Arabic script served as the basis for Ottoman Turkish, which was written in a Perso-Arabic variant from the 14th century until the 1928 alphabet reform under Mustafa Kemal Atatürk, when it was replaced by a Latin-based script to modernize education and reduce Islamic cultural ties.[63] This reform discontinued the Ottoman script's use in official and literary contexts, leading to widespread literacy campaigns that rendered Arabic-script Turkish obsolete within a decade. Additionally, in the Iberian Peninsula, Aljamiado—a practice of writing Romance languages like Spanish and Aragonese in Arabic script—emerged among Muslim communities (Mudejars and Moriscos) from the 15th to the 18th centuries, producing literature on religious, poetic, and moral themes to preserve Islamic identity under Christian rule.[64] Following the expulsion of the Moriscos in 1609–1614 and subsequent cultural suppression, Aljamiado ceased as a living tradition by the early 18th century.[65] In Africa, the Ajami script, an adaptation of Arabic for local languages, was employed for Hausa in West Africa and Swahili in East Africa from the 16th century onward, enabling poetry, religious texts, and correspondence among Muslim populations.[66] For Hausa speakers, who numbered in the tens of millions by the 19th century, Ajami facilitated widespread literacy outside formal education, with up to 80% proficiency in some communities. Similarly, Swahili Ajami supported literary epics and Islamic scholarship across coastal and inland regions. European colonial powers in the 19th and 20th centuries imposed Latin-based orthographies—such as "Boko" for Hausa under British rule in Nigeria—viewing Ajami as primitive and a barrier to administrative control, leading to its decline through school curricula, book burnings, and redefinition of literacy to exclude non-Latin systems.[66] By the mid-20th century, Ajami had largely discontinued in favor of colonial scripts for these languages. In South Asia, the Perso-Arabic script (Shahmukhi) for Punjabi, used primarily by Muslim communities, experienced a sharp decline in India following the 1947 partition, as mass migrations shifted the Muslim-majority population to Pakistan, leaving Gurmukhi as the dominant script in Indian Punjab.[67] Prior to partition, Shahmukhi coexisted with Gurmukhi in the region, but post-1947 demographic changes and promotion of Gurmukhi for official Punjabi use in India marginalized Arabic-script Punjabi literature and education.[68] This discontinuation aligned with broader linguistic standardization efforts, reducing Shahmukhi's role in Indian Sikh and Hindu Punjabi contexts to near obsolescence.Technical Implementation
Unicode encoding and digital representation
The Arabic script is encoded in the Unicode Standard primarily within the Basic Multilingual Plane (BMP), with characters distributed across several dedicated blocks to accommodate core letters, diacritics, variants, and historical forms. The primary Arabic block, designated U+0600–U+06FF, encompasses 256 code points and includes the 28 basic Arabic letters, common diacritical marks such as the kasra (U+064E) and shadda (U+0651), Qur'anic annotation signs, and Arabic-Indic digits from ٠ (U+0660) to ٩ (U+0669).[31] This block forms the foundation for standard Modern Standard Arabic orthography and supports right-to-left (RTL) text directionality inherent to the script.[31] For extended letter variants used in specific languages or historical contexts, the Arabic Supplement block (U+0750–U+077F) provides 48 additional code points, introduced in Unicode 4.1 (2005).[69] Examples include forms like Arabic Letter Beh with Three Dots Horizontally Below (U+0750, ݐ) for certain African languages and Arabic Letter Kaf with Two Dots Above (U+077F, ݿ) for historical notations.[69] Further expansions address historical and regional adaptations; notably, Unicode 14.0 (2021) introduced the Arabic Extended-A block (U+08A0–U+08FF) with additional letters and diacritics for languages including African and Caucasian scripts, and the Arabic Extended-B block (U+0870–U+089F) with 48 code points for obsolete or variant forms, such as Arabic Letter Alef with Attached Fatha (U+0870) used in certain non-Arabic orthographies including African languages. These additions up to Unicode 17.0 (2025) ensure comprehensive coverage of historical Arabic-derived scripts without altering core encoding principles. Core Arabic letters are assigned specific code points within the primary block, independent of their contextual forms; for instance, the letter alif is encoded as U+0627 (ا), which represents its isolated form but is rendered variably based on position in a word.[31] Similarly, ba' is U+0628 (ب), and they join according to cursive rules during display.[31] Unicode's design separates logical encoding from visual presentation, relying on font systems and rendering engines to handle joining behaviors. The right-to-left nature of Arabic text requires the Unicode Bidirectional Algorithm (UAX #9), which determines display order for mixed directional scripts in documents.[70] This algorithm processes text by resolving embedding levels for RTL segments, such as Arabic words interspersed with LTR elements like numbers or Latin text, ensuring proper reordering in environments like HTML and CSS via thedirection: rtl property and unicode-bidi controls.[70] Brief rendering challenges arise in complex layouts, but these are addressed through standardized font features rather than encoding changes.[70]
Unicode's encoding for Arabic aligns with the International Standard ISO/IEC 10646, which defines the Universal Coded Character Set (UCS) and incorporates all Unicode characters, including Arabic blocks, to facilitate global interoperability. This harmonization ensures that Arabic script data can be exchanged across systems without loss, with ISO/IEC 10646 specifying the same code points and properties for Arabic as in Unicode version 17.0.
| Unicode Block | Range | Key Contents | Version Introduced |
|---|---|---|---|
| Arabic | U+0600–U+06FF | Basic letters (e.g., U+0627 ا alif), diacritics, digits | 1.0 (1991) |
| Arabic Supplement | U+0750–U+077F | Letter variants (e.g., U+0750 ݐ Beh variant) | 4.1 (2005) |
| Arabic Extended-B | U+0870–U+089F | Historical forms (e.g., U+0870 Alef variant) | 14.0 (2021) |
Challenges in digital typography and rendering
In the early days of digital document creation, particularly with PDF formats before the 2000s, Arabic script faced significant challenges in rendering cursive joining behaviors, often resulting in disjointed letters that disrupted the script's natural connectivity and aesthetic integrity.[71] These failures stemmed from limited support in early typesetting systems, which treated Arabic characters as isolated forms rather than contextually linked glyphs, leading to poor legibility in printed and digital outputs.[72] Solutions emerged through the adoption of OpenType font technology, specifically the Glyph Substitution (GSUB) table, which enables contextual substitutions for initial, medial, final, and isolated forms of Arabic letters to ensure proper cursive connections.[73] By the mid-2000s, GSUB implementations in fonts like those developed for Microsoft Windows allowed for more accurate rendering across applications, marking a pivotal advancement in Arabic digital typography.[74] Despite these improvements, font availability remains a persistent issue, especially for specialized variants like Nastaliq, which is widely used in Urdu and Persian contexts but suffers from shortages in high-quality digital implementations due to its complex, slanted cursive structure.[75] This scarcity has historically limited web and print design in regions where Nastaliq is preferred, often forcing designers to rely on suboptimal Naskh-based alternatives that alter visual authenticity.[76] Projects like Google's Noto Arabic, launched in the 2010s as part of the broader Noto font family initiative, addressed this gap by providing open-source fonts supporting multiple Arabic styles, including Nastaliq variants, to ensure consistent rendering across over 800 languages and eliminate "tofu" placeholders for unsupported characters.[77] Noto's development, in collaboration with Monotype, emphasized comprehensive glyph coverage for cursive scripts, significantly boosting digital adoption in Android and web environments by the late 2010s.[78] Input methods for Arabic script continue to pose practical hurdles, particularly with keyboard layouts such as the standard Arabic 101-key configuration, which maps 28 letters plus diacritics to a QWERTY base but often leads to inefficient typing due to frequent shifts for common characters and inconsistent support across operating systems. On mobile devices, autocorrect systems exacerbate these issues by poorly handling diacritics (harakat), frequently misplacing or omitting vowel marks during predictive text, which is critical for precise Quranic or poetic rendering.[79] This results in error-prone input, especially for learners or non-native users, as mobile keyboards like those on iOS and Android struggle with the script's right-to-left directionality and contextual shaping.[80] As of 2025, emerging challenges include the need for AI-driven font generation to revive endangered Arabic styles, such as regional calligraphic variants at risk of digital obsolescence, where machine learning models are being explored to automate glyph design while preserving cultural nuances.[81] Accessibility for screen readers remains a key concern, with tools often failing to properly interpret Arabic's cursive joins and diacritics, leading to fragmented audio output that hinders navigation for visually impaired users in web and eBook content.[82] Ongoing W3C efforts highlight gaps in layout requirements, prioritizing solutions like enhanced text-to-speech engines tailored for Arabic script to improve inclusivity in digital interfaces.[30]Extensions
Additional letters and characters
The Arabic script has been extended beyond its core 28 letters to accommodate phonetic needs in numerous non-Arabic languages, particularly through modifications like added diacritics, new letter forms, and contextual variants encoded in Unicode blocks such as Arabic (U+0600–U+06FF), Arabic Extended-A (U+08A0–U+08FF), and Arabic Extended-B (U+0750–U+077F).[31] These extensions enable representation of sounds absent in Classical Arabic, such as implosives, retroflexes, and specific vowels, supporting languages from South Asia to Africa and Southeast Asia. Common additions include letters for Persian, Urdu, Pashto, and Kurdish, while rarer forms appear in African Ajami scripts for languages like Hausa and Fulfulde.[83] Among the most widely used extensions are those for Indo-Iranian languages. For instance, the Urdu nasalized noon (ں, U+06BA) represents a syllabic nasal /n̩/ or /̃/, essential for words like "kitābẽ" (books).[31] In Kurdish, the open o (ۆ, U+06C6) denotes the vowel /o/, distinguishing it from the standard waw /u/ or /w/, as in Sorani Kurdish orthography.[31] Persian adds peh (پ, U+067E, /p/), cheh (چ, U+0686, /tʃ/), zhe (ژ, U+0698, /ʒ/), and gaf (گ, U+06AF, /ɡ/), which are crucial for native phonemes not present in Arabic.[31] Pashto employs further variants like xwe (ښ, U+069A, /ʂ/) and noong (ږ, U+0696, /ŋ/) to capture retroflex and velar nasal sounds.[31] Rarer extensions are prominent in African Ajami scripts, where the Arabic script was adapted for indigenous languages during Islamic expansion. In Hausa, additional forms like keh with three dots above (ݣ, U+0763) represent labialized /kʷ/ or palatalized /kʲ/, while ghain with three dots above (ࣃ, U+08C3) denotes /ɡʷ/ or /ɡʲ/ in emphatic contexts.[84][85] For implosives in Hausa and related languages, characters such as beh with hamza above (ࢡ, U+08A1) indicate the implosive bilabial stop /ɓ/, and yeh with two dots below and hamza above (ࢨ, U+08A8) for the glottalized palatal approximant /ʝ/.[83][85] These Ajami innovations, often using stacked diacritics or modified bases, allow expression of tonal and ejective sounds unique to West and East African phonologies, as seen in Fulfulde and Wolof orthographies.[85] Standardization of these extensions has been advanced by international bodies since the 2010s, particularly through the Unicode Consortium's encoding proposals and the W3C Arabic Layout Task Force, which addresses rendering challenges for diverse variants in digital environments.[86][87] The Task Force, established in 2015, collaborates with linguists to ensure consistent support for over 50 extended characters across browsers and fonts, drawing on input from language communities in Asia and Africa.[88] Efforts like the 2018 Unicode proposal for Hausa-specific letters highlight ongoing work to encode underrepresented Ajami forms without disrupting existing Arabic typography.[83] More recently, Unicode 17.0 (September 2025) added the Arabic Extended-C block (U+10EC0–U+10EFF), introducing 64 characters for additional Qur'anic annotations used in Turkey and Libya, as well as letters for the Pegon script in Indonesian languages.[86] The following table catalogs over 50 representative additional letters and characters, selected from Unicode encodings for key languages. It includes the character glyph (isolated form where possible), Unicode code point, formal name, approximate phoneme(s), and primary language(s) of use. This is not exhaustive but illustrates the diversity of extensions.[31][85]| Character | Unicode | Name | Phoneme(s) | Language(s) |
|---|---|---|---|---|
| ٱ | U+0671 | Arabic Letter Alef Wasla | /a/ (elided) | Quranic Arabic |
| ٲ | U+0672 | Arabic Letter Alef with Wavy Hamza Above | /ʔa/ | Baluchi, Kashmiri |
| ٴ | U+0674 | Arabic Letter High Hamza | /ʔ/ (high) | Kazakh, Jawi |
| ٹ | U+0679 | Arabic Letter Tteh | /ʈ/ | Urdu, Sindhi |
| ٺ | U+067A | Arabic Letter Tteh with Small Tah Above | /ʈʰ/ | Sindhi |
| ٻ | U+067B | Arabic Letter Beeh | /ɓ/ | Sindhi |
| ټ | U+067C | Arabic Letter Teh with Ring | /ʈ/ | Pashto |
| ٽ | U+067D | Arabic Letter Teh with Small Tah Above | /t̪/ | Sindhi |
| پ | U+067E | Arabic Letter Peh | /p/ | Persian, Urdu |
| ٿ | U+067F | Arabic Letter Peh with Small Tah Above | /pʰ/ | Sindhi |
| ݐ | U+0750 | Arabic Letter Beh with Three Dots Horizontally Below | /ɓ/ | African languages (e.g., Hausa) |
| ݑ | U+0751 | Arabic Letter Beh with Dot Below and Three Dots Above | /bʷ/ | Hausa |
| ݒ | U+0752 | Arabic Letter Beh with Three Dots Pointing Upwards Below | /ɓ/ | African Ajami |
| ݓ | U+0753 | Arabic Letter Beh with Three Dots Pointing Upwards Below and Two Dots Above | /bʲ/ | African languages |
| ݔ | U+0754 | Arabic Letter Beh with Two Dots Below and Dot Above | /ɗ/ | Saraiki |
| ݕ | U+0755 | Arabic Letter Beh with Inverted Small V Below | /ɓ/ | African Ajami |
| ݖ | U+0756 | Arabic Letter Beh with Small V | /v/ | Shina |
| ݗ | U+0757 | Arabic Letter Hah with Two Dots Above | /ħ/ | African languages |
| ݘ | U+0758 | Arabic Letter Hah with Three Dots Pointing Upwards Below | /ɣ/ | African Ajami |
| ݙ | U+0759 | Arabic Letter Dal with Two Dots Vertically Below and Small Tah | /d̪/ | Saraiki |
| ݚ | U+075A | Arabic Letter Dal with Inverted Small V Below | /ɖ/ | African languages |
| ݛ | U+075B | Arabic Letter Reh with Stroke | /ɽ/ | African Ajami |
| ݜ | U+075C | Arabic Letter Seen with Four Dots Above | /s/ (emphatic) | Shina |
| ݝ | U+075D | Arabic Letter Ain with Two Dots Above | /ʕ/ | African languages |
| ݞ | U+075E | Arabic Letter Ain with Three Dots Pointing Downwards Above | /ʕʷ/ | African Ajami |
| ݟ | U+075F | Arabic Letter Ain with Two Dots Vertically Above | /ʕʲ/ | African languages |
| ݠ | U+0760 | Arabic Letter Feh with Two Dots Below | /v/ | African Ajami |
| ݡ | U+0761 | Arabic Letter Feh with Three Dots Pointing Upwards Below | /ɸ/ | African languages |
| ݢ | U+0762 | Arabic Letter Keheh with Dot Above | /k/ | Jawi |
| ݣ | U+0763 | Arabic Letter Keheh with Three Dots Above | /kʷ/, /kʲ/ | Hausa, Amazigh |
| ݤ | U+0764 | Arabic Letter Keheh with Three Dots Pointing Upwards Below | /q/ | African Ajami |
| ݥ | U+0765 | Arabic Letter Meem with Dot Above | /mʲ/ | African languages |
| ݦ | U+0766 | Arabic Letter Meem with Dot Below | /ɱ/ | Maba |
| ݧ | U+0767 | Arabic Letter Noon with Two Dots Below | /ɲ/ | Arwi |
| ݨ | U+0768 | Arabic Letter Noon with Small Tah | /ɳ/ | Saraiki |
| ݩ | U+0769 | Arabic Letter Noon with Small V | /ɲ/ | Gojri |
| ݪ | U+076A | Arabic Letter Lam with Bar | /ɭ/ | African languages |
| ݫ | U+076B | Arabic Letter Reh with Two Dots Vertically Above | /ɽʒ/ | Torwali |
| ݬ | U+076C | Arabic Letter Reh with Hamza Above | /ʑ/ | Ormuri |
| ݭ | U+076D | Arabic Letter Seen with Two Dots Vertically Above | /ʃ/ | Kalami |
| ݮ | U+076E | Arabic Letter Hah with Small Arabic Letter Tah Below | /χ/ | Khowar |
| ݯ | U+076F | Arabic Letter Hah with Small Arabic Letter Tah and Two Dots | /ʁ/ | Khowar |
| ݰ | U+0770 | Arabic Letter Seen with Small Arabic Letter Tah and Two Dots | /sˤ/ | Khowar |
| ݱ | U+0771 | Arabic Letter Reh with Small Arabic Letter Tah and Two Dots | /ɹˤ/ | Khowar |
| ݲ | U+0772 | Arabic Letter Hah with Small Arabic Letter Tah Above | /ħʷ/ | Torwali |
| ࢠ | U+08A0 | Arabic Letter Beh with Small V Below | /bʷ/ | African languages |
| ࢡ | U+08A1 | Arabic Letter Beh with Hamza Above | /ɓ/ | Adamawa Fulfulde |
| ࢢ | U+08A2 | Arabic Letter Jeem with Two Dots Above | /d͡ʒ/ | African Ajami |
| ࢣ | U+08A3 | Arabic Letter Tah with Two Dots Above | /tʰ/ | African languages |
| ࢤ | U+08A4 | Arabic Letter Feh with Dot Below and Three Dots Above | /ɸ/ | African Ajami |
| ࢥ | U+08A5 | Arabic Letter Qaf with Dot Below | /ɢ/ | African languages |
| ࢦ | U+08A6 | Arabic Letter Lam with Double Bar | /ʎ/ | African Ajami |
| ࢨ | U+08A8 | Arabic Letter Yeh with Two Dots Below and Hamza Above | /ʝ/ | Adamawa Fulfulde |
Numerals and their evolution
The Arabic numerals, also known as the Hindu-Arabic numeral system, originated from the Brahmi numerals developed in ancient India, where a decimal place-value system with nine symbols and a zero was established by the 3rd century BCE.[89] These Indian numerals reached the Islamic world through trade and scholarly exchanges, becoming known in regions under Arab influence as early as 662 CE, as recorded by the Syriac scholar Severus Sebokht.[89] In the 9th century, the Persian mathematician Muhammad ibn Musa al-Khwarizmi played a pivotal role in their adoption and dissemination by authoring a treatise on Indian calculation methods, titled On the Calculation with Hindu Numerals (c. 825 CE), which explained the system's arithmetic operations and positional notation.[90] Although the original Arabic text is lost, a 12th-century Latin translation, Algoritmi de numero Indorum, preserved its content and facilitated the system's spread to Europe.[89] By the 10th century, the numeral forms diverged into Eastern Arabic variants (٠١٢٣٤٥٦٧٨٩), used in the eastern Islamic world including the Arabian Peninsula, Egypt, and Persia, and Western Arabic variants (closer to modern 0-9), which emerged in the Maghreb and Al-Andalus through local scribal adaptations.[89] This split arose from independent evolutions in handwriting and regional mathematical texts, with the Western forms, often called Gubar numerals after the Arabic word for "dust" (referring to dust-board calculations), gaining prominence in North Africa.[89] Regional variants further diversified the system. In the Maghreb, Western Arabic numerals evolved distinctly by the 10th century, featuring rounded shapes like ⓪①②③④⑤⑥⑦⑧⑨ in early manuscripts, and were transmitted to Europe via Spain, influencing the global standard.[89] Persian numerals, a variant of the Eastern Arabic set, differ in digits 4 (۴), 5 (۵), and 6 (۶), reflecting calligraphic influences in Iran and Afghanistan, while maintaining the same positional values.[91] In modern computing, the Eastern Arabic numerals are encoded in Unicode as U+0660–U+0669 (Arabic-Indic digits: ٠ through ٩), supporting right-to-left rendering in Arabic-script languages, with extended variants U+06F0–U+06F9 for Persian and similar forms.[31] This standardization draws from ISO/IEC 8859-6 (Arabic), ensuring compatibility in digital typography and international software, where both Eastern and Western forms coexist for multilingual applications.[31]Components
Basic graphemes and radicals
The Arabic script is built upon 18 basic graphemes known as basic shapes or letter-shapes, which form the core structures from which the 28 letters of the alphabet are derived by adding diacritical marks such as dots. These fundamental elements originated in early forms of the script and include a variety of strokes that provide the skeletal framework for letter construction.[92][93][94] The basic shapes encompass diverse stroke types, including verticals, horizontals, diagonals, and curves, each contributing to the distinctive cursive flow of the script. Vertical strokes, for example, appear as tall, straight lines in letters like alif (ا), serving as standalone elements or stems. Horizontal strokes form baseline extensions, as seen in the final form of nun (ن), where they create a flat, connecting bar. Curves add fluidity, evident in sin (س), which employs rounded arcs to form its serpentine shape. These strokes are executed with consistent pen angles in traditional calligraphy to maintain proportional harmony.[95] A key basic shape is the dot (nuqṭah), a small circular mark placed above or below base shapes to differentiate homographic letters. Representative examples include the single nuqṭah positioned below the bā’ (ب) to distinguish it from tā’ (ت) and thā’ (ث), and above dāl (د) to set it apart from dhāl (ذ). Other basic shapes include loops, tails, and notches, which combine to form complex shapes while adhering to baseline alignment.[96]| Basic Shape/Stroke Type | Description | Example(s) |
|---|---|---|
| Vertical stroke | Straight downward line, often tall and isolated | Alif (ا) |
| Horizontal stroke | Right-to-left line along the baseline | Final nun (ن) |
| Curve (open/closed) | Arced or looped path for fluidity | Sin (س), mīm (م) |
| Diagonal/notch | Slanted line or angled cut | Jīm (ج) initial |
| Dot (nuqṭah) | Small circle for differentiation | Bā’ (ب) below; dāl (د) above |
| Tail (returning) | Extending curve folding under baseline | Final yā’ (ى) |
| Bowl/loop | Rounded enclosure below or above baseline | Final nūn (ن), ṣād (ص) |