Internationalized domain name
An Internationalized Domain Name (IDN) is a domain name that incorporates non-ASCII characters from Unicode, allowing registration and use of top-level domains (TLDs) in scripts and languages beyond the basic Latin alphabet, such as Arabic, Chinese, Cyrillic, or Devanagari.[1] These names enable internet users to access websites using familiar local scripts, promoting a more inclusive and multilingual global internet.[2] IDNs are stored and transmitted in the Domain Name System (DNS) via an ASCII-compatible encoding called Punycode, prefixed with "xn--", while applications display the original Unicode form to users.[3] The technical foundation for IDNs is the Internationalizing Domain Names in Applications (IDNA) protocol, first standardized by the Internet Engineering Task Force (IETF) in 2003 as IDNA2003 (RFC 3490) to handle non-ASCII domain names without modifying the underlying DNS infrastructure.[4] This was updated in 2008 to IDNA2008 (RFCs 5890–5894), which improved security against homograph attacks, supported newer Unicode versions, and refined character validation rules, including bidirectional text handling and context-specific restrictions.[5] Punycode (RFC 3492) serves as the core encoding mechanism, reversibly transforming Unicode strings into ASCII for DNS compatibility, ensuring seamless resolution across legacy systems. Development of IDNs began in the late 1990s through IETF working groups addressing the limitations of ASCII-only domain names, with initial guidelines emerging in 2003.[6] The Internet Corporation for Assigned Names and Numbers (ICANN) endorsed these standards in March 2003 and published its first IDN Implementation Guidelines in June 2003, authorizing generic TLD (gTLD) registries to offer IDNs at the second level.[6] ICANN's IDN ccTLD Fast Track Process, approved in October 2009, led to the delegation of the first IDN country-code TLDs (ccTLDs) into the DNS root zone in 2010, such as .рф for Russia and .الاردن for Jordan.[7] By 2013, IDNs were integrated into the New gTLD Program, expanding availability to generic TLDs like .みんな (Japanese for "everyone").[6] ICANN continues to oversee IDN implementation through community-driven processes, including the Root Zone Label Generation Rules (RZ-LGR), which define permissible scripts and variants to prevent conflicts, with Version 6 released in September 2025 covering 27 scripts.[8] As of June 2025, 61 IDN ccTLDs and 90 IDN gTLDs are operational, totaling 151 IDN TLDs, though adoption varies by region due to factors like script complexity and localization efforts.[9] Ongoing IETF updates, such as RFC 8753 in 2020, ensure IDNA compatibility with evolving Unicode standards, maintaining stability and security.[10]Overview and Purpose
Definition and Technical Scope
An Internationalized Domain Name (IDN) is a domain name that permits the use of a wider range of characters than the traditional ASCII set, specifically incorporating Unicode characters from various scripts to represent labels in languages other than English.[3] These non-ASCII characters are encoded into an ASCII-compatible encoding (ACE) format using Punycode, which transforms the Unicode string into a valid ASCII domain name label prefixed with "xn--", ensuring compatibility with the existing Domain Name System (DNS) infrastructure. This encoding allows IDNs to be registered, resolved, and used without modifications to the core DNS protocols. IDNs support a diverse array of scripts beyond the basic Latin alphabet used in ASCII-only domains, including extended Latin characters (e.g., with diacritics), Cyrillic (e.g., for Russian and Bulgarian), Arabic (e.g., for right-to-left languages), Chinese (e.g., Han ideographs), Devanagari (e.g., for Hindi), and many others defined in the Unicode standard.[1] In contrast, ASCII-only domains are limited to the 26 letters, 10 digits, and hyphen from the US-ASCII repertoire, excluding scripts that require non-Latin glyphs. The scope of supported scripts is determined by the Unicode Consortium's character properties and ICANN's guidelines, which evaluate scripts for stability, confusability, and linguistic viability in domain labels.[1] IDNs integrate seamlessly into the hierarchical structure of the DNS, where domain names are parsed from right to left across zones (e.g., top-level domains, second-level domains), with each label encoded as an A-label in the DNS zone files and wire format.[3] This approach preserves the DNS's reliance on ASCII for transmission and storage, mapping U-labels (the user-facing Unicode form) bidirectionally to A-labels during resolution without requiring protocol changes. Applications handle the conversion transparently, displaying U-labels to users while querying the DNS with A-labels. A critical aspect of IDN validity involves Unicode normalization, particularly Normalization Form C (NFC), which canonically decomposes and recomposes characters to ensure consistent representation (e.g., combining diacritics into precomposed forms like "é" instead of separate "e" and acute accent).[3] Strings must be normalized to NFC before processing to prevent equivalence issues, such as multiple encodings of the same label (e.g., NFC vs. NFD forms), thereby maintaining uniqueness and security in registrations. Normalization Form D (NFD) may appear in input but is converted to NFC for IDNA compliance.[3]Historical Context and Benefits
In the pre-IDN era, the Domain Name System (DNS) was limited to the ASCII character set, which supported only Latin-based scripts and effectively excluded non-English languages such as Arabic, Chinese, Cyrillic, and Devanagari from domain names.[11] This restriction forced users in non-Latin script regions to rely on transliteration, approximating native terms with Roman characters, which often resulted in ambiguities, misspellings, and challenges for accurate domain entry, particularly in the 1990s for languages like Japanese and Arabic.[12] For instance, Japanese users faced difficulties with romanized addresses that did not intuitively match their native script, hindering effective internet navigation.[12] The drive for IDNs accelerated with the explosive growth of internet adoption in non-English speaking regions after 1995, including Asia and the Middle East, where billions of potential users encountered barriers due to the English-centric DNS.[12] Early multilingual initiatives, such as the Tamil Internet project in 1995 and Chinese script-based email systems that same year, underscored the urgency for native script support in web addressing.[12] In response, the Internet Engineering Task Force (IETF) and the Internet Corporation for Assigned Names and Numbers (ICANN) began addressing these multilingual needs; the IETF formed its IDN Working Group in 2000 to standardize approaches, while ICANN collaborated through the Multilingual Internet Names Consortium (MINC), founded that July, to advocate for global inclusivity.[12] Pioneering proposals emerged in 1998, including an IETF draft from researchers at the National University of Singapore that outlined internationalizing host names via UTF-5 encoding to enable non-ASCII characters in domains.[12] Concurrently, the Asia Pacific Networking Group (APNG) launched a testbed in the second half of 1998, providing one of the first practical demonstrations of IDN functionality and paving the way for broader experimentation.[12] IDNs offer key benefits by enhancing usability for native speakers, allowing them to enter and recall domain names in familiar scripts like Cyrillic or Hangul, which simplifies online navigation compared to ASCII transliterations.[11] They promote cultural relevance by enabling domains that reflect local identities and languages, such as .рф for Russian content or .中国 for Chinese users, fostering a more representative digital presence.[13] Furthermore, IDNs reduce entry errors for non-English users by eliminating the need to approximate scripts, and they advance digital inclusion by empowering approximately 75% of the global internet population—who primarily use non-English languages (as of 2024)—to participate fully in the online ecosystem.[14][11]Technical Standards
IDNA Protocol Fundamentals
The Internationalized Domain Names in Applications (IDNA) protocol suite provides the foundational standards for incorporating non-ASCII characters into domain names, enabling their use across the internet while preserving compatibility with the existing Domain Name System (DNS), which is limited to ASCII characters.[4] Developed by the Internet Engineering Task Force (IETF), IDNA operates at the application layer, converting internationalized labels into an ASCII-Compatible Encoding (ACE) form that can be processed by DNS resolvers without requiring modifications to the DNS protocol itself.[3] This approach ensures that domain names in scripts such as Arabic, Chinese, or Cyrillic can be registered, resolved, and displayed seamlessly in user-facing applications.[5] A key prerequisite for IDNA is the Unicode standard, which defines a universal character repertoire encompassing over 140,000 characters from various writing systems, serving as the basis for representing internationalized labels.[5] IDNA relies on Unicode code points (ranging from 0 to 0x10FFFF in IDNA2008) to identify valid characters, with normalization to Unicode Normalization Form C (NFC) often applied in applications to ensure consistent representation.[3] Without this Unicode foundation, the protocol could not systematically map diverse scripts to DNS-compatible formats.[4] The initial IDNA2003 specification, outlined in RFC 3490 along with supporting documents RFC 3491 (Nameprep), RFC 3492 (Punycode), and RFC 3454 (Stringprep), introduced the core mechanism for encoding Unicode labels into Punycode-based ACE strings prefixed with "xn--", allowing backward compatibility with ASCII-only systems.[4] In contrast, IDNA2008, detailed in RFC 5890 (definitions and framework), RFC 5891 (protocol), RFC 5892 (character mapping tables), RFC 5893 (right-to-left stability), and RFC 5894 (internationalizing registrations), obsoletes much of IDNA2003 by removing dependencies like Stringprep, rejecting unassigned Unicode code points, and introducing stricter rules for context-dependent characters such as zero-width joiners.[5] Transitioning between versions posed challenges, including interoperability issues for existing registrations, as IDNA2008 alters the validity of certain labels (e.g., disallowing some symbols previously permitted) and shifts normalization responsibilities to applications rather than the protocol core.[3] At its heart, IDNA's principles revolve around bidirectional conversion between user-readable Unicode (U-labels) and DNS-transmittable ACE (A-labels), ensuring that only valid, non-problematic characters are processed through prohibition lists that exclude disallowed code points like private-use characters or those causing visual confusion.[15] These lists, defined in IDNA2008's mapping tables, prevent invalid or ambiguous strings from entering the DNS, promoting stability and security.[5] In applications such as web browsers, email clients, and DNS resolvers, IDNA facilitates this by processing input strings to generate ACE for queries and reversing the process for output, thereby supporting IDNs without exposing users to the underlying encoding.[3] This application-centric design allows global deployment of IDNs while minimizing disruptions to the ASCII-dominated internet infrastructure.[4]ToASCII and ToUnicode Processes
The ToASCII and ToUnicode processes form the core bidirectional conversion mechanisms in the Internationalized Domain Names in Applications (IDNA) protocol, enabling the transformation of Unicode-based domain labels (U-labels) into ASCII-Compatible Encoding (ACE) format for DNS compatibility and vice versa.[3] These algorithms ensure that non-ASCII labels can be registered and resolved in the DNS while maintaining reversibility, with ToASCII producing an A-label from a U-label and ToUnicode performing the inverse operation.[5] The processes differ between IDNA2003 (RFC 3490) and IDNA2008 (RFC 5891), with the latter introducing stricter validity rules and eliminating certain mappings present in the former.[16] In IDNA2003, the ToASCII algorithm processes an input sequence of Unicode code points with optional flags for allowing unassigned code points and enforcing STD3 ASCII rules.[17] First, if the input consists entirely of ASCII characters (code points 0x00-0x7F), it proceeds directly to length validation; otherwise, it applies the Nameprep normalization profile (RFC 3491), which includes case mapping, normalization, and prohibition of certain characters, failing if errors occur.[17] Next, if the UseSTD3ASCIIRules flag is set, it verifies compliance with STD3 restrictions, such as excluding non-LDH (Letter-Digit-Hyphen) ASCII characters (e.g., excluding code points like 0x00-0x2C) and prohibiting leading or trailing hyphens (U+002D).[17] For inputs containing non-ASCII code points, it confirms the sequence does not begin with the ACE prefix "xn--", then encodes the normalized sequence using Punycode (RFC 3492), appending the "xn--" prefix upon success, and finally checks that the resulting string length is between 1 and 63 characters.[17] Failure at any step results in an error, preventing invalid labels from proceeding.[17] IDNA2008 refines ToASCII with a more rigorous structure divided into preparation, validity, and encoding phases, emphasizing Unicode Normalization Form C (NFC) as input and removing the mapping steps from Nameprep.[18] Preparation ensures the input is in NFC and identifies whether it is a U-label (containing non-ASCII characters) or A-label.[19] Validity checks (Section 4.2) are stricter than in IDNA2003: the label must contain only permitted code points from the Protocol Valid categories (excluding DISALLOWED and UNASSIGNED per RFC 5892), with no leading or trailing hyphens, no "--" in the third and fourth positions, and no leading combining marks.[20] Additionally, it enforces contextual rules, such as CONTEXTJ (prohibiting certain characters like U+200C ZERO WIDTH NON-JOINER in joiner contexts unless permitted) and CONTEXTO (for other disallowed contexts), as defined in RFC 5892.[21] For labels involving bidirectional (Bidi) scripts, it applies Bidi rules from RFC 5893, requiring that right-to-left characters (e.g., Arabic or Hebrew) have matching left-to-right characters, the first character is right-to-left or LRI/RLI, and the last is right-to-left, among other criteria to prevent visual spoofing.[22] Upon passing validity, non-ASCII U-labels are encoded via Punycode to produce an A-label, prefixed with "xn--".[23] Unlike IDNA2003, which allowed mappings to normalize inputs (e.g., case folding or canonical equivalents), IDNA2008 rejects invalid inputs outright without mapping, ensuring greater consistency but potentially higher rejection rates.[16] The ToUnicode algorithm reverses ToASCII, converting an A-label back to a U-label while validating inputs to maintain protocol integrity.[24] It first checks if the input is an A-label by verifying the "xn--" prefix and Punycode validity; if not prefixed or decoding fails, it treats the input as a valid U-label (assuming all-ASCII) without alteration.[25] For valid A-labels, Punycode decoding yields a Unicode string in NFC, which then undergoes the same validity checks as ToASCII, including code point categories, hyphen rules, contextual (CONTEXTJ/CONTEXTO), and Bidi rules.[26] Invalid inputs—such as non-NFC forms, prohibited code points, or failing contextual/Bidi tests—result in failure, returning the original input unchanged as a fallback to avoid breaking legacy ASCII domains.[26] In IDNA2003, ToUnicode was simpler, relying on Punycode decoding without the extensive validity rules of 2008, and it did not enforce NFC or Bidi checks explicitly.[27] This symmetric design in both versions ensures that ToUnicode(ToASCII(U-label)) recovers the original U-label if valid, supporting reliable DNS operations.[5]Encoding and Normalization Examples
Internationalized domain names (IDNs) require normalization to ensure consistency across different input methods and systems, typically using Unicode Normalization Form C (NFC), which composes characters where possible to create a canonical representation.[28] This step precedes encoding into Punycode, the ASCII-Compatible Encoding (ACE) format prefixed with "xn--", allowing non-ASCII characters to be represented in the Domain Name System (DNS). For instance, the domain "café.com", where "é" is the precomposed Latin small letter e with acute (U+00E9), normalizes directly to NFC and encodes to "xn--caf-dma.com".[28] If entered in decomposed form as "caf\u0065\u0301.com" (e followed by combining acute accent), NFC recombines it to the same precomposed "é", yielding identical Punycode output and preventing duplicate registrations.[28] For scripts with bidirectional text like Arabic, normalization and encoding must also adhere to bidirectional rules to maintain readability and security. The domain "مثال.مثال" (meaning "example.example" in Arabic) normalizes to NFC, ensuring consistent character composition, and encodes to "xn--mgbh0fb.xn--mgbh0fb".[29] Arabic labels, being right-to-left (RTL), must start and end with a strong left-to-right character or an RTL character permitted for IDNA, and the overall direction cannot mix LTR and RTL in ways that violate the Bidi Rule, such as prohibiting LTR characters in RTL labels without proper framing. This prevents visual spoofing attacks where mirrored characters could confuse users. Edge cases highlight normalization's role in handling variations. Combining characters are generally disallowed in IDNA unless they map to a single code point under NFKC normalization, but permitted ones like certain diacritics in NFC form are encoded normally; for example, a domain with a valid combining mark like "résumé.com" (with acute on e) encodes to "xn--rsum-dma0p.com" after case folding to lowercase.[28] Case folding maps uppercase to lowercase (e.g., "Café.com" becomes "café.com" before encoding), ensuring domain insensitivity to case, as per Unicode's case-folding algorithm. Invalid sequences, such as disallowed characters (e.g., spaces or emojis) or unmapped combining marks, trigger errors in the ToASCII process, rejecting the label; for instance, input with a prohibited character like Greek final sigma in isolation fails validation. Standard libraries facilitate verification and implementation of these processes. In Python, theidna module (implementing IDNA2008 with UTS #46 compatibility) can encode "café.com" using idna.encode('café.com'), returning b'xn--caf-dma.com', and handles normalization automatically.[30] Similarly, idna.decode(b'xn--mgbh0fb.xn--mgbh0fb') recovers the Arabic "مثال.مثال", demonstrating round-trip consistency while flagging edge cases like invalid Bidi directions.[30]
Implementation Frameworks
ICANN Guidelines and Updates
The Internationalized Domain Name (IDN) Guidelines were first published in June 2003 with version 1.0, with subsequent updates including version 2.1 in February 2006, establishing initial standards for second-level IDNs in generic top-level domains (gTLDs), emphasizing compliance with IETF protocols and measures to prevent script mixing unless linguistically justified. Over the subsequent years, the guidelines underwent iterative updates to address emerging challenges in global deployment, culminating in version 4.0 proposals that led to version 4.1, approved on 22 September 2022 and published in November 2022. Version 4.1, which defers certain elements from 4.0 such as specific variant allocation rules (guidelines 11, 12, 13), became effective with full compliance required from registry operators by 30 April 2025, as announced on 28 October 2024. This evolution prioritizes enhanced variant handling to mitigate user confusion and ensures operational stability across diverse scripts.[31] Key components of the guidelines impose specific requirements on TLD registries to maintain integrity and security. Registries must validate IDN labels in strict adherence to the IETF's IDNA 2008 protocol (RFCs 5890–5893), prohibiting disallowed code points like hyphens in the third or fourth positions except in A-labels, and publish their supported Unicode code point repertoires in the IANA repository.[31][32] For variant bundling, registries are mandated to allocate variant labels only to the same registrant or block them entirely, promoting the "same entity" principle to avoid fragmentation and abuse.[31] Display rules further require that all code points within a label belong to the same Unicode script per Annex #24, with limited exceptions for established orthographies, while minimizing risks from homoglyphs and whole-script confusables as defined in Unicode Technical Reports #36 and #39.[31][33][34] In October 2024, the Expedited Policy Development Process (EPDP) on IDNs Phase 2 released its final report, adopted by the GNSO Council on 13 November 2024, integrating rights protection mechanisms tailored to IDN contexts.[35] This update aligns existing tools like the Uniform Domain-Name Dispute-Resolution Policy (UDRP), Uniform Rapid Suspension System (URS), and Trademark Clearinghouse (TMCH) with IDN variants, ensuring that suspensions or transfers under these mechanisms encompass entire variant sets while upholding the "same entity" principle, without expanding TMCH matching to include variants beyond exact matches.[35] It also mandates harmonized IDN tables across variant gTLDs and outreach to educate stakeholders on variant impacts in dispute resolution.[35] In October 2025, ICANN launched a public comment on string similarity evaluation data for the next gTLD round, focusing on IDN variant assessments to enhance security and usability.[36] Recent advancements include ICANN's Universal Acceptance (UA) initiatives, which aim to guarantee seamless IDN compatibility across software applications and systems. By July 2025, ICANN achieved a milestone in UA by enabling its account systems to fully support internationalized email addresses (Email Address Internationalization, or EAI), allowing sending and receiving of emails with non-ASCII domains.[37] These efforts, ongoing through 2025, involve evaluating software readiness and forming expert working groups to develop implementation guidelines, ensuring that IDNs and new TLDs are treated equally in global digital infrastructure.[38] Complementing these guidelines, Root Zone Label Generation Rules (LGRs) provide script-specific tools for consistent label validation.[39]Root Zone Label Generation Rules (LGRs)
The Root Zone Label Generation Rules (RZ-LGRs) serve as standardized rulesets that define the permissible code points, variants, and validation criteria for Internationalized Domain Name (IDN) labels in the DNS root zone. These rules ensure a secure and stable operation of the root zone by specifying which characters from various scripts can form valid top-level domains (TLDs) and their associated variants, thereby minimizing risks of label confusion across different writing systems. For instance, in the Chinese (Han) script, LGRs generate variants that account for differences between simplified and traditional forms, allowing related labels to be treated as equivalents or blocked to prevent conflicts.[40] The development of RZ-LGRs follows a structured procedure involving script-specific Generation Panels composed of experts from relevant linguistic communities, who propose rules tailored to each writing system based on Unicode standards. These proposals are then reviewed and integrated by a centralized Integration Panel, appointed by ICANN, which ensures consistency across scripts while adhering to core principles such as stability, inclusion, and conservatism. The process includes public comment periods for transparency, culminating in ICANN's approval and publication of the unified ruleset. A notable example is the release of RZ-LGR-6 in September 2025, following a public comment proceeding initiated in June 2025, which integrated the Thaana script and provided updates for Bangla (Bengali), Japanese, and Khmer scripts to refine variant handling and code point repertoires.[41][8] Reference LGRs, which provide baseline rules adaptable for both root and second-level domains, have expanded to include new scripts and languages, such as the additions of Balinese, Thaana, and Inuktitut in November 2024, bringing the total to 27 script-based and 32 language-based reference LGRs. As of June 2025, over 11,000 IDN tables—each representing permitted code points for specific scripts or languages—have been published in the IANA Repository, reflecting the cumulative output of these rulesets and supporting global IDN deployment.[42][43] Developing LGRs presents challenges in balancing inclusivity, which promotes broad representation of scripts and languages to foster a multilingual Internet, with the need for stability in multi-script environments to avoid usability issues or security vulnerabilities like visual similarity attacks. The Integration Panel's methodology applies principles of inclusion alongside stability and conservatism to reconcile community-driven proposals, ensuring that expansions do not compromise DNS integrity.[44]Global Deployment
IDN Top-Level Domains (TLDs)
Internationalized country code top-level domains (IDN ccTLDs) are managed through ICANN's Fast Track Process, which was launched on November 16, 2009, to enable eligible countries and territories to request and deploy non-Latin script TLDs representing their names in local languages.[45] This process involves rigorous string evaluation to ensure stability and security, including checks for visual similarity to existing TLDs and adherence to script-specific guidelines. As of October 2024, Libya's Arabic-script IDN ccTLD .ليبيا successfully completed string evaluation and became eligible for delegation, marking progress in the process to add support for Arabic-speaking users in the region.[46] For generic top-level domains (gTLDs), IDN variants were introduced as part of ICANN's 2012 New gTLD Program, allowing applicants to propose TLDs in scripts beyond Latin, such as Cyrillic and Chinese. By 2025, 90 IDN gTLDs had been delegated, including prominent examples like .рф for Russia (delegated in 2010 to represent "RF" in Cyrillic) and .中国 for China (introduced to denote the country in Simplified Chinese characters).[47] These delegations expand the global DNS to better serve non-English-speaking communities. Overall, as of June 2025, there are 151 IDN TLDs in the root zone—comprising 61 IDN ccTLDs and 90 IDN gTLDs—out of a total of 1,440 TLDs, covering 37 languages and 23 scripts. Registry operators for these IDN TLDs must comply with ICANN's operational requirements, including standardized registry agreements that mandate support for Internationalized Domain Names (IDNs) at the second level and proper handling of code point variants to prevent conflicts and ensure interoperability.[48] Variant management involves harmonizing IDN tables across related TLDs and integrating bundling mechanisms where applicable, as outlined in ICANN's IDN Variant TLD Implementation recommendations.[49]Registration Statistics and Adoption Trends
As of June 2025, there were approximately 4.4 million Internationalized Domain Name (IDN) registrations across all top-level domains (TLDs) worldwide.[9] In generic TLDs (gTLDs), second-level IDN registrations stood at 1.396 million as of March 2025, reflecting a decline of 4.84% from 1.467 million the previous year.[9] The distribution of IDN registrations in gTLDs highlights dominance by certain scripts, with Chinese accounting for 49% (about 681,000 registrations), followed by Latin script extensions at 28% (393,000 registrations), Cyrillic at roughly 65,000, and Arabic at 14,000.[9] Regionally, adoption is concentrated in Asia, with seven root server providers supporting IDN services, and Europe, hosting ten such providers, underscoring these areas as key hotspots for multilingual domain deployment.[9] Adoption trends reveal a contrast between country-code TLDs (ccTLDs) and gTLDs: while 61 IDN ccTLDs demonstrate slow but steady growth, gTLD registrations continue to decline amid broader market shifts.[9] The push for Universal Acceptance—ensuring systems handle non-ASCII characters seamlessly—has played a pivotal role in bolstering IDN uptake by addressing compatibility barriers in applications and networks.[9] Looking ahead, the ICANN IDN Annual Report 2025 projects continued multilingual expansion through ongoing development of Label Generation Rules (LGRs) for additional scripts and languages, aiming to further integrate IDNs into the global domain ecosystem.[9]| Script | Percentage (gTLDs) | Approximate Registrations (March 2025) |
|---|---|---|
| Chinese | 49% | 681,000 |
| Latin extensions | 28% | 393,000 |
| Cyrillic | ~5% | 65,000 |
| Arabic | ~1% | 14,000 |