Fact-checked by Grok 2 weeks ago

Unicode block

A Unicode block is a contiguous range of code points in the Unicode codespace, normatively defined to organize characters associated with a particular , symbol set, or functional category, such as the Basic Latin block (U+0000–U+007F) for standard ASCII characters or the block (U+4E00–U+9FFF) for characters. These blocks serve as the primary structural units for encoding and grouping 159,801 assigned characters across 336 blocks in Unicode 17.0, facilitating efficient text processing, rendering, and implementation in software and fonts. Unicode blocks were introduced as part of the Unicode Standard's design to divide the 1,114,112-code-point codespace into manageable, non-overlapping ranges, each with a unique, case-insensitive name that ignores variations in whitespace, hyphens, or underscores for matching purposes. While most blocks align with a single script or symbol collection—such as the Arabic block (U+0600–U+06FF) for right-to-left text or the Musical Symbols block (U+1D100–U+1D1FF) for notation—some scripts like Latin span multiple blocks to accommodate extensions for diacritics, historical forms, and regional variants. Blocks may include unassigned or reserved code points for future expansions, and their boundaries typically align with multiples of 16 in the hexadecimal code point system to optimize storage and lookup. The exact allocation and names are specified in the Unicode Character Database's Blocks.txt file, ensuring stability and interoperability across versions of the standard. In practice, Unicode blocks underpin key aspects of , including font design, input methods, and algorithms like the Unicode Bidirectional Algorithm for mixed-script text. They enable developers to reference character groups programmatically via the normative property, though this property alone does not determine semantic or behavioral traits—such as directionality or combining —which require consultation of the full Unicode Character Database. For unassigned code points outside defined blocks, the default value is "No_Block," highlighting the standard's forward-compatible design. This organization supports 172 scripts and diverse symbol sets, from ancient systems like (U+14400–U+1467F) to modern emojis in the Emoticons block (U+1F600–U+1F64F), promoting global digital communication.

Overview

Definition

A Unicode block is a named grouping of contiguous code points within the Unicode codespace, designed to organize characters for reference and documentation purposes by the . These blocks consist of non-overlapping ranges that partition the assignable code points, excluding the surrogate range U+D800–U+DFFF reserved for encoding mechanisms like UTF-16. The size of a Unicode block is always a multiple of 16 code points and can vary from the minimum of 16 to the maximum of 65,536 code points, aligning with the structure of planes for efficient allocation. For instance, the Basic Latin block occupies the range U+0000–U+007F, encompassing 128 code points for fundamental characters. Each block receives a descriptive name reflecting its typical contents, such as "" or "," though the grouping is informal and non-semantic, without implying specific properties or behaviors for the characters it contains. In Unicode 17.0, 336 blocks are defined, collectively spanning the Unicode codespace excluding the surrogate range (totaling 1,112,064 code points from U+0000 to U+10FFFF across the 17 planes).

Purpose

Unicode blocks provide a structured mechanism for grouping related characters within the Unicode codespace, enabling intuitive organization by script, symbol type, or thematic category. For instance, the "" block encompasses characters shared across , , and writing systems, facilitating easier reference and comprehension of the standard's expansive character set. This grouping establishes a coarse but practical framework for managing 159,801 assigned characters as of Unicode 17.0, promoting accessibility in documentation and implementation. These blocks support key practical utilities in and user interfaces. In font design, thematic chunks allow developers to prioritize glyph creation for specific categories, streamlining the support for diverse scripts and symbols without requiring comprehensive coverage of the entire . Similarly, blocks enhance functionalities and input methods by enabling categorized browsing in tools like character maps, where users can select from predefined groups rather than scanning individual points. The normative Block property, detailed in the Unicode Character Database, further empowers efficient text processing and storage. In databases and text processors, it permits filtering operations to isolate characters by block, optimizing queries and segmentation for applications such as and . For regular expressions, block-based patterns offer a quick means to match sets of characters, though with caveats regarding precision. Notably, this property's non-binding nature means blocks do not enforce semantic rules but instead bolster user-oriented features like character pickers, providing organizational aids independent of stricter script alignments—though blocks frequently correspond to scripts, with some spanning multiple blocks for broader coverage.

History

Early development

The Unicode Standard version 1.0, released in October 1991, introduced the concept of character blocks as a means to organize code points within its 16-bit encoding space, specifically the spanning U+0000 to U+FFFF. This structure was directly inspired by the design of plane 0 in the emerging ISO/IEC 10646 standard, aligning 's initial repertoire with international efforts to create a universal character encoding. The was prioritized to contain the most commonly used characters, facilitating efficient implementation in software and hardware while reserving space for future growth. Unicode 1.0 defined an initial set of 15 blocks, emphasizing major world scripts to provide broad multilingual support from the outset. These included blocks for Latin (Basic Latin, Latin-1 Supplement, and extensions), Greek, Cyrillic, Armenian, Hebrew, Arabic, Thai, Georgian, Hangul Jamo, and provisions for CJK ideographs, which were detailed in a supplementary volume. This selection reflected a strategic focus on scripts with significant global usage, enabling the encoding of text from diverse languages without immediate fragmentation. The rationale behind this block-based organization stemmed from the Unicode Consortium's foundational objectives, established in , to achieve compatibility with prevalent legacy encodings while supporting extensibility for unforeseen needs. For instance, the Basic Latin block (U+0000–U+007F) directly mirrored ASCII to ensure seamless integration with existing Western computing systems. At the same time, the modular block design allowed for logical grouping of related characters, simplifying font development, searching, and processes, and providing a framework for adding new scripts without disrupting established allocations. A pivotal event shaping this early framework was the 1991 merger between the initiative and ISO/IEC JTC1/SC2 efforts on 10646, following negotiations that resolved discrepancies in character repertoires and encoding forms. Meetings in October 1991 produced mutually acceptable adjustments, resulting in unified planning that synchronized Unicode's with ISO 10646's plane 0 and prevented divergent standards. This collaboration ensured that definitions would evolve cohesively, laying the groundwork for international adoption. Subsequent versions after 2.0 expanded this structure significantly, incorporating additional planes and scripts.

Evolution through versions

The Unicode Standard , released in July 1996, significantly expanded the encoding space by formalizing support for multiple planes beyond the (), including the introduction of supplementary planes via surrogate mechanisms in UTF-16, and added the block (U+AC00–U+D7A3) to enable precomposed representation of over 11,000 syllables. This version established a foundational structure of approximately 68 blocks, facilitating broader script coverage while maintaining compatibility with prior encodings. Subsequent versions built on this framework with targeted expansions. Unicode 3.0, published in September 2000, added 18 new blocks supporting 12 scripts, including historic ones such as (U+1680–U+169F) and Runic (U+16A0–U+16FF), alongside modern scripts like (U+13A0–U+13FF) and Mongolian (U+1800–U+18AF), contributing to over 10,000 new characters overall. Unicode 5.0, released in July 2006, further diversified symbol support by introducing new symbol blocks, while adding five new scripts including Phoenician (U+10900–U+1091F) and nine new blocks in total. A pivotal shift occurred in Unicode 6.0 (October 2010), which incorporated the first dedicated Emoji blocks—Emoticons (U+1F600–U+1F64F) and the expanded (U+1F300–U+1F5FF)—to accommodate over 1,000 symbolic characters for digital communication, reflecting growing needs in mobile and web technologies. From 8.0 (June 2015) onward, the adopted an annual release cadence to accelerate inclusion of underrepresented scripts, such as Ahom (U+11700–U+1173F) and Multani (U+11280–U+112AF) in version 8.0, prioritizing linguistic diversity in regions like and . By Unicode 17.0, released in September 2025, the standard encompasses 336 blocks and incorporates 4,803 new characters across four additional scripts, including Sidetic and Tolong Siki, underscoring ongoing efforts to document endangered writing systems. Overall growth has been substantial, evolving from roughly 15 blocks in version 1.0 to 336 today, with approximately 80% allocated to the and to optimize common usage while reserving space for future extensions. This progression highlights a pattern of incremental, script-focused additions, with occasional relocations of blocks to refine organization without disrupting compatibility.

Design and properties

Code point allocation

Unicode blocks are defined as fixed, non-overlapping ranges of code points within the Unicode codespace, which spans from U+0000 to U+10FFFF. Each block begins at a code point that is a multiple of 16 (such as U+0000 or U+0100) and ends 15 code points later in alignment, ensuring compatibility with code chart layouts that organize characters in 16-column grids. For instance, the Basic Latin block occupies U+0000–U+007F, encompassing 128 code points for fundamental Latin characters. This structure guarantees that blocks partition the codespace without gaps or overlaps, except for designated reservations, and unassigned code points fall under the "No_Block" category if not explicitly covered. The allocation of code points into blocks is managed by the Unicode Consortium through a rigorous proposal process. Proposals for new scripts or characters are submitted using a standardized form, evaluated initially by technical officers, and then reviewed by the Unicode Technical Committee (UTC) to assess compatibility with existing encodings, script requirements, and stability policies. Approved allocations prioritize needs for specific writing systems while incorporating reservations, such as unassigned gaps within blocks, to accommodate future expansions without disrupting prior assignments. This process ensures that blocks are assigned in contiguous ranges aligned to the 16-code-point boundary, maintaining the overall organization of the standard. Block sizes vary to accommodate diverse character repertoires: most blocks contain 256 code points (a full 16x16 ), but smaller or larger ranges exist based on content density, such as 128 code points for many Latin extensions or 20,992 code points for the block (U+4E00–U+9FFF) to support East Asian logographic scripts. The total allocatable codespace excludes the range U+D800–U+DFFF (2,048 code points), which is reserved for UTF-16 encoding mechanics and not assigned to s. Additionally, certain noncharacter code points, like U+FDD0–U+FDEF and plane-ending U+FFFE–U+FFFF, are permanently unassigned for interoperability purposes. The codespace is divided into 17 planes of 65,536 code points each, influencing block placement: Plane 0, the Basic Multilingual Plane (BMP, U+0000–U+FFFF), hosts blocks for widely used modern scripts like Latin, Greek, and Cyrillic to facilitate legacy system compatibility. Planes 1 through 16 cover supplementary areas, with Plane 1 (Supplementary Multilingual Plane, U+10000–U+1FFFF) for historic scripts and symbols, Planes 2 and 3 for additional CJK ideographs, Plane 14 for special-purpose characters, and Planes 15–16 reserved for private use. This plane-based organization guides block allocation, prioritizing the BMP for accessibility while reserving higher planes for less common or emerging content. The Blocks property, part of the Unicode Character Database, normatively identifies the block containing any given code point, aiding in implementation and analysis.

Block metadata

The Block property is a catalog property in the Unicode Character Database (UCD) that assigns a string value to each , identifying the named to which it belongs, such as "Basic Latin" for code points U+0000 to U+007F. This property facilitates the organization and querying of characters by grouping them into contiguous ranges, independent of other properties like the General Category. Metadata for Unicode blocks is primarily documented in the Blocks.txt file, which enumerates each block's hexadecimal range, official name, and the Unicode version (age) in which it was introduced, formatted as "start..end; Block Name # [Vmajor.minor]". For instance, the entry for the Basic Latin block appears as "0000..007F; Basic Latin # [V1.1]", indicating its assignment in 1.1. Unassigned code points default to the value "No_Block". Unicode enforces a stability policy for blocks, guaranteeing that once a range is assigned to a block in a published version, it will not be changed, reallocated, or removed in future versions to ensure backward compatibility for implementations. This policy, detailed in Section 2.3 of UAX #44, applies to both the ranges and the associated block names, preventing disruptions in software, fonts, and data processing that rely on these assignments. Block names follow descriptive , typically based on the primary , symbol set, or functional category they represent, such as "Ethiopic" for the Ge'ez or "Mathematical Operators" for symbolic characters. Names are unique, use with hyphens or spaces for readability, and are ASCII-compatible to support legacy systems; for compatibility, abbreviations and long names are provided as aliases in the PropertyValueAliases.txt . The versioning of blocks, or "block age," is directly tied to the Unicode Standard version in which the block was first defined, denoted in formats like "7.0" for blocks added in Unicode 7.0. This age information in Blocks.txt and related derived files like DerivedAge.txt allows developers to track the temporal evolution of the encoding space without affecting the stability of existing assignments.

Classifications and relations

Connection to scripts and categories

Unicode blocks serve as organizational ranges of code points within the Unicode standard, but they are distinct from other character properties such as the Script property and the General Category property, which are assigned on a per-character basis rather than applying to entire blocks. This independence allows blocks to provide a coarse grouping mechanism for code points, while scripts and general categories enable finer-grained semantic analysis essential for text processing. For instance, the Script property identifies the writing system associated with a character, such as "Latin" or "Greek," and is defined in the Scripts.txt data file as a normative property. In contrast, blocks like Basic Latin (U+0000–U+007F) are normative ranges specified in Blocks.txt, without inherent ties to these properties. A key distinction arises in how scripts often span multiple blocks, illustrating their orthogonality to block boundaries. The , for example, extends across several blocks including Basic Latin, (U+0080–U+00FF), and (U+0100–U+017F), encompassing characters used in various Latin-based languages. Similarly, the script is largely contained within the dedicated Greek block (U+0370–U+03FF), providing a closer alignment in this case, though it may incorporate characters from the script category for shared elements like . However, exceptions occur in blocks dedicated to symbols or ideographs, where multiple scripts may intermingle; for instance, the block (U+4E00–U+9FFF) includes characters from the script but also supports extensions for compatibility with other East Asian scripts. The General Category property further underscores this per-character granularity, classifying individual code points into values such as "Lu" (uppercase letter), "Ll" (lowercase letter), "Sm" (math symbol), or "So" (other symbol), as detailed in the UnicodeData.txt file. Unlike blocks, which do not impose uniform categories, symbol-related blocks like (U+2600–U+26FF) typically mix categories, including "So" for decorative icons and "Sm" for technical symbols, reflecting diverse semantic roles within a single range. This mixing prevents block-wide generalizations and emphasizes the need for character-level evaluation. In practice, blocks facilitate broad navigation and allocation in implementations, while scripts and general categories support precise operations in Unicode algorithms. For normalization, the Unicode Normalization Forms rely on general categories to decompose or compose characters, independent of block structure. Collation processes, such as those in the Default Unicode Collation Element Table (DUCET), group and order characters by for accurate across languages, using script extensions to handle overlaps without reference to blocks. Thus, these properties complement blocks by enabling semantic processing in applications like text rendering and .

Organization across planes

The Unicode Standard organizes its code points into 17 planes, each comprising 65,536 consecutive positions, to provide a structured framework for character allocation. Plane 0, known as the Basic Multilingual Plane (BMP) and spanning U+0000 to U+FFFF, serves as the foundational layer, hosting 163 blocks that encompass legacy encodings and widely used scripts essential for everyday text processing. These blocks include core sets like Basic Latin (U+0000–U+007F) for English and Western European languages, (U+4E00–U+9FFF) for , , and hanzi/kanji/hanja, and (U+AC00–U+D7A3) for , accommodating approximately 63,951 assigned characters that form the bulk of common global communication needs. Plane 1, the Supplementary Multilingual Plane (SMP) from U+10000 to U+1FFFF, extends this structure with 134 blocks dedicated to historic, rare, and supplementary scripts, enabling representation of less frequently used writing systems. Notable examples include (U+13000–U+1342F) for ancient Egyptian texts and Linear B Syllabary (U+10000–U+1007F) for inscriptions, supporting scholarly and cultural preservation efforts without disrupting the BMP's focus on modern usage. This plane's blocks often group thematically by script type, such as ancient symbols or minority languages, to facilitate efficient implementation in software. Higher planes address specialized requirements beyond primary scripts. Plane 2, the Supplementary Ideographic Plane (SIP) covering U+20000 to U+2FFFF, contains 7 blocks primarily for extensive CJK ideograph extensions, such as CJK Unified Ideographs Extension B (U+20000–U+2A6DF), which vastly expands East Asian character repertoires for rare or historical variants. Plane 3, the Tertiary Ideographic Plane (TIP) from U+30000 to U+3FFFF, holds 3 blocks for further CJK expansions, including CJK Unified Ideographs Extension G (U+30000–U+3134A), prioritizing technical and symbolic needs over broad script coverage. Planes 2 and 3 together allocate substantial code space to ideographic density, with large blocks reflecting the scale of these scripts. Planes 4 through 13 remain largely unallocated, serving as reserved areas for potential future expansions in scripts or symbol sets, ensuring long-term scalability of the encoding model. 14 (U+E0000–U+EFFFF) features 2 blocks for supplementary features like Tags (U+E0000–U+E007F) used in language tagging and Variation Selectors Supplement (U+E0100–U+E01EF) for glyph customization. Planes 15 and 16 (U+F0000–U+10FFFF) are designated for , with single large blocks each—Supplementary Private Use Area-A (U+F0000–U+FFFFD) and Supplementary Private Use Area-B (U+100000–U+10FFFD)—allowing vendors and users to define custom characters without conflicting with standardized assignments. Overall, approximately 88% of 17.0's 336 blocks reside in Planes 0 and 1, concentrating the majority of script diversity while leaving intentional gaps across all planes for anticipated growth in global writing systems. This hierarchical distribution balances immediate compatibility with forward-looking extensibility, with thematic groupings within planes aiding developers in handling diverse linguistic data.

Implementation

Software handling

Software systems query the Unicode block property of a code point to determine its grouping for processing purposes. In , the Character.UnicodeBlock class provides static methods like getBlock(char c) and getBlock(int codePoint) to retrieve the block instance containing a given or code point, with block names derived from the Unicode standard's Blocks.txt file. Similarly, Python's unicodedata includes the block(code) , which returns the name of the block for a specified Unicode code point as a . In text processing applications, Unicode blocks facilitate filtering and validation tasks, such as restricting input to specific scripts for localization. For instance, developers can filter text to include only characters from blocks to support , ensuring compatibility in multilingual interfaces or routines. This approach is particularly useful in pipelines, where block-based scoring helps filter corpora by excluding unwanted scripts or symbols. Unicode blocks contribute to encoding efficiency by enabling compact data representations in storage and transmission. In font subsetting, blocks allow tools to generate reduced-size font files containing only glyphs for relevant code point ranges, such as Basic Latin for English text, thereby optimizing load times and memory usage in and application contexts. For databases, block-aware indexing supports efficient querying and of internationalized text, reducing overhead in multilingual storage systems. A key challenge in software handling of Unicode blocks is managing unassigned code points, which are reserved within blocks for future allocations to maintain . Implementations must assign default properties to these points—such as neutral category or no-break behavior—to ensure algorithms like or segmentation function correctly without errors, even as Unicode evolves. The Blocks.txt file from the Unicode Character Database serves as the authoritative source for defining these ranges, guiding software updates to accommodate new assignments.

Font and rendering support

Font formats such as and rely on the 'cmap' table to map code points to indices, with subtables organized by encoding schemes that often align with blocks for efficient lookup. Format 13 subtables in the 'cmap' table are designed specifically for scenarios where an entire block maps to a single , commonly used in last-resort fonts to provide basic visual representation for unsupported ranges. Rendering pipelines in modern systems leverage Unicode properties, including the Script property, to inform script detection and text shaping. HarfBuzz, an open-source shaping engine, uses the normative Unicode Script property to assign scripts and apply OpenType features for complex layouts, such as handling the 'COMMON' script with heuristics for accurate glyph positioning and ligature formation; while blocks aid in grouping code points, script assignment is determined by Unicode data files. Similarly, Apple's Core Text framework employs Unicode properties, particularly the Script property, to perform script-specific shaping, integrating with font tables for bidirectional text and contextual forms. Font coverage for Unicode blocks is often partial, as comprehensive support across all 300+ blocks in Unicode 17.0 would exceed practical file size limits for most typefaces. The font family, developed by , addresses this by providing glyphs for characters in over 150 blocks through its modular design, covering more than 77,000 characters while prioritizing commonly used scripts. For rare or specialized blocks, such as historical scripts or symbols, rendering engines implement fallback strategies, automatically substituting glyphs from secondary fonts to avoid blank or tofu representations. Tools for debugging font and rendering support include Unicode block viewers integrated into development environments. In , extensions like Unicode Code Point display code point details, including block affiliations, in the status bar to help identify missing glyphs and test rendering across fonts.

Current and historical blocks

List of blocks in Unicode 17.0

Unicode 17.0, released on September 9, 2025, partitions the Unicode code space of 1,114,112 possible code points into 346 contiguous blocks, each dedicated to characters from specific scripts, symbol categories, or reserved areas. These blocks facilitate efficient implementation in software and fonts by grouping related characters, with sizes varying from 16 code points (e.g., for small supplements) to 65,536 (full planes for private use). The distribution spans 17 planes, predominantly in Planes 0, 1, and 2, covering modern and historical scripts, emojis, and technical symbols. This version adds 4,803 new characters for a total of 159,801 assigned characters. This version introduces eight new blocks, including four for recently encoded scripts: Sidetic (U+10940–U+1095F, 32 code points, Plane 1, Sidetic script), Tolong Siki (U+11DB0–U+11DEF, 64 code points, Plane 1, Tolong Siki script), Beria Erfe (U+16EA0–U+16EDF, 64 code points, Plane 1, Beria Erfe script), and Tai Yo (U+1E6C0–U+1E6FF, 64 code points, Plane 1, Tai Yo script). Other new blocks are Sharada Supplement (U+11B60–U+11B7F, 32 code points, Plane 1, Sharada), Tangut Components Supplement (U+18D80–U+18DFF, 128 code points, Plane 1, Tangut), Miscellaneous Symbols Supplement (U+1CEC0–U+1CEFF, 64 code points, Plane 1, symbols including asteroids), and CJK Unified Ideographs Extension J (U+323B0–U+3347F, 4,352 code points, Plane 3, CJK). Additionally, existing blocks like CJK Unified Ideographs and various emoji ranges received expansions, such as new characters in Emoji 17.0 for diverse representations. Key examples illustrate the diversity: the Basic Latin block (U+0000..U+007F, 128 code points, Plane 0) supports core ASCII characters for the ; the block (U+4E00..U+9FFF, 20,992 code points, Plane 0) encodes shared ideographs for , , and ; and emoji blocks like (U+1F300..U+1F5FF, 768 code points, Plane 1) include thousands of pictorial symbols, with recent additions enhancing inclusivity. Within blocks, gaps exist where code points remain unassigned or reserved, as denoted in official data files (e.g., non-character code points or future allocations). The comprehensive reference is provided in the table below, listing all blocks by ascending code point order. The "Primary Script/Association" column indicates the main category, such as a script name or symbol type; many blocks support multiple uses, but the primary is noted for context. This table includes selected blocks for brevity; the full list of 346 blocks is available in the official Unicode Character Database.
Block NameHex RangeSizePlanePrimary Script/Association
Basic LatinU+0000..U+007F1280Latin
Latin-1 SupplementU+0080..U+00FF1280Latin
Latin Extended-AU+0100..U+017F1280Latin
Latin Extended-BU+0180..U+024F2080Latin
IPA ExtensionsU+0250..U+02AF960Phonetic (IPA)
Spacing Modifier LettersU+02B0..U+02FF800Modifier letters
Combining Diacritical MarksU+0300..U+036F1120Diacritics
Greek and CopticU+0370..U+03FF1440Greek
CyrillicU+0400..U+04FF2560Cyrillic
Cyrillic SupplementU+0500..U+052F480Cyrillic
ArmenianU+0530..U+058F960Armenian
HebrewU+0590..U+05FF1120Hebrew
ArabicU+0600..U+06FF2560Arabic
SyriacU+0700..U+074F800Syriac
Arabic SupplementU+0750..U+077F480Arabic
ThaanaU+0780..U+07BF640Thaana
NKoU+07C0..U+07FF640N'Ko
SamaritanU+0800..U+083F640Samaritan
MandaicU+0840..U+085F320Mandaic
Syriac SupplementU+0860..U+086F160Syriac
Arabic Extended-BU+0870..U+089F480Arabic
Arabic Extended-AU+08A0..U+08FF960Arabic
DevanagariU+0900..U+097F1280Devanagari
BengaliU+0980..U+09FF1280Bengali
GurmukhiU+0A00..U+0A7F1280Gurmukhi
GujaratiU+0A80..U+0AFF1280Gujarati
OriyaU+0B00..U+0B7F1280Odia
TamilU+0B80..U+0BFF1280Tamil
TeluguU+0C00..U+0C7F1280Telugu
KannadaU+0C80..U+0CFF1280Kannada
MalayalamU+0D00..U+0D7F1280Malayalam
SinhalaU+0D80..U+0DFF1280Sinhala
ThaiU+0E00..U+0E7F1280Thai
LaoU+0E80..U+0EFF1280Lao
TibetanU+0F00..U+0FFF2560Tibetan
MyanmarU+1000..U+109F1600Myanmar
GeorgianU+10A0..U+10FF960Georgian
Hangul JamoU+1100..U+11FF2560Hangul
EthiopicU+1200..U+137F3840Ethiopic
Ethiopic SupplementU+1380..U+139F320Ethiopic
CherokeeU+13A0..U+13FF960Cherokee
Unified Canadian Aboriginal SyllabicsU+1400..U+167F6400Canadian Aboriginal
OghamU+1680..U+169F320Ogham
RunicU+16A0..U+16FF960Runic
TagalogU+1700..U+171F320Tagalog
HanunooU+1720..U+173F320Hanunoo
BuhidU+1740..U+175F320Buhid
TagbanwaU+1760..U+177F320Tagbanwa
KhmerU+1780..U+17FF1280Khmer
MongolianU+1800..U+18AF1760Mongolian
Unified Canadian Aboriginal Syllabics ExtendedU+18B0..U+18FF800Canadian Aboriginal
LimbuU+1900..U+194F800Limbu
Tai LeU+1950..U+197F480Tai Le
New Tai LueU+1980..U+19DF960New Tai Lue
Khmer SymbolsU+19E0..U+19FF320Khmer
BugineseU+1A00..U+1A1F320Buginese
Tai ThamU+1A20..U+1AAF1440Tai Tham
Combining Diacritical Marks ExtendedU+1AB0..U+1AFF800Diacritics
BalineseU+1B00..U+1B7F1280Balinese
SundaneseU+1B80..U+1BBF640Sundanese
BatakU+1BC0..U+1BFF640Batak
LepchaU+1C00..U+1C4F800Lepcha
Ol ChikiU+1C50..U+1C7F480Ol Chiki
Cyrillic Extended-CU+1C80..U+1C8F160Cyrillic
Georgian ExtendedU+1C90..U+1CBF480Georgian
Sundanese SupplementU+1CC0..U+1CCF160Sundanese
Vedic ExtensionsU+1CD0..U+1CFF480Devanagari (Vedic)
Phonetic ExtensionsU+1D00..U+1D7F1280Phonetic
Phonetic Extensions SupplementU+1D80..U+1DBF640Phonetic
Combining Diacritical Marks SupplementU+1DC0..U+1DFF640Diacritics
Latin Extended AdditionalU+1E00..U+1EFF2560Latin
Greek ExtendedU+1F00..U+1FFF2560Greek
General PunctuationU+2000..U+206F1120Punctuation
Superscripts and SubscriptsU+2070..U+209F480Symbols
Currency SymbolsU+20A0..U+20CF480Currency
Combining Diacritical Marks for SymbolsU+20D0..U+20FF480Diacritics
Letterlike SymbolsU+2100..U+214F800Symbols
Number FormsU+2150..U+218F640Numbers
ArrowsU+2190..U+21FF1120Arrows
Mathematical OperatorsU+2200..U+22FF2560Math
Miscellaneous TechnicalU+2300..U+23FF2560Technical
Control PicturesU+2400..U+243F640Controls
Optical Character RecognitionU+2440..U+245F320OCR
Enclosed AlphanumericsU+2460..U+24FF1600Enclosed
Box DrawingU+2500..U+257F1280Box drawing
Block ElementsU+2580..U+259F320Blocks
Geometric ShapesU+25A0..U+25FF960Geometric
Miscellaneous SymbolsU+2600..U+26FF2560Symbols
DingbatsU+2700..U+27BF1920Dingbats
Miscellaneous Mathematical Symbols-AU+27C0..U+27EF480Math
Supplemental Arrows-AU+27F0..U+27FF160Arrows
Braille PatternsU+2800..U+28FF2560Braille
Supplemental Arrows-BU+2900..U+297F1280Arrows
Miscellaneous Mathematical Symbols-BU+2980..U+29FF1280Math
Supplemental Mathematical OperatorsU+2A00..U+2AFF2560Math
Miscellaneous Symbols and ArrowsU+2B00..U+2BFF2560Symbols/Arrows
GlagoliticU+2C00..U+2C5F960Glagolitic
Latin Extended-CU+2C60..U+2C7F320Latin
CopticU+2C80..U+2CFF1280Coptic
Georgian SupplementU+2D00..U+2D2F480Georgian
TifinaghU+2D30..U+2D7F800Tifinagh
Ethiopic ExtendedU+2D80..U+2DDF960Ethiopic
Cyrillic Extended-AU+2DE0..U+2DFF320Cyrillic
Supplemental PunctuationU+2E00..U+2E7F1280Punctuation
CJK Radicals SupplementU+2E80..U+2EFF1280CJK
Kangxi RadicalsU+2F00..U+2FDF2140CJK
Ideographic Description CharactersU+2FF0..U+2FFF160CJK
CJK Symbols and PunctuationU+3000..U+303F640CJK
HiraganaU+3040..U+309F960Japanese (Hiragana)
KatakanaU+30A0..U+30FF960Japanese (Katakana)
BopomofoU+3100..U+312F480Bopomofo
Hangul Compatibility JamoU+3130..U+318F960Hangul
KanbunU+3190..U+319F160Kanbun
Bopomofo ExtendedU+31A0..U+31BF320Bopomofo
CJK StrokesU+31C0..U+31EF480CJK
Katakana Phonetic ExtensionsU+31F0..U+31FF160Katakana
Enclosed CJK Letters and MonthsU+3200..U+32FF2560Enclosed CJK
CJK CompatibilityU+3300..U+33FF2560CJK compatibility
CJK Unified Ideographs Extension AU+3400..U+4DBF6,5920CJK
Yijing Hexagram SymbolsU+4DC0..U+4DFF640Yijing
CJK Unified IdeographsU+4E00..U+9FFF20,9920CJK
Yi SyllablesU+A000..U+A48F1,1680Yi
Yi RadicalsU+A490..U+A4CF640Yi
LisuU+A4D0..U+A4FF480Lisu
VaiU+A500..U+A63F3520Vai
Cyrillic Extended-BU+A640..U+A69F960Cyrillic
BamumU+A6A0..U+A6FF960Bamum
Modifier Tone LettersU+A700..U+A71F320Phonetic
Latin Extended-DU+A720..U+A7FF2400Latin
Syloti NagriU+A800..U+A82F480Syloti Nagri
Common Indic Number FormsU+A830..U+A83F160Indic numbers
Phags-paU+A840..U+A87F640Phags-pa
SaurashtraU+A880..U+A8DF960Saurashtra
Devanagari ExtendedU+A8E0..U+A8FF320Devanagari
Kayah LiU+A900..U+A92F480Kayah Li
RejangU+A930..U+A95F480Rejang
Hangul Jamo Extended-AU+A960..U+A97F320Hangul
JavaneseU+A980..U+A9DF960Javanese
Myanmar Extended-BU+A9E0..U+A9FF320Myanmar
ChamU+AA00..U+AA5F960Cham
Myanmar Extended-AU+AA60..U+AA7F320Myanmar
Tai VietU+AA80..U+AADF1200Tai Viet
Meetei Mayek ExtensionsU+AAE0..U+AAFF320Meetei Mayek
Ethiopic Extended-AU+AB00..U+AB2F480Ethiopic
Latin Extended-EU+AB30..U+AB6F640Latin
Meetei MayekU+ABC0..U+ABFF640Meetei Mayek
Hangul SyllablesU+AC00..U+D7AF11,1720Hangul
Hangul Jamo Extended-BU+D7B0..U+D7FF800Hangul
High SurrogatesU+D800..U+DB7F20480Surrogates
High Private Use SurrogatesU+DB80..U+DBFF1280Surrogates
Low SurrogatesU+DC00..U+DFFF20480Surrogates
Private Use AreaU+E000..U+F8FF6,4000Private use
CJK Compatibility IdeographsU+F900..U+FAFF2560CJK compatibility
Alphabetic Presentation FormsU+FB00..U+FB4F800Presentation forms
Arabic Presentation Forms-AU+FB50..U+FDFF6880Arabic
Variation SelectorsU+FE00..U+FE0F160Variation selectors
Vertical FormsU+FE10..U+FE1F160Vertical
Combining Half MarksU+FE20..U+FE2F160Half marks
CJK Compatibility FormsU+FE30..U+FE4F320CJK compatibility
Small Form VariantsU+FE50..U+FE6F320Variants
Arabic Presentation Forms-BU+FE70..U+FEFF1440Arabic
Halfwidth and Fullwidth FormsU+FF00..U+FFEF2400Halfwidth/fullwidth
SpecialsU+FFF0..U+FFFF160Special
Linear B SyllabaryU+10000..U+1007F1281Linear B
Linear B IdeogramsU+10080..U+100FF1281Linear B
Aegean NumbersU+10100..U+1013F641Aegean
Ancient Greek NumbersU+10140..U+1018F801Greek
Ancient SymbolsU+10190..U+101CF641Ancient
Phaistos DiscU+101D0..U+101FF481Phaistos
LycianU+10280..U+1029F321Lycian
CarianU+102A0..U+102DF641Carian
Coptic Epact NumbersU+102E0..U+102FF321Coptic
Old ItalicU+10300..U+1032F481Italic
GothicU+10330..U+1034F321Gothic
Old PermicU+10350..U+1037F481Permic
UgariticU+10380..U+1039F321Ugaritic
Old PersianU+103A0..U+103DF641Old Persian
DeseretU+10400..U+1044F801Deseret
ShavianU+10450..U+1047F481Shavian
OsmanyaU+10480..U+104AF481Osmanya
OsageU+104B0..U+104FF801Osage
ElbasanU+10500..U+1052F481Elbasan
Caucasian AlbanianU+10530..U+1056F641Caucasian Albanian
VithkuqiU+10570..U+105BF801Vithkuqi
Linear AU+10600..U+1077F3841Linear A
Cypriot SyllabaryU+10800..U+1083F641Cypriot
Imperial AramaicU+10840..U+1085F321Aramaic
PalmyreneU+10860..U+1087F321Palmyrene
NabataeanU+10880..U+108AF481Nabataean
HatranU+108E0..U+108FF321Hatran
PhoenicianU+10900..U+1091F321Phoenician
LydianU+10920..U+1093F321Lydian
SideticU+10940..U+1095F321Sidetic
Meroitic HieroglyphsU+10980..U+1099F321Meroitic
Meroitic CursiveU+109A0..U+109FF961Meroitic
KharoshthiU+10A00..U+10A5F961Kharoshthi
Old South ArabianU+10A60..U+10A7F321South Arabian
Old North ArabianU+10A80..U+10A9F321North Arabian
ManichaeanU+10AC0..U+10AFF641Manichaean
AvestanU+10B00..U+10B3F641Avestan
Inscriptional ParthianU+10B40..U+10B5F321Parthian
Inscriptional PahlaviU+10B60..U+10B7F321Pahlavi
Psalter PahlaviU+10B80..U+10BAF481Pahlavi
Old TurkicU+10C00..U+10C4F801Old Turkic
Old HungarianU+10C80..U+10CFF1281Hungarian
Hanifi RohingyaU+10D00..U+10D3F641Hanifi Rohingya
YezidiU+10E60..U+10E7F321Yezidi
Cypro-MinoanU+12F90..U+12FFF1121Cypro-Minoan
Egyptian HieroglyphsU+13000..U+1342F1,0721Egyptian Hieroglyphs
Anatolian HieroglyphsU+14400..U+1467F5821Anatolian Hieroglyphs
Bamum SupplementU+16800..U+16A3F3521Bamum
MroU+16A40..U+16A6F481Mro
TangsaU+16A70..U+16ACF961Tangsa
Bassa VahU+16AD0..U+16AFF481Bassa Vah
Pahawh HmongU+16B00..U+16B8F1441Pahawh Hmong
MedefaidrinU+16E40..U+16E7F641Medefaidrin
MiaoU+16F00..U+16F9F1601Miao
Beria ErfeU+16EA0..U+16EDF641Beria Erfe
Ideographic Symbols and PunctuationU+16FE0..U+16FFF321CJK symbols
TangutU+17000..U+187FF6,1441Tangut
Khitan Small ScriptU+18A00..U+18AFF2561Khitan
Tangut ComponentsU+18800..U+18AFF7681Tangut
Tangut Components SupplementU+18D80..U+18DFF1281Tangut
Kana SupplementU+1B000..U+1B0FF2561Kana
Kana Extended-AU+1B100..U+1B12F481Kana
NushuU+1B170..U+1B2FF2401Nushu
DuployanU+1BC00..U+1BC9F1601Duployan
Shorthand Format ControlsU+1BCA0..U+1BCAF161Shorthand
Znamenny Musical NotationU+1CF00..U+1CF9F1601Musical notation
Byzantine Musical SymbolsU+1D000..U+1D0FF2561Musical symbols
Musical SymbolsU+1D100..U+1D1FF2561Musical symbols
Ancient Greek Musical NotationU+1D200..U+1D24F801Musical notation
Tai Xuan Jing SymbolsU+1D300..U+1D35F961Symbols
Counting Rod NumeralsU+1D360..U+1D37F321Numerals
Mayan NumeralsU+1D360..U+1D37F321Mayan
Kaktovik NumeralsU+1D2E0..U+1D2FF321Numerals
Mathematical Alphanumeric SymbolsU+1D400..U+1D7FF10241Math
Sutton SignWritingU+1D800..U+1DAAF6661SignWriting
Latin Extended-FU+1DF00..U+1DFFF2561Latin
NandinagariU+1E2F0..U+1E2FF161Nandinagari
WanchoU+1E2C0..U+1E2EF481Wancho
Tai YoU+1E6C0..U+1E6FF641Tai Yo
Ethiopic Extended-BU+1E7E0..U+1E7FF321Ethiopic
Mende KikakuiU+1E800..U+1E8DF2401Mende Kikakui
AdlamU+1E900..U+1E95F961Adlam
Indic Siyaq NumbersU+1EC90..U+1ECBF481Indic numbers
Ottoman Siyaq NumbersU+1ED01..U+1ED2D451Ottoman numbers
Arabic Mathematical Alphabetic SymbolsU+1EE00..U+1EEFF2561Arabic math
Mahjong TilesU+1F000..U+1F02F481Mahjong
Domino TilesU+1F030..U+1F09F1121Dominoes
Playing CardsU+1F0A0..U+1F0FF961Cards
Enclosed Alphanumeric SupplementU+1F100..U+1F1FF2561Enclosed
Enclosed Ideographic SupplementU+1F200..U+1F2FF2561Enclosed CJK
Miscellaneous Symbols and PictographsU+1F300..U+1F5FF7681Emojis/Symbols
EmoticonsU+1F600..U+1F64F801Emojis
Ornamental DingbatsU+1F650..U+1F67F481Dingbats
Transport and Map SymbolsU+1F680..U+1F6FF1281Symbols
Alchemical SymbolsU+1F700..U+1F77F1281Alchemical
Geometric Shapes ExtendedU+1F780..U+1F7FF1281Geometric
Supplemental Arrows-CU+1F800..U+1F8FF2561Arrows
Supplemental Symbols and PictographsU+1F900..U+1F9FF2561Emojis/Symbols
Chess SymbolsU+1FA00..U+1FA6F1121Chess
Symbols and Pictographs Extended-AU+1FA70..U+1FAFF1441Emojis/Symbols
Miscellaneous Symbols SupplementU+1CEC0..U+1CEFF641Symbols
Sharada SupplementU+11B60..U+11B7F321Sharada
CJK Unified Ideographs Extension BU+20000..U+2A6DF42,7202CJK
CJK Unified Ideographs Extension CU+2A700..U+2B73F4,1482CJK
CJK Unified Ideographs Extension DU+2B740..U+2B81F2222CJK
CJK Unified Ideographs Extension EU+2B820..U+2CEAF5,8762CJK
CJK Unified Ideographs Extension FU+2CEB0..U+2EBEF7,4732CJK
CJK Compatibility Ideographs SupplementU+2F800..U+2FA1F5442CJK compatibility
CJK Unified Ideographs Extension GU+30000..U+3134F4,9393CJK
CJK Unified Ideographs Extension JU+323B0..U+3347F4,3523CJK
TagsU+E0000..U+E007F12814Tags
Variation Selectors SupplementU+E0100..U+E01EF24014Variation selectors
Supplementary Private Use Area-AU+F0000..U+FFFFF65,53615Private use
Supplementary Private Use Area-BU+100000..U+10FFFF65,53616Private use
No Block(various unassigned ranges across Planes 3–13 and 17)VariesVariesUnassigned

Moved and reallocated blocks

In the early development of the Unicode Standard, prior to version 2.0, certain character blocks underwent relocation to better accommodate encoding models and compatibility requirements. One notable example is the Tibetan script, which was initially allocated in Unicode 1.0 to the range U+1000–U+105F using a virama-based model that proved inadequate for distinguishing certain linguistic features. This block was removed in Unicode 1.1 and reintroduced in Unicode 2.0 at U+0F00–U+0FFF with a revised encoding approach that supported stacked consonants and improved representation of Tibetan orthography. Similarly, the Hangul Syllables block, present in Unicode 1.1 at U+3400–U+4DFF (including supplementary areas), was relocated in Unicode 2.0 to U+AC00–U+D7AF to align with algorithmic generation of the 11,172 precomposed syllables and enhance compatibility with Korean standards like KS X 1001. Other reallocations in this period included adjustments to the , which were refined across Unicode 1.0 to 2.0 to standardize the range at U+E000–U+F8FF while reserving supplementary planes for future expansion, ensuring interoperability without disrupting legacy private assignments. Early CJK extensions also saw shifts for ; for instance, the CJK Compatibility Ideographs block (U+F900–U+FAFF) was introduced in Unicode 1.0.1 with 302 characters to provide round-trip mappings to variant forms in legacy East Asian encodings like , preventing data loss during conversion. To maintain long-term , the established a policy starting with that prohibits the movement or removal of encoded characters, guaranteeing fixed assignments thereafter. This encoding policy ensures that once assigned, ranges remain immutable, supporting reliable software development and data preservation across versions. These historical shifts necessitated compatibility mappings in implementations to handle legacy data; for example, software must map pre-2.0 s to their modern equivalents using tables like those in Unicode Technical Note #60, which detail transformations for the relocated syllables to avoid rendering errors in mixed-version environments. Such measures preserve the integrity of archived texts while adhering to the post-2.0 framework.

References

  1. [1]
    None
    Below is a merged summary of Unicode Blocks from Section 3.4, Characters and Encoding (Unicode Standard 17.0), consolidating all information from the provided segments into a concise yet comprehensive response. To handle the volume of details efficiently, I will use a structured format with text for the core summary and a table in CSV format for notable aspects, quotes, and URLs, ensuring all information is retained. The total number of blocks (336 in Unicode 17.0) is included where explicitly mentioned or inferred from reliable sources like Unicode.org.
  2. [2]
    Blocks.txt - Unicode
    # Note: When comparing block names, casing, whitespace, hyphens, # and underbars are ignored. # For example, "Latin Extended-A" and "latin extended a" are ...
  3. [3]
    UAX #44: Unicode Character Database
    Aug 27, 2025 · This annex provides the core documentation for the Unicode Character Database (UCD). It describes the layout and organization of the Unicode Character Database.
  4. [4]
  5. [5]
    About the Code Charts - Unicode
    Characters are organized into related groups called blocks (see D10b in Section 3.4, Characters and Encoding). Many scripts are fully contained within a single ...
  6. [6]
    Unicode® Code Charts Help and Links
    The Unicode character code charts are divided into character blocks, such that many scripts require the use of several blocks. In such cases, all blocks related ...
  7. [7]
  8. [8]
    Unicode 1.0
    Jul 15, 2015 · 1.0 Chapters. 1, Introduction. 2, General Principles. 3, Character Blocks. 3.1, General Scripts. 3.2, Symbols. 3.3, CJK Phonetics and Symbols.
  9. [9]
    [PDF] 1.0 Unification of the Unicode Standard - and ISO 10646
    Meetings in October of 1991 finally resulted in mutually acceptable changes to both Unicode and the ISO Draft International Standard (DIS) 10646 which merged ...
  10. [10]
    [PDF] To the BMP and beyond! - Unicode
    The ISO 10646 standard defines the UCS-2 encoding form. For all practical purposes, it is UTF-16 restricted to the BMP, i.e. to those scalar values in the ...Missing: rationale | Show results with:rationale
  11. [11]
    [PDF] Unicode Standard Version 1.0
    A complete Coptic set would be obtained by rendering the whole Greek alphabet in that same style. Encoding Structure. The Unicode block for the Greek script is ...
  12. [12]
    Chapter 2 – Unicode 16.0.0
    For more about deprecated characters, see D13 in Section 3.4, Characters and Encoding. Unicode character names are also never changed, so that they can be used ...
  13. [13]
  14. [14]
    UTC Document Register 1991 - Unicode
    UTC/1991-003R1, Unicode: A Technical Introduction, 28 January 1991, Bill ... My (Updated) Comments on Unicode 1.0 draft (12/90) and Conformance (1/91) ...
  15. [15]
  16. [16]
  17. [17]
    Blocks-1.txt - Unicode
    ... Number Forms 2190; 21FF; Arrows 2200; 22FF; Mathematical Operators 2300 ... Block Elements 25A0; 25FF; Geometric Shapes 2600; 26FF; Miscellaneous ...
  18. [18]
    Unicode 3.0.0
    ### Summary of Key Changes in Unicode 3.0
  19. [19]
    Unicode 5.0.0
    ### Summary of Key Changes in Unicode 5.0
  20. [20]
    Unicode 6.0.0
    ### Summary of Key Changes in Unicode 6.0
  21. [21]
    Unicode 8.0.0
    ### Summary of New Scripts and Blocks in Unicode 8.0 (Underrepresented Scripts)
  22. [22]
    Unicode 17.0.0
    Sep 9, 2025 · This page summarizes the important changes for the Unicode Standard, Version 17.0.0. This version supersedes all previous versions of the Unicode Standard.
  23. [23]
    None
    ### Summary of Blocks.txt (Unicode 17.0.0)
  24. [24]
    Submitting Character Proposals - Unicode
    Those considering submitting a proposal should first determine whether or not a particular script or character has already been proposed. Please see the ...General Information · Proposal Guidelines · Proposal Review Process
  25. [25]
    PropertyValueAliases.txt - Unicode
    ... property values used in the UCD. # These names can be used for XML formats of UCD data, for regular-expression # property tests, and other programmatic ...
  26. [26]
    UAX #24: Unicode Script Property
    ### Summary of Unicode Script Property and Relation to Unicode Blocks
  27. [27]
    Plane “Basic Multilingual Plane” - Compart
    The Basic Multilingual Plane (BMP) ranges from U+0000 to U+FFFF, has 163 blocks, 63,951 assigned characters, and 55,503 defined characters.Unicode Block “Basic Latin” · Unicode Block “Latin-1... · Unicode Block “Specials”
  28. [28]
    Unicode Supplementary Multilingual Plane (SMP) Language blocks
    Supplementary Multilingual Plane (SMP) of Unicode includes 134 Blocks covering code points from U+10000 to U+1FFFF. This table lists 111 blocks that relate ...
  29. [29]
    Character.UnicodeBlock (Java Platform SE 8 ) - Oracle Help Center
    Character blocks generally define characters used for a specific script or purpose. A character is contained by at most one Unicode block. Since: 1.2. Field ...
  30. [30]
    unicodedata — Unicode Database — Python 3.14.0 documentation
    This module provides access to the Unicode Character Database (UCD) which defines character properties for all Unicode characters.Missing: block | Show results with:block
  31. [31]
    The Unicode standard - Globalization | Microsoft Learn
    Feb 2, 2024 · The Unicode Standard contains strict rules for determining the equivalence of composed characters to combining character sequences.Missing: rationale early goals extensibility
  32. [32]
    uniblock: Scoring and Filtering Corpus with Unicode Block Information
    In this paper, we introduce a simple statistical method, uniblock, to overcome this problem. For each sentence, uniblock generates a fixed-size feature vector ...
  33. [33]
    Optimize font files with font subsetting - Unity - Manual
    It allows you to create a smaller font file by including only the specific characters or glyphs you need, which helps reduce file size and improve performance.
  34. [34]
    UTR#17: Unicode Character Encoding Model
    Sep 20, 2022 · Summary. This document clarifies a number of the terms used to describe character encodings, and where the different forms of Unicode fit in ...
  35. [35]
    Unassigned Characters - Unicode
    To work properly in implementations, unassigned code points must be given default properties as if they were characters, since various algorithms require ...
  36. [36]
    cmap - Character To Glyph Index Mapping Table (OpenType 1.9.1)
    May 29, 2024 · This table defines the mapping of character codes to a default glyph index. Different subtables may be defined that each contain mappings for different ...Missing: blocks | Show results with:blocks
  37. [37]
    Character to Glyph Mapping Table - TrueType Reference Manual
    The 'cmap' table maps character codes to glyph indices. The choice of encoding for a particular font is dependent upon the conventions used by the intended ...
  38. [38]
    Noto Home - Google Fonts
    Noto is a collection of high-quality fonts in more than 1000 languages and over 150 writing systems.Missing: block coverage
  39. [39]
    Font fallback deep dive | Raph Levien's blog
    Apr 4, 2019 · One of the main functions of skribo is “font fallback,” or choosing fonts to render an arbitrary string of text. This post is a deep dive into the topic.<|control11|><|separator|>
  40. [40]
    Unicode Code Point - Visual Studio Marketplace
    This extension shows character code point in status bar. Features Shows character code point in hexadecimal or decimal notation.
  41. [41]
    Unicode 17.0 Character Code Charts
    Unicode 17.0 includes European scripts, symbols & punctuation, and a name index. Scripts include Armenian, Cyrillic, Greek, Latin, and more. Symbols include ...Help and Links · Name Index · Unihan Database Lookup
  42. [42]
    [PDF] Unicode Version 1.0 Appendix E: Table of Contents to Volume 1
    Tibetan U+1000→ U+105F. Hebrew U+0590→ U+05FF. Arabic U+0600→ U+06FF ... Pictures for Control Codes U+2400→→ U+243F. Optical Character Recognition U+ ...
  43. [43]
  44. [44]
    Unicode Mail List Archive: Re-assignment of Hangul characters
    Sep 27, 1995 · Re-assignment of Hangul characters · 1. Can anyone confirm that the allocation of U+AC00 onwards for Unicode 2.0 is set in concrete (or, how firm ...
  45. [45]
    CJK Compatibility Ideographs - Unicode
    This block, despite its name, contains a number of unified CJK ideographs. Each is also individually identified by an annotation.
  46. [46]
    PD UTS: Unicode Set Notation
    The encoding stability policy, applicable to Unicode 2.0+, states that. Once a character is encoded, it will not be moved or removed. This policy implies ...
  47. [47]
    UTN #60: Legacy Hangul Syllables Mappings - Unicode
    Sep 5, 2024 · Summary. This Unicode Technical Note provides historical mappings between the Unicode code points for the 11,172 characters in the Hangul ...Missing: Tibetan pre- 2.0