Unicode block
A Unicode block is a contiguous range of code points in the Unicode codespace, normatively defined to organize characters associated with a particular script, symbol set, or functional category, such as the Basic Latin block (U+0000–U+007F) for standard ASCII characters or the CJK Unified Ideographs block (U+4E00–U+9FFF) for Han characters.[1] These blocks serve as the primary structural units for encoding and grouping 159,801 assigned characters across 336 blocks in Unicode 17.0, facilitating efficient text processing, rendering, and implementation in software and fonts.[1][2] Unicode blocks were introduced as part of the Unicode Standard's design to divide the 1,114,112-code-point codespace into manageable, non-overlapping ranges, each with a unique, case-insensitive name that ignores variations in whitespace, hyphens, or underscores for matching purposes.[1] While most blocks align with a single script or symbol collection—such as the Arabic block (U+0600–U+06FF) for right-to-left text or the Musical Symbols block (U+1D100–U+1D1FF) for notation—some scripts like Latin span multiple blocks to accommodate extensions for diacritics, historical forms, and regional variants.[1] Blocks may include unassigned or reserved code points for future expansions, and their boundaries typically align with multiples of 16 in the hexadecimal code point system to optimize storage and lookup.[1] The exact allocation and names are specified in the Unicode Character Database's Blocks.txt file, ensuring stability and interoperability across versions of the standard.[2] In practice, Unicode blocks underpin key aspects of internationalization, including font design, input methods, and algorithms like the Unicode Bidirectional Algorithm for mixed-script text.[1] They enable developers to reference character groups programmatically via the normative Block property, though this property alone does not determine semantic or behavioral traits—such as directionality or combining class—which require consultation of the full Unicode Character Database.[3] For unassigned code points outside defined blocks, the default Block value is "No_Block," highlighting the standard's forward-compatible design.[1] This organization supports 172 scripts and diverse symbol sets, from ancient systems like Anatolian Hieroglyphs (U+14400–U+1467F) to modern emojis in the Emoticons block (U+1F600–U+1F64F), promoting global digital communication.[1]Overview
Definition
A Unicode block is a named grouping of contiguous code points within the Unicode codespace, designed to organize characters for reference and documentation purposes by the Unicode Consortium. These blocks consist of non-overlapping ranges that partition the assignable code points, excluding the surrogate range U+D800–U+DFFF reserved for encoding mechanisms like UTF-16.[4] The size of a Unicode block is always a multiple of 16 code points and can vary from the minimum of 16 to the maximum of 65,536 code points, aligning with the structure of Unicode planes for efficient allocation. For instance, the Basic Latin block occupies the range U+0000–U+007F, encompassing 128 code points for fundamental Latin script characters.[2] Each block receives a descriptive name reflecting its typical contents, such as "Latin Extended Additional" or "Enclosed Alphanumerics," though the grouping is informal and non-semantic, without implying specific properties or behaviors for the characters it contains. In Unicode 17.0, 336 blocks are defined, collectively spanning the Unicode codespace excluding the surrogate range (totaling 1,112,064 code points from U+0000 to U+10FFFF across the 17 planes).[4][2]Purpose
Unicode blocks provide a structured mechanism for grouping related characters within the Unicode codespace, enabling intuitive organization by script, symbol type, or thematic category. For instance, the "CJK Unified Ideographs" block encompasses characters shared across Chinese, Japanese, and Korean writing systems, facilitating easier reference and comprehension of the standard's expansive character set. This grouping establishes a coarse but practical framework for managing 159,801 assigned characters as of Unicode 17.0, promoting accessibility in documentation and implementation.[5][3] These blocks support key practical utilities in software development and user interfaces. In font design, thematic chunks allow developers to prioritize glyph creation for specific categories, streamlining the support for diverse scripts and symbols without requiring comprehensive coverage of the entire repertoire. Similarly, blocks enhance searching functionalities and input methods by enabling categorized browsing in tools like character maps, where users can select from predefined groups rather than scanning individual code points.[6][3] The normative Block property, detailed in the Unicode Character Database, further empowers efficient text processing and storage. In databases and text processors, it permits filtering operations to isolate characters by block, optimizing queries and segmentation for applications such as internationalization and data analysis. For regular expressions, block-based patterns offer a quick means to match sets of characters, though with caveats regarding precision. Notably, this property's non-binding nature means blocks do not enforce semantic rules but instead bolster user-oriented features like character pickers, providing organizational aids independent of stricter script alignments—though blocks frequently correspond to scripts, with some spanning multiple blocks for broader coverage.[3][7]History
Early development
The Unicode Standard version 1.0, released in October 1991, introduced the concept of character blocks as a means to organize code points within its 16-bit encoding space, specifically the Basic Multilingual Plane (BMP) spanning U+0000 to U+FFFF.[8] This structure was directly inspired by the design of plane 0 in the emerging ISO/IEC 10646 standard, aligning Unicode's initial repertoire with international efforts to create a universal character encoding.[9] The BMP was prioritized to contain the most commonly used characters, facilitating efficient implementation in software and hardware while reserving space for future growth.[10] Unicode 1.0 defined an initial set of 15 blocks, emphasizing major world scripts to provide broad multilingual support from the outset. These included blocks for Latin (Basic Latin, Latin-1 Supplement, and extensions), Greek, Cyrillic, Armenian, Hebrew, Arabic, Thai, Georgian, Hangul Jamo, and provisions for CJK ideographs, which were detailed in a supplementary volume.[8] This selection reflected a strategic focus on scripts with significant global usage, enabling the encoding of text from diverse languages without immediate fragmentation.[11] The rationale behind this block-based organization stemmed from the Unicode Consortium's foundational objectives, established in 1991, to achieve compatibility with prevalent legacy encodings while supporting extensibility for unforeseen needs. For instance, the Basic Latin block (U+0000–U+007F) directly mirrored ASCII to ensure seamless integration with existing Western computing systems.[12] At the same time, the modular block design allowed for logical grouping of related characters, simplifying font development, searching, and collation processes, and providing a framework for adding new scripts without disrupting established allocations.[13] A pivotal event shaping this early framework was the 1991 merger between the Unicode initiative and ISO/IEC JTC1/SC2 efforts on 10646, following negotiations that resolved discrepancies in character repertoires and encoding forms. Meetings in October 1991 produced mutually acceptable adjustments, resulting in unified block planning that synchronized Unicode's BMP with ISO 10646's plane 0 and prevented divergent standards.[9][14] This collaboration ensured that block definitions would evolve cohesively, laying the groundwork for international adoption. Subsequent versions after 2.0 expanded this structure significantly, incorporating additional planes and scripts.[15]Evolution through versions
The Unicode Standard version 2.0, released in July 1996, significantly expanded the encoding space by formalizing support for multiple planes beyond the Basic Multilingual Plane (BMP), including the introduction of supplementary planes via surrogate mechanisms in UTF-16, and added the Hangul Syllables block (U+AC00–U+D7A3) to enable precomposed representation of over 11,000 Korean syllables.[16] This version established a foundational structure of approximately 68 blocks, facilitating broader script coverage while maintaining compatibility with prior encodings.[17] Subsequent versions built on this framework with targeted expansions. Unicode 3.0, published in September 2000, added 18 new blocks supporting 12 scripts, including historic ones such as Ogham (U+1680–U+169F) and Runic (U+16A0–U+16FF), alongside modern scripts like Cherokee (U+13A0–U+13FF) and Mongolian (U+1800–U+18AF), contributing to over 10,000 new characters overall.[18] Unicode 5.0, released in July 2006, further diversified symbol support by introducing new symbol blocks, while adding five new scripts including Phoenician (U+10900–U+1091F) and nine new blocks in total.[19] A pivotal shift occurred in Unicode 6.0 (October 2010), which incorporated the first dedicated Emoji blocks—Emoticons (U+1F600–U+1F64F) and the expanded Miscellaneous Symbols and Pictographs (U+1F300–U+1F5FF)—to accommodate over 1,000 symbolic characters for digital communication, reflecting growing needs in mobile and web technologies.[20] From Unicode 8.0 (June 2015) onward, the Consortium adopted an annual release cadence to accelerate inclusion of underrepresented scripts, such as Ahom (U+11700–U+1173F) and Multani (U+11280–U+112AF) in version 8.0, prioritizing linguistic diversity in regions like South Asia and Africa.[21] By Unicode 17.0, released in September 2025, the standard encompasses 336 blocks and incorporates 4,803 new characters across four additional scripts, including Sidetic and Tolong Siki, underscoring ongoing efforts to document endangered writing systems.[22] Overall growth has been substantial, evolving from roughly 15 blocks in version 1.0 to 336 today, with approximately 80% allocated to the BMP and Supplementary Multilingual Plane (SMP) to optimize common usage while reserving space for future extensions.[23] This progression highlights a pattern of incremental, script-focused additions, with occasional relocations of blocks to refine organization without disrupting compatibility.[3]Design and properties
Code point allocation
Unicode blocks are defined as fixed, non-overlapping ranges of code points within the Unicode codespace, which spans from U+0000 to U+10FFFF. Each block begins at a code point that is a multiple of 16 (such as U+0000 or U+0100) and ends 15 code points later in alignment, ensuring compatibility with code chart layouts that organize characters in 16-column grids. For instance, the Basic Latin block occupies U+0000–U+007F, encompassing 128 code points for fundamental Latin characters. This structure guarantees that blocks partition the codespace without gaps or overlaps, except for designated reservations, and unassigned code points fall under the "No_Block" category if not explicitly covered.[4][2] The allocation of code points into blocks is managed by the Unicode Consortium through a rigorous proposal process. Proposals for new scripts or characters are submitted using a standardized form, evaluated initially by technical officers, and then reviewed by the Unicode Technical Committee (UTC) to assess compatibility with existing encodings, script requirements, and stability policies. Approved allocations prioritize needs for specific writing systems while incorporating reservations, such as unassigned gaps within blocks, to accommodate future expansions without disrupting prior assignments. This process ensures that blocks are assigned in contiguous ranges aligned to the 16-code-point boundary, maintaining the overall organization of the standard.[24][4] Block sizes vary to accommodate diverse character repertoires: most blocks contain 256 code points (a full 16x16 grid), but smaller or larger ranges exist based on content density, such as 128 code points for many Latin extensions or 20,992 code points for the CJK Unified Ideographs block (U+4E00–U+9FFF) to support East Asian logographic scripts. The total allocatable codespace excludes the surrogate range U+D800–U+DFFF (2,048 code points), which is reserved for UTF-16 encoding mechanics and not assigned to characters. Additionally, certain noncharacter code points, like U+FDD0–U+FDEF and plane-ending U+FFFE–U+FFFF, are permanently unassigned for interoperability purposes.[2][4][12] The codespace is divided into 17 planes of 65,536 code points each, influencing block placement: Plane 0, the Basic Multilingual Plane (BMP, U+0000–U+FFFF), hosts blocks for widely used modern scripts like Latin, Greek, and Cyrillic to facilitate legacy system compatibility. Planes 1 through 16 cover supplementary areas, with Plane 1 (Supplementary Multilingual Plane, U+10000–U+1FFFF) for historic scripts and symbols, Planes 2 and 3 for additional CJK ideographs, Plane 14 for special-purpose characters, and Planes 15–16 reserved for private use. This plane-based organization guides block allocation, prioritizing the BMP for accessibility while reserving higher planes for less common or emerging content.[12][4] The Blocks property, part of the Unicode Character Database, normatively identifies the block containing any given code point, aiding in implementation and analysis.[4]Block metadata
The Block property is a catalog property in the Unicode Character Database (UCD) that assigns a string value to each code point, identifying the named block to which it belongs, such as "Basic Latin" for code points U+0000 to U+007F.[3] This property facilitates the organization and querying of characters by grouping them into contiguous ranges, independent of other properties like the General Category.[3] Metadata for Unicode blocks is primarily documented in the Blocks.txt file, which enumerates each block's hexadecimal range, official name, and the Unicode version (age) in which it was introduced, formatted as "start..end; Block Name # [Vmajor.minor]".[3] For instance, the entry for the Basic Latin block appears as "0000..007F; Basic Latin # [V1.1]", indicating its assignment in Unicode 1.1.[2] Unassigned code points default to the value "No_Block".[3] Unicode enforces a stability policy for blocks, guaranteeing that once a range is assigned to a block in a published version, it will not be changed, reallocated, or removed in future versions to ensure backward compatibility for implementations.[3] This policy, detailed in Section 2.3 of UAX #44, applies to both the ranges and the associated block names, preventing disruptions in software, fonts, and data processing that rely on these assignments.[3] Block names follow descriptive naming conventions, typically based on the primary script, symbol set, or functional category they represent, such as "Ethiopic" for the Ge'ez script or "Mathematical Operators" for symbolic characters.[3] Names are unique, use title case with hyphens or spaces for readability, and are ASCII-compatible to support legacy systems; for compatibility, abbreviations and long names are provided as aliases in the PropertyValueAliases.txt file.[25] The versioning of blocks, or "block age," is directly tied to the Unicode Standard version in which the block was first defined, denoted in formats like "7.0" for blocks added in Unicode 7.0.[3] This age information in Blocks.txt and related derived files like DerivedAge.txt allows developers to track the temporal evolution of the encoding space without affecting the stability of existing assignments.[3]Classifications and relations
Connection to scripts and categories
Unicode blocks serve as organizational ranges of code points within the Unicode standard, but they are distinct from other character properties such as the Script property and the General Category property, which are assigned on a per-character basis rather than applying to entire blocks.[3] This independence allows blocks to provide a coarse grouping mechanism for code points, while scripts and general categories enable finer-grained semantic analysis essential for text processing. For instance, the Script property identifies the writing system associated with a character, such as "Latin" or "Greek," and is defined in the Scripts.txt data file as a normative property.[26] In contrast, blocks like Basic Latin (U+0000–U+007F) are normative ranges specified in Blocks.txt, without inherent ties to these properties.[3] A key distinction arises in how scripts often span multiple blocks, illustrating their orthogonality to block boundaries. The Latin script, for example, extends across several blocks including Basic Latin, Latin-1 Supplement (U+0080–U+00FF), and Latin Extended-A (U+0100–U+017F), encompassing characters used in various Latin-based languages.[26] Similarly, the Greek script is largely contained within the dedicated Greek block (U+0370–U+03FF), providing a closer alignment in this case, though it may incorporate characters from the Common script category for shared elements like punctuation.[3] However, exceptions occur in blocks dedicated to symbols or ideographs, where multiple scripts may intermingle; for instance, the CJK Unified Ideographs block (U+4E00–U+9FFF) includes characters from the Han script but also supports extensions for compatibility with other East Asian scripts.[26] The General Category property further underscores this per-character granularity, classifying individual code points into values such as "Lu" (uppercase letter), "Ll" (lowercase letter), "Sm" (math symbol), or "So" (other symbol), as detailed in the UnicodeData.txt file.[3] Unlike blocks, which do not impose uniform categories, symbol-related blocks like Miscellaneous Symbols (U+2600–U+26FF) typically mix categories, including "So" for decorative icons and "Sm" for technical symbols, reflecting diverse semantic roles within a single range.[3] This mixing prevents block-wide generalizations and emphasizes the need for character-level evaluation. In practice, blocks facilitate broad navigation and allocation in implementations, while scripts and general categories support precise operations in Unicode algorithms. For normalization, the Unicode Normalization Forms rely on general categories to decompose or compose characters, independent of block structure. Collation processes, such as those in the Default Unicode Collation Element Table (DUCET), group and order characters by script for accurate sorting across languages, using script extensions to handle overlaps without reference to blocks. Thus, these properties complement blocks by enabling semantic processing in applications like text rendering and internationalization.[26]Organization across planes
The Unicode Standard organizes its code points into 17 planes, each comprising 65,536 consecutive positions, to provide a structured framework for character allocation.[22] Plane 0, known as the Basic Multilingual Plane (BMP) and spanning U+0000 to U+FFFF, serves as the foundational layer, hosting 163 blocks that encompass legacy encodings and widely used scripts essential for everyday text processing.[2] These blocks include core sets like Basic Latin (U+0000–U+007F) for English and Western European languages, CJK Unified Ideographs (U+4E00–U+9FFF) for Chinese, Japanese, and Korean hanzi/kanji/hanja, and Hangul Syllables (U+AC00–U+D7A3) for Korean, accommodating approximately 63,951 assigned characters that form the bulk of common global communication needs.[27] Plane 1, the Supplementary Multilingual Plane (SMP) from U+10000 to U+1FFFF, extends this structure with 134 blocks dedicated to historic, rare, and supplementary scripts, enabling representation of less frequently used writing systems.[28] Notable examples include Egyptian Hieroglyphs (U+13000–U+1342F) for ancient Egyptian texts and Linear B Syllabary (U+10000–U+1007F) for Mycenaean Greek inscriptions, supporting scholarly and cultural preservation efforts without disrupting the BMP's focus on modern usage. This plane's blocks often group thematically by script type, such as ancient symbols or minority languages, to facilitate efficient implementation in software.[2] Higher planes address specialized requirements beyond primary scripts. Plane 2, the Supplementary Ideographic Plane (SIP) covering U+20000 to U+2FFFF, contains 7 blocks primarily for extensive CJK ideograph extensions, such as CJK Unified Ideographs Extension B (U+20000–U+2A6DF), which vastly expands East Asian character repertoires for rare or historical variants. Plane 3, the Tertiary Ideographic Plane (TIP) from U+30000 to U+3FFFF, holds 3 blocks for further CJK expansions, including CJK Unified Ideographs Extension G (U+30000–U+3134A), prioritizing technical and symbolic needs over broad script coverage. Planes 2 and 3 together allocate substantial code space to ideographic density, with large blocks reflecting the scale of these scripts.[2] Planes 4 through 13 remain largely unallocated, serving as reserved areas for potential future expansions in scripts or symbol sets, ensuring long-term scalability of the encoding model.[22] Plane 14 (U+E0000–U+EFFFF) features 2 blocks for supplementary features like Tags (U+E0000–U+E007F) used in language tagging and Variation Selectors Supplement (U+E0100–U+E01EF) for glyph customization. Planes 15 and 16 (U+F0000–U+10FFFF) are designated for private use, with single large blocks each—Supplementary Private Use Area-A (U+F0000–U+FFFFD) and Supplementary Private Use Area-B (U+100000–U+10FFFD)—allowing vendors and users to define custom characters without conflicting with standardized assignments.[2] Overall, approximately 88% of Unicode 17.0's 336 blocks reside in Planes 0 and 1, concentrating the majority of script diversity while leaving intentional gaps across all planes for anticipated growth in global writing systems.[22] This hierarchical distribution balances immediate compatibility with forward-looking extensibility, with thematic groupings within planes aiding developers in handling diverse linguistic data.[3]Implementation
Software handling
Software systems query the Unicode block property of a code point to determine its grouping for processing purposes. In Java, theCharacter.UnicodeBlock class provides static methods like getBlock(char c) and getBlock(int codePoint) to retrieve the block instance containing a given character or code point, with block names derived from the Unicode standard's Blocks.txt file.[29] Similarly, Python's unicodedata module includes the block(code) function, which returns the name of the block for a specified Unicode code point as a string.[30]
In text processing applications, Unicode blocks facilitate filtering and validation tasks, such as restricting input to specific scripts for localization. For instance, developers can filter text to include only characters from CJK Unified Ideographs blocks to support East Asian languages, ensuring compatibility in multilingual interfaces or data validation routines.[31] This approach is particularly useful in natural language processing pipelines, where block-based scoring helps filter corpora by excluding unwanted scripts or symbols.[32]
Unicode blocks contribute to encoding efficiency by enabling compact data representations in storage and transmission. In font subsetting, blocks allow tools to generate reduced-size font files containing only glyphs for relevant code point ranges, such as Basic Latin for English text, thereby optimizing load times and memory usage in web and application contexts.[33] For databases, block-aware indexing supports efficient querying and collation of internationalized text, reducing overhead in multilingual storage systems.[34]
A key challenge in software handling of Unicode blocks is managing unassigned code points, which are reserved within blocks for future allocations to maintain forward compatibility. Implementations must assign default properties to these points—such as neutral category or no-break behavior—to ensure algorithms like normalization or segmentation function correctly without errors, even as Unicode evolves.[35] The Blocks.txt file from the Unicode Character Database serves as the authoritative source for defining these ranges, guiding software updates to accommodate new assignments.[2]
Font and rendering support
Font formats such as OpenType and TrueType rely on the 'cmap' table to map Unicode code points to glyph indices, with subtables organized by encoding schemes that often align with Unicode blocks for efficient glyph lookup.[36] Format 13 subtables in the 'cmap' table are designed specifically for scenarios where an entire Unicode block maps to a single glyph, commonly used in last-resort fonts to provide basic visual representation for unsupported ranges.[37] Rendering pipelines in modern systems leverage Unicode properties, including the Script property, to inform script detection and text shaping. HarfBuzz, an open-source shaping engine, uses the normative Unicode Script property to assign scripts and apply OpenType features for complex layouts, such as handling the 'COMMON' script with heuristics for accurate glyph positioning and ligature formation; while blocks aid in grouping code points, script assignment is determined by Unicode data files. Similarly, Apple's Core Text framework employs Unicode properties, particularly the Script property, to perform script-specific shaping, integrating with font tables for bidirectional text and contextual forms.[26][38] Font coverage for Unicode blocks is often partial, as comprehensive support across all 300+ blocks in Unicode 17.0 would exceed practical file size limits for most typefaces. The Noto font family, developed by Google, addresses this by providing glyphs for characters in over 150 blocks through its modular design, covering more than 77,000 characters while prioritizing commonly used scripts.[39] For rare or specialized blocks, such as historical scripts or symbols, rendering engines implement fallback strategies, automatically substituting glyphs from secondary fonts to avoid blank or tofu representations.[40] Tools for debugging font and rendering support include Unicode block viewers integrated into development environments. In Visual Studio Code, extensions like Unicode Code Point display code point details, including block affiliations, in the status bar to help identify missing glyphs and test rendering across fonts.[41]Current and historical blocks
List of blocks in Unicode 17.0
Unicode 17.0, released on September 9, 2025, partitions the Unicode code space of 1,114,112 possible code points into 346 contiguous blocks, each dedicated to characters from specific scripts, symbol categories, or reserved areas. These blocks facilitate efficient implementation in software and fonts by grouping related characters, with sizes varying from 16 code points (e.g., for small supplements) to 65,536 (full planes for private use). The distribution spans 17 planes, predominantly in Planes 0, 1, and 2, covering modern and historical scripts, emojis, and technical symbols.[22] This version adds 4,803 new characters for a total of 159,801 assigned characters.[22] This version introduces eight new blocks, including four for recently encoded scripts: Sidetic (U+10940–U+1095F, 32 code points, Plane 1, Sidetic script), Tolong Siki (U+11DB0–U+11DEF, 64 code points, Plane 1, Tolong Siki script), Beria Erfe (U+16EA0–U+16EDF, 64 code points, Plane 1, Beria Erfe script), and Tai Yo (U+1E6C0–U+1E6FF, 64 code points, Plane 1, Tai Yo script). Other new blocks are Sharada Supplement (U+11B60–U+11B7F, 32 code points, Plane 1, Sharada), Tangut Components Supplement (U+18D80–U+18DFF, 128 code points, Plane 1, Tangut), Miscellaneous Symbols Supplement (U+1CEC0–U+1CEFF, 64 code points, Plane 1, symbols including asteroids), and CJK Unified Ideographs Extension J (U+323B0–U+3347F, 4,352 code points, Plane 3, CJK).[22] Additionally, existing blocks like CJK Unified Ideographs and various emoji ranges received expansions, such as new characters in Emoji 17.0 for diverse representations.[42] Key examples illustrate the diversity: the Basic Latin block (U+0000..U+007F, 128 code points, Plane 0) supports core ASCII characters for the Latin script; the CJK Unified Ideographs block (U+4E00..U+9FFF, 20,992 code points, Plane 0) encodes shared Han ideographs for Chinese, Japanese, and Korean; and emoji blocks like Miscellaneous Symbols and Pictographs (U+1F300..U+1F5FF, 768 code points, Plane 1) include thousands of pictorial symbols, with recent additions enhancing inclusivity.[2] Within blocks, gaps exist where code points remain unassigned or reserved, as denoted in official Unicode data files (e.g., non-character code points or future allocations).[3] The comprehensive reference is provided in the table below, listing all blocks by ascending code point order. The "Primary Script/Association" column indicates the main category, such as a script name or symbol type; many blocks support multiple uses, but the primary is noted for context. This table includes selected blocks for brevity; the full list of 346 blocks is available in the official Unicode Character Database.[2]| Block Name | Hex Range | Size | Plane | Primary Script/Association |
|---|---|---|---|---|
| Basic Latin | U+0000..U+007F | 128 | 0 | Latin |
| Latin-1 Supplement | U+0080..U+00FF | 128 | 0 | Latin |
| Latin Extended-A | U+0100..U+017F | 128 | 0 | Latin |
| Latin Extended-B | U+0180..U+024F | 208 | 0 | Latin |
| IPA Extensions | U+0250..U+02AF | 96 | 0 | Phonetic (IPA) |
| Spacing Modifier Letters | U+02B0..U+02FF | 80 | 0 | Modifier letters |
| Combining Diacritical Marks | U+0300..U+036F | 112 | 0 | Diacritics |
| Greek and Coptic | U+0370..U+03FF | 144 | 0 | Greek |
| Cyrillic | U+0400..U+04FF | 256 | 0 | Cyrillic |
| Cyrillic Supplement | U+0500..U+052F | 48 | 0 | Cyrillic |
| Armenian | U+0530..U+058F | 96 | 0 | Armenian |
| Hebrew | U+0590..U+05FF | 112 | 0 | Hebrew |
| Arabic | U+0600..U+06FF | 256 | 0 | Arabic |
| Syriac | U+0700..U+074F | 80 | 0 | Syriac |
| Arabic Supplement | U+0750..U+077F | 48 | 0 | Arabic |
| Thaana | U+0780..U+07BF | 64 | 0 | Thaana |
| NKo | U+07C0..U+07FF | 64 | 0 | N'Ko |
| Samaritan | U+0800..U+083F | 64 | 0 | Samaritan |
| Mandaic | U+0840..U+085F | 32 | 0 | Mandaic |
| Syriac Supplement | U+0860..U+086F | 16 | 0 | Syriac |
| Arabic Extended-B | U+0870..U+089F | 48 | 0 | Arabic |
| Arabic Extended-A | U+08A0..U+08FF | 96 | 0 | Arabic |
| Devanagari | U+0900..U+097F | 128 | 0 | Devanagari |
| Bengali | U+0980..U+09FF | 128 | 0 | Bengali |
| Gurmukhi | U+0A00..U+0A7F | 128 | 0 | Gurmukhi |
| Gujarati | U+0A80..U+0AFF | 128 | 0 | Gujarati |
| Oriya | U+0B00..U+0B7F | 128 | 0 | Odia |
| Tamil | U+0B80..U+0BFF | 128 | 0 | Tamil |
| Telugu | U+0C00..U+0C7F | 128 | 0 | Telugu |
| Kannada | U+0C80..U+0CFF | 128 | 0 | Kannada |
| Malayalam | U+0D00..U+0D7F | 128 | 0 | Malayalam |
| Sinhala | U+0D80..U+0DFF | 128 | 0 | Sinhala |
| Thai | U+0E00..U+0E7F | 128 | 0 | Thai |
| Lao | U+0E80..U+0EFF | 128 | 0 | Lao |
| Tibetan | U+0F00..U+0FFF | 256 | 0 | Tibetan |
| Myanmar | U+1000..U+109F | 160 | 0 | Myanmar |
| Georgian | U+10A0..U+10FF | 96 | 0 | Georgian |
| Hangul Jamo | U+1100..U+11FF | 256 | 0 | Hangul |
| Ethiopic | U+1200..U+137F | 384 | 0 | Ethiopic |
| Ethiopic Supplement | U+1380..U+139F | 32 | 0 | Ethiopic |
| Cherokee | U+13A0..U+13FF | 96 | 0 | Cherokee |
| Unified Canadian Aboriginal Syllabics | U+1400..U+167F | 640 | 0 | Canadian Aboriginal |
| Ogham | U+1680..U+169F | 32 | 0 | Ogham |
| Runic | U+16A0..U+16FF | 96 | 0 | Runic |
| Tagalog | U+1700..U+171F | 32 | 0 | Tagalog |
| Hanunoo | U+1720..U+173F | 32 | 0 | Hanunoo |
| Buhid | U+1740..U+175F | 32 | 0 | Buhid |
| Tagbanwa | U+1760..U+177F | 32 | 0 | Tagbanwa |
| Khmer | U+1780..U+17FF | 128 | 0 | Khmer |
| Mongolian | U+1800..U+18AF | 176 | 0 | Mongolian |
| Unified Canadian Aboriginal Syllabics Extended | U+18B0..U+18FF | 80 | 0 | Canadian Aboriginal |
| Limbu | U+1900..U+194F | 80 | 0 | Limbu |
| Tai Le | U+1950..U+197F | 48 | 0 | Tai Le |
| New Tai Lue | U+1980..U+19DF | 96 | 0 | New Tai Lue |
| Khmer Symbols | U+19E0..U+19FF | 32 | 0 | Khmer |
| Buginese | U+1A00..U+1A1F | 32 | 0 | Buginese |
| Tai Tham | U+1A20..U+1AAF | 144 | 0 | Tai Tham |
| Combining Diacritical Marks Extended | U+1AB0..U+1AFF | 80 | 0 | Diacritics |
| Balinese | U+1B00..U+1B7F | 128 | 0 | Balinese |
| Sundanese | U+1B80..U+1BBF | 64 | 0 | Sundanese |
| Batak | U+1BC0..U+1BFF | 64 | 0 | Batak |
| Lepcha | U+1C00..U+1C4F | 80 | 0 | Lepcha |
| Ol Chiki | U+1C50..U+1C7F | 48 | 0 | Ol Chiki |
| Cyrillic Extended-C | U+1C80..U+1C8F | 16 | 0 | Cyrillic |
| Georgian Extended | U+1C90..U+1CBF | 48 | 0 | Georgian |
| Sundanese Supplement | U+1CC0..U+1CCF | 16 | 0 | Sundanese |
| Vedic Extensions | U+1CD0..U+1CFF | 48 | 0 | Devanagari (Vedic) |
| Phonetic Extensions | U+1D00..U+1D7F | 128 | 0 | Phonetic |
| Phonetic Extensions Supplement | U+1D80..U+1DBF | 64 | 0 | Phonetic |
| Combining Diacritical Marks Supplement | U+1DC0..U+1DFF | 64 | 0 | Diacritics |
| Latin Extended Additional | U+1E00..U+1EFF | 256 | 0 | Latin |
| Greek Extended | U+1F00..U+1FFF | 256 | 0 | Greek |
| General Punctuation | U+2000..U+206F | 112 | 0 | Punctuation |
| Superscripts and Subscripts | U+2070..U+209F | 48 | 0 | Symbols |
| Currency Symbols | U+20A0..U+20CF | 48 | 0 | Currency |
| Combining Diacritical Marks for Symbols | U+20D0..U+20FF | 48 | 0 | Diacritics |
| Letterlike Symbols | U+2100..U+214F | 80 | 0 | Symbols |
| Number Forms | U+2150..U+218F | 64 | 0 | Numbers |
| Arrows | U+2190..U+21FF | 112 | 0 | Arrows |
| Mathematical Operators | U+2200..U+22FF | 256 | 0 | Math |
| Miscellaneous Technical | U+2300..U+23FF | 256 | 0 | Technical |
| Control Pictures | U+2400..U+243F | 64 | 0 | Controls |
| Optical Character Recognition | U+2440..U+245F | 32 | 0 | OCR |
| Enclosed Alphanumerics | U+2460..U+24FF | 160 | 0 | Enclosed |
| Box Drawing | U+2500..U+257F | 128 | 0 | Box drawing |
| Block Elements | U+2580..U+259F | 32 | 0 | Blocks |
| Geometric Shapes | U+25A0..U+25FF | 96 | 0 | Geometric |
| Miscellaneous Symbols | U+2600..U+26FF | 256 | 0 | Symbols |
| Dingbats | U+2700..U+27BF | 192 | 0 | Dingbats |
| Miscellaneous Mathematical Symbols-A | U+27C0..U+27EF | 48 | 0 | Math |
| Supplemental Arrows-A | U+27F0..U+27FF | 16 | 0 | Arrows |
| Braille Patterns | U+2800..U+28FF | 256 | 0 | Braille |
| Supplemental Arrows-B | U+2900..U+297F | 128 | 0 | Arrows |
| Miscellaneous Mathematical Symbols-B | U+2980..U+29FF | 128 | 0 | Math |
| Supplemental Mathematical Operators | U+2A00..U+2AFF | 256 | 0 | Math |
| Miscellaneous Symbols and Arrows | U+2B00..U+2BFF | 256 | 0 | Symbols/Arrows |
| Glagolitic | U+2C00..U+2C5F | 96 | 0 | Glagolitic |
| Latin Extended-C | U+2C60..U+2C7F | 32 | 0 | Latin |
| Coptic | U+2C80..U+2CFF | 128 | 0 | Coptic |
| Georgian Supplement | U+2D00..U+2D2F | 48 | 0 | Georgian |
| Tifinagh | U+2D30..U+2D7F | 80 | 0 | Tifinagh |
| Ethiopic Extended | U+2D80..U+2DDF | 96 | 0 | Ethiopic |
| Cyrillic Extended-A | U+2DE0..U+2DFF | 32 | 0 | Cyrillic |
| Supplemental Punctuation | U+2E00..U+2E7F | 128 | 0 | Punctuation |
| CJK Radicals Supplement | U+2E80..U+2EFF | 128 | 0 | CJK |
| Kangxi Radicals | U+2F00..U+2FDF | 214 | 0 | CJK |
| Ideographic Description Characters | U+2FF0..U+2FFF | 16 | 0 | CJK |
| CJK Symbols and Punctuation | U+3000..U+303F | 64 | 0 | CJK |
| Hiragana | U+3040..U+309F | 96 | 0 | Japanese (Hiragana) |
| Katakana | U+30A0..U+30FF | 96 | 0 | Japanese (Katakana) |
| Bopomofo | U+3100..U+312F | 48 | 0 | Bopomofo |
| Hangul Compatibility Jamo | U+3130..U+318F | 96 | 0 | Hangul |
| Kanbun | U+3190..U+319F | 16 | 0 | Kanbun |
| Bopomofo Extended | U+31A0..U+31BF | 32 | 0 | Bopomofo |
| CJK Strokes | U+31C0..U+31EF | 48 | 0 | CJK |
| Katakana Phonetic Extensions | U+31F0..U+31FF | 16 | 0 | Katakana |
| Enclosed CJK Letters and Months | U+3200..U+32FF | 256 | 0 | Enclosed CJK |
| CJK Compatibility | U+3300..U+33FF | 256 | 0 | CJK compatibility |
| CJK Unified Ideographs Extension A | U+3400..U+4DBF | 6,592 | 0 | CJK |
| Yijing Hexagram Symbols | U+4DC0..U+4DFF | 64 | 0 | Yijing |
| CJK Unified Ideographs | U+4E00..U+9FFF | 20,992 | 0 | CJK |
| Yi Syllables | U+A000..U+A48F | 1,168 | 0 | Yi |
| Yi Radicals | U+A490..U+A4CF | 64 | 0 | Yi |
| Lisu | U+A4D0..U+A4FF | 48 | 0 | Lisu |
| Vai | U+A500..U+A63F | 352 | 0 | Vai |
| Cyrillic Extended-B | U+A640..U+A69F | 96 | 0 | Cyrillic |
| Bamum | U+A6A0..U+A6FF | 96 | 0 | Bamum |
| Modifier Tone Letters | U+A700..U+A71F | 32 | 0 | Phonetic |
| Latin Extended-D | U+A720..U+A7FF | 240 | 0 | Latin |
| Syloti Nagri | U+A800..U+A82F | 48 | 0 | Syloti Nagri |
| Common Indic Number Forms | U+A830..U+A83F | 16 | 0 | Indic numbers |
| Phags-pa | U+A840..U+A87F | 64 | 0 | Phags-pa |
| Saurashtra | U+A880..U+A8DF | 96 | 0 | Saurashtra |
| Devanagari Extended | U+A8E0..U+A8FF | 32 | 0 | Devanagari |
| Kayah Li | U+A900..U+A92F | 48 | 0 | Kayah Li |
| Rejang | U+A930..U+A95F | 48 | 0 | Rejang |
| Hangul Jamo Extended-A | U+A960..U+A97F | 32 | 0 | Hangul |
| Javanese | U+A980..U+A9DF | 96 | 0 | Javanese |
| Myanmar Extended-B | U+A9E0..U+A9FF | 32 | 0 | Myanmar |
| Cham | U+AA00..U+AA5F | 96 | 0 | Cham |
| Myanmar Extended-A | U+AA60..U+AA7F | 32 | 0 | Myanmar |
| Tai Viet | U+AA80..U+AADF | 120 | 0 | Tai Viet |
| Meetei Mayek Extensions | U+AAE0..U+AAFF | 32 | 0 | Meetei Mayek |
| Ethiopic Extended-A | U+AB00..U+AB2F | 48 | 0 | Ethiopic |
| Latin Extended-E | U+AB30..U+AB6F | 64 | 0 | Latin |
| Meetei Mayek | U+ABC0..U+ABFF | 64 | 0 | Meetei Mayek |
| Hangul Syllables | U+AC00..U+D7AF | 11,172 | 0 | Hangul |
| Hangul Jamo Extended-B | U+D7B0..U+D7FF | 80 | 0 | Hangul |
| High Surrogates | U+D800..U+DB7F | 2048 | 0 | Surrogates |
| High Private Use Surrogates | U+DB80..U+DBFF | 128 | 0 | Surrogates |
| Low Surrogates | U+DC00..U+DFFF | 2048 | 0 | Surrogates |
| Private Use Area | U+E000..U+F8FF | 6,400 | 0 | Private use |
| CJK Compatibility Ideographs | U+F900..U+FAFF | 256 | 0 | CJK compatibility |
| Alphabetic Presentation Forms | U+FB00..U+FB4F | 80 | 0 | Presentation forms |
| Arabic Presentation Forms-A | U+FB50..U+FDFF | 688 | 0 | Arabic |
| Variation Selectors | U+FE00..U+FE0F | 16 | 0 | Variation selectors |
| Vertical Forms | U+FE10..U+FE1F | 16 | 0 | Vertical |
| Combining Half Marks | U+FE20..U+FE2F | 16 | 0 | Half marks |
| CJK Compatibility Forms | U+FE30..U+FE4F | 32 | 0 | CJK compatibility |
| Small Form Variants | U+FE50..U+FE6F | 32 | 0 | Variants |
| Arabic Presentation Forms-B | U+FE70..U+FEFF | 144 | 0 | Arabic |
| Halfwidth and Fullwidth Forms | U+FF00..U+FFEF | 240 | 0 | Halfwidth/fullwidth |
| Specials | U+FFF0..U+FFFF | 16 | 0 | Special |
| Linear B Syllabary | U+10000..U+1007F | 128 | 1 | Linear B |
| Linear B Ideograms | U+10080..U+100FF | 128 | 1 | Linear B |
| Aegean Numbers | U+10100..U+1013F | 64 | 1 | Aegean |
| Ancient Greek Numbers | U+10140..U+1018F | 80 | 1 | Greek |
| Ancient Symbols | U+10190..U+101CF | 64 | 1 | Ancient |
| Phaistos Disc | U+101D0..U+101FF | 48 | 1 | Phaistos |
| Lycian | U+10280..U+1029F | 32 | 1 | Lycian |
| Carian | U+102A0..U+102DF | 64 | 1 | Carian |
| Coptic Epact Numbers | U+102E0..U+102FF | 32 | 1 | Coptic |
| Old Italic | U+10300..U+1032F | 48 | 1 | Italic |
| Gothic | U+10330..U+1034F | 32 | 1 | Gothic |
| Old Permic | U+10350..U+1037F | 48 | 1 | Permic |
| Ugaritic | U+10380..U+1039F | 32 | 1 | Ugaritic |
| Old Persian | U+103A0..U+103DF | 64 | 1 | Old Persian |
| Deseret | U+10400..U+1044F | 80 | 1 | Deseret |
| Shavian | U+10450..U+1047F | 48 | 1 | Shavian |
| Osmanya | U+10480..U+104AF | 48 | 1 | Osmanya |
| Osage | U+104B0..U+104FF | 80 | 1 | Osage |
| Elbasan | U+10500..U+1052F | 48 | 1 | Elbasan |
| Caucasian Albanian | U+10530..U+1056F | 64 | 1 | Caucasian Albanian |
| Vithkuqi | U+10570..U+105BF | 80 | 1 | Vithkuqi |
| Linear A | U+10600..U+1077F | 384 | 1 | Linear A |
| Cypriot Syllabary | U+10800..U+1083F | 64 | 1 | Cypriot |
| Imperial Aramaic | U+10840..U+1085F | 32 | 1 | Aramaic |
| Palmyrene | U+10860..U+1087F | 32 | 1 | Palmyrene |
| Nabataean | U+10880..U+108AF | 48 | 1 | Nabataean |
| Hatran | U+108E0..U+108FF | 32 | 1 | Hatran |
| Phoenician | U+10900..U+1091F | 32 | 1 | Phoenician |
| Lydian | U+10920..U+1093F | 32 | 1 | Lydian |
| Sidetic | U+10940..U+1095F | 32 | 1 | Sidetic |
| Meroitic Hieroglyphs | U+10980..U+1099F | 32 | 1 | Meroitic |
| Meroitic Cursive | U+109A0..U+109FF | 96 | 1 | Meroitic |
| Kharoshthi | U+10A00..U+10A5F | 96 | 1 | Kharoshthi |
| Old South Arabian | U+10A60..U+10A7F | 32 | 1 | South Arabian |
| Old North Arabian | U+10A80..U+10A9F | 32 | 1 | North Arabian |
| Manichaean | U+10AC0..U+10AFF | 64 | 1 | Manichaean |
| Avestan | U+10B00..U+10B3F | 64 | 1 | Avestan |
| Inscriptional Parthian | U+10B40..U+10B5F | 32 | 1 | Parthian |
| Inscriptional Pahlavi | U+10B60..U+10B7F | 32 | 1 | Pahlavi |
| Psalter Pahlavi | U+10B80..U+10BAF | 48 | 1 | Pahlavi |
| Old Turkic | U+10C00..U+10C4F | 80 | 1 | Old Turkic |
| Old Hungarian | U+10C80..U+10CFF | 128 | 1 | Hungarian |
| Hanifi Rohingya | U+10D00..U+10D3F | 64 | 1 | Hanifi Rohingya |
| Yezidi | U+10E60..U+10E7F | 32 | 1 | Yezidi |
| Cypro-Minoan | U+12F90..U+12FFF | 112 | 1 | Cypro-Minoan |
| Egyptian Hieroglyphs | U+13000..U+1342F | 1,072 | 1 | Egyptian Hieroglyphs |
| Anatolian Hieroglyphs | U+14400..U+1467F | 582 | 1 | Anatolian Hieroglyphs |
| Bamum Supplement | U+16800..U+16A3F | 352 | 1 | Bamum |
| Mro | U+16A40..U+16A6F | 48 | 1 | Mro |
| Tangsa | U+16A70..U+16ACF | 96 | 1 | Tangsa |
| Bassa Vah | U+16AD0..U+16AFF | 48 | 1 | Bassa Vah |
| Pahawh Hmong | U+16B00..U+16B8F | 144 | 1 | Pahawh Hmong |
| Medefaidrin | U+16E40..U+16E7F | 64 | 1 | Medefaidrin |
| Miao | U+16F00..U+16F9F | 160 | 1 | Miao |
| Beria Erfe | U+16EA0..U+16EDF | 64 | 1 | Beria Erfe |
| Ideographic Symbols and Punctuation | U+16FE0..U+16FFF | 32 | 1 | CJK symbols |
| Tangut | U+17000..U+187FF | 6,144 | 1 | Tangut |
| Khitan Small Script | U+18A00..U+18AFF | 256 | 1 | Khitan |
| Tangut Components | U+18800..U+18AFF | 768 | 1 | Tangut |
| Tangut Components Supplement | U+18D80..U+18DFF | 128 | 1 | Tangut |
| Kana Supplement | U+1B000..U+1B0FF | 256 | 1 | Kana |
| Kana Extended-A | U+1B100..U+1B12F | 48 | 1 | Kana |
| Nushu | U+1B170..U+1B2FF | 240 | 1 | Nushu |
| Duployan | U+1BC00..U+1BC9F | 160 | 1 | Duployan |
| Shorthand Format Controls | U+1BCA0..U+1BCAF | 16 | 1 | Shorthand |
| Znamenny Musical Notation | U+1CF00..U+1CF9F | 160 | 1 | Musical notation |
| Byzantine Musical Symbols | U+1D000..U+1D0FF | 256 | 1 | Musical symbols |
| Musical Symbols | U+1D100..U+1D1FF | 256 | 1 | Musical symbols |
| Ancient Greek Musical Notation | U+1D200..U+1D24F | 80 | 1 | Musical notation |
| Tai Xuan Jing Symbols | U+1D300..U+1D35F | 96 | 1 | Symbols |
| Counting Rod Numerals | U+1D360..U+1D37F | 32 | 1 | Numerals |
| Mayan Numerals | U+1D360..U+1D37F | 32 | 1 | Mayan |
| Kaktovik Numerals | U+1D2E0..U+1D2FF | 32 | 1 | Numerals |
| Mathematical Alphanumeric Symbols | U+1D400..U+1D7FF | 1024 | 1 | Math |
| Sutton SignWriting | U+1D800..U+1DAAF | 666 | 1 | SignWriting |
| Latin Extended-F | U+1DF00..U+1DFFF | 256 | 1 | Latin |
| Nandinagari | U+1E2F0..U+1E2FF | 16 | 1 | Nandinagari |
| Wancho | U+1E2C0..U+1E2EF | 48 | 1 | Wancho |
| Tai Yo | U+1E6C0..U+1E6FF | 64 | 1 | Tai Yo |
| Ethiopic Extended-B | U+1E7E0..U+1E7FF | 32 | 1 | Ethiopic |
| Mende Kikakui | U+1E800..U+1E8DF | 240 | 1 | Mende Kikakui |
| Adlam | U+1E900..U+1E95F | 96 | 1 | Adlam |
| Indic Siyaq Numbers | U+1EC90..U+1ECBF | 48 | 1 | Indic numbers |
| Ottoman Siyaq Numbers | U+1ED01..U+1ED2D | 45 | 1 | Ottoman numbers |
| Arabic Mathematical Alphabetic Symbols | U+1EE00..U+1EEFF | 256 | 1 | Arabic math |
| Mahjong Tiles | U+1F000..U+1F02F | 48 | 1 | Mahjong |
| Domino Tiles | U+1F030..U+1F09F | 112 | 1 | Dominoes |
| Playing Cards | U+1F0A0..U+1F0FF | 96 | 1 | Cards |
| Enclosed Alphanumeric Supplement | U+1F100..U+1F1FF | 256 | 1 | Enclosed |
| Enclosed Ideographic Supplement | U+1F200..U+1F2FF | 256 | 1 | Enclosed CJK |
| Miscellaneous Symbols and Pictographs | U+1F300..U+1F5FF | 768 | 1 | Emojis/Symbols |
| Emoticons | U+1F600..U+1F64F | 80 | 1 | Emojis |
| Ornamental Dingbats | U+1F650..U+1F67F | 48 | 1 | Dingbats |
| Transport and Map Symbols | U+1F680..U+1F6FF | 128 | 1 | Symbols |
| Alchemical Symbols | U+1F700..U+1F77F | 128 | 1 | Alchemical |
| Geometric Shapes Extended | U+1F780..U+1F7FF | 128 | 1 | Geometric |
| Supplemental Arrows-C | U+1F800..U+1F8FF | 256 | 1 | Arrows |
| Supplemental Symbols and Pictographs | U+1F900..U+1F9FF | 256 | 1 | Emojis/Symbols |
| Chess Symbols | U+1FA00..U+1FA6F | 112 | 1 | Chess |
| Symbols and Pictographs Extended-A | U+1FA70..U+1FAFF | 144 | 1 | Emojis/Symbols |
| Miscellaneous Symbols Supplement | U+1CEC0..U+1CEFF | 64 | 1 | Symbols |
| Sharada Supplement | U+11B60..U+11B7F | 32 | 1 | Sharada |
| CJK Unified Ideographs Extension B | U+20000..U+2A6DF | 42,720 | 2 | CJK |
| CJK Unified Ideographs Extension C | U+2A700..U+2B73F | 4,148 | 2 | CJK |
| CJK Unified Ideographs Extension D | U+2B740..U+2B81F | 222 | 2 | CJK |
| CJK Unified Ideographs Extension E | U+2B820..U+2CEAF | 5,876 | 2 | CJK |
| CJK Unified Ideographs Extension F | U+2CEB0..U+2EBEF | 7,473 | 2 | CJK |
| CJK Compatibility Ideographs Supplement | U+2F800..U+2FA1F | 544 | 2 | CJK compatibility |
| CJK Unified Ideographs Extension G | U+30000..U+3134F | 4,939 | 3 | CJK |
| CJK Unified Ideographs Extension J | U+323B0..U+3347F | 4,352 | 3 | CJK |
| Tags | U+E0000..U+E007F | 128 | 14 | Tags |
| Variation Selectors Supplement | U+E0100..U+E01EF | 240 | 14 | Variation selectors |
| Supplementary Private Use Area-A | U+F0000..U+FFFFF | 65,536 | 15 | Private use |
| Supplementary Private Use Area-B | U+100000..U+10FFFF | 65,536 | 16 | Private use |
| No Block | (various unassigned ranges across Planes 3–13 and 17) | Varies | Varies | Unassigned |