Fact-checked by Grok 2 weeks ago

Unicode block

A Unicode block is a contiguous range of code points in the Unicode codespace, normatively defined to organize characters associated with a particular script, symbol set, or functional category, such as the Basic Latin block (U+0000–U+007F) for standard ASCII characters or the CJK Unified Ideographs block (U+4E00–U+9FFF) for Han characters.^[1] These blocks serve as the primary structural units for encoding and grouping 159,801 assigned characters across 336 blocks in Unicode 17.0, facilitating efficient text processing, rendering, and implementation in software and fonts.^[1]^[2] Unicode blocks were introduced as part of the Unicode Standard's design to divide the 1,114,112-code-point codespace into manageable, non-overlapping ranges, each with a unique, case-insensitive name that ignores variations in whitespace, hyphens, or underscores for matching purposes.^[1] While most blocks align with a single script or symbol collection—such as the Arabic block (U+0600–U+06FF) for right-to-left text or the Musical Symbols block (U+1D100–U+1D1FF) for notation—some scripts like Latin span multiple blocks to accommodate extensions for diacritics, historical forms, and regional variants.^[1] Blocks may include unassigned or reserved code points for future expansions, and their boundaries typically align with multiples of 16 in the hexadecimal code point system to optimize storage and lookup.^[1] The exact allocation and names are specified in the Unicode Character Database's Blocks.txt file, ensuring stability and interoperability across versions of the standard.^[2] In practice, Unicode blocks underpin key aspects of internationalization, including font design, input methods, and algorithms like the Unicode Bidirectional Algorithm for mixed-script text.^[1] They enable developers to reference character groups programmatically via the normative Block property, though this property alone does not determine semantic or behavioral traits—such as directionality or combining class—which require consultation of the full Unicode Character Database.^[3] For unassigned code points outside defined blocks, the default Block value is "No_Block," highlighting the standard's forward-compatible design.^[1] This organization supports 172 scripts and diverse symbol sets, from ancient systems like Anatolian Hieroglyphs (U+14400–U+1467F) to modern emojis in the Emoticons block (U+1F600–U+1F64F), promoting global digital communication.^[1]

Overview

Definition

A Unicode block is a named grouping of contiguous code points within the Unicode codespace, designed to organize characters for reference and documentation purposes by the Unicode Consortium. These blocks consist of non-overlapping ranges that partition the assignable code points, excluding the surrogate range U+D800–U+DFFF reserved for encoding mechanisms like UTF-16.^[4] The size of a Unicode block is always a multiple of 16 code points and can vary from the minimum of 16 to the maximum of 65,536 code points, aligning with the structure of Unicode planes for efficient allocation. For instance, the Basic Latin block occupies the range U+0000–U+007F, encompassing 128 code points for fundamental Latin script characters.^[2] Each block receives a descriptive name reflecting its typical contents, such as "Latin Extended Additional" or "Enclosed Alphanumerics," though the grouping is informal and non-semantic, without implying specific properties or behaviors for the characters it contains. In Unicode 17.0, 336 blocks are defined, collectively spanning the Unicode codespace excluding the surrogate range (totaling 1,112,064 code points from U+0000 to U+10FFFF across the 17 planes).^[4]^[2]

Purpose

Unicode blocks provide a structured mechanism for grouping related characters within the Unicode codespace, enabling intuitive organization by script, symbol type, or thematic category. For instance, the "CJK Unified Ideographs" block encompasses characters shared across Chinese, Japanese, and Korean writing systems, facilitating easier reference and comprehension of the standard's expansive character set. This grouping establishes a coarse but practical framework for managing 159,801 assigned characters as of Unicode 17.0, promoting accessibility in documentation and implementation.^[5]^[3] These blocks support key practical utilities in software development and user interfaces. In font design, thematic chunks allow developers to prioritize glyph creation for specific categories, streamlining the support for diverse scripts and symbols without requiring comprehensive coverage of the entire repertoire. Similarly, blocks enhance searching functionalities and input methods by enabling categorized browsing in tools like character maps, where users can select from predefined groups rather than scanning individual code points.^[6]^[3] The normative Block property, detailed in the Unicode Character Database, further empowers efficient text processing and storage. In databases and text processors, it permits filtering operations to isolate characters by block, optimizing queries and segmentation for applications such as internationalization and data analysis. For regular expressions, block-based patterns offer a quick means to match sets of characters, though with caveats regarding precision. Notably, this property's non-binding nature means blocks do not enforce semantic rules but instead bolster user-oriented features like character pickers, providing organizational aids independent of stricter script alignments—though blocks frequently correspond to scripts, with some spanning multiple blocks for broader coverage.^[3]^[7]

History

Early development

The Unicode Standard version 1.0, released in October 1991, introduced the concept of character blocks as a means to organize code points within its 16-bit encoding space, specifically the Basic Multilingual Plane (BMP) spanning U+0000 to U+FFFF.^[8] This structure was directly inspired by the design of plane 0 in the emerging ISO/IEC 10646 standard, aligning Unicode's initial repertoire with international efforts to create a universal character encoding.^[9] The BMP was prioritized to contain the most commonly used characters, facilitating efficient implementation in software and hardware while reserving space for future growth.^[10] Unicode 1.0 defined an initial set of 15 blocks, emphasizing major world scripts to provide broad multilingual support from the outset. These included blocks for Latin (Basic Latin, Latin-1 Supplement, and extensions), Greek, Cyrillic, Armenian, Hebrew, Arabic, Thai, Georgian, Hangul Jamo, and provisions for CJK ideographs, which were detailed in a supplementary volume.^[8] This selection reflected a strategic focus on scripts with significant global usage, enabling the encoding of text from diverse languages without immediate fragmentation.^[11] The rationale behind this block-based organization stemmed from the Unicode Consortium's foundational objectives, established in 1991, to achieve compatibility with prevalent legacy encodings while supporting extensibility for unforeseen needs. For instance, the Basic Latin block (U+0000–U+007F) directly mirrored ASCII to ensure seamless integration with existing Western computing systems.^[12] At the same time, the modular block design allowed for logical grouping of related characters, simplifying font development, searching, and collation processes, and providing a framework for adding new scripts without disrupting established allocations.^[13] A pivotal event shaping this early framework was the 1991 merger between the Unicode initiative and ISO/IEC JTC1/SC2 efforts on 10646, following negotiations that resolved discrepancies in character repertoires and encoding forms. Meetings in October 1991 produced mutually acceptable adjustments, resulting in unified block planning that synchronized Unicode's BMP with ISO 10646's plane 0 and prevented divergent standards.^[9]^[14] This collaboration ensured that block definitions would evolve cohesively, laying the groundwork for international adoption. Subsequent versions after 2.0 expanded this structure significantly, incorporating additional planes and scripts.^[15]

Evolution through versions

The Unicode Standard version 2.0, released in July 1996, significantly expanded the encoding space by formalizing support for multiple planes beyond the Basic Multilingual Plane (BMP), including the introduction of supplementary planes via surrogate mechanisms in UTF-16, and added the Hangul Syllables block (U+AC00–U+D7A3) to enable precomposed representation of over 11,000 Korean syllables.^[16] This version established a foundational structure of approximately 68 blocks, facilitating broader script coverage while maintaining compatibility with prior encodings.^[17] Subsequent versions built on this framework with targeted expansions. Unicode 3.0, published in September 2000, added 18 new blocks supporting 12 scripts, including historic ones such as Ogham (U+1680–U+169F) and Runic (U+16A0–U+16FF), alongside modern scripts like Cherokee (U+13A0–U+13FF) and Mongolian (U+1800–U+18AF), contributing to over 10,000 new characters overall.^[18] Unicode 5.0, released in July 2006, further diversified symbol support by introducing new symbol blocks, while adding five new scripts including Phoenician (U+10900–U+1091F) and nine new blocks in total.^[19] A pivotal shift occurred in Unicode 6.0 (October 2010), which incorporated the first dedicated Emoji blocks—Emoticons (U+1F600–U+1F64F) and the expanded Miscellaneous Symbols and Pictographs (U+1F300–U+1F5FF)—to accommodate over 1,000 symbolic characters for digital communication, reflecting growing needs in mobile and web technologies.^[20] From Unicode 8.0 (June 2015) onward, the Consortium adopted an annual release cadence to accelerate inclusion of underrepresented scripts, such as Ahom (U+11700–U+1173F) and Multani (U+11280–U+112AF) in version 8.0, prioritizing linguistic diversity in regions like South Asia and Africa.^[21] By Unicode 17.0, released in September 2025, the standard encompasses 336 blocks and incorporates 4,803 new characters across four additional scripts, including Sidetic and Tolong Siki, underscoring ongoing efforts to document endangered writing systems.^[22] Overall growth has been substantial, evolving from roughly 15 blocks in version 1.0 to 336 today, with approximately 80% allocated to the BMP and Supplementary Multilingual Plane (SMP) to optimize common usage while reserving space for future extensions.^[23] This progression highlights a pattern of incremental, script-focused additions, with occasional relocations of blocks to refine organization without disrupting compatibility.^[3]

Design and properties

Code point allocation

Unicode blocks are defined as fixed, non-overlapping ranges of code points within the Unicode codespace, which spans from U+0000 to U+10FFFF. Each block begins at a code point that is a multiple of 16 (such as U+0000 or U+0100) and ends 15 code points later in alignment, ensuring compatibility with code chart layouts that organize characters in 16-column grids. For instance, the Basic Latin block occupies U+0000–U+007F, encompassing 128 code points for fundamental Latin characters. This structure guarantees that blocks partition the codespace without gaps or overlaps, except for designated reservations, and unassigned code points fall under the "No_Block" category if not explicitly covered.^[4]^[2] The allocation of code points into blocks is managed by the Unicode Consortium through a rigorous proposal process. Proposals for new scripts or characters are submitted using a standardized form, evaluated initially by technical officers, and then reviewed by the Unicode Technical Committee (UTC) to assess compatibility with existing encodings, script requirements, and stability policies. Approved allocations prioritize needs for specific writing systems while incorporating reservations, such as unassigned gaps within blocks, to accommodate future expansions without disrupting prior assignments. This process ensures that blocks are assigned in contiguous ranges aligned to the 16-code-point boundary, maintaining the overall organization of the standard.^[24]^[4] Block sizes vary to accommodate diverse character repertoires: most blocks contain 256 code points (a full 16x16 grid), but smaller or larger ranges exist based on content density, such as 128 code points for many Latin extensions or 20,992 code points for the CJK Unified Ideographs block (U+4E00–U+9FFF) to support East Asian logographic scripts. The total allocatable codespace excludes the surrogate range U+D800–U+DFFF (2,048 code points), which is reserved for UTF-16 encoding mechanics and not assigned to characters. Additionally, certain noncharacter code points, like U+FDD0–U+FDEF and plane-ending U+FFFE–U+FFFF, are permanently unassigned for interoperability purposes.^[2]^[4]^[12] The codespace is divided into 17 planes of 65,536 code points each, influencing block placement: Plane 0, the Basic Multilingual Plane (BMP, U+0000–U+FFFF), hosts blocks for widely used modern scripts like Latin, Greek, and Cyrillic to facilitate legacy system compatibility. Planes 1 through 16 cover supplementary areas, with Plane 1 (Supplementary Multilingual Plane, U+10000–U+1FFFF) for historic scripts and symbols, Planes 2 and 3 for additional CJK ideographs, Plane 14 for special-purpose characters, and Planes 15–16 reserved for private use. This plane-based organization guides block allocation, prioritizing the BMP for accessibility while reserving higher planes for less common or emerging content.^[12]^[4] The Blocks property, part of the Unicode Character Database, normatively identifies the block containing any given code point, aiding in implementation and analysis.^[4]

Block metadata

The Block property is a catalog property in the Unicode Character Database (UCD) that assigns a string value to each code point, identifying the named block to which it belongs, such as "Basic Latin" for code points U+0000 to U+007F.^[3] This property facilitates the organization and querying of characters by grouping them into contiguous ranges, independent of other properties like the General Category.^[3] Metadata for Unicode blocks is primarily documented in the Blocks.txt file, which enumerates each block's hexadecimal range, official name, and the Unicode version (age) in which it was introduced, formatted as "start..end; Block Name # [Vmajor.minor]".^[3] For instance, the entry for the Basic Latin block appears as "0000..007F; Basic Latin # [V1.1]", indicating its assignment in Unicode 1.1.^[2] Unassigned code points default to the value "No_Block".^[3] Unicode enforces a stability policy for blocks, guaranteeing that once a range is assigned to a block in a published version, it will not be changed, reallocated, or removed in future versions to ensure backward compatibility for implementations.^[3] This policy, detailed in Section 2.3 of UAX #44, applies to both the ranges and the associated block names, preventing disruptions in software, fonts, and data processing that rely on these assignments.^[3] Block names follow descriptive naming conventions, typically based on the primary script, symbol set, or functional category they represent, such as "Ethiopic" for the Ge'ez script or "Mathematical Operators" for symbolic characters.^[3] Names are unique, use title case with hyphens or spaces for readability, and are ASCII-compatible to support legacy systems; for compatibility, abbreviations and long names are provided as aliases in the PropertyValueAliases.txt file.^[25] The versioning of blocks, or "block age," is directly tied to the Unicode Standard version in which the block was first defined, denoted in formats like "7.0" for blocks added in Unicode 7.0.^[3] This age information in Blocks.txt and related derived files like DerivedAge.txt allows developers to track the temporal evolution of the encoding space without affecting the stability of existing assignments.^[3]

Classifications and relations

Connection to scripts and categories

Unicode blocks serve as organizational ranges of code points within the Unicode standard, but they are distinct from other character properties such as the Script property and the General Category property, which are assigned on a per-character basis rather than applying to entire blocks.^[3] This independence allows blocks to provide a coarse grouping mechanism for code points, while scripts and general categories enable finer-grained semantic analysis essential for text processing. For instance, the Script property identifies the writing system associated with a character, such as "Latin" or "Greek," and is defined in the Scripts.txt data file as a normative property.^[26] In contrast, blocks like Basic Latin (U+0000–U+007F) are normative ranges specified in Blocks.txt, without inherent ties to these properties.^[3] A key distinction arises in how scripts often span multiple blocks, illustrating their orthogonality to block boundaries. The Latin script, for example, extends across several blocks including Basic Latin, Latin-1 Supplement (U+0080–U+00FF), and Latin Extended-A (U+0100–U+017F), encompassing characters used in various Latin-based languages.^[26] Similarly, the Greek script is largely contained within the dedicated Greek block (U+0370–U+03FF), providing a closer alignment in this case, though it may incorporate characters from the Common script category for shared elements like punctuation.^[3] However, exceptions occur in blocks dedicated to symbols or ideographs, where multiple scripts may intermingle; for instance, the CJK Unified Ideographs block (U+4E00–U+9FFF) includes characters from the Han script but also supports extensions for compatibility with other East Asian scripts.^[26] The General Category property further underscores this per-character granularity, classifying individual code points into values such as "Lu" (uppercase letter), "Ll" (lowercase letter), "Sm" (math symbol), or "So" (other symbol), as detailed in the UnicodeData.txt file.^[3] Unlike blocks, which do not impose uniform categories, symbol-related blocks like Miscellaneous Symbols (U+2600–U+26FF) typically mix categories, including "So" for decorative icons and "Sm" for technical symbols, reflecting diverse semantic roles within a single range.^[3] This mixing prevents block-wide generalizations and emphasizes the need for character-level evaluation. In practice, blocks facilitate broad navigation and allocation in implementations, while scripts and general categories support precise operations in Unicode algorithms. For normalization, the Unicode Normalization Forms rely on general categories to decompose or compose characters, independent of block structure. Collation processes, such as those in the Default Unicode Collation Element Table (DUCET), group and order characters by script for accurate sorting across languages, using script extensions to handle overlaps without reference to blocks. Thus, these properties complement blocks by enabling semantic processing in applications like text rendering and internationalization.^[26]

Organization across planes

The Unicode Standard organizes its code points into 17 planes, each comprising 65,536 consecutive positions, to provide a structured framework for character allocation.^[22] Plane 0, known as the Basic Multilingual Plane (BMP) and spanning U+0000 to U+FFFF, serves as the foundational layer, hosting 163 blocks that encompass legacy encodings and widely used scripts essential for everyday text processing.^[2] These blocks include core sets like Basic Latin (U+0000–U+007F) for English and Western European languages, CJK Unified Ideographs (U+4E00–U+9FFF) for Chinese, Japanese, and Korean hanzi/kanji/hanja, and Hangul Syllables (U+AC00–U+D7A3) for Korean, accommodating approximately 63,951 assigned characters that form the bulk of common global communication needs.^[27] Plane 1, the Supplementary Multilingual Plane (SMP) from U+10000 to U+1FFFF, extends this structure with 134 blocks dedicated to historic, rare, and supplementary scripts, enabling representation of less frequently used writing systems.^[28] Notable examples include Egyptian Hieroglyphs (U+13000–U+1342F) for ancient Egyptian texts and Linear B Syllabary (U+10000–U+1007F) for Mycenaean Greek inscriptions, supporting scholarly and cultural preservation efforts without disrupting the BMP's focus on modern usage. This plane's blocks often group thematically by script type, such as ancient symbols or minority languages, to facilitate efficient implementation in software.^[2] Higher planes address specialized requirements beyond primary scripts. Plane 2, the Supplementary Ideographic Plane (SIP) covering U+20000 to U+2FFFF, contains 7 blocks primarily for extensive CJK ideograph extensions, such as CJK Unified Ideographs Extension B (U+20000–U+2A6DF), which vastly expands East Asian character repertoires for rare or historical variants. Plane 3, the Tertiary Ideographic Plane (TIP) from U+30000 to U+3FFFF, holds 3 blocks for further CJK expansions, including CJK Unified Ideographs Extension G (U+30000–U+3134A), prioritizing technical and symbolic needs over broad script coverage. Planes 2 and 3 together allocate substantial code space to ideographic density, with large blocks reflecting the scale of these scripts.^[2] Planes 4 through 13 remain largely unallocated, serving as reserved areas for potential future expansions in scripts or symbol sets, ensuring long-term scalability of the encoding model.^[22] Plane 14 (U+E0000–U+EFFFF) features 2 blocks for supplementary features like Tags (U+E0000–U+E007F) used in language tagging and Variation Selectors Supplement (U+E0100–U+E01EF) for glyph customization. Planes 15 and 16 (U+F0000–U+10FFFF) are designated for private use, with single large blocks each—Supplementary Private Use Area-A (U+F0000–U+FFFFD) and Supplementary Private Use Area-B (U+100000–U+10FFFD)—allowing vendors and users to define custom characters without conflicting with standardized assignments.^[2] Overall, approximately 88% of Unicode 17.0's 336 blocks reside in Planes 0 and 1, concentrating the majority of script diversity while leaving intentional gaps across all planes for anticipated growth in global writing systems.^[22] This hierarchical distribution balances immediate compatibility with forward-looking extensibility, with thematic groupings within planes aiding developers in handling diverse linguistic data.^[3]

Implementation

Software handling

Software systems query the Unicode block property of a code point to determine its grouping for processing purposes. In Java, the Character.UnicodeBlock class provides static methods like getBlock(char c) and getBlock(int codePoint) to retrieve the block instance containing a given character or code point, with block names derived from the Unicode standard's Blocks.txt file.^[29] Similarly, Python's unicodedata module includes the block(code) function, which returns the name of the block for a specified Unicode code point as a string.^[30] In text processing applications, Unicode blocks facilitate filtering and validation tasks, such as restricting input to specific scripts for localization. For instance, developers can filter text to include only characters from CJK Unified Ideographs blocks to support East Asian languages, ensuring compatibility in multilingual interfaces or data validation routines.^[31] This approach is particularly useful in natural language processing pipelines, where block-based scoring helps filter corpora by excluding unwanted scripts or symbols.^[32] Unicode blocks contribute to encoding efficiency by enabling compact data representations in storage and transmission. In font subsetting, blocks allow tools to generate reduced-size font files containing only glyphs for relevant code point ranges, such as Basic Latin for English text, thereby optimizing load times and memory usage in web and application contexts.^[33] For databases, block-aware indexing supports efficient querying and collation of internationalized text, reducing overhead in multilingual storage systems.^[34] A key challenge in software handling of Unicode blocks is managing unassigned code points, which are reserved within blocks for future allocations to maintain forward compatibility. Implementations must assign default properties to these points—such as neutral category or no-break behavior—to ensure algorithms like normalization or segmentation function correctly without errors, even as Unicode evolves.^[35] The Blocks.txt file from the Unicode Character Database serves as the authoritative source for defining these ranges, guiding software updates to accommodate new assignments.^[2]

Font and rendering support

Font formats such as OpenType and TrueType rely on the 'cmap' table to map Unicode code points to glyph indices, with subtables organized by encoding schemes that often align with Unicode blocks for efficient glyph lookup.^[36] Format 13 subtables in the 'cmap' table are designed specifically for scenarios where an entire Unicode block maps to a single glyph, commonly used in last-resort fonts to provide basic visual representation for unsupported ranges.^[37] Rendering pipelines in modern systems leverage Unicode properties, including the Script property, to inform script detection and text shaping. HarfBuzz, an open-source shaping engine, uses the normative Unicode Script property to assign scripts and apply OpenType features for complex layouts, such as handling the 'COMMON' script with heuristics for accurate glyph positioning and ligature formation; while blocks aid in grouping code points, script assignment is determined by Unicode data files. Similarly, Apple's Core Text framework employs Unicode properties, particularly the Script property, to perform script-specific shaping, integrating with font tables for bidirectional text and contextual forms.^[26]^[38] Font coverage for Unicode blocks is often partial, as comprehensive support across all 300+ blocks in Unicode 17.0 would exceed practical file size limits for most typefaces. The Noto font family, developed by Google, addresses this by providing glyphs for characters in over 150 blocks through its modular design, covering more than 77,000 characters while prioritizing commonly used scripts.^[39] For rare or specialized blocks, such as historical scripts or symbols, rendering engines implement fallback strategies, automatically substituting glyphs from secondary fonts to avoid blank or tofu representations.^[40] Tools for debugging font and rendering support include Unicode block viewers integrated into development environments. In Visual Studio Code, extensions like Unicode Code Point display code point details, including block affiliations, in the status bar to help identify missing glyphs and test rendering across fonts.^[41]

Current and historical blocks

List of blocks in Unicode 17.0

Unicode 17.0, released on September 9, 2025, partitions the Unicode code space of 1,114,112 possible code points into 346 contiguous blocks, each dedicated to characters from specific scripts, symbol categories, or reserved areas. These blocks facilitate efficient implementation in software and fonts by grouping related characters, with sizes varying from 16 code points (e.g., for small supplements) to 65,536 (full planes for private use). The distribution spans 17 planes, predominantly in Planes 0, 1, and 2, covering modern and historical scripts, emojis, and technical symbols.^[22] This version adds 4,803 new characters for a total of 159,801 assigned characters.^[22] This version introduces eight new blocks, including four for recently encoded scripts: Sidetic (U+10940–U+1095F, 32 code points, Plane 1, Sidetic script), Tolong Siki (U+11DB0–U+11DEF, 64 code points, Plane 1, Tolong Siki script), Beria Erfe (U+16EA0–U+16EDF, 64 code points, Plane 1, Beria Erfe script), and Tai Yo (U+1E6C0–U+1E6FF, 64 code points, Plane 1, Tai Yo script). Other new blocks are Sharada Supplement (U+11B60–U+11B7F, 32 code points, Plane 1, Sharada), Tangut Components Supplement (U+18D80–U+18DFF, 128 code points, Plane 1, Tangut), Miscellaneous Symbols Supplement (U+1CEC0–U+1CEFF, 64 code points, Plane 1, symbols including asteroids), and CJK Unified Ideographs Extension J (U+323B0–U+3347F, 4,352 code points, Plane 3, CJK).^[22] Additionally, existing blocks like CJK Unified Ideographs and various emoji ranges received expansions, such as new characters in Emoji 17.0 for diverse representations.^[42] Key examples illustrate the diversity: the Basic Latin block (U+0000..U+007F, 128 code points, Plane 0) supports core ASCII characters for the Latin script; the CJK Unified Ideographs block (U+4E00..U+9FFF, 20,992 code points, Plane 0) encodes shared Han ideographs for Chinese, Japanese, and Korean; and emoji blocks like Miscellaneous Symbols and Pictographs (U+1F300..U+1F5FF, 768 code points, Plane 1) include thousands of pictorial symbols, with recent additions enhancing inclusivity.^[2] Within blocks, gaps exist where code points remain unassigned or reserved, as denoted in official Unicode data files (e.g., non-character code points or future allocations).^[3] The comprehensive reference is provided in the table below, listing all blocks by ascending code point order. The "Primary Script/Association" column indicates the main category, such as a script name or symbol type; many blocks support multiple uses, but the primary is noted for context. This table includes selected blocks for brevity; the full list of 346 blocks is available in the official Unicode Character Database.^[2]

Block Name	Hex Range	Size	Plane	Primary Script/Association
Basic Latin	U+0000..U+007F	128	0	Latin
Latin-1 Supplement	U+0080..U+00FF	128	0	Latin
Latin Extended-A	U+0100..U+017F	128	0	Latin
Latin Extended-B	U+0180..U+024F	208	0	Latin
IPA Extensions	U+0250..U+02AF	96	0	Phonetic (IPA)
Spacing Modifier Letters	U+02B0..U+02FF	80	0	Modifier letters
Combining Diacritical Marks	U+0300..U+036F	112	0	Diacritics
Greek and Coptic	U+0370..U+03FF	144	0	Greek
Cyrillic	U+0400..U+04FF	256	0	Cyrillic
Cyrillic Supplement	U+0500..U+052F	48	0	Cyrillic
Armenian	U+0530..U+058F	96	0	Armenian
Hebrew	U+0590..U+05FF	112	0	Hebrew
Arabic	U+0600..U+06FF	256	0	Arabic
Syriac	U+0700..U+074F	80	0	Syriac
Arabic Supplement	U+0750..U+077F	48	0	Arabic
Thaana	U+0780..U+07BF	64	0	Thaana
NKo	U+07C0..U+07FF	64	0	N'Ko
Samaritan	U+0800..U+083F	64	0	Samaritan
Mandaic	U+0840..U+085F	32	0	Mandaic
Syriac Supplement	U+0860..U+086F	16	0	Syriac
Arabic Extended-B	U+0870..U+089F	48	0	Arabic
Arabic Extended-A	U+08A0..U+08FF	96	0	Arabic
Devanagari	U+0900..U+097F	128	0	Devanagari
Bengali	U+0980..U+09FF	128	0	Bengali
Gurmukhi	U+0A00..U+0A7F	128	0	Gurmukhi
Gujarati	U+0A80..U+0AFF	128	0	Gujarati
Oriya	U+0B00..U+0B7F	128	0	Odia
Tamil	U+0B80..U+0BFF	128	0	Tamil
Telugu	U+0C00..U+0C7F	128	0	Telugu
Kannada	U+0C80..U+0CFF	128	0	Kannada
Malayalam	U+0D00..U+0D7F	128	0	Malayalam
Sinhala	U+0D80..U+0DFF	128	0	Sinhala
Thai	U+0E00..U+0E7F	128	0	Thai
Lao	U+0E80..U+0EFF	128	0	Lao
Tibetan	U+0F00..U+0FFF	256	0	Tibetan
Myanmar	U+1000..U+109F	160	0	Myanmar
Georgian	U+10A0..U+10FF	96	0	Georgian
Hangul Jamo	U+1100..U+11FF	256	0	Hangul
Ethiopic	U+1200..U+137F	384	0	Ethiopic
Ethiopic Supplement	U+1380..U+139F	32	0	Ethiopic
Cherokee	U+13A0..U+13FF	96	0	Cherokee
Unified Canadian Aboriginal Syllabics	U+1400..U+167F	640	0	Canadian Aboriginal
Ogham	U+1680..U+169F	32	0	Ogham
Runic	U+16A0..U+16FF	96	0	Runic
Tagalog	U+1700..U+171F	32	0	Tagalog
Hanunoo	U+1720..U+173F	32	0	Hanunoo
Buhid	U+1740..U+175F	32	0	Buhid
Tagbanwa	U+1760..U+177F	32	0	Tagbanwa
Khmer	U+1780..U+17FF	128	0	Khmer
Mongolian	U+1800..U+18AF	176	0	Mongolian
Unified Canadian Aboriginal Syllabics Extended	U+18B0..U+18FF	80	0	Canadian Aboriginal
Limbu	U+1900..U+194F	80	0	Limbu
Tai Le	U+1950..U+197F	48	0	Tai Le
New Tai Lue	U+1980..U+19DF	96	0	New Tai Lue
Khmer Symbols	U+19E0..U+19FF	32	0	Khmer
Buginese	U+1A00..U+1A1F	32	0	Buginese
Tai Tham	U+1A20..U+1AAF	144	0	Tai Tham
Combining Diacritical Marks Extended	U+1AB0..U+1AFF	80	0	Diacritics
Balinese	U+1B00..U+1B7F	128	0	Balinese
Sundanese	U+1B80..U+1BBF	64	0	Sundanese
Batak	U+1BC0..U+1BFF	64	0	Batak
Lepcha	U+1C00..U+1C4F	80	0	Lepcha
Ol Chiki	U+1C50..U+1C7F	48	0	Ol Chiki
Cyrillic Extended-C	U+1C80..U+1C8F	16	0	Cyrillic
Georgian Extended	U+1C90..U+1CBF	48	0	Georgian
Sundanese Supplement	U+1CC0..U+1CCF	16	0	Sundanese
Vedic Extensions	U+1CD0..U+1CFF	48	0	Devanagari (Vedic)
Phonetic Extensions	U+1D00..U+1D7F	128	0	Phonetic
Phonetic Extensions Supplement	U+1D80..U+1DBF	64	0	Phonetic
Combining Diacritical Marks Supplement	U+1DC0..U+1DFF	64	0	Diacritics
Latin Extended Additional	U+1E00..U+1EFF	256	0	Latin
Greek Extended	U+1F00..U+1FFF	256	0	Greek
General Punctuation	U+2000..U+206F	112	0	Punctuation
Superscripts and Subscripts	U+2070..U+209F	48	0	Symbols
Currency Symbols	U+20A0..U+20CF	48	0	Currency
Combining Diacritical Marks for Symbols	U+20D0..U+20FF	48	0	Diacritics
Letterlike Symbols	U+2100..U+214F	80	0	Symbols
Number Forms	U+2150..U+218F	64	0	Numbers
Arrows	U+2190..U+21FF	112	0	Arrows
Mathematical Operators	U+2200..U+22FF	256	0	Math
Miscellaneous Technical	U+2300..U+23FF	256	0	Technical
Control Pictures	U+2400..U+243F	64	0	Controls
Optical Character Recognition	U+2440..U+245F	32	0	OCR
Enclosed Alphanumerics	U+2460..U+24FF	160	0	Enclosed
Box Drawing	U+2500..U+257F	128	0	Box drawing
Block Elements	U+2580..U+259F	32	0	Blocks
Geometric Shapes	U+25A0..U+25FF	96	0	Geometric
Miscellaneous Symbols	U+2600..U+26FF	256	0	Symbols
Dingbats	U+2700..U+27BF	192	0	Dingbats
Miscellaneous Mathematical Symbols-A	U+27C0..U+27EF	48	0	Math
Supplemental Arrows-A	U+27F0..U+27FF	16	0	Arrows
Braille Patterns	U+2800..U+28FF	256	0	Braille
Supplemental Arrows-B	U+2900..U+297F	128	0	Arrows
Miscellaneous Mathematical Symbols-B	U+2980..U+29FF	128	0	Math
Supplemental Mathematical Operators	U+2A00..U+2AFF	256	0	Math
Miscellaneous Symbols and Arrows	U+2B00..U+2BFF	256	0	Symbols/Arrows
Glagolitic	U+2C00..U+2C5F	96	0	Glagolitic
Latin Extended-C	U+2C60..U+2C7F	32	0	Latin
Coptic	U+2C80..U+2CFF	128	0	Coptic
Georgian Supplement	U+2D00..U+2D2F	48	0	Georgian
Tifinagh	U+2D30..U+2D7F	80	0	Tifinagh
Ethiopic Extended	U+2D80..U+2DDF	96	0	Ethiopic
Cyrillic Extended-A	U+2DE0..U+2DFF	32	0	Cyrillic
Supplemental Punctuation	U+2E00..U+2E7F	128	0	Punctuation
CJK Radicals Supplement	U+2E80..U+2EFF	128	0	CJK
Kangxi Radicals	U+2F00..U+2FDF	214	0	CJK
Ideographic Description Characters	U+2FF0..U+2FFF	16	0	CJK
CJK Symbols and Punctuation	U+3000..U+303F	64	0	CJK
Hiragana	U+3040..U+309F	96	0	Japanese (Hiragana)
Katakana	U+30A0..U+30FF	96	0	Japanese (Katakana)
Bopomofo	U+3100..U+312F	48	0	Bopomofo
Hangul Compatibility Jamo	U+3130..U+318F	96	0	Hangul
Kanbun	U+3190..U+319F	16	0	Kanbun
Bopomofo Extended	U+31A0..U+31BF	32	0	Bopomofo
CJK Strokes	U+31C0..U+31EF	48	0	CJK
Katakana Phonetic Extensions	U+31F0..U+31FF	16	0	Katakana
Enclosed CJK Letters and Months	U+3200..U+32FF	256	0	Enclosed CJK
CJK Compatibility	U+3300..U+33FF	256	0	CJK compatibility
CJK Unified Ideographs Extension A	U+3400..U+4DBF	6,592	0	CJK
Yijing Hexagram Symbols	U+4DC0..U+4DFF	64	0	Yijing
CJK Unified Ideographs	U+4E00..U+9FFF	20,992	0	CJK
Yi Syllables	U+A000..U+A48F	1,168	0	Yi
Yi Radicals	U+A490..U+A4CF	64	0	Yi
Lisu	U+A4D0..U+A4FF	48	0	Lisu
Vai	U+A500..U+A63F	352	0	Vai
Cyrillic Extended-B	U+A640..U+A69F	96	0	Cyrillic
Bamum	U+A6A0..U+A6FF	96	0	Bamum
Modifier Tone Letters	U+A700..U+A71F	32	0	Phonetic
Latin Extended-D	U+A720..U+A7FF	240	0	Latin
Syloti Nagri	U+A800..U+A82F	48	0	Syloti Nagri
Common Indic Number Forms	U+A830..U+A83F	16	0	Indic numbers
Phags-pa	U+A840..U+A87F	64	0	Phags-pa
Saurashtra	U+A880..U+A8DF	96	0	Saurashtra
Devanagari Extended	U+A8E0..U+A8FF	32	0	Devanagari
Kayah Li	U+A900..U+A92F	48	0	Kayah Li
Rejang	U+A930..U+A95F	48	0	Rejang
Hangul Jamo Extended-A	U+A960..U+A97F	32	0	Hangul
Javanese	U+A980..U+A9DF	96	0	Javanese
Myanmar Extended-B	U+A9E0..U+A9FF	32	0	Myanmar
Cham	U+AA00..U+AA5F	96	0	Cham
Myanmar Extended-A	U+AA60..U+AA7F	32	0	Myanmar
Tai Viet	U+AA80..U+AADF	120	0	Tai Viet
Meetei Mayek Extensions	U+AAE0..U+AAFF	32	0	Meetei Mayek
Ethiopic Extended-A	U+AB00..U+AB2F	48	0	Ethiopic
Latin Extended-E	U+AB30..U+AB6F	64	0	Latin
Meetei Mayek	U+ABC0..U+ABFF	64	0	Meetei Mayek
Hangul Syllables	U+AC00..U+D7AF	11,172	0	Hangul
Hangul Jamo Extended-B	U+D7B0..U+D7FF	80	0	Hangul
High Surrogates	U+D800..U+DB7F	2048	0	Surrogates
High Private Use Surrogates	U+DB80..U+DBFF	128	0	Surrogates
Low Surrogates	U+DC00..U+DFFF	2048	0	Surrogates
Private Use Area	U+E000..U+F8FF	6,400	0	Private use
CJK Compatibility Ideographs	U+F900..U+FAFF	256	0	CJK compatibility
Alphabetic Presentation Forms	U+FB00..U+FB4F	80	0	Presentation forms
Arabic Presentation Forms-A	U+FB50..U+FDFF	688	0	Arabic
Variation Selectors	U+FE00..U+FE0F	16	0	Variation selectors
Vertical Forms	U+FE10..U+FE1F	16	0	Vertical
Combining Half Marks	U+FE20..U+FE2F	16	0	Half marks
CJK Compatibility Forms	U+FE30..U+FE4F	32	0	CJK compatibility
Small Form Variants	U+FE50..U+FE6F	32	0	Variants
Arabic Presentation Forms-B	U+FE70..U+FEFF	144	0	Arabic
Halfwidth and Fullwidth Forms	U+FF00..U+FFEF	240	0	Halfwidth/fullwidth
Specials	U+FFF0..U+FFFF	16	0	Special
Linear B Syllabary	U+10000..U+1007F	128	1	Linear B
Linear B Ideograms	U+10080..U+100FF	128	1	Linear B
Aegean Numbers	U+10100..U+1013F	64	1	Aegean
Ancient Greek Numbers	U+10140..U+1018F	80	1	Greek
Ancient Symbols	U+10190..U+101CF	64	1	Ancient
Phaistos Disc	U+101D0..U+101FF	48	1	Phaistos
Lycian	U+10280..U+1029F	32	1	Lycian
Carian	U+102A0..U+102DF	64	1	Carian
Coptic Epact Numbers	U+102E0..U+102FF	32	1	Coptic
Old Italic	U+10300..U+1032F	48	1	Italic
Gothic	U+10330..U+1034F	32	1	Gothic
Old Permic	U+10350..U+1037F	48	1	Permic
Ugaritic	U+10380..U+1039F	32	1	Ugaritic
Old Persian	U+103A0..U+103DF	64	1	Old Persian
Deseret	U+10400..U+1044F	80	1	Deseret
Shavian	U+10450..U+1047F	48	1	Shavian
Osmanya	U+10480..U+104AF	48	1	Osmanya
Osage	U+104B0..U+104FF	80	1	Osage
Elbasan	U+10500..U+1052F	48	1	Elbasan
Caucasian Albanian	U+10530..U+1056F	64	1	Caucasian Albanian
Vithkuqi	U+10570..U+105BF	80	1	Vithkuqi
Linear A	U+10600..U+1077F	384	1	Linear A
Cypriot Syllabary	U+10800..U+1083F	64	1	Cypriot
Imperial Aramaic	U+10840..U+1085F	32	1	Aramaic
Palmyrene	U+10860..U+1087F	32	1	Palmyrene
Nabataean	U+10880..U+108AF	48	1	Nabataean
Hatran	U+108E0..U+108FF	32	1	Hatran
Phoenician	U+10900..U+1091F	32	1	Phoenician
Lydian	U+10920..U+1093F	32	1	Lydian
Sidetic	U+10940..U+1095F	32	1	Sidetic
Meroitic Hieroglyphs	U+10980..U+1099F	32	1	Meroitic
Meroitic Cursive	U+109A0..U+109FF	96	1	Meroitic
Kharoshthi	U+10A00..U+10A5F	96	1	Kharoshthi
Old South Arabian	U+10A60..U+10A7F	32	1	South Arabian
Old North Arabian	U+10A80..U+10A9F	32	1	North Arabian
Manichaean	U+10AC0..U+10AFF	64	1	Manichaean
Avestan	U+10B00..U+10B3F	64	1	Avestan
Inscriptional Parthian	U+10B40..U+10B5F	32	1	Parthian
Inscriptional Pahlavi	U+10B60..U+10B7F	32	1	Pahlavi
Psalter Pahlavi	U+10B80..U+10BAF	48	1	Pahlavi
Old Turkic	U+10C00..U+10C4F	80	1	Old Turkic
Old Hungarian	U+10C80..U+10CFF	128	1	Hungarian
Hanifi Rohingya	U+10D00..U+10D3F	64	1	Hanifi Rohingya
Yezidi	U+10E60..U+10E7F	32	1	Yezidi
Cypro-Minoan	U+12F90..U+12FFF	112	1	Cypro-Minoan
Egyptian Hieroglyphs	U+13000..U+1342F	1,072	1	Egyptian Hieroglyphs
Anatolian Hieroglyphs	U+14400..U+1467F	582	1	Anatolian Hieroglyphs
Bamum Supplement	U+16800..U+16A3F	352	1	Bamum
Mro	U+16A40..U+16A6F	48	1	Mro
Tangsa	U+16A70..U+16ACF	96	1	Tangsa
Bassa Vah	U+16AD0..U+16AFF	48	1	Bassa Vah
Pahawh Hmong	U+16B00..U+16B8F	144	1	Pahawh Hmong
Medefaidrin	U+16E40..U+16E7F	64	1	Medefaidrin
Miao	U+16F00..U+16F9F	160	1	Miao
Beria Erfe	U+16EA0..U+16EDF	64	1	Beria Erfe
Ideographic Symbols and Punctuation	U+16FE0..U+16FFF	32	1	CJK symbols
Tangut	U+17000..U+187FF	6,144	1	Tangut
Khitan Small Script	U+18A00..U+18AFF	256	1	Khitan
Tangut Components	U+18800..U+18AFF	768	1	Tangut
Tangut Components Supplement	U+18D80..U+18DFF	128	1	Tangut
Kana Supplement	U+1B000..U+1B0FF	256	1	Kana
Kana Extended-A	U+1B100..U+1B12F	48	1	Kana
Nushu	U+1B170..U+1B2FF	240	1	Nushu
Duployan	U+1BC00..U+1BC9F	160	1	Duployan
Shorthand Format Controls	U+1BCA0..U+1BCAF	16	1	Shorthand
Znamenny Musical Notation	U+1CF00..U+1CF9F	160	1	Musical notation
Byzantine Musical Symbols	U+1D000..U+1D0FF	256	1	Musical symbols
Musical Symbols	U+1D100..U+1D1FF	256	1	Musical symbols
Ancient Greek Musical Notation	U+1D200..U+1D24F	80	1	Musical notation
Tai Xuan Jing Symbols	U+1D300..U+1D35F	96	1	Symbols
Counting Rod Numerals	U+1D360..U+1D37F	32	1	Numerals
Mayan Numerals	U+1D360..U+1D37F	32	1	Mayan
Kaktovik Numerals	U+1D2E0..U+1D2FF	32	1	Numerals
Mathematical Alphanumeric Symbols	U+1D400..U+1D7FF	1024	1	Math
Sutton SignWriting	U+1D800..U+1DAAF	666	1	SignWriting
Latin Extended-F	U+1DF00..U+1DFFF	256	1	Latin
Nandinagari	U+1E2F0..U+1E2FF	16	1	Nandinagari
Wancho	U+1E2C0..U+1E2EF	48	1	Wancho
Tai Yo	U+1E6C0..U+1E6FF	64	1	Tai Yo
Ethiopic Extended-B	U+1E7E0..U+1E7FF	32	1	Ethiopic
Mende Kikakui	U+1E800..U+1E8DF	240	1	Mende Kikakui
Adlam	U+1E900..U+1E95F	96	1	Adlam
Indic Siyaq Numbers	U+1EC90..U+1ECBF	48	1	Indic numbers
Ottoman Siyaq Numbers	U+1ED01..U+1ED2D	45	1	Ottoman numbers
Arabic Mathematical Alphabetic Symbols	U+1EE00..U+1EEFF	256	1	Arabic math
Mahjong Tiles	U+1F000..U+1F02F	48	1	Mahjong
Domino Tiles	U+1F030..U+1F09F	112	1	Dominoes
Playing Cards	U+1F0A0..U+1F0FF	96	1	Cards
Enclosed Alphanumeric Supplement	U+1F100..U+1F1FF	256	1	Enclosed
Enclosed Ideographic Supplement	U+1F200..U+1F2FF	256	1	Enclosed CJK
Miscellaneous Symbols and Pictographs	U+1F300..U+1F5FF	768	1	Emojis/Symbols
Emoticons	U+1F600..U+1F64F	80	1	Emojis
Ornamental Dingbats	U+1F650..U+1F67F	48	1	Dingbats
Transport and Map Symbols	U+1F680..U+1F6FF	128	1	Symbols
Alchemical Symbols	U+1F700..U+1F77F	128	1	Alchemical
Geometric Shapes Extended	U+1F780..U+1F7FF	128	1	Geometric
Supplemental Arrows-C	U+1F800..U+1F8FF	256	1	Arrows
Supplemental Symbols and Pictographs	U+1F900..U+1F9FF	256	1	Emojis/Symbols
Chess Symbols	U+1FA00..U+1FA6F	112	1	Chess
Symbols and Pictographs Extended-A	U+1FA70..U+1FAFF	144	1	Emojis/Symbols
Miscellaneous Symbols Supplement	U+1CEC0..U+1CEFF	64	1	Symbols
Sharada Supplement	U+11B60..U+11B7F	32	1	Sharada
CJK Unified Ideographs Extension B	U+20000..U+2A6DF	42,720	2	CJK
CJK Unified Ideographs Extension C	U+2A700..U+2B73F	4,148	2	CJK
CJK Unified Ideographs Extension D	U+2B740..U+2B81F	222	2	CJK
CJK Unified Ideographs Extension E	U+2B820..U+2CEAF	5,876	2	CJK
CJK Unified Ideographs Extension F	U+2CEB0..U+2EBEF	7,473	2	CJK
CJK Compatibility Ideographs Supplement	U+2F800..U+2FA1F	544	2	CJK compatibility
CJK Unified Ideographs Extension G	U+30000..U+3134F	4,939	3	CJK
CJK Unified Ideographs Extension J	U+323B0..U+3347F	4,352	3	CJK
Tags	U+E0000..U+E007F	128	14	Tags
Variation Selectors Supplement	U+E0100..U+E01EF	240	14	Variation selectors
Supplementary Private Use Area-A	U+F0000..U+FFFFF	65,536	15	Private use
Supplementary Private Use Area-B	U+100000..U+10FFFF	65,536	16	Private use
No Block	(various unassigned ranges across Planes 3–13 and 17)	Varies	Varies	Unassigned

Moved and reallocated blocks

In the early development of the Unicode Standard, prior to version 2.0, certain character blocks underwent relocation to better accommodate encoding models and compatibility requirements. One notable example is the Tibetan script, which was initially allocated in Unicode 1.0 to the range U+1000–U+105F using a virama-based model that proved inadequate for distinguishing certain linguistic features.^[43] This block was removed in Unicode 1.1 and reintroduced in Unicode 2.0 at U+0F00–U+0FFF with a revised encoding approach that supported stacked consonants and improved representation of Tibetan orthography.^[44] Similarly, the Hangul Syllables block, present in Unicode 1.1 at U+3400–U+4DFF (including supplementary areas), was relocated in Unicode 2.0 to U+AC00–U+D7AF to align with algorithmic generation of the 11,172 precomposed syllables and enhance compatibility with Korean standards like KS X 1001.^[45] Other reallocations in this period included adjustments to the Private Use Areas, which were refined across Unicode 1.0 to 2.0 to standardize the BMP range at U+E000–U+F8FF while reserving supplementary planes for future expansion, ensuring interoperability without disrupting legacy private assignments.^[8] Early CJK extensions also saw shifts for compatibility; for instance, the CJK Compatibility Ideographs block (U+F900–U+FAFF) was introduced in Unicode 1.0.1 with 302 characters to provide round-trip mappings to variant forms in legacy East Asian encodings like JIS X 0208, preventing data loss during conversion.^[46] To maintain long-term stability, the Unicode Consortium established a policy starting with version 2.0 that prohibits the movement or removal of encoded characters, guaranteeing fixed code point assignments thereafter.^[47] This encoding stability policy ensures that once assigned, ranges remain immutable, supporting reliable software development and data preservation across versions.^[47] These historical shifts necessitated compatibility mappings in implementations to handle legacy data; for example, software must map pre-2.0 Hangul code points to their modern equivalents using tables like those in Unicode Technical Note #60, which detail transformations for the relocated syllables to avoid rendering errors in mixed-version environments.^[48] Such measures preserve the integrity of archived texts while adhering to the post-2.0 stability framework.

References

[1]
None
Below is a merged summary of Unicode Blocks from Section 3.4, Characters and Encoding (Unicode Standard 17.0), consolidating all information from the provided segments into a concise yet comprehensive response. To handle the volume of details efficiently, I will use a structured format with text for the core summary and a table in CSV format for notable aspects, quotes, and URLs, ensuring all information is retained. The total number of blocks (336 in Unicode 17.0) is included where explicitly mentioned or inferred from reliable sources like Unicode.org.
[2]
Blocks.txt - Unicode
# Note: When comparing block names, casing, whitespace, hyphens, # and underbars are ignored. # For example, "Latin Extended-A" and "latin extended a" are ...
[3]
UAX #44: Unicode Character Database
Aug 27, 2025 · This annex provides the core documentation for the Unicode Character Database (UCD). It describes the layout and organization of the Unicode Character Database.
[4]
https://www.unicode.org/reports/tr44/#Blocks
[5]
About the Code Charts - Unicode
Characters are organized into related groups called blocks (see D10b in Section 3.4, Characters and Encoding). Many scripts are fully contained within a single ...
[6]
Unicode® Code Charts Help and Links
The Unicode character code charts are divided into character blocks, such that many scripts require the use of several blocks. In such cases, all blocks related ...
[7]
https://www.unicode.org/reports/tr18/
[8]
Unicode 1.0
Jul 15, 2015 · 1.0 Chapters. 1, Introduction. 2, General Principles. 3, Character Blocks. 3.1, General Scripts. 3.2, Symbols. 3.3, CJK Phonetics and Symbols.
[9]
[PDF] 1.0 Unification of the Unicode Standard - and ISO 10646
Meetings in October of 1991 finally resulted in mutually acceptable changes to both Unicode and the ISO Draft International Standard (DIS) 10646 which merged ...
[10]
[PDF] To the BMP and beyond! - Unicode
The ISO 10646 standard defines the UCS-2 encoding form. For all practical purposes, it is UTF-16 restricted to the BMP, i.e. to those scalar values in the ...Missing: rationale | Show results with:rationale
[11]
[PDF] Unicode Standard Version 1.0
A complete Coptic set would be obtained by rendering the whole Greek alphabet in that same style. Encoding Structure. The Unicode block for the Greek script is ...
[12]
Chapter 2 – Unicode 16.0.0
For more about deprecated characters, see D13 in Section 3.4, Characters and Encoding. Unicode character names are also never changed, so that they can be used ...
[13]
https://www.unicode.org/versions/Unicode1.0.0/Preface.pdf
[14]
UTC Document Register 1991 - Unicode
UTC/1991-003R1, Unicode: A Technical Introduction, 28 January 1991, Bill ... My (Updated) Comments on Unicode 1.0 draft (12/90) and Conformance (1/91) ...
[15]
https://www.unicode.org/versions/Unicode1.1.0/
[16]
https://www.unicode.org/versions/components-2.0.0.html
[17]
Blocks-1.txt - Unicode
... Number Forms 2190; 21FF; Arrows 2200; 22FF; Mathematical Operators 2300 ... Block Elements 25A0; 25FF; Geometric Shapes 2600; 26FF; Miscellaneous ...
[18]
Unicode 3.0.0
### Summary of Key Changes in Unicode 3.0
[19]
Unicode 5.0.0
### Summary of Key Changes in Unicode 5.0
[20]
Unicode 6.0.0
### Summary of Key Changes in Unicode 6.0
[21]
Unicode 8.0.0
### Summary of New Scripts and Blocks in Unicode 8.0 (Underrepresented Scripts)
[22]
Unicode 17.0.0
Sep 9, 2025 · This page summarizes the important changes for the Unicode Standard, Version 17.0.0. This version supersedes all previous versions of the Unicode Standard.
[23]
None
### Summary of Blocks.txt (Unicode 17.0.0)
[24]
Submitting Character Proposals - Unicode
Those considering submitting a proposal should first determine whether or not a particular script or character has already been proposed. Please see the ...General Information · Proposal Guidelines · Proposal Review Process
[25]
PropertyValueAliases.txt - Unicode
... property values used in the UCD. # These names can be used for XML formats of UCD data, for regular-expression # property tests, and other programmatic ...
[26]
UAX #24: Unicode Script Property
### Summary of Unicode Script Property and Relation to Unicode Blocks
[27]
Plane “Basic Multilingual Plane” - Compart
The Basic Multilingual Plane (BMP) ranges from U+0000 to U+FFFF, has 163 blocks, 63,951 assigned characters, and 55,503 defined characters.Unicode Block “Basic Latin” · Unicode Block “Latin-1... · Unicode Block “Specials”
[28]
Unicode Supplementary Multilingual Plane (SMP) Language blocks
Supplementary Multilingual Plane (SMP) of Unicode includes 134 Blocks covering code points from U+10000 to U+1FFFF. This table lists 111 blocks that relate ...
[29]
Character.UnicodeBlock (Java Platform SE 8 ) - Oracle Help Center
Character blocks generally define characters used for a specific script or purpose. A character is contained by at most one Unicode block. Since: 1.2. Field ...
[30]
unicodedata — Unicode Database — Python 3.14.0 documentation
This module provides access to the Unicode Character Database (UCD) which defines character properties for all Unicode characters.Missing: block | Show results with:block
[31]
The Unicode standard - Globalization | Microsoft Learn
Feb 2, 2024 · The Unicode Standard contains strict rules for determining the equivalence of composed characters to combining character sequences.Missing: rationale early goals extensibility
[32]
uniblock: Scoring and Filtering Corpus with Unicode Block Information
In this paper, we introduce a simple statistical method, uniblock, to overcome this problem. For each sentence, uniblock generates a fixed-size feature vector ...
[33]
Optimize font files with font subsetting - Unity - Manual
It allows you to create a smaller font file by including only the specific characters or glyphs you need, which helps reduce file size and improve performance.
[34]
UTR#17: Unicode Character Encoding Model
Sep 20, 2022 · Summary. This document clarifies a number of the terms used to describe character encodings, and where the different forms of Unicode fit in ...
[35]
Unassigned Characters - Unicode
To work properly in implementations, unassigned code points must be given default properties as if they were characters, since various algorithms require ...
[36]
cmap - Character To Glyph Index Mapping Table (OpenType 1.9.1)
May 29, 2024 · This table defines the mapping of character codes to a default glyph index. Different subtables may be defined that each contain mappings for different ...Missing: blocks | Show results with:blocks
[37]
Character to Glyph Mapping Table - TrueType Reference Manual
The 'cmap' table maps character codes to glyph indices. The choice of encoding for a particular font is dependent upon the conventions used by the intended ...
[38]
Noto Home - Google Fonts
Noto is a collection of high-quality fonts in more than 1000 languages and over 150 writing systems.Missing: block coverage
[39]
Font fallback deep dive | Raph Levien's blog
Apr 4, 2019 · One of the main functions of skribo is “font fallback,” or choosing fonts to render an arbitrary string of text. This post is a deep dive into the topic.<|control11|><|separator|>
[40]
Unicode Code Point - Visual Studio Marketplace
This extension shows character code point in status bar. Features Shows character code point in hexadecimal or decimal notation.
[41]
Unicode 17.0 Character Code Charts
Unicode 17.0 includes European scripts, symbols & punctuation, and a name index. Scripts include Armenian, Cyrillic, Greek, Latin, and more. Symbols include ...Help and Links · Name Index · Unihan Database Lookup
[42]
[PDF] Unicode Version 1.0 Appendix E: Table of Contents to Volume 1
Tibetan U+1000→ U+105F. Hebrew U+0590→ U+05FF. Arabic U+0600→ U+06FF ... Pictures for Control Codes U+2400→→ U+243F. Optical Character Recognition U+ ...
[43]
http://www.unicode.org/versions/Unicode1.0.0/V2appE.pdf
[44]
Unicode Mail List Archive: Re-assignment of Hangul characters
Sep 27, 1995 · Re-assignment of Hangul characters · 1. Can anyone confirm that the allocation of U+AC00 onwards for Unicode 2.0 is set in concrete (or, how firm ...
[45]
CJK Compatibility Ideographs - Unicode
This block, despite its name, contains a number of unified CJK ideographs. Each is also individually identified by an annotation.
[46]
PD UTS: Unicode Set Notation
The encoding stability policy, applicable to Unicode 2.0+, states that. Once a character is encoded, it will not be moved or removed. This policy implies ...
[47]
UTN #60: Legacy Hangul Syllables Mappings - Unicode
Sep 5, 2024 · Summary. This Unicode Technical Note provides historical mappings between the Unicode code points for the 11,172 characters in the Hangul ...Missing: Tibetan pre- 2.0