Fact-checked by Grok 2 weeks ago

Numeric character reference

A numeric character reference (NCR) is a markup construct used in SGML and SGML-derived languages such as HTML and XML to represent a specific Unicode character by referencing its code point with a numeric value, allowing the inclusion of characters that may be reserved, difficult to input directly, or unavailable in certain character sets.^[1]^[2] In HTML, as defined in the WHATWG HTML Living Standard, an NCR begins with the sequence &# followed by a decimal integer (optionally prefixed with x for hexadecimal notation) and ends with a semicolon (;), such as A for the Latin capital letter A (U+0041) or A for the same character in hexadecimal form.^[1] The parser converts the numeric value to the corresponding Unicode code point, emitting it as a character token, while handling invalid references—such as those exceeding U+10FFFF or representing surrogate code points—by substituting the replacement character U+FFFD.^[1] Semicolons are required for conformance, though their absence triggers a parse error but does not prevent processing in tolerant parsers.^[1] Similarly, in XML 1.0 as specified by the W3C, NCRs follow the production CharRef ::= '&#' [0-9]+ ';' for decimal or CharRef ::= '&#x' [0-9a-fA-F]+ ';' for hexadecimal, ensuring representation of legal XML characters within the defined ranges (e.g., U+0020 to U+10FFFF, excluding certain controls).^[2] These references must denote valid characters per the XML Char production and are expanded immediately by processors into the referenced character data, facilitating interoperability across diverse encoding environments.^[2] Both standards emphasize NCRs as a fundamental mechanism for escaping special characters like < (< or <) and & (& or &) in markup, distinct from named character references that use predefined entity names.^[1]^[2]

Syntax and Formats

Decimal Form

The decimal form of a numeric character reference begins with an ampersand (&) immediately followed by a number sign (#), then one or more decimal digits (0 through 9) that represent the Unicode code point value in base-10, and ends with a semicolon (;).^[2]^[3] This form is used to reference Unicode code points in the range from U+0000 to U+10FFFF, subject to the validity rules defined by the markup language specification.^[4] Leading zeros are permitted in the decimal sequence and do not change the interpreted value, allowing flexibility in formatting while maintaining equivalence (e.g., A equals A).^[2]^[5] For example, Σ denotes the Greek capital letter sigma (Σ) at code point U+03A3.^[3] The decimal form serves as the base-10 alternative to the hexadecimal notation for referencing the same Unicode code points.^[2]^[3]

Hexadecimal Form

The hexadecimal form of a numeric character reference provides an alternative to the decimal variant by expressing the Unicode code point in base-16 notation.^[6]^[7] It begins with the sequence &#x (uppercase &#X in HTML only), followed by one or more hexadecimal digits representing the code point value, and terminates with a semicolon (;).^[6]^[7] This format allows for the inclusion of characters whose code points are more succinctly represented in hexadecimal, particularly for higher values beyond the basic ASCII range.^[6] Hexadecimal digits in this reference are case-insensitive, accepting both uppercase (A-F) and lowercase (a-f) letters alongside digits 0-9; for instance, both Σ and Σ resolve to the Greek capital letter sigma (Σ, U+03A3).^[6]^[7] The valid numeric range mirrors that of the decimal form, from U+0000 to U+10FFFF, subject to language-specific restrictions on certain code points such as controls and noncharacters.^[6]^[8] Leading zeros are permitted but not required, as in 
 for the line feed character (U+000A), though they do not alter the interpreted value.^[6]^[7] A practical example is ♥, which renders as ♥ (black heart suit, U+2665), demonstrating hexadecimal's advantage in brevity for code points like this one, where the four-digit hex equivalent (2665) is shorter than its five-digit decimal counterpart (9829).^[6]^[7]

Applications in Markup Languages

In HTML

Numeric character references (NCRs) in HTML serve to represent characters that cannot be directly entered from a keyboard or that might conflict with markup syntax, such as the less-than sign (<) or ampersand (&). For instance, the reserved ampersand can be encoded as & to prevent it from being interpreted as the start of another entity.^[9] In HTML5, both decimal and hexadecimal NCR forms are supported, allowing references to any valid Unicode code point from U+0000 to U+10FFFF. The decimal form uses &# followed by decimal digits, while the hexadecimal form uses &#x or &#X followed by hexadecimal digits; both must typically end with a semicolon (;) to terminate the reference. However, semicolons are required except in certain ambiguous cases, such as when the reference is followed by trailing digits that could otherwise extend the numeric value, where parsers may still resolve it for compatibility.^[9]^[10] NCRs are resolved to their corresponding Unicode characters during parsing, independent of the document's declared encoding, with UTF-8 assumed as the default if none is specified. Invalid NCRs, such as those referencing code points outside the Unicode range or malformed sequences, are typically treated as literal text by parsers or replaced with the Unicode replacement character U+FFFD.^[3]^[11] HTML5 permits NCRs for C1 control characters (U+0080 to U+009F), which are mapped to specific Unicode equivalents during tokenization in certain contexts, such as attributes, but they may not display consistently across browsers due to varying handling of control codes.^[12]^[13] A unique aspect of HTML parsing involves resolving ambiguities, such as distinguishing ; (encoding a semicolon) from a plain semicolon; most parsers enforce the semicolon terminator to avoid misinterpretation, treating non-terminated sequences as literal ampersands followed by digits. In contrast to XML's stricter requirements, HTML's more lenient approach ensures broader compatibility with legacy content.^[14]

In XML and SGML

Numeric character references were introduced in the Standard Generalized Markup Language (SGML), defined by ISO 8879:1986, to reference characters by their numeric position within the document character set, a coded character set specified in the SGML declaration that determines the repertoire of allowable characters.^[15] This mechanism allowed authors to include characters not directly available in a limited encoding by substituting them with references like &#N;, where N denotes the character's code position.^[15] In SGML, these references are resolved relative to the declared document character set, providing flexibility for various international character sets, though SGML itself does not mandate Unicode.^[16] XML, as a subset of SGML, adapted numeric character references to align with Unicode (ISO/IEC 10646), requiring them to denote valid Unicode code points while mandating a terminating semicolon in all cases for unambiguous parsing.^[17] Both decimal (&#d;) and hexadecimal (&#xh;) forms have been supported since XML 1.0's initial 1998 recommendation, enabling references to characters across the Unicode range.^[18] In XML 1.0 editions prior to 2000, valid code points were restricted to exclude most control characters (e.g., #x1 through #x1F were forbidden both directly and via reference, except for #x9, #xA, and #xD), surrogates (#xD800–#xDFFF), and noncharacters like #xFFFE and #xFFFF, limiting the range effectively to U+0009, U+000A, U+000D, U+0020–U+D7FF, and U+E000–U+FFFD.^[18] XML 1.1, introduced in 2004, extended support to the full Unicode repertoire (up to U+10FFFF) by permitting numeric references to additional control characters (#x1–#x1F and #x7F–#x9F, excluding #x0), while maintaining prohibitions on surrogates and requiring special handling in UTF-16 encodings where surrogates might appear in streams.^[19] XML parsers enforce numeric character references through strict well-formedness validation, expanding valid ones immediately into character data and treating invalid references—such as those to disallowed code points—as fatal errors that halt processing.^[17] This rigorous enforcement contrasts with HTML's more permissive approach, which tolerates certain malformed references for practical web authoring.^[17] In both XML 1.0 and 1.1, processors must reject documents containing invalid numeric references to ensure conformance to Unicode semantics and document integrity.^[19]

Illustrative Examples

Common ASCII and Latin Characters

Numeric character references (NCRs) provide a standardized way to represent common characters from the ASCII range (Unicode U+0020 to U+007E) and the Latin-1 Supplement (U+0080 to U+00FF), particularly those that are reserved in markup languages or difficult to input on standard keyboards.^[20]^[2] These references are especially useful for symbols like the ampersand (&) and less-than sign (<), which must be escaped in HTML and XML to avoid interpretation as markup delimiters.^[20] In practice, NCRs are frequently employed for the characters & (ampersand), < (less-than), > (greater-than), " (quotation mark), and ' (apostrophe) to prevent entity conflicts and ensure document validity.^[2] For the basic ASCII printable characters, NCRs allow direct reference to their Unicode code points. For instance, the ampersand (&, U+0026) is represented as & in decimal form, while the less-than sign (<, U+003C) uses <.^[20] Similarly, the greater-than sign (>, U+003E) is >, the quotation mark (", U+0022) is ", and the apostrophe (', U+0027) is '.^[2] Letters like uppercase A (A, U+0041) can be denoted as A, though such usage is rare for alphanumeric characters that are easily typed.^[20] Extending to the Latin-1 Supplement, NCRs facilitate inclusion of accented and symbolic characters common in Western European languages. The copyright symbol (©, U+00A9) is encoded as ©, and the cent sign (¢, U+00A2) as ¢.^[20] For ligatures in legacy contexts, such as the small oe (œ, U+0153 in Unicode, often mapped from Windows-1252 byte 0x9C), the correct NCR is œ in decimal, though older systems might reference it via encoding-specific decimals like 156 for compatibility.^[2] These examples highlight how NCRs bridge basic input limitations while adhering to Unicode standards.^[20]

Character	Description	Unicode	Decimal NCR	Hexadecimal NCR
&	Ampersand	U+0026	&	&
<	Less-than	U+003C	<	<
>	Greater-than	U+003E	>	>
"	Quotation mark	U+0022	"	"
'	Apostrophe	U+0027	'	'
©	Copyright	U+00A9	©	©
¢	Cent sign	U+00A2	¢	¢
œ	Small oe ligature	U+0153	œ	œ

International and Symbolic Characters

Numeric character references (NCRs) facilitate the inclusion of international scripts and symbolic elements in digital documents by allowing direct reference to Unicode code points, enabling support for languages and notations outside the basic Latin alphabet. This capability is essential for creating content that spans diverse linguistic traditions and expressive symbols, such as those used in global communication and visual representation.^[20] Examples from non-Latin scripts illustrate this utility. The Greek capital letter Alpha (Α), the first letter of the Greek alphabet from the Greek and Coptic block (U+0370–U+03FF), is represented by the decimal NCR Α corresponding to its code point U+0391.^[21] Likewise, the Hebrew letter Aleph (א), a foundational consonant in the Hebrew script from the Hebrew block (U+0590–U+05FF), uses the decimal NCR א for U+05D0.^[22] For East Asian languages, the CJK Unified Ideograph for "one" (一), the initial character in the CJK Unified Ideographs block (U+4E00–U+9FFF) shared across Chinese, Japanese, and Korean, is denoted by the hexadecimal NCR 一 at U+4E00, highlighting NCRs' role in unifying ideographic systems.^[23] Symbolic and emotive characters further demonstrate NCRs' versatility. The black heart suit symbol (♥) from the Miscellaneous Symbols block (U+2600–U+26FF), often evoking affection, is referenced decimally as ♥ for U+2665.^[24] Emojis, residing in higher planes like the Emoticons block (U+1F600–U+1F64F), exemplify extended symbolic use; the grinning face (😀), indicating joy, employs the hexadecimal NCR 😀 for U+1F600.^[25] For such elevated code points beyond the Basic Multilingual Plane, hexadecimal notation is typically favored for conciseness over lengthy decimal equivalents, with NCRs specifying the full Unicode scalar value directly—distinct from surrogate pair representations in UTF-16 encoding, while UTF-8 handles them as multi-byte sequences.^[20]

Historical Development

Origins in SGML

Numeric character references were first defined in the Standard Generalized Markup Language (SGML), formalized as the international standard ISO 8879:1986, where they function as a type of character reference consisting of a delimited character number to specify positions in the reference concrete syntax character set.^[26] This reference concrete syntax establishes a standardized mapping for characters in SGML documents, allowing parsers to interpret the numeric values relative to the declared document character set.^[26] The core purpose of numeric character references in SGML was to enable the portable representation and interchange of characters across diverse encoding environments, particularly for those not directly supported in the input stream or basic character set, well before the adoption of Unicode as a universal standard.^[20] By using numeric codes tied to the document's declared character set, SGML documents could maintain consistency and readability regardless of the underlying hardware or software variations in character handling.^[27] In early SGML implementations, the syntax for numeric character references primarily employed decimal notation, formatted as &# followed by a sequence of decimal digits and terminated by a semicolon, with these values interpreted relative to the specific document character set rather than a fixed universal coding scheme.^[28] Hexadecimal notation, using &#x followed by hexadecimal digits, was also supported but depended on the SGML declaration to define the allowable bases and ranges.^[28] A pivotal development occurred with the WebSGML adaptations, incorporated via the second technical corrigendum to ISO 8879 in 1999, which aligned SGML's reference concrete syntax with the Unicode standard by mapping the first 65,536 character numbers to Unicode 2.0 code points, thereby transitioning numeric references toward universal code point addressing for enhanced web compatibility.^[29] This adjustment laid the groundwork for later adaptations in markup languages like HTML and XML.^[18]

Evolution in Web Standards

Numeric character references (NCRs), originally derived from SGML as a mechanism for encoding characters by their numeric positions, underwent significant adaptations as web markup languages evolved in the 1990s. In HTML, the adoption of NCRs began with the HTML 2.0 specification, published as RFC 1866 in 1995, which introduced basic decimal-form references to represent characters by their code positions in the document character set, aligned with ISO 10646. These decimal NCRs, formatted as &# followed by digits and a semicolon, enabled the inclusion of special characters like < (<) within markup, but hexadecimal notation was not yet supported. By HTML 4.01 in 1999, the World Wide Web Consortium (W3C) expanded this to include hexadecimal NCRs (&#x followed by hex digits), fully aligning references with Unicode code points to facilitate international character support and portability across encodings.^[30] Parallel developments occurred in XML, which inherited NCRs from SGML but formalized them in the XML 1.0 Recommendation of February 1998. This initial version supported both decimal and hexadecimal forms from the outset, allowing references to any valid Unicode character within the specified repertoire, though limited to exclude certain control and surrogate code points.^[31] The second edition of XML 1.0, released in October 2000, refined these rules without altering the core syntax, ensuring consistency with updates to ISO/IEC 10646 and enhancing robustness for broader Unicode adoption.^[32] A pivotal milestone in the late 1990s was the transition from encoding-dependent interpretations—where NCRs resolved relative to the document's character set—to a Unicode code point-based model, addressing portability challenges in diverse international contexts.^[20] This shift, evident in both HTML 4.01 and XML 1.0, laid the groundwork for modern web standards. In the HTML5 era, initiated by the WHATWG in the mid-2000s and formalized as a living standard in 2011, with collaboration between W3C and WHATWG established in 2019 to maintain a unified specification, NCR resolution is standardized to support the full range of Unicode code points (up to U+10FFFF), typically within UTF-8 contexts, ensuring seamless rendering of global scripts and symbols as Unicode evolves (e.g., supporting Unicode 15.0 from 2022 and later versions).^[33]

Restrictions and Limitations

Valid Code Points

Numeric character references in markup languages such as HTML and XML are designed to reference Unicode code points up to U+10FFFF, though XML excludes U+0000 and other invalid points. This range includes the Basic Multilingual Plane (BMP, U+0000 to U+FFFF), which accommodates the majority of commonly used scripts and symbols, as well as the supplementary planes (U+10000 to U+10FFFF) for less common or historical scripts. For efficiency, hexadecimal notation is typically preferred for code points in the supplementary planes, as their decimal equivalents would involve excessively large numbers.^[34]^[35]^[36] While the syntax permits referencing any code point in this range, best practices emphasize using only assigned code points—those explicitly defined with properties and glyph representations in the Unicode Standard—to ensure consistent rendering and behavior across systems. Unassigned code points, which are reserved for potential future assignment, may be technically valid in numeric references but can lead to unpredictable display or interpretation, as their semantics could change in subsequent Unicode versions. As of Unicode 17.0 (released September 9, 2025), 159,801 code points are assigned, covering a vast array of languages, symbols, and technical characters.^[37]^[36] In XML 1.1, the specification explicitly defines valid characters for numeric references as those in the ranges #x1–#xD7FF, #xE000–#xFFFD, and #x10000–#x10FFFF, thereby excluding surrogate code points (U+D800–U+DFFF) and certain noncharacters like U+FFFE and U+FFFF. Although surrogates fall within the overall Unicode range, they are not treated as standalone characters and are recommended to be avoided in encodings like UTF-8, where they form invalid byte sequences. Certain subsets, such as control characters, may face additional restrictions in specific document contexts.^[35]

Prohibited and Restricted Characters

Numeric character references (NCRs) in markup languages like XML and HTML are subject to prohibitions and restrictions on certain Unicode code points to ensure well-formedness, security, and interoperability. These limitations stem from the underlying Unicode standard and the specific grammars of XML and HTML, preventing references to code points that could lead to invalid documents or unpredictable behavior in parsers.^[8]^[9]^[38] Control characters, particularly those in the C0 set (U+0000–U+001F) and C1 set (U+0080–U+009F), are often prohibited or restricted in NCRs. In XML 1.0, the valid character production excludes most C0 controls except U+0009 (tab), U+000A (line feed), and U+000D (carriage return), rendering NCRs like (form feed, U+000C) invalid as they fall outside the allowed range.^[8] XML 1.1 relaxes this slightly by permitting NCRs for controls U+0001–U+001F and certain C1 controls when used in references, though direct inclusion remains forbidden except for whitespace.^[35] In HTML, NCRs are barred from referencing U+0000 (null) and most controls beyond ASCII whitespace (U+0009, U+000A, U+000C, U+000D, U+0020), with browsers typically suppressing or ignoring disallowed ones to avoid rendering issues.^[9]^[39] Surrogate code points (U+D800–U+DFFF) are universally prohibited in NCRs across XML, HTML, and Unicode-compliant encodings like UTF-8 and UTF-16. These ranges are reserved exclusively for surrogate pairs in UTF-16 to encode supplementary characters, and standalone references to them would create malformed sequences incompatible with text interchange.^[38]^[40] Both XML 1.0/1.1 and HTML explicitly exclude surrogates from valid NCR targets to maintain encoding integrity.^[8]^[9] References to code points beyond the Unicode limit (U+10FFFF) are invalid in all contexts, as they exceed the defined codespace.^[41] Noncharacters, such as U+FFFE, U+FFFF, the range U+FDD0–U+FDEF, and the last two positions in each plane (e.g., U+1FFFE–U+1FFFF), are restricted or prohibited; XML 1.0 disallows them entirely in the character production, while HTML permits them technically but treats them as permanently undefined, advising against use to prevent interoperability failures.^[8]^[42]^[38] The private use area (U+E000–U+F8FF) is allowed in NCRs but strongly restricted for practical reasons. Unicode reserves these code points for private agreements without standard semantics, and their use in NCRs can lead to non-interoperable documents, as parsers and applications may interpret them differently or fail to render them consistently; HTML5 explicitly warns against relying on them for public content.^[38]^[9]

Compatibility Considerations

Encoding Discrepancies

In pre-Unicode HTML documents, numeric character references (NCRs) were interpreted relative to the document's declared character encoding, leading to significant discrepancies in character rendering across different charsets. For instance, the NCR � resolves to the euro symbol (€) in documents encoded with Windows-1252 (CP-1252), where byte 0x80 maps to U+20AC, but it corresponds to the Unicode control character U+0080 (padding character) in strict ISO-8859-1 encoding, where bytes 0x80–0x9F are undefined controls.^[43]^[44] This mismatch arises because legacy single-byte encodings like Windows-1252 extended ISO-8859-1 by assigning printable characters to the 0x80–0x9F range for compatibility with common usage, while ISO-8859-1 reserved them for C1 controls. To reliably represent the euro symbol in Unicode-aware contexts, the correct NCR is € (decimal for U+20AC), which maps consistently regardless of encoding; however, in undeclared or mismatched charsets like UTF-8 without proper declaration, legacy NCRs such as � may fail to render as intended, defaulting to the Unicode control or a replacement character (U+FFFD).^[43]^[44] Further variations occur with encodings like ISO-8859-15 (Latin-9), which adjusted the ISO-8859-1 layout to include the euro at byte 0xA4 (NCR ¤), replacing less-used symbols to support the eurozone currencies, in contrast to Windows-1252's placement at 0x80. This positioning difference means ¤ yields a currency sign (¤) in Windows-1252 or ISO-8859-1 but the euro (€) in ISO-8859-15, highlighting how charset-specific mappings can cause unexpected outputs in cross-encoding scenarios.^[45]^[44] Modern web standards resolve these issues by mandating UTF-8 as the default and recommended encoding for HTML documents, ensuring NCRs are interpreted directly as Unicode code points (U+0000 to U+10FFFF) independently of the underlying byte sequence. This approach eliminates legacy dependencies, with parsers applying Unicode mappings universally, including the HTML specification's compatibility remapping of certain control NCRs (e.g., � to €) for backward compatibility even in UTF-8 contexts.^[46]^[43]

Parser and Implementation Variations

Modern web browsers, including Chrome and Firefox as of 2025, resolve numeric character references (NCRs) to their corresponding Unicode code points during HTML5 parsing, following the tokenization rules in the WHATWG HTML living standard. For invalid NCRs, such as those exceeding the Unicode range (e.g., � referencing a code point beyond U+10FFFF), browsers emit the replacement character U+FFFD instead of the literal text, ensuring graceful degradation without halting parsing.^[43] XML parsers, such as libxml2, enforce strict adherence to the XML 1.0 specification, requiring a terminating semicolon for all NCRs and raising a parsing error if absent (e.g., &#x3C treated as invalid without the ;). In contrast, HTML parsers like html5lib adopt a more tolerant approach aligned with HTML5 error recovery, inferring the end of an NCR by consuming available digits even without a semicolon and emitting the resolved character if valid, while flagging it as a parse error.^[2]^[10] Certain edge cases highlight implementation differences: some older tools, particularly those emulating early XML or HTML processing in strict modes, disallow hexadecimal NCRs altogether, treating them as unrecognized syntax rather than resolving to Unicode equivalents. Additionally, NCRs for emoji characters in the astral planes (e.g., U+1F000 for mahjong tiles) render inconsistently or fail entirely on legacy systems lacking font support for supplementary planes, often displaying as blank boxes or fallback glyphs instead of the intended symbols.^[2]^[47] The WHATWG HTML living standard, maintained as a continuously updated document through 2025, mandates consistent surrogate pair handling for astral plane characters resolved via NCRs, ensuring that code points from U+10000 to U+10FFFF are properly represented in UTF-16 without erroneous surrogate emission.^[10]

References

[1]
https://html.spec.whatwg.org/multipage/parsing.html#character-references
[2]
Extensible Markup Language (XML) 1.0 (Fifth Edition)
Summary of each segment:
[3]
HTML Standard
Summary of each segment:
[4]
HTML Standard
Summary of each segment:
[5]
https://html.spec.whatwg.org/multipage/parsing.html#decimal-character-reference-state
[6]
https://html.spec.whatwg.org/multipage/syntax.html#hexadecimal-character-references
[7]
https://www.w3.org/TR/xml/#sec-character-references
[8]
https://www.w3.org/TR/xml/#NT-Char
[9]
https://html.spec.whatwg.org/multipage/syntax.html#character-references
[10]
https://html.spec.whatwg.org/multipage/parsing.html#numeric-character-reference-state
[11]
https://html.spec.whatwg.org/multipage/syntax.html#syntax-character-references-0
[12]
https://html.spec.whatwg.org/multipage/parsing.html#parse-error-control-character-reference
[13]
https://html.spec.whatwg.org/multipage/parsing.html#c1-controls
[14]
https://html.spec.whatwg.org/multipage/parsing.html#ambiguous-ampersand
[15]
https://www.w3.org/MarkUp/html-spec/charset-harmful
[16]
"Character Set" Considered Harmful - W3C
Every SGML document has exactly one document character set, which is a coded character set; Numeric character references give code positions in the document ...Introduction · Numeric Character References · Example: Iso2022 Encoding...<|separator|>
[17]
Comparison of SGML and XML - W3C
Dec 15, 1997 · Named character references are not allowed; Numeric character references to non-SGML characters are not allowed. Entity declarations. A ...
[18]
Extensible Markup Language (XML) 1.0 (Fifth Edition) - W3C
Nov 26, 2008 · Numeric character references may also be used; they are expanded ... then the XML processor will recognize the character references ...
[19]
Extensible Markup Language (XML) 1.0 - W3C
Feb 10, 1998 · Markup takes the form of start-tags, end-tags, empty-element tags, entity references, character references, comments, CDATA section delimiters, ...
[20]
Extensible Markup Language (XML) 1.1 (Second Edition) - W3C
Aug 16, 2006 · Therefore, XML 1.1 allows the use of character references to the control characters #x1 through #x1F, most of which are forbidden in XML 1.0.Rationale and list of changes... · Characters · Character and Entity References
[21]
5 HTML Document Representation
Numeric character references specify the code position of a character in the document character set. Numeric character references may take two forms: The syntax ...Character encodings · Specifying the character... · Character references<|control11|><|separator|>
[22]
None
### Summary of Greek Character Alpha (U+0391) and Its Usage in Unicode
[23]
[PDF] U0590.pdf - Unicode
The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. See https://www.
[24]
None
Below is a merged summary of the CJK Unified Ideograph U+4E00 (一) that consolidates all information from the provided segments into a concise and dense format. To retain as much detail as possible, I will use a combination of narrative text and a table in CSV format for properties and cross-references. This ensures all unique details (e.g., properties, usage, URLs) are included without redundancy.
[25]
None
**Summary of U+2665 (Black Heart Suit) from Unicode Chart (U2600.pdf):**
[26]
None
### Summary of Emoji Grinning Face U+1F600
[27]
ISO 8879:1986(en), Information processing — Text and office systems
numeric character reference. A character reference consisting of a delimited character number. 4.218. object capacity. The capacity limit for a particular kind ...
[28]
Standard Generalized Markup Language (SGML). ISO 8879:1986
The first statement is typically a reference to a "base character set" using a recognized "public identifier." Additional characters can be defined using their ...Missing: numeric | Show results with:numeric
[29]
Syntax Reference
Character entity references are strings of the form &#NNNNNN where NNNNNN is a decimal number, or of the form &#xMMMMMM where MMMMMM is a hexadecimal number.
[30]
SGML declaration - OpenSP - OpenJade
Web SGML Adaptations. OpenSP supports most of the Web SGML Adaptations as specified in Annex K of ISO 8879:1996 (added by the second technical corrigendum, 1998) ...
[31]
https://www.w3.org/TR/1998/REC-xml-19980210#sec-references
[32]
https://www.w3.org/TR/2000/REC-xml-20001006
[33]
Extensible Markup Language (XML) 1.0 (Second Edition)
Summary of each segment:
[34]
HTML Standard
Summary of each segment:
[35]
Extensible Markup Language (XML) 1.1 (Second Edition)
Summary of each segment:
[36]
Unicode 17.0.0
This page summarizes the important changes for the Unicode Standard, Version 17.0.0. This version supersedes all previous versions of the Unicode Standard.
[37]
What makes a Unicode code point safe? @ Things Of Interest - Qntm
Nov 8, 2017 · Unassigned code points have unpredictable, potentially undesirable characteristics (see below) if they ever do get assigned. As of Unicode 10.0, ...
[38]
Special Areas and Format Characters - Unicode
Surrogate code points are restricted use. The numeric values for surrogates are used in pairs in UTF-16 to access 1,048,576 supplementary code points in the ...Control Codes · Private-Use Characters · Surrogates Area · Noncharacters
[39]
https://infra.spec.whatwg.org/#control
[40]
https://infra.spec.whatwg.org/#surrogate
[41]
Chapter 2 – Unicode 16.0.0
Code points that are not assigned to abstract characters are subject to restrictions in interchange ... The Surrogates Area contains only surrogate code points ...
[42]
https://infra.spec.whatwg.org/#noncharacter
[43]
https://html.spec.whatwg.org/multipage/parsing.html#numeric-character-reference-end-state
[44]
Representing Characters in HTML- NCR numeric ... - I18n Guy
You can also represent a character using a Numeric Character Reference, of the form &#dddd;, where dddd is the decimal value representing the character's ...
[45]
Comparing ISO-8859-1 and ISO-8859-15 - I18nQA
So ISO created ISO-8859-15, which is identical to ISO-8859-1 except for 8 characters. ISO-8859-15 is often used on Unix systems in Europe, especially in France.Missing: position NCR
[46]
Choosing & applying a character encoding - W3C
Mar 31, 2014 · Choose UTF-8 for all content. Declare the encoding in the document, and save the text in that encoding. UTF-8 is recommended for web content.Details · Applying An Encoding To Your... · Additional Information
[47]
Emoji unicode does not render into emoji - Stack Overflow
Mar 31, 2018 · I have the following context where I want to turn a pattern into an emoji, however what I see as a result is &#1F600; . The code &#1F600; does not render into ...How are emojis rendered? - fontsRendering or deleting emoji - javascriptMore results from stackoverflow.comMissing: NCR 1F000 legacy