Fact-checked by Grok 2 weeks ago

Implicit directional marks

Implicit directional marks are invisible, zero-width characters designed to influence the rendering direction of text in bidirectional contexts, primarily the (LRM, U+200E), Right-to-Left Mark (RLM, U+200F), and Arabic Letter Mark (ALM, U+061C). These marks function as strong directional controls without visible display or impact on text semantics, such as word breaking or string comparison. In the Unicode Bidirectional Algorithm (UBA), implicit directional marks play a crucial role by resolving the directionality of neutral characters—such as spaces, , or symbols—that lack inherent left-to-right (LTR) or right-to-left () properties, ensuring proper visual ordering in mixed-script text. Unlike explicit or override characters (e.g., LRE or RLE), which create nested directional levels and are limited by a maximum embedding depth of 125, LRM, RLM, and ALM provide lightweight, local control limited to the current paragraph and terminated by paragraph separators. They are classified under the Bidi_Control property and treated equivalently to strong L or R characters during the algorithm's resolution phases, affecting implicit processing without altering the explicit hierarchy. These marks are essential for and in software handling languages like , Hebrew, or alongside LTR scripts such as English, preventing issues like reversed or misaligned numbers in user interfaces, documents, and . For instance, inserting an RLM after an in text (e.g., "RETAW DEEN I!RLM") ensures the punctuation adheres to flow rather than flipping to LTR. LRM and RLM were adopted in 1.1, with ALM added and support enhancements (including directional isolates) in 6.3, simplifying editing and display across platforms like and rich text formats.

Overview

Definition and Purpose

Implicit directional marks are invisible Unicode characters designed to influence the directionality of text without any visual or semantic impact on the content. These marks function as strong directional characters—either left-to-right or right-to-left—while having zero width, meaning they do not occupy space or alter the apparent length of the text string. By embedding these marks strategically, they guide the bidirectional rendering process to ensure that neutral or weak characters, such as , align correctly in mixed-script environments. The primary purpose of implicit directional marks is to resolve ambiguities in the rendering of , particularly where left-to-right (LTR) and right-to-left () scripts intermingle, such as in languages like English and . In such contexts, neutral elements like or parentheses can be misinterpreted by the bidirectional algorithm, leading to incorrect visual ordering or —for instance, an might appear reversed in an RTL-dominant paragraph without proper guidance. These marks provide a lightweight mechanism to enforce the intended direction locally, preserving the logical order of the text while achieving the desired visual presentation, thus facilitating accurate interchange and display across diverse writing systems. Unlike explicit directional formatting codes, such as the Left-to-Right Override (LRO) or Right-to-Left Override (RLO), which apply to entire blocks of text and can override the natural directionality in a more global manner, implicit directional marks operate on a finer, more localized scope within a single paragraph. This distinction makes them preferable for subtle adjustments, as they avoid the broader embedding effects that could disrupt surrounding text or complicate parsing and comparison operations. Implementations of this concept include characters like the (LRM), Right-to-Left Mark (RLM), and Arabic Letter Mark (ALM), each tailored to specific directional needs without introducing visible artifacts.

Historical Development

Implicit directional marks originated as part of Unicode's early efforts to handle in computing environments. The (LRM, U+200E) and Right-to-Left Mark (RLM, U+200F) were introduced in 1.1.0, released in June 1993, to provide invisible formatting controls for resolving ambiguities in mixed left-to-right and right-to-left scripts. These marks were essential for the initial bidirectional support outlined in the Unicode Standard, enabling proper rendering of text combining European languages with scripts like Hebrew. The development of these marks was driven by the need to support digital typesetting for bidirectional languages, particularly Hebrew and , which were increasingly digitized in the late 1980s and early 1990s. As expanded globally, challenges in displaying mixed-script documents—such as numbers or embedded in right-to-left text—necessitated standardized directional controls beyond simple script detection. This was influenced by requirements from industries like and software localization, where inconsistent rendering could distort meaning in legal or financial texts. To address specific limitations in handling, the Arabic Letter Mark (ALM, U+061C) was later added in 6.3.0, released in September 2013. ALM serves as a right-to-left zero-width tailored for contexts, improving the bidirectional algorithm's treatment of neutral characters adjacent to Arabic letters without affecting non- scripts. Its inclusion responded to proposals highlighting gaps in prior marks for complex , such as in presentation forms and ligatures. The evolution of implicit directional marks has been closely tied to refinements in Unicode Standard Annex #9 (UAX #9), the Unicode Bidirectional Algorithm, first formalized in 2.0 and iteratively updated through subsequent versions. Early revisions focused on core embedding and override mechanisms, with significant enhancements in Unicode 3.0 for better neutral character resolution. Post- 6.3, no major structural changes occurred to the marks themselves, though UAX #9 (now at revision 51) incorporated interactions with new directional isolates (added in 6.3) and minor clarifications for edge cases in mixed-script rendering. The framework has remained stable through 17.0, released in 2025, reflecting its maturity in supporting global text processing needs.

The Characters

Left-to-Right Mark (LRM)

The (LRM) is a zero-width, non-printing formatting character that enforces left-to-right (LTR) directionality in by acting as a strong LTR directional cue for adjacent neutral or weak elements. It is particularly useful in mixed-script environments where implicit directional marks help maintain legible ordering without visible artifacts. A primary use case for the LRM involves inserting it after LTR text that is followed by right-to-left (RTL) text to prevent directional overrides on punctuation or other neutral characters. For example, in the sequence "Hello!عربي", placing an LRM immediately after the exclamation mark results in "Hello!عربي", ensuring the punctuation stays aligned to the right of the English word rather than being pulled into the RTL flow. This application is common in plain text scenarios, such as filenames, database entries, or user-generated content, where it resolves ambiguities to preserve intended visual structure. In terms of behavior, the LRM influences only the display ordering of text by providing a localized LTR context, remaining entirely invisible during rendering. It has no effect on text shaping processes, such as glyph joining or cursive connections, nor does it alter the semantic or parsing interpretation of the content. The LRM's scope is confined to the current paragraph, where it applies directionality without nesting deeper embeddings or overriding broader directional controls.

Right-to-Left Mark (RLM)

The Right-to-Left Mark (RLM) is a zero-width, non-printing formatting that functions as an implicit strong right-to-left directional indicator, primarily for non-Arabic scripts such as Hebrew, where it enforces right-to-left ordering on adjacent neutral or weak bidirectional without affecting text shaping or semantics. In primary use cases involving Hebrew or mixed-script environments, the RLM ensures proper positioning of punctuation or separators following left-to-right segments within an overall right-to-left flow; for instance, after an LTR number or segment in RTL text, inserting the RLM before the following Hebrew text maintains separation and correct alignment. This application is particularly valuable in lists or numbered items where neutral elements like commas or dashes might otherwise disrupt the expected RTL progression in Hebrew text. Behaviorally, the RLM remains invisible during rendering and display, allowing it to isolate directional runs and block unwanted left-to-right influence from bleeding into right-to-left sections, thereby maintaining clean visual separation without introducing artifacts or altering the logical order of characters. Its effect is confined to the enclosing , terminating at breaks such as line ends or structural dividers. A key limitation of the RLM in Arabic contexts is that it does not provide the Arabic letter (AL) bidirectional class, which is needed to give neutral characters an Arabic context for proper resolution, such as selecting Arabic-Indic digit forms; the Arabic Letter Mark (ALM) is preferred for such cases.

Arabic Letter Mark (ALM)

The Arabic Letter Mark (ALM) is a zero-width, non-printing formatting character designed to mimic the bidirectional behavior of an Arabic letter, thereby enforcing right-to-left directionality specifically within Arabic text contexts. It serves as an invisible directional control that influences the resolution of subsequent characters without altering the visual appearance or semantic content of the text. In primary use cases, ALM ensures correct ordering and alignment in mixed Arabic and left-to-right text, particularly for neutral characters like numbers or that follow Arabic letters. For instance, placing ALM after Arabic text before a prevents the neutral from disrupting the right-to-left flow, allowing the number to integrate seamlessly into the Arabic run and potentially adopting Arabic-Indic digit forms in appropriate locales. This is especially useful in scenarios such as numbered lists, dates, or mathematical expressions embedded in Arabic documents, where maintaining contextual directionality is crucial for readability. Behaviorally, ALM remains transparent during rendering, contributing no width or glyph to the display while acting as a strong right-to-left influencer in bidirectional processing. It preserves the context for adjacent elements without directly modifying glyph shapes, though it indirectly supports proper shaping engine decisions by sustaining the script's directional properties. As an implicit directional mark tailored for right-to-left scripts, it aids in resolving ambiguities in hybrid text environments. Limitations of ALM include its specificity to Arabic script environments, making it unsuitable for other right-to-left languages like Hebrew, where it may not yield expected results. Unlike the more general Right-to-Left Mark, ALM maintains Arabic-specific contextual effects on neutrals, such as digit substitution, but requires careful placement to avoid unintended interactions with shaping systems that prioritize broader RTL rules.

Bidirectional Properties

Unicode Code Points and Classes

Implicit directional marks are defined in the Unicode Standard with specific code points and bidirectional classes that determine their role in text directionality. The (LRM) is assigned the code point and belongs to the bidirectional class L, which indicates a strong left-to-right directionality. The (RLM) uses and has the bidirectional class R, signifying strong right-to-left directionality. The (ALM) is encoded at with the bidirectional class AL, treated as a right-to-left Arabic letter for directional purposes. These marks share several key properties that ensure their function without visual impact. All three have the Bidi_Control property set to Yes, enabling them to influence the bidirectional algorithm explicitly. Their General_Category is Other_Format (), classifying them as formatting characters that do not occupy space or display glyphs; they are zero-width and invisible in rendered text. These classes assign directional strength to the marks, overriding weaker directional cues in surrounding text without altering the semantic content. In terms of encoding, LRM and RLM reside in the block (U+2000–U+206F), while ALM is part of the block (U+0600–U+06FF). These assignments have remained stable since their introduction in early versions, reflecting their foundational role in support.
MarkCode PointBidirectional ClassBlock
LRMU+200EL (Left-to-Right)
RLMU+200FR (Right-to-Left)
ALMU+061CAL (Arabic Letter)

Interaction with the Bidirectional Algorithm

Implicit directional marks, including the (LRM, U+200E), Right-to-Left Mark (RLM, U+200F), and Arabic Letter Mark (ALM, U+061C), are integrated into 's bidirectional algorithm as defined in Unicode Standard #9 (UAX #9). These marks are classified as strong directional characters—LRM as type L (left-to-right), RLM as type R (right-to-left), and ALM as type AL ( letter, which behaves directionally like R)—and thus participate in the algorithm's resolution phases to influence text reordering without producing any visible output. During the explicit embedding and override processing (steps X1 through X9 in UAX #9), these marks have no special role, as they do not initiate or terminate embedding levels, isolates, or overrides; instead, their effects are confined to the implicit processing stages. In the primary embedding level assignment (step P2), each mark receives an embedding level based on its strong directional type and the preceding character's level, thereby propagating directionality to adjacent neutral or weak characters, such as forcing a neutral space to adopt the mark's direction for consistent reordering. In the subsequent resolving of embedding levels for weak types (step P3) and the detailed weak type resolution (steps W1 through W7), the marks override directional ambiguities in neighboring characters—for instance, an RLM can resolve a following European number (EN) to Arabic number (AN) in an Arabic context or direct neutrals to right-to-left orientation—ensuring proper alignment without altering the overall paragraph structure. However, implicit directional marks do not influence text outside isolating run sequences (step X10). Within the isolate, their strong type affects local ordering, while the entire run is treated as neutral for surrounding text. Unlike explicit directional formatting characters such as the Left-to-Right Override (LRO, U+202D), implicit marks operate only at a local scope, affecting immediate adjacent characters without creating block-level embeddings or requiring termination via a Pop Directional Formatting (PDF, U+202C). This lightweight design eliminates the need for pairing or stack management, making them suitable for fine-grained directional corrections in mixed-script text, though their influence does not nest across multiple levels. Edge cases in their processing include termination of influence at paragraph separators (type B in step P1), which confine the marks' effects strictly within their originating to prevent cross-paragraph reordering. Additionally, in higher-level protocols like those involving embedding isolates (e.g., RLI or LRI), the marks do not propagate across isolate boundaries, maintaining while still applying their strong directionality locally within the run.

Practical Usage

In Web Technologies (HTML and CSS)

In web technologies, implicit directional marks such as the Left-to-Right Mark (LRM, U+200E), Right-to-Left Mark (RLM, U+200F), and Arabic Letter Mark (ALM, U+061C) are inserted into HTML documents using character entities to influence the rendering of bidirectional text without altering the visible content. The LRM can be represented as &lrm; or &#x200E;, the RLM as &rlm; or &#x200F;, and the ALM as &#x61C; or &#1564;. These marks work within the Unicode Bidirectional Algorithm for inline elements by influencing directional runs, ensuring proper ordering in mixed-language text, such as when neutral characters like punctuation or numbers adjoin opposite-direction scripts. In , these marks are particularly useful for fine-tuning the directionality of characters like or numbers adjacent to scripts of differing directions, especially in -dominant contexts. For instance, in an RTL containing an embedded LTR phrase followed by , the RLM after the phrase attaches the to the LTR run: <p dir="rtl">عربي the [title](/page/Title) is &rlm;!INTERNATIONALIZE THE [WEB](/page/Web)</p>, which renders with the "!" correctly positioned after the English phrase in the flow. Similarly, for LRM, in an LTR with text followed by a number, <p dir="ltr">عربي &lrm;123</p> ensures the "123" aligns left-to-right without inheriting directionality from the preceding . The ALM functions analogously to the RLM but specifically strengthens the right-to-left direction for following letters in contexts where characters might otherwise disrupt the run, as in <p>English &#x61C;عربي</p>. CSS enhances the effects of these marks through properties like unicode-bidi and direction, which can isolate or embed directional contexts around marked text. The unicode-bidi: embed value, combined with direction: ltr or rtl, applies an embedding level to inline elements containing the marks, aligning their behavior with the Unicode algorithm. For stronger isolation, unicode-bidi: isolate computes the directionality of the element's content independently, preventing interference from surrounding text; for example, <span style="unicode-bidi: isolate; direction: rtl;">عربي&lrm;</span> ensures the LRM correctly influences the RTL run without affecting adjacent LTR content. The isolate value is particularly effective in modern browsers supporting CSS Writing Modes Level 3, as it mimics the pairing of directional marks with explicit isolates like LRI (U+2066) and PDI (U+2069). Common rendering issues in browsers arise from inconsistencies in handling neutral characters or legacy support for the Bidirectional Algorithm, such as "sticking" to the wrong directional run or numbers inheriting unexpected directions. In older versions of WebKit-based browsers, for example, RTL text followed by an LTR number might reverse the numeral's order unless terminated by an LRM: <p><span dir="rtl">عربي</span> 123&lrm;</p>. and Blink engines generally handle marks more robustly but may require explicit dir="auto" on parent elements to detect base direction accurately. Testing across engines like those in , , and is recommended to verify consistent output, as variations in br elements or inline block treatment can propagate bidi errors. Best practices for using implicit directional marks in emphasize targeted application for and simplicity. They align with WCAG 2.1 Success Criterion 1.3.2 (Meaningful Sequence) via Technique H34, ensuring screen readers and assistive technologies preserve logical reading order in mixed bidi content when combined with techniques like G57 for adjacent links. Avoid overuse, as excessive marks can complicate source code maintenance and increase the risk of mismatched embeddings; instead, prefer like <bdi> for dynamic content or CSS isolation for broader control.

In Plain Text and Other Applications

Implicit directional marks, such as the (LRM, U+200E), Right-to-Left Mark (RLM, U+200F), and Arabic Letter Mark (ALM, U+061C), can be directly inserted into files using their code points in editors that support full rendering, including applications like Notepad++ and Vim with appropriate configurations. These marks are particularly vital in environments without markup support, such as email composition in clients like or chats in terminal-based tools, where they ensure correct ordering of mixed left-to-right (LTR) and right-to-left () scripts, preventing visual reordering issues in . For instance, inserting an LRM after an numeral in an LTR maintains its expected leftward alignment without altering the surrounding flow. In document formats like PDF, Microsoft Word, and LibreOffice Writer, these marks provide fine-grained bidirectional control without relying on explicit paragraph-level formatting, embedding directly into the text stream to influence rendering. PDFs generated from Unicode-compliant sources preserve these marks during layout, allowing tools like Adobe Acrobat to apply the Unicode Bidirectional Algorithm for accurate display of mixed-script content. In Microsoft Word, users can insert LRM or RLM via the Insert Symbol dialog or keyboard shortcuts to resolve issues like inverted parentheses in RTL contexts embedded within LTR documents. Similarly, LibreOffice supports insertion through the Insert > Special Character menu, enabling seamless handling of bidirectional text in ODF files. These applications often integrate shaping libraries like HarfBuzz, which processes the marks during glyph positioning to ensure proper cursive joining and directionality in complex scripts such as Arabic or Hebrew. In programming contexts, implicit directional marks are handled through Unicode string operations in languages like and to support . In , strings can include LRM or RLM characters directly, with the java.text.Bidi class analyzing and reordering them according to the bidirectional algorithm for correct output. 's built-in Unicode support allows appending marks via escape sequences (e.g., '\u200e' for LRM), facilitating in libraries like unicodedata for normalization and display. The (ICU) library provides such as ubidi_setPara() in its Bidi , which automatically inserts marks during paragraph-level processing to enforce directionality in applications built with C++, , or other bindings. These functions are essential for cross-platform text rendering in software like web servers or desktop apps handling global content. Despite their utility, compatibility challenges arise in legacy systems lacking full Unicode support, where non-bidirectional-aware applications may render marks as question marks or ignore them entirely, leading to disordered text. Testing across platforms is crucial, as older Windows environments or non-ICU-based tools might fail to process marks correctly, necessitating fallback strategies like explicit direction overrides or normalization to ensure consistent rendering in mixed-script scenarios.

References

  1. [1]
    UAX #9: Unicode Bidirectional Algorithm
    In addition, there are implicit directional formatting characters, the right-to-left and left-to-right marks. The effects of all of these formatting characters ...Directional Formatting... · Basic Display Algorithm · Bidirectional ConformanceMissing: linguistics typography
  2. [2]
    H34: Using a Unicode right-to-left mark (RLM) or left-to-right ... - W3C
    The objective of this technique is to use Unicode right-to-left marks and left-to-right marks to override the HTML bidirectional algorithm when it produces ...Missing: implicit | Show results with:implicit
  3. [3]
    [PDF] Basics of the Unicode BiDirectional Algorithm (UBDA)
    Embedding is used to mark a block of text as a subdocument with its own direction. (In Unicode, each paragraph is a document whose direction is determined by ...Missing: definition linguistics typography
  4. [4]
    [PDF] Directionality - Unicode
    Unicode uses logical order, mostly left-to-right, but can be right-to-left. Bidirectional text can be ambiguous, and directional codes are used for formatting. ...<|control11|><|separator|>
  5. [5]
    [PDF] 1. Title: Proposal to encode the Arabic Letter Mark (ALM) Introduction
    Jul 17, 2011 · Proposal to encode the Arabic Letter Mark (ALM). 2. Requester's name ... Unicode includes the LRM (U+200E) and RLM (U+200F) characters ...
  6. [6]
    [PDF] Arabic - The Unicode Standard, Version 17.0
    ARABIC LETTER MARK. • commonly abbreviated ALM. → 200F right-to-left mark ... ٷ ARABIC LETTER U WITH HAMZA ABOVE. • preferred spelling is 0674 ٴ 06C7 ۇ.
  7. [7]
    UAX #9: The Bidirectional Algorithm - Unicode
    Summary. This document describes specifications for the positioning of characters flowing from right to left, such as Arabic or Hebrew.Missing: history evolution
  8. [8]
    Unicode 17.0.0
    Sep 9, 2025 · A detailed history of how the handling of this character in Unicode's specifications has evolved over the years has been added to UAX #14. See ...<|control11|><|separator|>
  9. [9]
    Text directionality - Globalization - Microsoft Learn
    Nov 20, 2023 · Explicit formatting characters · U+061C: ARABIC LETTER MARK (ALM) · U+200E: LEFT-TO-RIGHT MARK (LRM) · U+200F: RIGHT-TO-LEFT MARK (RLM).Missing: implicit | Show results with:implicit
  10. [10]
  11. [11]
  12. [12]
    Inline markup and bidirectional text in HTML - W3C
    Jul 25, 2016 · One use of LRM and RLM is to extend a directional run through neutral or weak characters at the start or end of an opposite-direction phrase, by ...
  13. [13]
  14. [14]
  15. [15]
    U+200F RIGHT-TO-LEFT MARK - Unicode Explorer
    The right-to-left mark (RLM) is a non-printing character used in the computerized typesetting of bi-directional text containing mixed left-to-right scripts ( ...
  16. [16]
  17. [17]
    Arabic orthography notes - r12a.io
    U+061C LETTER MARK (ALM) is used to produce correct sequencing of numeric data. Click on the character name, and see also Expressions & sequences for ...
  18. [18]
    [PDF] Script property of Arabic Letter Mark and interaction with digit ...
    Oct 27, 2016 · ARABIC LETTER MARK (U+061C) was added in Unicode 6.3 with a Script property of Arabic, and Script_Extensions of Arabic, Syriac and Thaana. ...
  19. [19]
  20. [20]
  21. [21]
  22. [22]
  23. [23]
  24. [24]
  25. [25]
  26. [26]
  27. [27]
  28. [28]
    Technique H34:Using a Unicode right-to-left mark (RLM) or ... - W3C
    The objective of this technique is to use Unicode right-to-left marks and left-to-right marks to override the HTML bidirectional algorithm when it produces ...Missing: implicit | Show results with:implicit
  29. [29]
    Unicode Character 'ARABIC LETTER MARK' (U+061C)
    Unicode Character 'ARABIC LETTER MARK' (U+061C) ; Index entries, ARABIC LETTER MARK ; Comments, commonly abbreviated ALM ; See Also, right-to-left mark U+200F.Missing: 4.1 | Show results with:4.1
  30. [30]
    Inline markup and bidirectional text in HTML
    ###Summary: Using Implicit Directional Marks and Markup for Bidirectional Text in HTML and CSS
  31. [31]
    Additional Requirements for Bidi in HTML & CSS - W3C
    Jul 21, 2015 · The Unicode Bidirectional Algorithm, which determines the visual order in which bidi text is to be displayed, given a base direction that is ...
  32. [32]
    How to use Unicode controls for bidi text - W3C
    Feb 23, 2023 · This article looks at how content authors can apply direction metadata to bidirectional text when markup is not available.
  33. [33]
    why are parentheses (brackets) inverted in MS Word when written to ...
    Jun 12, 2021 · So either you mark each LTR character using U+200E LEFT-TO-RIGHT MARK and then RTL characters using U+200F RIGHT-TO-LEFT MARK (RLM) . Or you are ...<|separator|>
  34. [34]
    Bidirectional text and closing bracket bug - English - Ask LibreOffice
    The RLM/LRM I indicated are for use with individual characters while RLE (U+202b) and LRE (U+202a) are used to surround strings. I have tested this and the ...
  35. [35]
    Shaping concepts: HarfBuzz Manual
    Text shaping is the process of transforming a sequence of Unicode codepoints that represent individual characters (letters, diacritics, tone marks, numbers, ...
  36. [36]
    Handling of arabic characters in unicode - python - Stack Overflow
    Apr 3, 2016 · Usage is prescribed in the Unicode Bidirectional Algorithm. LRM is ... Specifically ALM U+061C ARABIC LETTER MARK Right-to-left zero-width Arabic ...
  37. [37]
    Non-Unicode apps show question marks instead of Russian text ...
    Sep 8, 2025 · This issue occurs because the System Locale determines the ANSI code page used by non-Unicode apps (e.g., Windows-1251 for Russian) which ...