Fact-checked by Grok 2 weeks ago

Complex text layout

Complex text layout (CTL), also referred to as complex script rendering, is the specialized process of and rendering text in writing systems where the visual form, positioning, or sequence of characters (graphemes) varies based on contextual relationships with neighboring characters, rather than following a simple left-to-right linear progression. This includes handling directions, glyph shaping, ligature formation, and placement to ensure accurate and aesthetically appropriate display. CTL is essential for supporting a wide array of scripts, including right-to-left languages like Arabic and Hebrew, which mix with left-to-right elements such as numbers, as well as Southeast Asian scripts like Thai that form character clusters with implicit vowels and tone marks. Indic scripts, such as Devanagari and Bengali, require complex reordering and matra (vowel sign) positioning around base consonants to form syllables. Unlike simple scripts (e.g., Latin or Cyrillic), which map characters directly to glyphs in storage order, CTL languages store text in logical order but demand transformation for visual presentation, involving steps like script analysis, character reordering, and font-specific glyph substitution. In computing, CTL is implemented through technologies like Microsoft's Uniscribe API, which performs script-specific processing including bidirectional resolution via the Bidirectional Algorithm and font features for shaping. Open-source libraries such as provide similar capabilities, while web standards in CSS and leverage these for international , ensuring support for diverse languages in browsers and applications. Early efforts, such as The Open Group's CTL project in the , standardized integration of these features into desktop environments for languages like and Thai. The complexity arises from rules for justification, line breaking, and font fallback, which prevent invalid combinations and maintain readability across mixed-script documents.

Introduction

Definition and Scope

Complex text layout (CTL) refers to the and rendering of writing systems in which the shape, position, or order of a depends on its context, such as adjacent characters or the surrounding text direction. This process involves transformations between the logical storage of text in and its visual display, distinguishing it from simple linear rendering where characters are presented without modification. The scope of CTL includes bidirectional (BiDi) text that mixes right-to-left and left-to-right directions, cursive joining behaviors, ligature formation for combined glyphs, and vertical or multidirectional layouts, but generally excludes straightforward left-to-right scripts like basic Latin unless they require contextual features such as combining marks. These elements ensure that text is legible and culturally appropriate across diverse scripts, with brief handling of BiDi reordering to maintain logical flow in mixed-language documents. For example, the simple Latin string "abc" displays as isolated characters in fixed positions, while the Arabic phrase "العربية" demands contextual shaping: letters connect cursively and alter forms (initial, medial, final, or isolated) based on their neighbors, resulting in a fluid, joined appearance. CTL's importance lies in its role for software (i18n), allowing applications to support global languages accurately and reducing localization costs for vendors entering international markets.

Historical Development

In the 1980s and early 1990s, digital typesetting technologies like Adobe's , introduced in 1982, were optimized for Latin-based scripts, creating substantial hurdles for non-Latin writing systems that demanded bidirectional rendering, variable glyph widths, or contextual shaping. These systems often relied on fixed-width encodings or ad hoc extensions, complicating the handling of scripts such as , Hebrew, or CJK ideographs, where mixed byte lengths in standards like Shift-JIS further exacerbated access and unification issues. Early releases, including version 1.0 in 1991 and 1.1 in 1993, provided a universal encoding foundation but omitted full bidirectional support, restricting effective digital representation of right-to-left and mixed-direction texts. Key advancements in the mid-1990s addressed these deficiencies through standardized and font formats. Unicode 2.0, released in 1996, incorporated the Bidirectional , enabling logical-to-visual text reordering for scripts with opposing directionalities. Complementing this, 1.0, jointly developed by and and published in April 1997, introduced glyph substitution and positioning tables via GSUB and GPOS, facilitating complex shaping for cursive and conjunct-dependent scripts. As proprietary solutions proved insufficient for diverse linguistic needs, open-source initiatives gained traction: launched in 2004 as a programmable system for fonts targeting lesser-known languages, while emerged in 2006 from collaborations between and developers to provide a robust, unified shaping engine. The post-2000 era marked a transition to open, web-centric standards, driven by the internet's into non-Western markets and the demand for . This culminated in specifications like the CSS Writing Modes Module Level 3, issued as a W3C Working Draft in February 2011, which defined properties for horizontal, vertical, and bidirectional layouts to support international scripts in browsers. Despite these strides, pre-2020 implementations revealed persistent gaps in minority script support, where many endangered or low-resource writing systems lacked encoding, shaping rules, or font resources for complex layouts. expansions, including version 3.0 in 2001 and subsequent releases up to 13.0 in 2020, systematically incorporated new characters, bidirectional properties, and script-specific behaviors to bridge these deficiencies and preserve linguistic diversity, continuing in later versions up to 16.0 in September 2024.

Writing Systems Requiring CTL

Bidirectional Scripts

Bidirectional scripts are writing systems that incorporate text flowing primarily from right to left (), often intermixed with left-to-right (LTR) elements such as numbers, , or embedded phrases in other languages, necessitating algorithmic reordering to achieve correct visual . These scripts arise in languages where the base direction is , but neutral or weak directional characters require resolution based on surrounding context to prevent visual distortion. Primary examples include Arabic, Hebrew, and Syriac, which are Semitic languages using abjads where letters connect and change form contextually, but whose layout demands bidirectional handling for coherent display. Numbers, typically classified as European numbers (EN) or Arabic numbers (AN), and punctuation marks like parentheses or quotes are treated as neutral (ON) or weak elements, adopting the direction of adjacent strong directional text or the paragraph's embedding level. For instance, in an Arabic sentence containing a European numeral, the number flows LTR within the RTL context, ensuring readability without manual adjustment. The Bidirectional Algorithm, specified in Unicode Standard Annex #9 (UAX #9), governs this reordering through a multi-pass process that assigns directional levels to characters. Embedding levels allow nesting of opposite-direction text using control characters like left-to-right (LRE, U+202A) or right-to-left (RLE, U+202B), with levels ranging from even (LTR) to odd () up to a maximum depth of 125 to avoid overflow. Overrides, via left-to-right override (LRO, U+202D) or right-to-left override (RLO, U+202E), force uniform direction but are discouraged due to and concerns. Resolution occurs in phases: first, splitting into paragraphs (P1) and applying explicit embeddings (X1–X9); then resolving weak types like numbers (W1–W7); followed by neutral resolution (N1–N2), where neutrals inherit direction from neighbors; and finally implicit levels (I1–I2) for unresolved cases, culminating in visual reordering by level parity (L1–L4). These scripts affect hundreds of millions of users worldwide, with alone spoken by over 450 million people across 25 countries, underscoring the global scale of bidirectional layout needs. Historical precedents trace to ancient systems like the Phoenician script, an from the BCE that influenced modern writing directions. Challenges emerge prominently in mixed-content scenarios, such as documents embedding LTR quotes, , or code snippets, where unhandled neutrals can lead to reversed or mirrored appearances— for example, a in text might display with slashes and dots in inverted order, confusing readers. Modern solutions recommend directional isolates (LRI, RLI, PDI; U+2066–U+2069) to encapsulate segments without affecting surroundings, mitigating these issues in digital interfaces.

Complex Shaping Scripts

Complex shaping scripts involve writing systems where individual characters or glyphs change form, combine into ligatures, or reposition relative to one another within a word or to achieve proper rendering. These scripts require sophisticated layout engines to handle intra-word transformations, such as vowel signs attaching to or letters adopting contextual shapes based on their position. Unlike scripts, shaping here ensures legibility and aesthetic harmony by applying rules for clustering and substitution. The Indic or Brahmic family of scripts, including , , and , exemplifies complex shaping through structures where s carry an inherent that can be modified or suppressed. In , dependent signs known as matras attach above, below, to the left, or right of a base ; for instance, the matra U+093F ◌ि repositions to the left of the क (U+0915) to form the कि (ki). follows similar rules, allowing up to three left-side signs per , while uses the puḷḷi (U+0BCA) to suppress inherent s and positions signs accordingly. These scripts rely on glyph substitution (GSUB) and positioning (GPOS) tables in fonts to handle reordering and attachment of matras and conjuncts. Southeast Asian scripts like Thai, Khmer, and Lao also demand intricate shaping due to their stacked diacritics and lack of inter-word spacing. In Thai, tone marks (e.g., U+0E48 ◌่ mai ek) and vowel signs (e.g., U+0E31 ◌ู) appear above or below the base consonant, with left-side vowels rendered in logical order but visually preceding the base. Khmer employs a coeng (U+17D2 ◌្) for subjoined consonants and vowel signs that trap around the base, such as composites like U+17B6 U+17C6 for certain vowels, while avoiding spaces between words. Lao mirrors Thai in tone mark and vowel placement, using diacritics that stack outward from the consonant. These features necessitate precise vertical positioning to prevent overlaps in rendering. Cursive scripts such as and Mongolian further complicate shaping by requiring glyphs to adopt position-dependent forms for fluid connection. Arabic letters typically have up to four contextual forms: isolated (standalone), (word-start), medial (mid-word, joining both sides), and final (word-end), applied to dual-joining characters like م (U+0645); right-joining letters like ر (U+0631) use only isolated and final forms. This cursive joining is managed through features like init, medi, and fina. Mongolian, written vertically, exhibits similar cursive behavior where letters join on both sides within words, with context-sensitive forms ensuring continuous flow from top to bottom.

Vertical and Multidirectional Layouts

Vertical text layout involves arranging characters in lines that flow from top to bottom, often with columns progressing from right to left, a convention prevalent in certain writing systems to accommodate their visual and cultural traditions. This approach contrasts with the predominant left-to-right flow in many scripts and requires specific handling for character orientation, such as keeping ideographs upright while rotating or Latin letters. In , vertical presentation has historical roots in scroll-based writing, where text advances downward along the spine, enhancing readability for dense ideographic content. East Asian scripts exemplify vertical layout through their handling of Hanzi (Chinese characters), Kanji (Japanese characters borrowed from Chinese), Hiragana and Katakana (Japanese syllabaries), and Hangul (Korean syllables). Hanzi and Kanji remain upright in vertical text, with lines flowing top to bottom and succeeding columns from right to left, preserving the square aspect of each glyph for optimal legibility. Hiragana and Katakana characters also stay upright, integrating seamlessly with ideographs in mixed-script documents common in Japanese publications. For Korean, Hangul syllables are composed of stacked jamo (consonants and vowels) that appear upright in vertical flow, though the overall syllable block does not rotate; this allows natural progression down the line without disrupting phonetic clustering. The represents a distinct vertical system where text is written in columns from top to bottom, with columns advancing from right to left across the page. Individual letters rotate 90 degrees counterclockwise to align with the vertical baseline and connect fluidly within each column, forming a cursive-like chain that reflects the script's traditional calligraphic style. This rotation and connection ensure that vowels and consonants interlock properly, maintaining the script's aesthetic continuity in vertical presentation. Multidirectional layouts extend vertical flow by incorporating non-linear progressions, as seen in , which primarily runs horizontally left to right but can adopt vertical arrangements top to bottom with successive columns progressing from left to right in certain manuscript traditions. This leftward column advance, combined with the script's inherent stacking of subjoined consonants below main glyphs, creates a dynamic flow suited to religious texts or artistic layouts. Ancient scripts like , used for around 1450–1200 BCE, occasionally employed writing—alternating direction per line (left to right, then right to left)—on clay tablets. Unicode Technical Annex #50 (UAX #50) addresses these needs by defining the Vertical_Orientation property, which specifies default behaviors such as upright positioning or 90-degree rotation for over 100 characters across scripts, enabling consistent rendering in vertical contexts without relying solely on font-specific adjustments. This property supports bidirectional interactions briefly noted in text directionality handling, ensuring mixed vertical-horizontal flows remain coherent.

Key Characteristics

Text Directionality

Text directionality in complex text layout (CTL) refers to the foundational rules governing how text flows, either from left to right (LTR) or right to left (), particularly in mixed- content. For languages like English, the default base is LTR, while scripts such as Hebrew and use as the base . The base of a is typically determined by the first strong directional encountered, which could be L (left-to-right, e.g., Latin letters), R (right-to-left, e.g., Hebrew letters), or AL ( letters with right-to-left ). If no strong is present, higher-level protocols may set the explicitly. The Bidirectional Algorithm (UBA), defined in Unicode Standard Annex #9 (UAX #9), provides a standardized method to resolve directionality through an 18-rule process divided into phases: separating paragraphs, resolving embedding levels, handling weak and neutral characters, and final reordering. For instance, Rule P2 identifies paragraph separators, and Rule P3 sets the paragraph level to 0 (LTR) or 1 () based on the first strong character. Explicit directional overrides are managed by formatting codes, such as Rule X2 for RLE (Right-to-Left Embedding), which raises the embedding level to the next odd number to force direction within a segment, later terminated by PDF (Pop Directional Format). Rule L1 then resets the levels of paragraph separators, trailing whitespace, and isolate terminators to match the paragraph's base level. Directionality operates at both paragraph and inline levels within CTL. Paragraphs are processed independently, split by B-type (paragraph separator) characters, with each establishing its own base direction before line-by-line reordering. Inline elements, such as embedded text or objects, inherit or adapt to the surrounding context, treating inline objects as the neutral U+FFFC character for direction resolution. Weak directional characters, including numbers, are resolved in the algorithm's third phase using Rules W1 through W7; for example, European numbers (EN) adapt by changing to Arabic numbers (AN) if preceded by right-to-left characters like AL (Rule W2), or to left-to-right if preceded by L (Rule W7), ensuring numbers align appropriately in RTL contexts without disrupting the overall flow. In web and document technologies, directionality can be overridden using standards like CSS. The CSS direction property specifies the base inline direction as ltr or rtl for an element, influencing the UBA's paragraph level and affecting text ordering, table layouts, and overflow behavior. Complementing this, the unicode-bidi property controls bidirectional embedding and isolation, with values like embed (inserting LRE or RLE codes), isolate (using directional isolates for scoped direction), or bidi-override (forcing direction regardless of character types), allowing precise control over mixed-direction rendering while integrating with the UBA. These properties enable authors to handle CTL in bidirectional scripts, such as embedding LTR quotes in RTL text.

Glyph Shaping and Ligatures

Glyph shaping transforms sequences of Unicode code points into positioned glyphs for accurate rendering in complex scripts, primarily through substitutions defined in the OpenType GSUB (Glyph Substitution) table. The process begins with mapping Unicode characters to initial glyph indices via the font's cmap (character-to-glyph mapping) table, followed by application of script- and language-specific OpenType features by a shaping engine, such as HarfBuzz or Microsoft's Uniscribe. These features apply contextual substitutions, resulting in an output glyph string that accounts for script requirements like cursive joining or syllabic clustering. For instance, the 'rlig' feature enforces required ligatures, while others handle positional variants. Ligatures represent a key substitution mechanism, replacing multiple input glyphs with a single composite to enhance or aesthetics. Discretionary ligatures, activated via the 'dlig' feature, are optional and common in Latin scripts, such as the "fi" combination where the dot of 'i' overlaps the crossbar of 'f' to avoid collision. In contrast, contextual ligatures are mandatory in scripts like , where the 'rlig' feature substitutes specific sequences; a prominent example is the Lam-Alef ligature (لام + الف → لا), which joins the lam and alef consonants into a unified form essential for orthographic correctness across initial, medial, final, and isolated positions. These substitutions ensure fluid connections without gaps or overlaps. In abugida scripts like those of Indic languages, position-specific forms further refine glyph substitution to reflect syllabic structure. The 'rphf' feature substitutes a special reph form for the 'ra' consonant (र) when followed by a virama (halant) in a conjunct, repositioning it visually after the subsequent base consonant, often in an above-base position. Similarly, the 'vatu' feature applies above-base substitutions for vattu forms, such as elevating certain consonant clusters above the primary base glyph in scripts like Telugu or Kannada. Khmer script employs analogous mechanisms, where the 'pres' (pre-base substitutions) and 'abvs' (above-base) features split certain vowel signs; for example, the OE vowel (អើ) decomposes into a pre-base part and an above-base component, ensuring proper attachment around the consonant without overlap. The GSUB table organizes these substitutions into lookups, which can number in the thousands for complex scripts due to the combinatorial possibilities of contextual rules. In fonts, such as those supporting Naskh styles, GSUB lookups handle joining behaviors and ligatures across hundreds of variants, demonstrating the table's capacity for intricate rule sets. This framework, integral to font technologies, enables consistent rendering across diverse writing systems.

Reordering and Positioning

Reordering in complex text layout involves transforming the logical sequence of characters— as entered or stored—into a visual order suitable for display, particularly in bidirectional and complex scripts. In bidirectional scripts like and , the Bidirectional Algorithm (UBA) performs this logical-to-visual reordering by assigning embedding levels to characters based on their directional properties. For example, Hebrew text is input in logical order from left to right, but the UBA reverses it for right-to-left visual presentation; thus, the logical sequence "AB" (where A and B are Hebrew characters) appears as "BA" visually. This process resolves mixed directional runs, ensuring that left-to-right (LTR) segments, such as embedded numbers or Latin text, are correctly nested within right-to-left () contexts. The UBA supports up to 61 explicit embedding levels, with even levels indicating LTR direction and odd levels RTL, allowing for deeply nested bidirectional structures without exceeding practical limits. In Indic scripts, reordering also repositions dependent vowels known as s relative to their base to achieve proper syllabic structure. Pre-base matras, which appear after the base consonant in logical order, are repositioned to precede the during rendering. For instance, in , the short 'i' (ि) follows the base 'ka' (क) logically as कि, but is rendered with the before the 'ka'. In clusters, such as 'ka' + 'i-' + + 'ta' for "क्ति", the is reordered before the 'ka' after forming the . This reordering occurs after initial decomposition and before applying features like half-forms, relying on script-specific rules to maintain phonetic and aesthetic integrity. Positioning adjustments fine-tune metrics post-reordering to handle spacing and attachments. The GPOS (Glyph Positioning) table enables precise control, including to adjust inter-glyph spacing—such as reducing the advance width between a lowercase "f" and "i" by a specified value—and mark attachment for anchoring to base glyphs. In mark-to-base positioning, a like a kasra (below a base letter in ) is aligned using anchor points, offsetting its x and y coordinates relative to the base glyph's attachment point for accurate placement. Mark-to-mark attachments further position stacked diacritics, such as a tone above a , ensuring layered . Line breaking in complex scripts requires tailored rules to identify permissible breaks, often beyond simple spaces. Unicode Annex #14 (UAX #14) defines these via character classes and rules, but for scripts like Thai—which lack spaces between words—breaks are restricted to or word boundaries determined by -based analysis. Thai characters fall into the SA (South East Asian) class, where a morphological reclassifies runs (e.g., assigning BB for word beginnings and for continuations) to enable breaks only at valid points, preventing disruptions in tonal or forms. For vertical layouts common in East Asian writing systems, metrics ensure proper ideograph positioning and line progression. Under UAX #50, Han ideographs and similar characters remain upright (Vertical_Orientation property "U") in vertical text, with baselines aligned centrally within the em-box for consistent column flow. Vertical metrics, such as those in OpenType's VORG or VDMX tables, define line gaps and advance heights tailored to ideographs, accommodating mixed orientations where Latin insertions rotate while ideographs stay upright to preserve in traditional formats.

Standards and Specifications

The Standard, maintained by the , serves as the primary framework for complex text layout (CTL) by assigning unique code points to characters from diverse s and defining properties that enable algorithms for directionality, shaping, and positioning. First released as version 1.0 in 1991, the standard has evolved through annual updates, reaching version 17.0 in September 2025, with each iteration expanding support for CTL through refined character properties such as Bidi_Class (which categorizes characters for bidirectional resolution) and Script (which identifies the for appropriate rendering rules). These properties are documented in the Unicode Character Database (UCD), part of Unicode Standard Annex #44, and form the basis for CTL processing in software implementations. Key supporting specifications include Unicode Standard Annex #9, which outlines the Bidirectional Algorithm for handling mixed directional text in scripts like and Hebrew. Annex #14 details the Line Breaking Algorithm, specifying rules for identifying break opportunities in complex scripts to prevent improper word or syllable division. For shaping in scripts requiring glyph reordering and contextual forms, such as Indic and Southeast Asian languages, the standard relies on properties like Indic_Syllabic_Category and Joining_Type defined in the UCD. Annex #50 addresses vertical text layout, providing orientation properties for scripts like Mongolian and traditional that flow top-to-bottom. Additionally, Unicode Technical Report #17 describes the model that accommodates complex representations, such as composite sequences for scripts with inherent variability. The maintains synchronization with ISO/IEC 10646, the for the Universal Coded Character Set (UCS), ensuring identical character repertoires and encoding forms like , UTF-16, and UTF-32 for global interoperability. Recent versions have incorporated new complex scripts to preserve endangered writing systems; for instance, Unicode 10.0 (2017) added the Masaram Gondi block (U+11D00–U+11D5F), a Brahmi-derived script for the requiring vowel signs and reph positioning. Unicode 12.0 (2019) introduced the block (U+119A0–U+119FF), a historical variant used for with matra attachments and conjunct forms. Unicode 17.0, released on September 9, 2025, adds 4,803 characters to reach a total of 159,801, including the new Tolong Siki block (U+11DB0–U+11DBF) for the , a script requiring vowel sign positioning around consonants to form syllables.

OpenType and Font Technologies

, developed jointly by and , serves as the predominant font format for enabling complex text layout through its layout s, which allow for script-specific substitutions, positioning, and classifications. The core of OpenType's CTL capabilities lies in three key tables: the Substitution (GSUB), which handles replacements such as ligatures and contextual forms; the Positioning (GPOS), which manages precise adjustments for , mark placement, and cursive connections; and the Definition (GDEF), which defines classes like base glyphs, ligatures, and marks to facilitate efficient processing by GSUB and GPOS. These tables, introduced in OpenType 1.0 and refined in subsequent versions, enable fonts to implement Unicode-based script requirements without altering the underlying text encoding. Prior to widespread OpenType adoption, alternative technologies existed for CTL. Apple's Advanced Typography (AAT), part of the Apple Type Services framework, provided similar functionality through tables like 'mort' for substitutions and 'morf' or 'trak' for positioning, but it was largely proprietary and tied to Apple's ecosystem. AAT was deprecated starting with Mac OS X 10.5 Leopard in 2007, with Apple shifting focus to OpenType for cross-platform compatibility and broader script support. Another alternative is Graphite, developed by SIL International as an open-source, rule-based system embedded in TrueType or OpenType-compatible fonts using custom tables like 'Silf' for layout rules and 'Sill' for state tables. Graphite excels in flexibility for non-Latin scripts not fully covered by OpenType standards, allowing programmers to define complex behaviors directly in the font without relying on external engines. OpenType version 1.8, released in 2016, introduced variable fonts, which extend CTL efficiency by packaging multiple stylistic variations—such as weight, width, or optical size—into a single font file using axes defined in the 'fvar' table and interpolated via 'gvar' for glyphs. This reduces file sizes and loading times for CTL scenarios involving diverse typographic needs across scripts, as a single variable font can adapt to localization or emphasis requirements without multiple static files. OpenType's feature system further supports over 30 scripts, including Arabic, Devanagari, and Thai, through tags like 'locl' for localized glyph forms and 'mark' for attaching diacritics and combining marks to base characters, ensuring proper rendering in bidirectional or shaping contexts as per Unicode properties.

Implementations

Software Libraries

is an open-source text shaping library initiated in 2006 by Behdad Esfahbod as part of the project and currently maintained by and . It provides comprehensive support for features, Apple Advanced Typography (AAT), and shaping models, enabling accurate selection, positioning, and ligature formation for complex scripts across various writing systems. Widely adopted in web browsers such as and Mozilla Firefox, HarfBuzz ensures consistent rendering of bidirectional and cursive text in these environments. As of November 2025, version 12.2.0 includes optimizations for font subsetting and integration with modern graphics APIs, while adding full support for 17.0 characters released in 2025. On modern hardware, HarfBuzz achieves high throughput, with recent releases like 11.3 delivering up to 45% faster advance calculations. The (ICU) library, developed by and now maintained under the , incorporates a LayoutEngine module for handling complex text layout in cross-platform applications. This engine integrates bidirectional algorithm processing with glyph shaping, supporting features for scripts like , , and Indic languages through its C, C++, and APIs. Designed for embedding in software such as web engines and document processors, ICU's LayoutEngine processes runs of text in a single font and direction, facilitating reordering and positioning without relying on platform-specific rendering. Other notable libraries include Microsoft's Uniscribe, a legacy introduced in the early 2000s for Unicode text rendering and complex script support, which handles paragraph-level layout using tables but is increasingly supplemented by newer DirectWrite APIs. Apple's Core Text framework provides low-level text shaping and layout capabilities optimized for macOS and , leveraging AAT and for high-performance glyph positioning in applications like . For Rust ecosystems, wrappers such as harfbuzz-rs offer safe bindings to , enabling text shaping in systems programming without direct C interop, while rustybuzz provides a pure-Rust implementation of the core shaping algorithm for memory-safe environments.

Operating System and Application Support

On Microsoft Windows, DirectWrite, introduced in 2009 with , serves as the primary for high-quality text rendering, incorporating full support for complex scripts through its integration with the Uniscribe engine. Uniscribe, a longstanding component of the Windows text processing stack, handles , glyph shaping, and reordering for a wide array of scripts, enabling applications to support numerous languages including , Hebrew, Indic, and Southeast Asian writing systems. The DWriteCore library extends this functionality to non-Windows environments while maintaining compatibility with Windows' native complex text layout capabilities. Apple's macOS and platforms rely on Core Text as the core framework for text layout and rendering, providing robust support for complex scripts through features like glyph positioning, bidirectional algorithms, and font fallback mechanisms. Core Text processes text streams to generate positioned glyph runs, accommodating right-to-left and vertical writing modes essential for languages such as , Hebrew, and East Asian scripts. Since in 2022, enhanced integration with open-source libraries like has allowed developers to leverage advanced shaping for even more precise control over complex text rendering in custom applications. Linux operating systems typically employ for text shaping in conjunction with for glyph rasterization, forming a lightweight yet powerful stack for complex text layout in desktop environments like and . This combination ensures accurate handling of script-specific features, such as ligature formation in or reordering in Indic scripts, across graphical toolkits and applications. On , similarly powers the system's text engine, integrated into the Android framework to deliver consistent complex script support for diverse languages in user interfaces and apps. Web browsers achieve complex text layout through adherence to the CSS Writing Modes Level 3 specification, which defines properties for controlling text direction, inline progression, and glyph orientation to support bidirectional and vertical flows. Modern engines in Chrome and Firefox utilize underlying shapers like HarfBuzz, while Safari relies on Core Text, enabling web content to render scripts such as Mongolian vertical text or Arabic cursive joining without platform-specific dependencies. Major applications have incorporated dedicated complex text layout engines to meet professional needs. Adobe has provided comprehensive CTL support since the CS3 release in 2007, with the World-Ready paragraph composer enabling advanced features like contextual glyph substitution and bidirectional paragraph composition for scripts including , Hebrew, and Indic languages. Microsoft applications, particularly from versions post-2010, feature enhanced rendering for and Hebrew through improved Uniscribe integration, offering better visual , ligature application, and right-to-left text alignment in tools like Word and PowerPoint. Recent mobile OS updates have further refined CTL for specific scripts.

Challenges and Advances

Persistent Issues

One persistent challenge in complex text layout (CTL) is the incomplete coverage of fonts for minority and low-resource scripts. Although Unicode encodes over 150 scripts, many minority ones lack comprehensive OpenType features necessary for proper glyph shaping, ligature formation, and positioning. For instance, analysis of Unicode versions 6.0 to 9.0 (2010–2016) revealed that over 40% of newly added scripts had no available fonts supporting their layout requirements at the time of encoding. Projects like Google's font family have addressed some gaps by providing open-licensed coverage for most Unicode scripts, yet full OpenType support remains absent for numerous endangered and minority writing systems, limiting accurate digital representation. Performance bottlenecks continue to affect CTL, particularly in handling scripts with high glyph counts or intricate shaping rules. Rendering complex pages, such as PDFs with 1000+ glyphs, demands significant CPU resources due to the computational intensity of bidirectional analysis, contextual substitution, and positioning algorithms. Text shaping engines like , while optimized, incur overhead from frequent glyph lookups and feature applications, leading to delays in resource-constrained environments like mobile devices or legacy systems. Interoperability variations between shaping libraries pose another ongoing issue, resulting in inconsistent text rendering across applications and platforms. For example, and ICU (International Components for Unicode) differ in their handling of Thai text stacking, where diacritic positioning and vowel marks may vary due to distinct implementations of tables and script-specific rules. These discrepancies can lead to visual artifacts, such as misaligned clusters or incorrect ligatures, complicating cross-platform development and document exchange. Accessibility challenges are particularly acute for users relying on screen readers with reordered or bidirectional text. Screen readers often struggle to convey logical reading order in complex scripts like Arabic or Hebrew, presenting content in visual rather than semantic , which confuses navigation and comprehension. Pre-CSS Writing Modes Level 3 implementations (prior to widespread CSS4 adoption) exacerbated web rendering inconsistencies, with browsers varying in support for inline progression and baseline alignment in mixed-script layouts. A 2023 survey highlighted these issues, reporting a 15% error rate in accurate text rendering for low-resource languages like Shan across common assistive technologies.

Recent Developments

In recent years, the has continued to expand support for complex text layout through major version releases. 16.0, released on September 10, 2024, introduced seven new scripts, including Tulu-Tigalari, which requires complex glyph shaping and positioning for proper rendering. 17.0, released on September 9, 2025, added four additional scripts, with Tai Yo featuring intricate layout requirements involving reordering and ligature formation. The open-source shaping library has seen significant enhancements for complex text processing. Version 10.3.0, released on February 11, 2025, delivered substantial performance improvements to Apple Advanced Typography (AAT) shaping, benefiting scripts with complex contextual rules. More recently, version 12.0.0 on September 27, 2025, enabled support for the VARC (Variable Composites) table by default, optimizing handling in complex layouts by allowing dynamic glyph composition. Version 12.2.0, released on November 5, 2025, aligned HarfBuzz's syllable-based ChainContext rules with Windows implementations, enhancing consistency for Indic and other complex scripts. Web standards have advanced to better accommodate bidirectional and ruby annotations in complex text. The CSS Text Module Level 4 was published as a Working Draft on May 29, 2024, introducing refined controls for text wrapping, justification, and white space processing that interact with bidirectional algorithms. It builds on the unicode-bidi property to provide finer for mixed-directionality , reducing embedding errors in layouts with right-to-left and left-to-right scripts. For text, often used in East Asian complex layouts, CSS Ruby Module Level 1 integrations with Text Level 4 enable advanced positioning without disrupting alignment. Microsoft's Universal Shaping Engine (USE) has been updated to support emerging Unicode scripts. As of 2024, it accommodates complex scripts from Unicode 16.0, including those requiring multi-stage glyph reordering, extending prior coverage of Unicode 15.0 scripts like ADLaM. Open-source efforts for underrepresented scripts have progressed notably; for instance, full shaping support for the ADLaM script—used for Fulani languages in West Africa—was integrated into HarfBuzz and related font tools in 2019, with W3C layout requirements documented in May 2024 to guide browser and e-book implementations. Browser vendors have implemented these advancements, leading to more efficient complex text rendering. In 2025, Chromium-based browsers, including , rolled out enhanced text rendering on Windows, improving subpixel and , which has reduced visual artifacts in various contexts.

References

  1. [1]
    About Complex Scripts - Win32 apps | Microsoft Learn
    Jan 7, 2021 · Processing a complex script must account for the difference between the logical (keystroke) order and the visual order of the glyphs.
  2. [2]
    Chapter 1 Complex Text Layout Languages
    A Complex Text Layout (CTL) language is any language which stores text differently from how it is displayed.
  3. [3]
    Text — SVG 2
    complex text layout where: there is not always a one-to-one correspondence between characters and glyphs, characters may change shape depending on location (e. ...
  4. [4]
    Desktop Technologies -- CTL - The Open Group
    The Open Group's Complex Text Layout (CTL) pre-structured technology (PST) project integrates the display and editing of complex text languages.
  5. [5]
    TOG Press Release - CTL 1.0 - The Open Group
    Complex Text Layout enables open system and software vendors to penetrate new, international markets while reducing the costs associated with localizing ...Missing: i18n | Show results with:i18n
  6. [6]
    [PDF] The Non-Latin scripts & typography Kamal Mansour 1 Introduction
    It is quite common in typographic terminology to divide the world's scripts into Latin and non-Latins. At first glance that might seem to be a reasonable.
  7. [7]
    Early Years of Unicode
    Mar 26, 2015 · Unicode's groundwork began in late 1987 with discussions by Joe Becker, Lee Collins, and Mark Davis. The term "Unicode" was coined in December ...
  8. [8]
    ReadMe-2.0.14.txt - Unicode
    These are the categories required by the Bidirectional Behavior Algorithm in the Unicode Standard. These categories are summarized in Chapter 4 of the Unicode ...
  9. [9]
    Archive of OpenType versions - Typography - Microsoft Learn
    released April 1997. For a detailed change history, see the change log for the ...Missing: text shaping
  10. [10]
    State of Text Rendering - Behdad Esfahbod
    Jul 5, 2009 · Around 2006 Pango and Qt developers cooperated to reunify the layout engine again, and HarfBuzz was born as a freedesktop.org project. Initially ...
  11. [11]
    CSS Writing Modes Module Level 3
    ### Summary of CSS Writing Modes Module Level 3 (W3C Working Draft, 01 February 2011)
  12. [12]
    About Versions
    ### Summary of Unicode Expansions Pre-2020 Addressing Gaps in Minority Scripts for Complex Text Layout
  13. [13]
    Progress Updates - Script Encoding Initiative
    This page compiles past reports and annual summaries of achievement. Detailed updates on SEI's work are published quarterly in the Unicode Document Registry.Missing: pre- | Show results with:pre-
  14. [14]
  15. [15]
  16. [16]
  17. [17]
  18. [18]
  19. [19]
  20. [20]
  21. [21]
    World Arabic Language Day - the United Nations
    Dec 18, 2024 · Arabic, spoken by over 450 million people and holding official status in nearly 25 countries, is a global language with immense cultural ...Missing: total | Show results with:total
  22. [22]
  23. [23]
  24. [24]
  25. [25]
    Chapter 12 – Unicode 17.0.0
    The Unicode Standard encodes Devanagari characters in the same relative positions as those coded in positions A0–F416 in the ISCII-1988 standard. The same ...
  26. [26]
    Developing OpenType Fonts for Devanagari Script - Typography
    The new Indic shaping engine allows for variations in typographic conventions, giving a font developer control over shaping by the choice of designation of ...
  27. [27]
    Southeast Asia-I - Unicode
    Some of the vowel signs and all of the tone marks are rendered in the script as diacritics attached above or below the base consonant. These combining signs and ...Missing: traps | Show results with:traps
  28. [28]
  29. [29]
    Arabic & Persian Layout Requirements - W3C
    Oct 2, 2025 · In addition to the four joining forms (isolated, initial, medial, and final), each Arabic letter can come with different shapes while preserving ...
  30. [30]
    Developing OpenType Fonts for Arabic Script - Microsoft Learn
    Jun 9, 2022 · Glyph - A glyph represents a form of one or more characters. For example, the final, initial and medial 'lam' glyphs (U+FEDE, U+FEDF & U+ ...
  31. [31]
    Mongolian Script Resources - W3C
    Jan 16, 2025 · Modern Mongolian can be written using a subset of the letters available in the Mongolian Unicode block. ... The script is cursive, ie.
  32. [32]
    UAX #50: Unicode Vertical Text Layout
    This report describes a Unicode character property which can serve as a stable default orientation of characters for reliable document interchange.Overview and Scope · The Vertical_Orientation... · Scope of the Property
  33. [33]
    UTN #22: Robust Vertical Text Layout - Unicode
    Apr 25, 2005 · Vertical text is the traditional mode of text layout for many East Asian writing systems. It is also used for effects such as vertical headers ...
  34. [34]
    UTR #50: Unicode Vertical Text Layout
    Some languages, however, have publishing traditions that provide for long-format vertical text presentation, notably East Asian languages such as Japanese. In ...Unicode Vertical Text Layout · 1 Overview And Scope · 5 Glyphs Changes For...
  35. [35]
    Mongolian Layout Requirements - W3C
    Jul 10, 2025 · This document describes the basic requirements for Mongolian script layout and text support on the Web and in eBooks.
  36. [36]
    [PDF] Mongolian Script Rendering Issues - Unicode
    Abstract. This paper discusses the rendering issues of complex text layouts, particularly traditional Mongolian script. Solving the rendering issues of.
  37. [37]
    Requirements for Tibetan Text Layout and Typography - W3C
    Apr 2, 2024 · Text direction​​ Tibetan is normally written horizontally and read from left to right. Occasionally, Tibetan text may occur in vertically-set ...
  38. [38]
  39. [39]
    Linear B Script - World History Encyclopedia
    Aug 4, 2023 · Linear B script was the writing system of the Mycenaean civilization of the Bronze Age Mediterranean. The syllabic script was used to write Mycenaean Greek ...Missing: multidirectional | Show results with:multidirectional
  40. [40]
    UAX #9: Unicode Bidirectional Algorithm
    This annex describes the algorithm used to determine the directionality for bidirectional Unicode text.
  41. [41]
    CSS Writing Modes Level 3 - W3C
    Dec 10, 2019 · CSS Writing Modes Level 3 defines CSS support for various writing modes and their combinations, including left-to-right and right-to-left text ordering.Missing: globalization | Show results with:globalization
  42. [42]
    GSUB — Glyph Substitution Table (OpenType 1.9.1) - Microsoft Learn
    May 29, 2024 · The Glyph Substitution (GSUB) table provides data for substitution of glyphs for appropriate rendering of scripts, such as cursively-connecting forms in Arabic ...Missing: Devanagari | Show results with:Devanagari
  43. [43]
    Windows glyph processing for OpenType fonts, part 1 - Microsoft Learn
    Nov 17, 2020 · When applications have needed to provide more complicated text processing for complex scripts [1] or sophisticated typography, they have ...Opentype Fonts · Opentype Layout Services · Uniscribe
  44. [44]
    Registered features, a-e (OpenType 1.9.1) - Typography
    Jul 6, 2024 · Recommended implementation: This feature is used to map sequences that form Akhands to the corresponding ligature glyph (GSUB lookup type 4).
  45. [45]
    Chapter 9 – Unicode 17.0.0
    Summary of each segment:
  46. [46]
    Creating and supporting OpenType fonts for the Universal Shaping ...
    This document presents information that will help font developers in creating OpenType fonts for complex scripts included in the Unicode Standard 16.0
  47. [47]
    Registered features, p-t (OpenType 1.9.1) - Typography
    May 31, 2024 · For Indic scripts, the following features should be applied in order: 'nukt', 'akhn', 'rphf', 'rkrf', 'pref', 'blwf', 'half', 'pstf', 'cjct'.
  48. [48]
    Developing OpenType Fonts for Khmer Script - Microsoft Learn
    Jun 24, 2022 · This document helps font developers create OpenType fonts for Khmer, covering encoding, character sets, and using tools to produce Khmer fonts.Missing: Indic reph vattu
  49. [49]
    UAX #9: The Bidirectional Algorithm - Unicode
    Summary. This document describes specifications for the positioning of characters flowing from right to left, such as Arabic or Hebrew.
  50. [50]
    GPOS — Glyph Positioning Table (OpenType 1.9.1) - Typography
    May 29, 2024 · A mark-to-mark attachment positions one mark relative to another, as when positioning tone marks with respect to vowel diacritical marks in ...
  51. [51]
    UAX #14: Unicode Line Breaking Algorithm
    The third style is used for scripts such as Thai, which allow line breaks only at word boundaries, but do not mark word boundaries in any way, so that the ...Definitions · Line Breaking Properties · Line Breaking Algorithm · Customization
  52. [52]
    Unicode 17.0.0
    Sep 9, 2025 · Unicode 17.0 adds 4803 characters, for a total of 159,801 characters. The new additions include 4 new scripts: Sidetic; Tolong Siki; Beria ...Missing: Rohingya Garay
  53. [53]
    UAX #44: Unicode Character Database
    Aug 27, 2025 · This annex provides the core documentation for the Unicode Character Database (UCD). It describes the layout and organization of the Unicode Character Database.Missing: pre- | Show results with:pre-
  54. [54]
  55. [55]
  56. [56]
    [PDF] Hanifi Rohingya - The Unicode Standard, Version 17.0
    These charts are provided as the online reference to the character contents of the Unicode Standard, Version 17.0 but do not provide all the information needed ...
  57. [57]
    OpenType layout common table formats (OpenType 1.9.1)
    Jul 6, 2024 · OpenType Layout makes use of five tables: the Glyph Substitution table (GSUB), the Glyph Positioning table (GPOS), the Baseline table (BASE), ...Missing: Adobe | Show results with:Adobe
  58. [58]
    About Apple Advanced Typography Fonts
    This document explains the font tables you include in the 'sfnt' resource in order for your font to offer special features and effects.Missing: deprecated 10.5
  59. [59]
    Postscript is gone, long live TrueType and OpenType - AppleInsider
    Nov 8, 2023 · Apple deprecated all support for ATSUI and WorldScript in Mac OS X 10.5 Leopard - and support for Mac OS X's ATS.framework (Apple Type Services) ...
  60. [60]
    Graphite technical overview
    The Graphite engine uses the font, particularly the Graphite-specific tables, to perform text layout. The dotted arrow between the engine and the output device ...Missing: non- | Show results with:non-
  61. [61]
    OpenType Font Variations Overview - Microsoft Learn
    May 30, 2024 · OpenType Font Variations allow a font designer to incorporate multiple font faces within a font family into a single font resource.
  62. [62]
    Introducing OpenType Variable Fonts | by John Hudson - Medium
    Sep 14, 2016 · An OpenType variable font is one in which the equivalent of multiple individual fonts can be compactly packaged within a single font file.
  63. [63]
    Registered features, k-o (OpenType 1.9.1) - Typography
    May 31, 2024 · Application interface: In recommended usage, this feature triggers positioning of mark glyphs required for correct layout. It should always be ...
  64. [64]
    HarfBuzz text shaping engine - GitHub
    HarfBuzz is a text shaping engine. It primarily supports OpenType, but also Apple Advanced Typography. HarfBuzz is used in Android, Chrome, ChromeOS, Firefox, ...Harfbuzz Wiki · HarfBuzz · Issues 84 · Pull requests 7Missing: 2006 | Show results with:2006
  65. [65]
    Why is it called HarfBuzz?
    This project is maintained by Behdad Esfahbod, who named it HarfBuzz. Originally, it was a shaping engine for OpenType fonts—"HarfBuzz" is the Persian for "open ...Missing: history | Show results with:history
  66. [66]
    Releases · harfbuzz/harfbuzz - GitHub
    The Fontra font editor already supports this technology. Note that this new format involves just the HarfBuzz draw API and does not affect shaping.Missing: text history
  67. [67]
    HarfBuzz 11.3 Delivers Significant Performance Improvements
    Jul 21, 2025 · Drawing can be up to 40% faster, calculating glyph extents up to 15% faster, and getting horizontal glyph advances up to 45% faster. HarfBuzz ...Missing: second | Show results with:second
  68. [68]
    Layout Engine | ICU Documentation
    The ICU LayoutEngine is designed to process a run of text which is in a single font. It is written in a single direction (left-to-right or right-to-left), and ...
  69. [69]
    Displaying Text with Uniscribe - Win32 apps | Microsoft Learn
    Jan 7, 2021 · An application that uses complex scripts has the following problems with a simple approach to layout and display. The width of a complex script ...
  70. [70]
    Core Text | Apple Developer Documentation
    Core Text provides a low-level programming interface for laying out text and handling fonts. The Core Text layout engine is designed for high performance.Core Text Programming Guide · Core Text Enumerations · Core Text Structures
  71. [71]
    harfbuzz/harfbuzz_rs: A fully safe Rust wrapper for the ... - GitHub
    harfbuzz_rs is a high-level interface to HarfBuzz, exposing its most important functionality in a safe manner using Rust.
  72. [72]
    harfbuzz/rustybuzz - GitHub
    rustybuzz passes nearly all of harfbuzz shaping tests (2221 out of 2252 to be more precise). So it's mostly identical, but there are still some tiny ...Missing: November | Show results with:November
  73. [73]
    DirectWrite (DWrite) - Win32 apps - Microsoft Learn
    Oct 4, 2021 · Today's applications must support high-quality text rendering, resolution-independent outline fonts, and full Unicode text and layout support.
  74. [74]
    Script and font support in Windows - Globalization - Microsoft Learn
    Mar 11, 2025 · The following complex scripts in Unicode 7.0 are supported in the Universal Shaping Engine. Balinese, Batak, Brahmi, Buginese, Buhid, Chakma, ...
  75. [75]
    Core Text | Apple Developer Documentation
    Core Text provides a low-level programming interface for laying out text and handling fonts. The Core Text layout engine is designed for high performance, ease ...Missing: HarfBuzz 2022
  76. [76]
    Core Text integration: HarfBuzz Manual
    HarfBuzz offers an additional API that can help integrate with Apple's Core Text engine and the underlying Core Graphics framework.
  77. [77]
    FreeType integration: HarfBuzz Manual
    HarfBuzz provides integration points with FreeType at the face-object and font-object level and for the font-functions virtual-method structure of a font object ...
  78. [78]
    Using right-to-left languages in Office - Microsoft Support
    Open an Microsoft 365 program file, such as a Word document. On the File tab, choose Options > Language. In the Set the Office Language Preferences dialog box, ...
  79. [79]
    Bridging the Divide: Supporting Minority and Historic Scripts in Fonts
    Especially true for scripts in Unicode versions 6.0 to 9.0 (2010 – 2016), where over 40% of the scripts have no fonts. (Unicode version 10.0 was released in ...Missing: percentage | Show results with:percentage
  80. [80]
    Technical Affordances of Multilingual Publication from Manuscripts ...
    Sep 20, 2024 · In this article, we move between these definitions as needed to illustrate different challenges across the history of text technologies. The ...
  81. [81]
    Layout and Complex Text Processing - Simon Cozens
    The task of the bidi algorithm is to swap around the characters in a text to convert it from its logical order into its visual order - the visual order being ...
  82. [82]
    Text rendering and fonts | Qt for MCUs 2.11.1
    This default behavior comes with a performance overhead caused by the frequent calls to drawing engine. On platforms where this behavior leads to slower ...Text Rendering And Fonts · Overview · Monotype
  83. [83]
    difference between icu4c opentype harfbuzz - Stack Overflow
    Mar 13, 2013 · HarfBuzz is a text shaping library, in short it takes a font, a string of text and some properties (script, language, optional OpenType ...ICU layout engine - Stack OverflowICU Layout sample renders text differently than Microsoft Notepad ...More results from stackoverflow.com<|separator|>
  84. [84]
    Best Practices for Authoring HTML: Handling Right-to-left Scripts
    Jul 14, 2009 · This document provides advice on practical techniques related to the creation of content in languages that use right-to-left scripts, such as ...W3c Working Draft 14 July... · Table Of Contents · 1 Introduction
  85. [85]
    [PDF] A Concise Survey of OCR for Low-Resource Languages
    Jun 21, 2024 · Post-OCR processing aims to rectify mistakes made by OCR systems in text extraction, and can be extremely valuable for low-resource languages.Missing: rendering | Show results with:rendering
  86. [86]
    Unicode 16.0.0
    Sep 10, 2024 · Script-related Changes. There are seven new scripts encoded in Unicode 16.0. Some of these scripts, such as Tulu-Tigalari, have complex layout.
  87. [87]
    CSS Text Module Level 4 - W3C
    May 29, 2024 · The CSS Text Module Level 4 defines properties for text manipulation, including line breaking, justification, alignment, white space, and text ...
  88. [88]
    unicode-bidi - CSS - MDN Web Docs
    Oct 30, 2025 · The element does not offer an additional level of embedding with respect to the bidirectional algorithm. For inline elements, implicit ...
  89. [89]
    CSS Text Module Level 4
    Oct 29, 2025 · This CSS module defines properties for text manipulation and specifies their processing model. It covers line breaking, justification and alignment, white ...
  90. [90]
    Adlam Script Resources - W3C
    Nov 14, 2024 · This document points to resources for the layout and presentation of text in languages that use the Adlam script.Missing: source | Show results with:source
  91. [91]
    Better text contrast for all Chromium-based browsers on Windows
    Jan 30, 2025 · We're happy to announce that our enhanced text rendering is now available for users across all Chromium-based browsers on Windows.Missing: complex Indic latency