Fact-checked by Grok 2 weeks ago

Line break

A line break is a control character or sequence of control characters in character encoding standards such as ASCII and Unicode that signifies the termination of one line of text and the initiation of the next, facilitating structured text display and processing in both digital and print media.^[1]^[2] In computing, line breaks trace their origins to mechanical typewriters and early printers, where carriage return (CR, U+000D) moved the print head to the beginning of the line, and line feed (LF, U+000A) advanced the paper to the next line.^[3] These functions evolved into digital conventions, with platforms adopting different representations: Unix-like systems use LF alone, Windows employs the CR LF sequence (U+000D followed by U+000A), and older Macintosh systems used CR.^[1] Unicode standardizes handling through the Newline Function (NLF), which normalizes CR, LF, CR LF, and Next Line (NEL, U+0085) as equivalent line separators, while providing dedicated characters like Line Separator (LS, U+2028) and Paragraph Separator (PS, U+2029) for unambiguous semantic breaks.^[1] This variation has led to compatibility challenges in file exchanges and software development, prompting recommendations to map NLF to platform-specific formats on output while treating them uniformly on input.^[1] In typography and layout, line breaking refers to the algorithmic process of wrapping text within a defined width, balancing readability, aesthetics, and linguistic rules across scripts.^[4] For space-separated languages like English or Arabic, breaks typically occur at word boundaries to avoid awkward hyphenation, whereas scripts without spaces—such as Thai, Japanese, or Chinese—rely on syllable, character, or dictionary-based rules to insert breaks without disrupting meaning.^[4] Advanced systems consider justification, punctuation placement, and cultural conventions, such as prohibiting breaks before certain particles in Korean Hangul or after tsek marks in Tibetan.^[4] In digital contexts like web development, the HTML <br> element explicitly inserts a line break, distinct from automatic wrapping or paragraph breaks.^[5] Line breaks play a critical role in text editors, programming, and document formats, where inconsistent handling can cause rendering issues, but standardized guidelines from bodies like the Unicode Consortium ensure interoperability across global applications.^[1]

History

Origins in early text transmission

The concept of the line break emerged from mechanical printing devices in the late 19th century, particularly typewriters, where the carriage return mechanism allowed operators to reset the typing position to the start of a new line while advancing the paper. The first commercially successful typewriter, the Remington Model 1 introduced in 1873, incorporated a manual carriage return lever that moved the carriage assembly to the left margin and rotated the platen to feed paper by one line, enabling structured text formatting on continuous sheets.^[6] This mechanical innovation directly influenced early automated text transmission systems, as it provided a reliable way to delineate lines in printed output without manual intervention for each character. In telegraphy, line breaks were formalized through character encoding for electrical transmission, building on typewriter principles to control printing mechanisms remotely. Émile Baudot's 5-bit code, patented in 1874, revolutionized multiplex telegraphy by encoding text for simultaneous transmission over wires but initially lacked dedicated control characters for line endings, relying instead on operator interpretation at receiving stations.^[7] By the early 1900s, as printing telegraphs and teleprinters evolved to automate message reception on paper, revisions to the Baudot system introduced explicit carriage return (CR) codes; notably, Donald Murray's 1901 patent for an improved synchronous multiplex system assigned the binary sequence 01000 to CR, which commanded the print head to return to the line's beginning, and 00010 to line feed (LF) for paper advancement.^[8] These control characters, known as "format affectors," were essential for producing readable, multi-line messages in real-time teletype operations.^[8] Punched tape systems, widely adopted in telegraphy from the 1880s onward, integrated these CR codes to mark line ends during message preparation and playback, using 5-level paper strips where specific hole patterns represented formatting instructions for teleprinters. Operators punched CR sequences to segment text into lines on the tape, which could then be fed into transmitters for error-free, formatted delivery over long distances, bridging manual encoding with automated processing.^[9] This approach minimized transmission errors in batch-like message handling and set precedents for data structuring in mechanical systems. The transition to electronic computing in the 1940s carried forward these telegraphy conventions, as early machines adapted punched media and typewriter interfaces for input and output.

Standardization in computing eras

The standardization of line breaks in computing began with the American Standard Code for Information Interchange (ASCII), published as ANSI X3.4-1963, which defined key control characters including carriage return (CR, code 13 or 0x0D) to return the print head to the start of the line and line feed (LF, code 10 or 0x0A) to advance to the next line.^[10] These characters provided a foundational mechanism for text formatting in early digital systems, influencing subsequent operating systems and protocols.^[10] In the 1970s, Unix adopted LF alone as the newline sequence, following the precedent set by its predecessor Multics, which used LF to represent a line break in internal file storage, relying on device drivers to translate it to CR+LF for output on teletypes.^[11] This choice aligned with emerging efficiency goals in time-sharing systems and the ISO/IEC 646 standard (1973), an international adaptation of ASCII that prescribed LF for line advancement. Conversely, MS-DOS (1981) and later Windows systems inherited the CR+LF sequence from CP/M, which emulated DEC operating systems requiring both characters to fully position the cursor for printing compatibility.^[12] Early internet protocols further codified these conventions. The Telnet protocol, specified in RFC 854 (1983), defined the Network Virtual Terminal to interpret CR followed by LF (or NUL) as a newline, ensuring consistent rendering across heterogeneous systems using 7-bit US-ASCII codes.^[13] This CR+LF pairing became a de facto standard for network text transmission. Later, RFC 4180 (2005) explicitly mandated CR+LF as the line terminator for Comma-Separated Values (CSV) files in the text/csv MIME type, accommodating the last record optionally without a trailing terminator while aligning with MIME conventions from RFC 2046.^[14] These developments marked a shift toward formalized interoperability, with ISO standards like ISO 646 extending ASCII's control characters globally by the 1970s, though platform-specific variations persisted until broader adoption of Unicode in the 1990s.

Technical Representation

Character encodings and codes

In the American Standard Code for Information Interchange (ASCII), line breaks are primarily represented using two control characters: carriage return (CR), encoded as decimal 13 (hexadecimal 0x0D and often denoted in caret notation as ^M), which moves the cursor to the beginning of the current line, and line feed (LF), encoded as decimal 10 (hexadecimal 0x0A and denoted as ^J), which advances the cursor to the next line.^[15]^[16] These characters originate from teletypewriter mechanics and form the basis for newline handling in many text encodings.^[15] In contrast, the Extended Binary Coded Decimal Interchange Code (EBCDIC), developed by IBM for mainframe systems, assigns CR the same hexadecimal value of 0x0D but uses 0x25 for LF and introduces a distinct new line (NL) control at 0x15, which combines carriage return and line feed functions in a single byte.^[17]^[18] This mapping reflects EBCDIC's origins in punched card technology, where control codes were optimized for different hardware behaviors compared to ASCII.^[17] Line breaks are often implemented as sequences of these characters rather than single codes; the most widespread is CR followed by LF (hexadecimal 0x0D0A), which ensures both horizontal and vertical movement for compatibility across devices.^[19] Less common variants include LF followed by CR (0x0A0D), used in systems such as Acorn RISC OS, and the single-character Next Line (NEL), defined in Unicode as U+0085 (hexadecimal 0x85 in Latin-1 encoding), which performs a combined CR and LF operation and serves as the Unicode equivalent to EBCDIC's NL.^[19]^[1]^[3] These sequences and codes are treated as newline functions (NLF) in Unicode processing, with specific rules for their interpretation during line breaking.^[19] Historically, the International Organization for Standardization (ISO) adopted these concepts in ISO 646 (first published in 1972 and revised in 1973), which defined a 7-bit international reference version (IRV) incorporating ASCII-compatible control characters like CR and LF for global text interchange, ensuring portability in early computing networks.^[20] The 7-bit limitation of ISO 646 restricted it to 128 code points, prioritizing control functions in the C0 set (codes 0–31) while leaving the eighth bit unset for parity or extension; this contrasted with emerging 8-bit systems in the late 1970s and 1980s, such as ISO 8859, which expanded to 256 characters but preserved 7-bit compatibility for core controls like line breaks to avoid interoperability issues in mixed environments.^[20]^[21]

Encoding	CR	LF	NL/NEL	Common Sequence
ASCII	0x0D (^M)	0x0A (^J)	N/A	0x0D0A (CR+LF)
EBCDIC	0x0D	0x25	0x15	Varies (often NL alone)
Unicode	U+000D	U+000A	U+0085	0x0D0A or U+0085

Different platforms, such as Windows favoring CR+LF and Unix preferring LF, build upon these foundational encodings.^[19]

Platform-specific conventions

Different operating systems and environments have adopted distinct conventions for representing line breaks in text files, primarily rooted in historical hardware and software design choices from the 1970s onward. These conventions typically involve control characters from the ASCII standard, such as line feed (LF, 0x0A) or carriage return (CR, 0x0D), either singly or in combination.^[22] In Unix, Linux, and modern macOS systems, the standard line break is LF-only, a convention established since the early 1970s development of Unix to promote efficiency and simplicity in text processing.^[12] This aligns with the POSIX standard, which defines a line as a sequence of non-newline characters terminated by a (LF in the POSIX locale).^[22] Windows and DOS environments use a CR+LF sequence for line breaks, inherited from the CP/M operating system in the 1970s, which emulated teletypewriter behavior requiring both carriage return and line feed to advance to the next line.^[12] Microsoft documentation confirms this as the native format for Windows text files, ensuring compatibility with console and printer outputs.^[23] The classic Mac OS (prior to OS X in 2001) employed CR-only for line breaks, reflecting its origins in standalone Apple hardware without the need for combined sequences.^[24] OpenVMS systems utilize CR+LF as the default line terminator in text files, consistent with DEC's influence from earlier minicomputer designs.^[25] Older IBM mainframe systems, operating under EBCDIC encoding, employ variable conventions but commonly use the New Line (NL, 0x15 in EBCDIC) character for line separation, which is often mapped to CR+LF equivalents during ASCII interchanges. In modern cross-platform formats like JSON and XML, line breaks are often normalized to LF for portability. The JSON specification (RFC 8259) treats both CR and LF as whitespace but permits LF as the canonical line separator in implementations, particularly in JSON Lines format.^[26] Similarly, the XML 1.0 specification mandates normalization of all line breaks (CR, LF, or CR+LF) to LF (#xA) by processors to ensure consistent parsing across systems.

Unicode and extended standards

In Unicode, line breaks are represented by specific control characters that facilitate text formatting across diverse writing systems. The primary legacy control codes include U+000A LINE FEED (LF), which advances the cursor to the next line, and U+000D CARRIAGE RETURN (CR), which moves the cursor to the beginning of the current line; these are often combined as CR LF for explicit line termination.^[27] Additionally, U+0085 NEXT LINE (NEL) combines the effects of CR and LF to advance to the next line, serving as a line break in certain legacy encodings.^[27] Unicode extends this with semantic separators: U+2028 LINE SEPARATOR (LS), which explicitly divides lines within a paragraph without implying a new paragraph, and U+2029 PARAGRAPH SEPARATOR (PS), which marks the end of a paragraph with an associated line break.^[28]^[27] The Unicode Line Breaking Algorithm, detailed in Unicode Standard Annex #14 (UAX #14), defines a standardized method for identifying permissible line break opportunities in text.^[28] It assigns each Unicode character to one of 35 line break classes (e.g., LF for line feed, CR for carriage return, BK for mandatory break, NL for next line) and applies a sequence of rules using symbols such as × (prohibited break), ÷ (allowed break), and ! (mandatory break).^[28] Explicit breaks, triggered by characters like LF, CR, NEL, LS, and PS, are enforced as mandatory regardless of context, while automatic wrapping relies on contextual rules to insert breaks at opportunities, such as after spaces or between words, without altering the text.^[28] These rules prioritize mandatory breaks (e.g., LB4–LB6) before optional ones, ensuring consistent rendering.^[28] Unicode encodings UTF-8 and UTF-16 fully preserve line break sequences, as all relevant control characters reside in the Basic Multilingual Plane and are encoded directly without surrogates or special transformations.^[28] In UTF-8, they occupy 1–2 bytes, while in UTF-16, they use 2 bytes each, maintaining their semantic integrity during storage and transmission.^[28] For East Asian languages using CJK (Chinese, Japanese, Korean) scripts, which often lack spaces between characters, UAX #14 provides tailored rules to handle line breaking flexibly.^[28] Ideographic characters (class ID) permit breaks almost anywhere except before non-starters (class NS) or within prohibited sequences, with rules like LB30 allowing breaks around East Asian punctuation.^[28] The 2023 update in Unicode 15.1 refined these for better support of orthographic syllables and regional variations, such as prohibiting breaks within Korean jamos while enabling them between hanja or kana. Further refinements occurred in Unicode 16.0 (2024), enhancing support for numeric expressions, and in Unicode 17.0 (2025), introducing the Unambiguous_Hyphen class and rule updates.^[29]^[28]^[30]^[31]

Applications

In communication protocols

In communication protocols, line breaks serve as delimiters to structure data transmission, ensuring reliable parsing of messages across networks. These conventions often specify the use of carriage return (CR, ASCII 13) followed by line feed (LF, ASCII 10), known as CRLF, to mark the end of lines or headers, accommodating the historical origins of teletypewriters while adapting to binary-safe transmission. Protocols define these to prevent ambiguity in heterogeneous environments, where varying end-of-line (EOL) representations could disrupt interoperability. The Hypertext Transfer Protocol version 1.1 (HTTP/1.1), as defined in RFC 7230, mandates the use of CRLF to terminate header fields and separate the header section from the message body. This ensures that servers and clients consistently interpret the boundaries of request and response messages, with each header field ending in CRLF and the entire header block concluded by an empty line (CRLF followed by CRLF). For example, in a typical HTTP response, headers like "Content-Type: text/html" are followed by CRLF to delineate structure. In the Simple Mail Transfer Protocol (SMTP), outlined in RFC 5321, CRLF delineates the boundaries between commands, responses, and the content of email messages. SMTP requires that all lines in the mail data, including the message body and headers, end with CRLF, with the end of the data marked by a single period (.) on a line by itself, also terminated by CRLF. This convention supports the transmission of multiline text while avoiding conflicts with binary attachments through quoted-printable or base64 encoding when needed. The Telnet protocol, specified in RFC 854, employs the CR LF sequence in its Network Virtual Terminal (NVT) mode to represent line breaks.^[13] To transmit a carriage return without advancing to the next line, CR is followed by a null character (NUL, ASCII 0), ensuring proper rendering on remote terminals and maintaining compatibility with diverse terminal emulations during interactive sessions.^[13] File Transfer Protocol (FTP), per RFC 959, handles line breaks differently in its text transfer mode, where the server translates the data stream's line endings to the local host's conventions upon reception and converts them back for transmission. This abstraction allows clients to send files with native EOL sequences (such as CRLF on Windows or LF on Unix), with the protocol specifying that text mode ensures portability by normalizing breaks during transfer, contrasting with binary mode that preserves exact bytes. More recent protocols like WebSockets, detailed in RFC 6455, use CRLF to separate the opening handshake's HTTP-upgrade headers, consistent with HTTP/1.1 conventions. In the WebSocket framing after the handshake, data is sent in binary frames without line-based delimiters, enabling seamless full-duplex communication over TCP.

In markup and web technologies

In HTML, the <br> element provides a mechanism for inserting explicit line breaks within text content, forcing subsequent text to render on a new line without creating a new paragraph.^[32] This tag, introduced in early HTML specifications, is particularly useful for poetic or formatted text where automatic wrapping is insufficient. For preserving natural line breaks from source text, such as in preformatted content, the CSS white-space property set to pre maintains whitespace and newlines exactly as authored, preventing browser default collapsing.^[33] This property was defined in the CSS Level 2 specification, published as a W3C Recommendation in 1998. In XML, line breaks are represented using character entities such as 
 for line feed (LF) or  for carriage return (CR), allowing explicit insertion within markup without disrupting parsing.^[34] XML parsers normalize all line endings in parsed entities to a single LF character (#xA) during processing, regardless of the original platform-specific sequences like CRLF or CR, to ensure consistent handling across systems.^[34] This normalization applies to external parsed entities, including the document entity, simplifying downstream applications but requiring authors to use entities for preserved breaks in content.^[34] CSS extends line break control through properties like line-break, which adjusts the strictness of automatic line breaking, particularly for CJK (Chinese, Japanese, Korean) text where character-based wrapping differs from alphabetic languages.^[35] Values such as auto permit standard browser heuristics, while strict enforces tighter rules to avoid awkward breaks, as specified in the CSS Text Module Level 3.^[35] This module, advanced to Candidate Recommendation status in 2020, builds on earlier CSS levels to support international typography needs. When HTML is used in email via MIME, line breaks must conform to CRLF sequences as the canonical representation for text subtypes, per RFC 2046, to avoid rendering issues in mail clients.^[36] RFC 2045 outlines the broader MIME framework, but inconsistencies in handling non-CRLF breaks can lead to broken layouts in HTML emails if not normalized during encoding.^[37] HTML5 refines whitespace handling by mandating that user agents collapse consecutive space characters and replace them with a single space, while ignoring most line breaks in normal flow unless preserved via CSS or elements like <pre>. This rule, part of the 2014 W3C Recommendation, ensures consistent rendering across browsers but requires explicit markup for intentional breaks in dynamic content.

In programming languages

In programming languages, line breaks are primarily represented and manipulated through escape sequences embedded in string literals, allowing developers to insert control characters like newline without directly typing them. These sequences originated in early C implementations, where the backslash () serves as an escape character followed by a specifier for the desired control code. In the original K&R C from 1978, \n denotes the line feed (LF, ASCII 10) character, which advances the cursor to the next line, while \r represents the carriage return (CR, ASCII 13) character, which returns the cursor to the beginning of the current line.^[38] Many languages adopted this notation, with \r\n conventionally used in Windows environments to combine both effects for full line termination, ensuring compatibility with DOS-derived systems.^[39] To handle platform differences portably, modern languages provide utilities that abstract line break representations. In Python, the os.linesep attribute returns the native line terminator—such as '\n' on Unix-like systems, '\r\n' on Windows, or '\r' on older Mac systems—enabling cross-platform string construction and file operations without hardcoding sequences.^[40] Similarly, Java's System.lineSeparator() method, introduced in Java 7 in 2011, yields a string equivalent to the system's line separator, promoting consistent behavior in output streams and text processing across operating systems. String literals in various languages treat line breaks differently to balance readability and functionality. In JavaScript, the ECMAScript 2015 (ES6) specification introduced template literals delimited by backticks (`), which preserve embedded newlines and other whitespace exactly as written, facilitating multi-line string definitions without explicit escapes.^[41] This contrasts with traditional string literals in languages like C or Python, where newlines must typically be escaped as \n to avoid syntax errors, though raw or triple-quoted strings in Python can include literal newlines for similar effects. During source code compilation or interpretation, parsers tokenize input files while generally ignoring line breaks and other whitespace as mere separators between tokens, except within string literals where they form part of the literal's value. In C++, for instance, the preprocessing phase decomposes source code into tokens, discarding whitespace sequences (including newlines) that do not contribute to token boundaries, but preserving them verbatim inside string or character constants to maintain the intended string content. More recent languages emphasize cross-platform handling at runtime. Rust, since its 1.0 stable release in 2015, offers the std::io::Lines iterator in its standard library, which reads from a BufRead source and yields lines by transparently splitting on the platform's native line endings—such as \n, \r\n, or \r—while stripping the terminators to provide clean string slices. This approach simplifies I/O operations in heterogeneous environments without requiring manual escape sequence management.

Challenges and Solutions

Compatibility issues across systems

One significant compatibility issue arises in version control systems like Git, which has normalized line endings to LF in repositories since its initial release in 2005. This normalization ensures a consistent internal representation, but during checkout, Git's configuration options—such as core.autocrlf or core.eol—can convert files to CRLF on Windows systems while leaving them as LF on Unix-like systems, resulting in divergent file contents across developer machines and potential merge conflicts if configurations vary.^[42] In email systems, the Internet Message Format standard requires CRLF for line breaks in both headers and body content to ensure proper parsing. However, transmissions from Unix-based systems often used LF alone, or mixed conventions occurred due to incomplete conversions, leading to garbled text displays such as overlaid lines or unrendered breaks, a frequent problem in cross-operating system communications during the 1990s when email protocols were maturing.^[43] Database storage in systems like MySQL exacerbates issues because TEXT fields store raw bytes without automatic normalization, preserving whatever line ending sequence was inserted—such as CRLF from Windows applications. When this data is retrieved and displayed on a Unix-based client or web application, the extra CR may appear as visible artifacts (e.g., ^M characters) or cause unintended line wrapping, disrupting readability in cross-platform environments.^[44] Cloud file syncing services, including Dropbox and iCloud, sync files byte-for-byte without altering line endings, allowing platform-specific conventions (e.g., LF on macOS versus CRLF on Windows) to propagate inconsistencies in shared text documents. This resulted in frequent editing and display anomalies for collaborative users until the 2010s, when widespread adoption of normalization tools like Git's .gitattributes files began enforcing consistent LF usage across synced repositories.^[45] In the 2020s, containerization platforms such as Docker have introduced stricter consistency by expecting Unix-style LF endings in all files, including Dockerfiles, shell scripts, and configuration files processed within containers. Since Docker runs on Linux kernels, CRLF sequences from Windows-originated files can cause runtime failures—such as misinterpreted commands due to trailing carriage returns—prompting development teams to standardize on LF to avoid deployment inconsistencies across hybrid environments.^[46]

Conversion and normalization methods

Conversion and normalization of line breaks involve techniques to standardize line ending characters across different systems, ensuring compatibility in text processing and file handling. Command-line utilities like dos2unix and its counterpart unix2dos, originating in the 1980s, perform direct conversions between DOS-style (CR+LF) and Unix-style (LF) formats by replacing or removing carriage returns as needed. These tools process files in batch mode, making them essential for migrating legacy data or preparing cross-platform scripts, with dos2unix specifically stripping CR characters while preserving LF endings. In programming environments, APIs facilitate programmatic normalization of line breaks. For instance, Python's textwrap.dedent() function removes common leading whitespace from multi-line strings, assuming LF line endings; for cross-platform consistency, strings are often normalized to LF beforehand using methods like str.replace().^[47] Similarly, in Node.js and JavaScript, the String.prototype.replace() method allows replacing CRLF with LF using a regular expression like /\r\n/g, enabling developers to normalize strings in real-time during data processing or I/O operations.^[48] These methods are lightweight and integrate seamlessly into applications, prioritizing LF for Unix-like consistency while avoiding platform-specific os.linesep assumptions. Text editors provide built-in mechanisms for detecting and converting line endings. In Vim, the :set ff=unix command sets the fileformat option to use LF endings, automatically converting the buffer upon saving and allowing detection of the original format via :set ff?. Visual Studio Code, since its 2015 release, includes auto-detection of line endings (LF or CRLF) upon opening files, with an option to enforce a default via the files.eol setting, and it displays the current format in the status bar for manual adjustments. This feature reduces errors in collaborative editing by normalizing on save unless disabled. Standards like POSIX provide foundational utilities for line break translation. The tr command, defined in the POSIX specification, translates or deletes characters, commonly used to remove stray CRs with tr -d '\r' < input.txt > output.txt, converting CRLF files to LF-only format in a single pass.^[49] This approach is portable across Unix-like systems and forms the basis for scripting more complex normalization pipelines. Tools like Git further assist by normalizing line endings during repository operations.^[50]

Special Interpretations

System-level interpretation

At the system level, operating systems and applications interpret line break sequences to manage cursor positioning, text rendering, and input processing, ensuring consistent display across diverse environments. Terminal emulators like xterm, developed for the X Window System in the 1980s, handle carriage return (CR, ASCII 13) by moving the cursor to the left margin of the current line without advancing to the next line, while line feed (LF, ASCII 10) advances the cursor down one line in the same horizontal position.^[51] This behavior emulates VT100 terminal standards, allowing precise control in graphical environments. In console rendering, Windows Command Prompt (CMD) expects and produces CRLF sequences for newlines, reflecting the system's native convention where Environment.NewLine resolves to "\r\n", whereas Unix shells like bash interpret LF alone as a newline terminator, treating it as the end-of-line marker without requiring a preceding CR.^[52]^[53] This divergence can lead to rendering artifacts when cross-platform text is displayed, such as extra blank lines in Windows when processing Unix-style LF-only files. Graphical user interface (GUI) frameworks abstract these differences by mapping line breaks to platform-native conventions. In Qt, QTextStream normalizes incoming line endings by stripping trailing delimiters like "\n" or "\r\n" during reads, using a platform-agnostic "\n" internally but allowing output to conform to native formats when writing to files or devices via QIODevice.^[54] Similarly, GTK recognizes multiple newline types through enums like Gtk.SourceNewlineType, which defines LF for Unix, CR for classic Mac, and CR+LF for Windows, enabling widgets such as GtkTextView to render text appropriately on the host system.^[55] Virtual environments bridge these interpretations to facilitate interoperability. The Windows Subsystem for Linux (WSL), introduced in 2016, defaults to LF line endings in its Linux distributions while interacting with Windows' CRLF-native file system; Git configurations, such as core.autocrlf, handle conversions to prevent spurious diffs when sharing repositories across the boundary.^[56] For accessibility, screen readers process line breaks semantically to aid navigation in line with Unicode guidelines for structured text flow.

Reverse and partial line feeds

Reverse line feeds, also known as reverse line feed (RLF), were employed in early dot-matrix printers during the 1970s to enable overprinting by moving the paper backward after initial printing. In Epson's ESC/P control language, the ESC j command facilitated this by advancing the paper in the reverse direction by n/216 inch on 9-pin printers such as the EX-800 and FX-80, often combined with a carriage return (CR) sequence to reposition the print head for overlaying text or graphics.^[57] This functionality, introduced in models like the MX-80, allowed for effects such as bolding or underlining without dedicated hardware support, though it was limited to a maximum of 1/2 inch to maintain accuracy and was later marked as obsolete in ESC/P 2 standards.^[57] In fanfold paper printers, reverse line feeds were historically used for underlining through backfeed mechanisms, where the paper was retracted to reprint a line with underscores or denser characters. Printronix line matrix printers, for instance, supported reverse paper feed to print multiple densities on a single line, enabling the combination of forms and text or the application of underlines by repositioning the paper for a second pass.^[58] This technique was particularly valuable in continuous-form printing environments of the era, as it simulated overprinting without requiring advanced color capabilities. Partial line feeds involved control characters like the vertical tab (VT, ASCII 11), which advanced the print head to predefined vertical tab stops, allowing fractional vertical movements in typesetting and form alignment on historical printers. VT was designed to speed up vertical positioning by jumping to tab spots on printer mechanisms, such as belts, facilitating precise partial advances beyond full-line increments for aligned content in data processing applications.^[59] Obsolete systems like the IBM 1403 printer from the 1960s allowed manual paper positioning via a clutch and line selection knob in full-line increments (at 6 or 8 lines per inch), which could approximate half-line spacing (e.g., 1/2 inch via multiple advances) for alignment in high-volume printing, though lacking automated fractional control.^[60] These features supported spacing at slow speeds for continuous forms in System/360 environments but lacked automated fractional control.^[60] Modern remnants of these techniques appear in PDF overprint simulation, where software emulates historical overprinting effects by blending colors without knockout, akin to reverse feed overlays for multi-pass printing.^[61] In web technologies, such capabilities are rare, with CSS text-decoration properties providing basic underlining that indirectly echoes backfeed-based emphasis without physical paper reversal.

References

[1]
UTR #13: Unicode Newline Guidelines
This document describes guidelines for how to handle different characters used to represent CRLF and other representations of new lines on different platforms.Introduction · Definitions · Background · Recommendations
[2]
What Is a Line Break? - Computer Hope
Aug 16, 2024 · A line break is a command or sequence of control characters that returns the cursor to the next line and does not create a new paragraph.
[3]
The Great Newline Schism - Coding Horror
Jan 18, 2010 · The Carriage Return (CR) and Line Feed (LF) terms derive from manual typewriters, and old printers based on typewriter-like mechanisms ( ...
[4]
Approaches to line breaking - W3C
Aug 12, 2018 · This article gives a high level summary of various typographic strategies for wrapping text at the end of a line, for a variety of scripts.
[5]
<br>: The Line Break element - HTML - MDN Web Docs - Mozilla
Jul 9, 2025 · The `<br>` HTML element produces a line break in text, useful for poems or addresses, and the text after it starts a new line.
[6]
Typewriter | Writing Technology & Impact | Britannica
Sep 24, 2025 · The electric typewriter as an office writing machine was pioneered by James Smathers in 1920. In 1961 the first commercially successful ...
[7]
Émile Baudot Invents the Baudot Code, the First Means of Digital ...
"Baudot invented his original code during 1870 and patented it during 1874. It was a 5-bit code, with equal on and off intervals, which allowed telegraph ...
[8]
Baudot and CCITT Character Codes - Rabbit
Originally used in wireless telegraphy as a replacement for Morse Code. ... CR Carriage Return - move print head back to beginning of line; LF Line Feed ...
[9]
[PDF] How it was: Paper tapes and punched cards
Oct 13, 2011 · Later techniques used a five-bit code created by the French inventor Jean Maurice Émile Baudot in 1880, which soon became known as the Baudot ...
[10]
ENIAC, Electronic, Computing - Britannica
Oct 17, 2025 · It used an operator keyboard and console typewriter for input and magnetic tape for all other input and output. ... The early HLLs thus were all ...
[11]
[PDF] The Evolution of Character Codes, 1874-1968 - FalseDoor.com
Proposed Revised ASCII, October 19-21,. 1964.153 Heavy borders indicate additions and changes since X3.4-1963. the X3.2.4 task group responded that no action ...
[12]
carriage_return.gi.info - MIT
A "carriage return" means that the typing mechanism moves to the first column of the next line. On the Multics system, this action is the result of the ASCII ...
[13]
Why is the line terminator CR+LF? - The Old New Thing
Mar 18, 2004 · This protocol dates back to the days of teletypewriters. CR stands for “carriage return” – the CR control character returned the print head ...<|control11|><|separator|>
[14]
RFC 854 - Telnet Protocol Specification - IETF Datatracker
The purpose of the TELNET Protocol is to provide a fairly general, bi-directional, eight-bit byte oriented communications facility.
[15]
RFC 4180 Common Format and MIME Type for CSV Files - IETF
As per section 4.1.1. of RFC 2046 [3], this media type uses CRLF to denote line breaks. However, implementors should be aware that some implementations may use ...
[16]
Control characters in ASCII and Unicode - Aivosto
ASCII control characters: HT and SP are considered whitespace. LF, VT, FF and CR are considered whitespace, and also mandatory line breaks in the line breaking ...
[17]
Carriage return - ASCII control characters
ASCII code Carriage return, American Standard Code for Information Interchange, ASCII ... ASCII code 10 = LF ( Line feed ) ASCII code 11 = VT ( Vertical ...
[18]
EBCDIC Codes and Characters - Lookup Tables
FF, Form Feed. 13, 0D, 015, 00001101, CR, Carriage Return. 14, 0E, 016, 00001110, SO, Shift Out. 15, 0F, 017, 00001111, SI, Shift In. 16, 10, 020, 00010000, DLE ...
[19]
EBCDIC Character Codes - Rabbit
Carriage Return. 14. 0E. 016. SO. Shift Out. 15. 0F. 017. SI. Shift In. 16. 10. 020. DLE ... Line Feed. 38. 26. 046. ETB. End of Transmission Block. 39. 27. 047 ...
[20]
UAX #14: Unicode Line Breaking Algorithm
They are not individual characters, but are encoded as sequences of the control characters NEL, LF, and CR. ... <CR, LF> is treated like a hard line break.
[21]
7-bit character sets - Aivosto
ISO 646-1973 was the second version of the standard. This was the first version to define an IRV. The IRV was similar to ASCII-1977 with the following ...Missing: break | Show results with:break
[22]
How it was: ASCII, EBCDIC, ISO, and Unicode - EE Times
... 7-bit and 8-bit character codes were to be structured and extended. The principles laid down in ISO 2022 were subsequently used to create the ISO 8859-1 ...<|control11|><|separator|>
[23]
Definitions - The Open Group Publications Catalog
In the POSIX locale, a <blank> character is either a <tab> or a <space>. 3.75 Blank Line. A line consisting solely of zero or more <blank> characters terminated ...Missing: LF | Show results with:LF
[24]
Encoding and line break characters - Visual Studio - Microsoft Learn
Dec 17, 2024 · CR LF, Carriage return plus Line feed, 000D plus 000A. LF, Line feed, 000A. NEL, Next line, 0085. LS, Line separator, 2028. PS, Paragraph ...
[25]
CRLF vs. LF: Normalizing Line Endings in Git
Apr 18, 2021 · Line endings can differ from one OS to another. Learn the history behind CRLF and LF line endings and how to enforce line endings in Git.
[26]
[PDF] Guide to OpenVMS File Applications - VMS Software
A hyphen at the end of a command format description, command line ... the line feed (LF), the carriage-return/line-feed combination (CR/LF), the form feed (FF), ...
[27]
RFC 8259 - The JavaScript Object Notation (JSON) Data ...
JavaScript Object Notation (JSON) is a lightweight, text-based, language-independent data interchange format. It was derived from the ECMAScript Programming ...
[28]
None
Summary of each segment:
[29]
UAX #14: Unicode Line Breaking Algorithm
Summary of each segment:
[30]
BETA Unicode 15.1.0
There has been a major update to UAX #14, Unicode Line Breaking Algorithm, to provide better default line breaking behavior for orthographic syllables in a ...
[31]
https://www.unicode.org/versions/Unicode17.0.0/
[32]
https://www.w3.org/TR/html/textlevel-semantics.html#the-br-element
[33]
https://www.w3.org/TR/CSS2/text.html#white-space-prop
[34]
https://www.w3.org/TR/xml/#sec-line-ends
[35]
https://www.w3.org/TR/css-text-3/#line-break-property
[36]
https://datatracker.ietf.org/doc/html/rfc2046#section-1
[37]
RFC 2045 - Multipurpose Internet Mail Extensions (MIME) Part One
This set of documents, collectively called the Multipurpose Internet Mail Extensions, or MIME, redefines the format of messages.
[38]
[PDF] ebook - The C Programming Language Ritchie & kernighan -
Certain characters can be represented in character and string constants by escape sequences like \n (newline); these sequences look like two characters, but ...
[39]
Escape Sequences | Microsoft Learn
Dec 12, 2023 · Character combinations consisting of a backslash (\) followed by a letter or by a combination of digits are called escape sequences.
[40]
https://docs.python.org/3/library/os.html#os.linesep
[41]
Template literals (Template strings) - JavaScript - MDN Web Docs
Template literals are literals delimited with backtick ( ` ) characters, allowing for multi-line strings, string interpolation with embedded expressions, and ...Syntax · Description
[42]
Configuring Git to handle line endings - GitHub Docs
text eol=lf Git will always convert line endings to LF on checkout. You should use this for files that must keep LF endings, even on Windows. binary Git will ...
[43]
RFC 5322 - Internet Message Format - IETF Datatracker
... CR and bare LF appear in messages with two different meanings. In many cases, bare CR or bare LF are used improperly instead of CRLF to indicate line separators ...
[44]
MySQL 8.4 Reference Manual :: 15.2.9 LOAD DATA Statement
For a text file generated on a Windows system, proper file reading might require LINES TERMINATED BY '\r\n' because Windows programs typically use two ...
[45]
End-Of-Line issues on Dropbox files shared between Mac & Win
Feb 4, 2014 · I'm facing some problems when synching text files between Mac and Windows, through Dropbox. I currently share source code (.c and .h files) between Mac and ...Missing: iCloud standardization 2010s
[46]
Troubleshoot topics for Docker Desktop
Docker containers expect Unix-style line \n endings, not Windows style: \r\n . This includes files referenced at the command line for builds and in RUN commands ...<|control11|><|separator|>
[47]
textwrap — Text wrapping and filling — Python 3.14.0 documentation
Wraps the single paragraph in text (a string) so every line is at most width characters long. Returns a list of output lines, without final newlines.
[48]
String.prototype.replace() - JavaScript - MDN Web Docs
Jul 10, 2025 · The replace() method returns a new string with matches of a pattern replaced by a replacement, and does not mutate the original string.
[49]
Normalization | ICU Documentation
The ICU normalization APIs support the standard normalization forms which are described in great detail in Unicode Technical Report #15 (Unicode Normalization ...
[50]
tr - The Open Group Publications Catalog
The tr utility shall copy the standard input to the standard output with substitution or deletion of selected characters. The options specified and the ...
[51]
8.1 Customizing Git - Git Configuration
... line endings with CRLF, or insert both line-ending characters when the user hits the enter key. Git can handle this by auto-converting CRLF line endings ...
[52]
Xterm Control Sequences - XFree86
The xterm program recognizes both 8-bit and 7-bit control characters. It generates 7-bit controls (by default) or 8-bit if S8C1T is enabled.
[53]
https://www.baeldung.com/linux/line-breaks-types
[54]
Difference Between CR LF, LF, and CR Line Break Types - Baeldung
Jan 23, 2024 · The CR LF line break originated from the typewriter era when a manual carriage return and a line feed were required to start a new line on the ...
[55]
QTextStream Class | Qt Core | Qt 6.10.0
### Summary: Qt Handling of Line Endings in QTextStream
[56]
Gtk.SourceNewlineType – gtksourceview-4 - Valadoc
Content: Enum values: CR - carriage return, used on Mac. CR_LF - carriage return followed by a line feed, used on Windows. LF - line feed, used on UNIX.
[57]
Get started using Git on WSL - Microsoft Learn
Oct 3, 2025 · If you are working with the same repository folder between Windows, WSL, or a container, be sure to set up consistent line endings. Since ...Git Can Be Installed On... · Git Config File Setup · Git Credential Manager Setup
[58]
https://printronix.com/wp-content/uploads/2023/08/PTX_UM_P8_256381M.pdf
[59]
[PDF] EPSON ESC/P
When EPSON created the ESC/P printer control language, the industry standard for simple, sophisticated, efficient operation of dot-matrix printers was born.
[60]
[PDF] Administrator's Manual - Printronix
The reverse paper feed capability allows the printing of multiple densities on a single line. This is useful in printing forms and text together or in ...Missing: backfeed | Show results with:backfeed
[61]
Difference Between Vertical Tab, Form Feed, and Other ... - Baeldung
Jul 6, 2024 · Vertical separators are special characters that we use to control the layout and appearance of text on a screen or a printer.
[62]
[PDF] n ~ - U Field Engineering Manual of Instruction Printers - ibm-1401.info
Spacing is a line-by-line advancement of the paper through the printing station. It is always done at slow speed. A manual clutch and line selection knob allows.
[63]
Understanding Overprint - PDF Association
Feb 7, 2019 · Overprint has to do with creating separations, the (usually 4) different color plates (colorants) that are used in the printing machine to specify where and ...
[64]
Overprinting in InDesign - Adobe Help Center
May 24, 2023 · You can overprint strokes or fills of any selected paths using the Attributes panel. An overprinted stroke or fill doesn't need to be trapped.