Fact-checked by Grok 2 weeks ago

Newline

A newline, also known as a line ending or end-of-line (EOL) marker, is a control character or sequence of control characters in character encoding standards such as ASCII and Unicode that denotes the conclusion of one line of text and the commencement of the next.^[1] In the ASCII standard (ANSI X3.4-1968), the primary newline character is the line feed (LF), assigned code 10 (0x0A), which functions as a format effector to advance the printing or display position to the next line.^[1] The carriage return (CR), code 13 (0x0D), separately moves the position to the beginning of the current line, but combinations like CR followed by LF (CRLF) emerged as conventions for complete line termination.^[1] Unicode incorporates these via the Newline Function (NLF), which includes LF (U+000A), CR (U+000D), the sequence CRLF, and next line (NEL, U+0085); Unicode also defines line separator (LS, U+2028) and paragraph separator (PS, U+2029) as explicit break characters, with guidelines recommending consistent handling across platforms to avoid interoperability issues.^[2] Newline conventions vary by operating system and historical context: Unix-like systems (e.g., Linux, macOS) standardize on LF alone, while Windows employs CRLF for compatibility with DOS heritage, and older Macintosh systems used CR exclusively until adopting LF in macOS. These differences can lead to challenges in text processing, file transfers, and software development, prompting tools and protocols like those in RFC 5198 to normalize line endings to UTF-8 with CRLF for network interchange.^[3] In programming languages, newlines are often represented by escape sequences such as \n for LF, facilitating portable text manipulation.

History

Origins in Typewriters and Teleprinters

The typewriter, a pivotal invention in mechanical writing devices, was patented on June 23, 1868, by Christopher Latham Sholes, along with Carlos Glidden and Samuel W. Soule, marking the first practical model known as the "Type-Writer."^[4] This device featured a carriage—a movable frame holding the paper—that advanced incrementally as keys were struck, thanks to an escapement mechanism ensuring precise letter spacing. At the end of each line, the typist manually operated a carriage return lever, which retracted the carriage to the left margin, while a separate line feed lever or platen knob advanced the paper upward by one line to prepare for the next row of text.^[5] These physical operations, driven by springs and gears, addressed the need for organized linear text production on paper, preventing overlap and maintaining readability without digital aids. The introduction of electric typewriters in the 1930s further refined these mechanisms, automating actions for greater efficiency. IBM's Electromatic model, released in 1935 after acquiring the Northeast Electric Company, incorporated an electric motor to power the carriage return and line feed, reducing manual effort and enabling faster operation compared to purely mechanical predecessors.^[6] Earlier attempts at electrification dated back to Thomas Edison's 1872 printing wheel design, but practical office models emerged only in this decade, with Royal introducing its first electric typewriter in 1950.^[6]^[7] These innovations preserved the core principles of carriage return—resetting the print position horizontally—and line feed—vertical paper advancement—while enhancing reliability for professional use. Teleprinters, or teletypewriters, emerged in the early 1900s as electromechanical devices for transmitting typed messages over telegraph lines, building directly on typewriter mechanics for remote printing. Émile Baudot's five-bit telegraph code, patented in 1874, enabled efficient character transmission but initially lacked dedicated line control signals.^[8] This changed with Donald Murray's 1901 adaptation of the Baudot code for English-language use, which introduced specific control characters for carriage return (CR) and line feed (LF); these simulated the typewriter's physical actions by signaling the receiving device's carriage to shift left and the platen to advance the paper, respectively.^[9] Teletype machines, first commercialized by the Morkrum Company from 1906 onward and later by the Teletype Corporation, standardized the CR+LF sequence to ensure complete line transitions over asynchronous telegraph connections, allowing synchronized printing at both ends.^[10]^[11] A key feature of teleprinters was the ability to perform overstriking—reprinting on the same line for emphasis or correction—by issuing a CR without a subsequent LF, which returned the print head to the line's start without advancing the paper, thus enabling manipulation of text on a single line before feeding to the next.^[12] This capability, rooted in the separate mechanical controls of typewriters, foreshadowed flexible line handling in later communication systems and highlighted the practical need for distinct CR and LF operations in noisy telegraph environments.

Evolution in Early Computing

As early computers emerged in the 1940s and 1950s, the newline concept transitioned from mechanical teleprinters to digital terminals, where repurposed teleprinters served as input/output devices for timesharing systems, allowing multiple users to interact with a single machine over phone lines.^[13] Devices like the IBM 026 printing card punch, introduced in 1949, adapted typewriter mechanisms for data entry, incorporating carriage return and line feed operations to print punched cards while advancing the paper feed.^[14] Early line printers, such as the IBM 1403 from the 1950s, extended this by using carriage control characters in the first column of each line to manage paper advancement, spacing, and form feeds, ensuring efficient output formatting without dedicated newline sequences.^[15] The standardization of newline in computing advanced significantly with the development of the American Standard Code for Information Interchange (ASCII) in 1963 by the American Standards Association (ASA) X3 committee.^[16] ASCII defined line feed (LF, hexadecimal 0x0A, decimal 10) as a control character to advance the paper or cursor to the next line, and carriage return (CR, 0x0D, decimal 13) to move the cursor to the line's starting position, drawing from teleprinter conventions to support data transmission and display.^[16] These definitions, published as ASA X3.4-1963, provided a common framework for text handling across systems, influencing subsequent protocols and software.^[17] In the 1960s and 1970s, operating systems diverged in newline adoption: the TECO text editor, developed in 1962 for Digital Equipment Corporation systems, treated line breaks as single LF characters internally, automatically appending LF to input carriage returns for buffer storage.^[18] Multics, an influential timesharing system from the late 1960s, stored text with LF alone but inserted CR before LF during output to terminals or printers for compatibility with teleprinters.^[19] In contrast, UNIX, developed in the early 1970s at Bell Labs, standardized on LF-only for line endings to enhance storage efficiency by avoiding redundant CR characters, particularly beneficial on limited media like tapes and disks.^[12] The ARPANET, launched in 1969, further emphasized consistent newline handling through its reliance on ASCII control characters in early protocols like the 1822 interface message processor protocol, ensuring reliable text transmission across heterogeneous hosts by standardizing LF and CR for line demarcation in network messages.^[20] This approach influenced subsequent internet standards, promoting interoperability in data exchange.

Technical Representation

ASCII Control Characters

The American Standard Code for Information Interchange (ASCII), formalized in 1963 as ASA X3.4-1963, established a 7-bit character encoding scheme that reserved the code positions from 0 to 31 (and 127) for control characters, which lack visual glyphs and serve to control text layout, transmission, and peripheral devices rather than represent printable symbols.^[21] These controls were influenced by earlier teleprinter codes, such as the International Telegraph Alphabet No. 2 (ITA2) from the 1930s, which introduced non-printing signals for formatting in mechanical systems like Baudot code derivatives.^[22] Central to newline operations are the Line Feed (LF) and Carriage Return (CR) control characters. LF, assigned code 10 (hexadecimal 0A, octal 012, binary 0001010), instructs devices to advance the active position to the next line, performing a vertical movement without horizontal reset, as originally defined for teleprinter paper feed mechanisms.^[21] CR, with code 13 (hex 0D, octal 015, binary 0001101), returns the active position to the start of the current line, resetting the horizontal position to the left margin while leaving the vertical position unchanged, emulating the mechanical action of a typewriter carriage.^[21] In 7-bit ASCII streams, these appear as non-printable bytes; for example, a sequence ending a line might embed LF as the byte 0x0A in a binary data flow, invisible to direct display but interpreted by parsers to format output.^[23] Related control characters include Vertical Tabulation (VT, code 11 or hex 0B, octal 013, binary 0001011), which advances the position to the next vertical tab stop for multi-line spacing, and Form Feed (FF, code 12 or hex 0C, octal 014, binary 0001100), which ejects the current page and advances to the start of a new one, both supporting structured text progression in early printing and display systems.^[21] Subsequent extensions to ASCII, such as the ISO 8859 family of standards (e.g., ISO 8859-1 from 1987), preserved the core 7-bit structure unchanged for these control characters in the 0-127 range, ensuring compatibility while adding 128-255 for additional printable symbols in regional variants.^[21]

End-of-Line Sequences Across Systems

In computing environments, end-of-line sequences represent the transition to a new line in text data, with variations arising from historical and technical considerations across systems. The most prevalent are the line feed (LF, ASCII 0x0A) used alone in Unix, Linux, and macOS (post-2002 versions); the carriage return followed by line feed (CR+LF, ASCII 0x0D followed by 0x0A) in Windows and DOS-derived systems; and carriage return (CR, ASCII 0x0D) alone in classic Mac OS (pre-OS X). These sequences build on ASCII control characters for carriage positioning and paper advancement. The LF-only approach emerged in early Unix implementations for storage efficiency and standardization, as a single character adequately advanced the cursor on line-buffered terminals without needing separate return and feed operations.^[12] In contrast, CR+LF originated in CP/M and was adopted by MS-DOS to ensure compatibility with mechanical teletypes and printers, where CR reset the print head to the line start and LF advanced the paper.^[24] Classic Mac OS employed CR-only for its straightforward text rendering model, simplifying file processing on resource-constrained hardware.^[25] Less common variants include the Next Line (NEL) control, encoded as 0x85 in ISO-8859-1 and equivalent to EBCDIC's NL (0x15), primarily used in IBM mainframe environments for combined carriage return and line feed in vertical tabulation contexts.^[26]

Sequence	Systems	Rationale
LF (0x0A)	Unix/Linux/macOS (post-2002)	Efficiency in file size and terminal handling
CR+LF (0x0D 0x0A)	Windows/DOS/CP/M	Compatibility with teletype mechanics
CR (0x0D)	Classic Mac OS (pre-2001)	Simplicity in text display
NEL (0x85)	EBCDIC mainframes	Vertical movement in legacy encodings

Internet protocols standardize these for interoperability; for instance, RFC 4180 (2005) specifies CR+LF as the delimiter for records in comma-separated values (CSV) files, allowing the final record to optionally omit a trailing break.^[27] The JSON interchange format (ECMA-404, 2013) flexibly recognizes LF, CR, or CR+LF within whitespace to separate tokens, accommodating diverse input sources.^[28] In XML documents, the 1.0 specification mandates processor normalization of any CR, LF, or CR+LF to a single LF (#xA) during parsing for consistent entity handling.^[29] Tools like Microsoft Excel, when processing CSV files, generally expect CR+LF for row boundaries to align with Windows conventions, though they may tolerate variations in quoted fields containing embedded breaks.^[30]

Encoding in Unicode

In Unicode, newline functionality is represented through several dedicated control characters and separators, each serving specific roles in text formatting and line progression. The primary characters include Line Feed (LF, U+000A), which advances the cursor to the next line while maintaining the horizontal position; Carriage Return (CR, U+000D), which returns the cursor to the line start; and Next Line (NEL, U+0085), a control from ISO 6429 that combines both effects in some legacy systems. Additionally, Unicode defines Line Separator (LS, U+2028) for breaking lines within paragraphs without implying a new paragraph, and Paragraph Separator (PS, U+2029) for separating entire paragraphs, both aiding in structured text processing. These characters trace their origins to early Unicode versions, with LF and CR included as part of the basic C0 control set in Unicode 1.0 released in 1991, inheriting from ASCII and ISO standards to ensure compatibility with existing text processing. LS and PS were added later in Unicode 3.0 in 2000, specifically to support bidirectional text layouts in scripts like Arabic and Hebrew, as well as East Asian typography where visual line breaks differ from Western conventions due to vertical writing modes and character widths. Unicode normalization forms, such as Normalization Form C (NFC) and Normalization Form D (NFD), preserve these line break characters without alteration, as they are neither decomposable nor composed with other characters; for instance, LF, CR, LS, and PS remain unchanged during canonical decomposition or composition to maintain text integrity. This stability is crucial for applications involving text transformation, where unintended splitting or merging of lines could disrupt formatting. In multi-byte encodings like UTF-8 and UTF-16, these characters must be treated as atomic units to avoid splitting sequences; for example, in UTF-8, LF encodes as the single byte 0x0A, whereas LS requires the three-byte sequence 0xE2 0x80 0xA8, ensuring no partial reads occur during stream processing. In UTF-16, characters in the BMP such as LS (U+2028) are encoded directly as two bytes (0x20 0x28), whereas characters in higher planes (U+10000 and above) use surrogate pairs, which parsers must handle as atomic units to preserve line semantics.^[31]

Usage Contexts

Operating Systems and Text Files

In Unix-like operating systems, including Linux, the native end-of-line sequence for text files is the line feed (LF) character, as standardized by POSIX for portability across systems. Editors such as vi and Emacs handle this by detecting the file's line ending format upon opening and optionally converting to LF for editing; for instance, Vim uses the :set fileformat=unix command to ensure LF consistency, while Emacs employs set-buffer-file-coding-system with the "unix" argument to normalize endings without altering content encoding. Windows traditionally employs the carriage return plus line feed (CR+LF) sequence in text files, which is the default for applications like Notepad when creating or saving plain text files.^[32] In PowerShell scripting, especially since PowerShell Core (version 6 and later, now PowerShell 7+) released in 2016, uses LF line endings by default across all platforms to support cross-platform compatibility, though this can cause issues with Windows tools expecting CRLF, as discussed in ongoing compatibility reports.^[33] macOS underwent a significant shift in newline handling with the transition to OS X in 2001, moving from the classic Mac OS's single carriage return (CR) to LF for compliance with POSIX standards and Unix heritage, ensuring seamless integration with Unix-based tools and file systems.^[34] Plain text files with a .txt extension exhibit newline variations depending on the originating system or application, leading to potential interoperability challenges. Structured formats like JSON and XML demand consistent normalization of line endings to prevent parsing errors; for example, the XML specification requires processors to normalize all line breaks to LF during input parsing, while JSON parsers may fail on unescaped or mismatched endings in multi-line values unless files are pre-normalized to a single convention. Version control systems like Git, first released in 2005, address these discrepancies by storing text files internally with LF endings regardless of the originating platform, then converting to the system's native format (such as CR+LF on Windows) during checkout via the core.autocrlf configuration setting, which can be set to true for automatic handling or input to enforce LF on commit.^[35] A practical example arises with CSV files generated by Microsoft Excel, which enforces CR+LF as the row delimiter to align with Windows conventions, often resulting in parsing issues when these files are opened in Unix tools like csvkit or awk without prior conversion, as the extra CR may be interpreted as embedded data rather than a line separator.^[30]

Programming Languages

In programming languages, newlines are commonly represented using escape sequences within string literals to insert line feed (LF) or carriage return (CR) characters. For instance, the sequence \n denotes LF in languages such as C, Java, and Python, while \r represents CR, and \r\n can be specified explicitly for the combined sequence used on Windows systems.^[36] Python 3 implements universal newlines, treating \n in file input as a portable representation that automatically handles LF, CR, or CRLF sequences regardless of the platform's native convention, as defined in PEP 278 from 2001.^[37] In contrast, Java provides System.lineSeparator(), a method that returns the platform-specific newline string—such as \n on Unix-like systems or \r\n on Windows—to ensure compatibility with operating system text file conventions in input and output operations.^[38] Modern languages address variability in line endings through flexible APIs; for example, Rust's std::io::BufRead trait, via its lines() method, recognizes both LF and CRLF as line terminators, stripping the delimiter (including the optional CR) without including it in the resulting string, and supports custom line-ending configurations through iterator adaptations.^[39]^[40] Similarly, Go's bufio.[Scanner](/page/Scanner) with the default ScanLines function splits on an optional CR followed by a mandatory LF (matching the regex \r?\n), allowing developers to define custom split functions for other endings.^[41] In JSON strings, the escape sequence \n is interpreted as a literal LF character (Unicode U+000A), preserving the newline in serialized data across language implementations. Newline handling in SQL varies by database system; for instance, PostgreSQL preserves newlines in string literals and text fields when inserted using escape sequences like E'\n', though certain functions such as trim() may collapse leading or trailing whitespace including newlines.^[42] In regular expressions, languages like Perl treat \n as matching only the LF character by default, requiring modifiers such as /s (dotall) to make . match newlines or explicit patterns like \r?\n for broader line-ending support.^[43] A practical example in C++ is the std::getline function from <string>, which reads input until it encounters the platform-default delimiter (typically \n), consumes the delimiter to advance the stream, but excludes it from the output string, helping prevent residual characters in subsequent reads.

Web Technologies and Markup

In HTML, whitespace characters, including newlines, are collapsed into a single space during rendering in normal text flow, preventing multiple spaces or line breaks from affecting layout unless explicitly preserved.^[44] The <br> element provides a mechanism for inserting a single line break, equivalent to a newline in visual rendering, and is commonly used to simulate the effect of newlines in non-preformatted content. However, within the <pre> element, all whitespace—including newlines—is preserved exactly as authored, rendering fixed-width text with explicit line breaks.^[45] Numeric character entities such as 
 (representing LF, U+000A) allow authors to embed line feeds directly in markup where needed. CSS extends control over newline handling through the white-space property, where the pre-line value collapses consecutive whitespace sequences but preserves newlines as line breaks, allowing text to wrap while respecting authored line separations.^[46] This behavior applies to standard LF characters, enabling dynamic formatting in web layouts. Gaps exist in handling Unicode-specific separators like U+2028 (line separator, LS) and U+2029 (paragraph separator, PS), which are treated as non-collapsible segment breaks in pre-line mode but may not always render consistently across browsers in internationalized content.^[47] In XML-based formats like SVG, parsers normalize all line endings—whether CR, LF, or CR+LF—to a single LF (U+000A) before processing, ensuring consistent internal representation regardless of the source file's platform.^[29] Similarly, JSON used in web APIs escapes newlines within strings as \n (denoting LF), adhering to the format's strict syntax for control characters to maintain parsability across systems.^[48] HTTP protocol specifications mandate CR+LF (CRLF) as the line terminator for header fields, separating name-value pairs in requests and responses.^[49] Markdown, as defined in the CommonMark specification (version 0.31.2, released January 2024), treats single newlines in paragraphs as soft breaks that are ignored for rendering, requiring either two trailing spaces followed by a newline or a blank line to produce a hard line break or paragraph separation.^[50] In code blocks, however, raw newlines are preserved literally as LF, maintaining the original formatting for embedded code snippets.

Interpretation and Processing

Software Parsing Behaviors

Software applications and systems interpret newline sequences differently based on their design, platform conventions, and standards, which influences how text is parsed, processed, and rendered during reading and display. In web browsers, HTML parsing collapses sequences of whitespace characters—including newlines (LF or CR+LF)—into a single space, except within elements like <pre> or when the CSS white-space property is set to pre or pre-wrap. This behavior ensures consistent layout rendering across documents but can obscure original formatting unless preserved explicitly. Terminal emulators, such as xterm, map the LF control character to advancing the cursor to the next line while maintaining the horizontal position, and the CR character to moving the cursor to the start of the current line without vertical movement. These mappings align with legacy teletype behaviors and enable precise cursor control in command-line interfaces. Language runtimes often implement flexible parsing to handle cross-platform compatibility. In Java, the BufferedReader.readLine() method operates in a universal newline mode, recognizing any of \r (CR), \n (LF), or \r\n (CR+LF) as a line terminator and returning the line without the terminator.^[51] Similarly, in .NET Framework and .NET Core, the StreamReader.ReadLine() method detects and consumes \r\n, \n, or \r as line endings, normalizing them during text stream processing.^[52] Modern development tools address parsing inconsistencies by auto-detecting and managing newline variants. Visual Studio Code, released in 2015, automatically detects line ending types (LF, CRLF, or CR) upon opening files and displays the current format in the status bar, allowing users to configure detection and normalization to prevent display artifacts. Version control systems like Git handle mixed newline sequences in diff operations through settings such as core.autocrlf, which normalize endings during checkout and commit to ensure consistent comparisons across environments. In POSIX-compliant environments, as defined by IEEE Std 1003.1, a text line consists of zero or more non-newline characters terminated by a (LF), so a CR+LF sequence is parsed as a line content ending with a CR character followed by a newline delimiter, potentially causing visible artifacts like trailing carets in displays unless normalized.^[53] Text editors like GNU Emacs support specialized modes—such as dos-mode for CRLF and mac-mode for CR—to detect and internally convert foreign newline formats to Unix LF for editing, while preserving the original on save. Email clients encounter parsing variations due to protocol requirements. The MIME standard (RFC 2045) mandates CRLF as the canonical line break for message headers and overall structure, but encapsulated body text may contain platform-specific newlines, resulting in display quirks such as extra blank lines or misaligned content if the client does not normalize during rendering.

Format Conversion Methods

Command-line tools provide straightforward methods for converting newline formats, particularly between Unix-style LF and Windows-style CRLF sequences. The dos2unix and unix2dos utilities, originating in the early 1990s, convert files by removing or adding carriage returns as needed; for instance, dos2unix strips trailing \r characters from lines ending in \r\n, while unix2dos inserts \r before existing \n terminators.^[54] These tools are available in most Unix-like systems and process files in batch mode, preserving content while normalizing line endings. Stream editors like sed and awk offer scriptable alternatives for targeted conversions without dedicated binaries. A common sed command to remove carriage returns from DOS-formatted files is sed 's/\r$//' , which substitutes any \r at the end of a line with nothing, effectively converting CRLF to LF. Similarly, awk can process and rewrite lines, such as awk '{sub(/\r$/,""); print}' to strip trailing \r before outputting.^[55] The tr utility simplifies deletion of carriage returns across an entire file using tr -d '\r' < input > output , which removes all instances of the \r character (ASCII 13) from input and redirects to output.^[56] In programming environments, APIs facilitate programmatic newline handling for cross-platform compatibility. Python's os module provides os.linesep , a string representing the native line separator (\r\n on Windows, \n on Unix), which can be used with str.replace() to normalize text; for example, text.replace('\n', os.linesep) converts Unix newlines to the local format before writing to disk.^[57] Node.js's fs module, when reading files with 'utf8' encoding, preserves original byte sequences including mixed newlines, allowing conversion via string methods like text.replace(/\r\n/g, '\n') to unify to LF for processing. Perl supports in-place editing through the $^I variable, set to an extension for backups (e.g., $^I = ".bak"), enabling scripts like perl -i -pe 's/\r\n?//g' to remove CRLF or CR variants directly in the file.^[58] Integrated development environments (IDEs) and cloud services address conversion gaps through automation. IntelliJ IDEA allows configuration of line separators per file or globally via Editor > Code Style settings, with options to change existing files' endings (e.g., from CRLF to LF) and apply normalization during saves if tied to code style schemes.^[59] In cloud storage like AWS S3, objects are stored as immutable bytes, preserving native newline formats without alteration, but transformations can be applied via AWS Lambda functions or S3 Select queries for on-demand conversion during retrieval or processing. Version control systems like Git incorporate line ending filters to manage conversions in cross-platform repositories. Git's smudge and clean filters, defined in .gitattributes files, process files during checkout (smudge: apply local CRLF) and commit (clean: normalize to LF); for example, setting *.txt filter=crlf invokes scripts to handle endings, ensuring consistent storage while adapting to developer platforms.^[35] This approach mitigates compatibility issues by automating transformations at the repository level.^[60]

Common Compatibility Issues

One prevalent compatibility issue arises in version control systems like Git, where files with mixed or platform-specific newline sequences—such as CRLF on Windows versus LF on Unix-like systems—can produce misleading commit diffs that appear to show unnecessary changes to entire files.^[35] This occurs because Git normalizes line endings during commits based on configuration settings like core.autocrlf, leading developers to inadvertently introduce or propagate false modifications across repositories.^[35] In email systems, mixed newline sequences can disrupt automatic line wrapping, causing text to render incorrectly in clients that expect uniform CRLF delimiters as per MIME standards, where any occurrence of CRLF must represent a line break and isolated CR or LF usage is prohibited.^[61] For instance, a message composed with LF-only lines on a Unix system may result in broken formatting or unintended reflow when viewed on Windows-based email software.^[61] Cross-platform deployment exacerbates these problems; for example, shell scripts authored on Windows with CRLF endings often fail on Linux servers because the shebang line (e.g., #!/bin/bash) becomes #!/bin/bash\r, rendering the interpreter path invalid and preventing execution.^[62] Similarly, JSON parsers adhering strictly to RFC 8259 may reject or misparse documents using CR-only line endings, as the specification defines whitespace (including line breaks) but many implementations expect LF or CRLF for structural separation, treating CR as an unescaped control character.^[63] Post-2020 developments in containerization, particularly with Docker, have introduced practices enforcing LF endings in Linux-based images to mitigate portability issues, as CRLF files mounted from Windows hosts can cause runtime errors in scripts or configurations within the container environment. This standardization helps avoid inconsistencies but highlights ongoing challenges in hybrid development workflows. Security risks also stem from unnormalized newline inputs; in web forms, failure to sanitize user-supplied data containing CRLF sequences can enable injection attacks, allowing attackers to append arbitrary HTTP headers and facilitate response splitting or cache poisoning.^[64] The RFC 2046 for MIME recommends CRLF as the standard line break in text parts while acknowledging tolerance for legacy systems using other conventions, yet deviations persist and cause interoperability failures.^[61] A notable example is Microsoft Excel's handling of CSV files, where LF-only endings from Unix sources are often mangled during import, resulting in data appearing in a single row or column misalignment due to improper line detection.^[65] Tools like Vim address detection challenges via options such as ++ff=dos when editing files, which forces interpretation as DOS (CRLF) format to prevent display artifacts from mismatched endings.^[66] Additionally, regular expressions in programming languages may fail if the escape sequence \n (matching LF) is used on CRLF files without accounting for the preceding CR, leading to incomplete pattern matches or parsing errors across platforms. These issues can typically be resolved through format conversion methods that normalize endings to a consistent standard.

Specialized Variants

Reverse Line Feeds

Reverse line feeds, designated as the Reverse Indexing (RI) control function in standards like ECMA-48, enable the printing or cursor position to move upward by one line, countering the downward movement of a standard line feed. This capability facilitates overstriking or overprinting, where subsequent characters are printed over previous ones to simulate effects such as bolding (by reprinting the same text) or underlining (by printing underscore characters beneath the original line) in hardware without dedicated formatting features. The process typically involves a carriage return (CR, ASCII 13) to reposition to the line's start, followed by the RI control (ASCII 141 or U+008D in Unicode) to shift upward, and then outputting the overstrike characters; in some implementations, combinations of backspace (BS, ASCII 8) with line feeds approximate this upward and leftward motion.^[67] In historical contexts, reverse line feeds were integral to dot-matrix printers prevalent from the 1970s through the 1990s, where escape sequences like Epson's ESC j n allowed partial reverse feeding (n/216 inch increments) to align for precise overprinting and emphasis without advanced graphics modes.^[68] Early terminals also utilized them within ANSI escape sequences or direct control characters for text formatting in line-oriented interfaces, supporting applications like document preparation where visual enhancements were achieved through mechanical repetition rather than fonts. The Teletype Model 37 teleprinter, released in 1966, incorporated support for reverse line feed alongside half-forward and half-reverse feeds, enabling sophisticated output like charts and emphasized text on friction or sprocket-fed paper at 100 words per minute.^[69] Modern terminal emulators, including xterm, process these operations via ECMA-48-compliant controls, preserving compatibility for legacy software that relies on RI for formatting. Today, reverse line feeds see limited application primarily in retro computing recreations of vintage systems or emulations of period printers, where they recreate authentic overstrike behaviors. In digital text, similar effects are often emulated using Unicode combining characters, such as U+0332 COMBINING LOW LINE to simulate underlining over existing glyphs without positional reversal. For instance, in early Microsoft BASIC environments supporting extended ASCII, a carriage return could be issued via PRINT CHR$(13); to reposition without advancing, followed by backspaces CHR$(8) to enable overprinting on the current line, though true reverse line feed required the RI character CHR$(141) on 8-bit systems for upward movement.^[70]

Partial Line Feeds

Partial line feeds involve advancing the paper or print position by a fraction of a standard line height, typically half a line (such as 1/12 inch at 6 lines per inch spacing), to enable precise vertical positioning in printing and display systems. These mechanisms, often implemented via custom control codes or mechanical adjustments, differ from full line feeds by allowing incremental movements without completing a full newline operation. In early printing contexts, partial feeds were achieved through dedicated commands like "Half Line Feed Forward" and "Half Line Feed Reverse," which facilitated vertical motions for enhanced text formatting.^[71] A primary application of partial line feeds appears in typewriters and impact printers for creating subscripts and superscripts, where the platen or carriage advances halfway to position smaller characters relative to the baseline. For instance, IBM Wheelwriter typewriters, such as the Model 1000, use a key combination (Code + H) to move the paper one-half line downward for subscript entry, followed by typing and an automatic return to the baseline upon completion. Similarly, in dot-matrix printers supporting Epson ESC/P commands, sequences like ESC j n enable reverse partial feeds of n/216 inch, allowing fine adjustments for subscript rendering in documents. These techniques were essential in pre-digital typesetting to approximate mathematical or scientific notation without dedicated fonts.^[72]^[73] Partial line feeds have been used in early terminal processing to precisely position output vertically. For example, early Multics terminal processing used vertical half-line feed characters to position output, matching the number of feeds to the height of preceding glyphs for accurate rendering, such as for superscripts and subscripts.^[74] Although Unicode's variation selectors provide glyph alternatives, they do not directly control positioning, leaving partial feeds as a legacy solution for fine vertical adjustments. Despite their utility, partial line feeds lack a widespread digital standard today, remaining largely confined to legacy hardware and printer emulations. In modern web technologies, fractional effects are simulated via CSS properties like line-height set to values such as 0.5em, which adjusts the height of line boxes without altering newline semantics, though this does not replicate true partial advances. Printer languages like HP PCL 5 include explicit half line-feed controls (e.g., moving the cursor one-half line upward or downward), often via escape sequences for compatibility with older workflows. PostScript, while not relying on ESC codes, approximates partial line feeds through relative y-offset commands like rmoveto for fine-grained positioning in document composition. Related techniques, such as reverse line feeds, complement partial advances by enabling upward movements for overwriting or alignment corrections.

References

[1]
[PDF] USAS X3.4-1968
LF (Line Feed): A format effector which controls the movement of the printing position to the next print- ing line. (Applicable also to display devices.) Where ...
[2]
RFC 5198 - Unicode Format for Network Interchange
1. Characters MUST be encoded in UTF-8 as defined in [RFC3629]. · 2. If the protocol has the concept of "lines", line-endings MUST be indicated by the sequence ...<|control11|><|separator|>
[3]
Christopher Latham Sholes | QWERTY keyboard, typewriter, printer
Oct 15, 2025 · With Glidden and Soulé, Sholes was granted a patent for a typewriter on June 23, 1868; later improvements brought him two more patents, but he ...
[4]
The History of the Typewriter - Science | HowStuffWorks
Invented in the early 19th century, typewriters offered speed, efficiency and legibility, paving the way for subsequent technological advancements.
[5]
Electric typewriter | writing technology - Britannica
Oct 15, 2025 · The first electrically operated typewriter, consisting of a printing wheel, was invented by Thomas A. Edison in 1872 and later developed into ...
[6]
Émile Baudot Invents the Baudot Code, the First Means of Digital ...
It was a 5-bit code, with equal on and off intervals, which allowed telegraph transmission of the Roman alphabet and punctuation and control signals. It was ...Missing: line | Show results with:line
[7]
[PDF] The Evolution of Character Codes, 1874-1968
Émile Baudot's printing telegraph was the first widely adopted device to encode letters, numbers, and symbols as uniform-length binary sequences. Donald Murray ...
[8]
[PDF] Some Notes on Teletype Corporation - DeRamp
Mar 1993. The original TWX goes back to about 1930, used 3-row machines, and manual switchboards. In fact the introduction of TWX was what caused AT&T to ...
[9]
Why does Linux use LF as the newline character?
Dec 19, 2017 · LF was preferred to CR because the latter still had a specific usage. By repositioning the printed character to the beginning of the same line, ...Why in the output of script (1) the newline is CR + LF (dos-style)?Auto-translation of newline characters in the terminalMore results from unix.stackexchange.com
[10]
Networking & The Web | Timeline of Computer History
By the early 1960s many people can share a single computer, using terminals (often repurposed teleprinters) to log in over phone lines. These timesharing ...
[11]
Doug Jones's punched card codes - University of Iowa
The IBM 024 keypunch was identical to the 026, except that it did not include a printing mechanism. Formally, the 026 was called a printing keypunch because of ...
[12]
Special characters used with line data - IBM
There are two types of special characters that can be used with line data; carriage control characters (CC) and table reference characters (TRC).Missing: early | Show results with:early
[13]
[PDF] The Evolution of Character Codes, 1874-1968 - FalseDoor.com
American Standard Code for Information Interchange, ASA. X3.4-1963, American Standards Assocation, June 17, 1963,. United States National Bureau of Standards ...
[14]
What Is ASCII (American Standard Code for Information ...
Jan 6, 2025 · ASCII was developed and published in 1963 by the X3 committee, a part of the ASA (American Standards Association). The ASCII standard was ...
[15]
[PDF] TECO Reference Manual digital equipment corporation
Line feed and form feed characters are inserted automatically by TECO. A line feed is automatically appended to every carriage return entered into the ...
[16]
ms dos - Why is Windows using CR+LF and Unix just LF when Unix ...
May 3, 2018 · Windows and MS-DOS use the control characters CR+LF (carriage return ASCII 13 followed by line feed ASCII 10) for new lines, while Unix uses just LF.Why did Acorn use LF+CR instead of CR+LF as a line ending?What is the historical origin of lone CR as a line terminator?More results from retrocomputing.stackexchange.com
[17]
Today's Internet Still Relies on an ARPANET-Era Protocol
Jul 29, 2020 · ASCII included 32 device and transmission control characters—such as ACK (acknowledge), EOT (end of transmission), and SYN (synchronous idle)— ...
[18]
Control characters in ASCII and Unicode - Aivosto
As to some specific control characters, the current detailed definitions of SI, SO and ESC can be found in ANSI X3.41, ISO 2022 and ECMA-35. The current details ...
[19]
History of Ctrl-S and Ctrl-Q for flow control
Aug 13, 2018 · The Beginning. This brings us to the introduction of the very first control characters with ITA2 of 1924 as: ... What file systems / encapsulation ...When did IBM start to use ASCII? - Retrocomputing Stack Exchangeascii - Are the control characters any useful nowadays?More results from retrocomputing.stackexchange.com
[20]
ASCII character set - Lammert Bies
In the original 1963 definition of the ASCII standard the name start of message was used, which has been renamed to start of heading in the final release.
[21]
Why is the line terminator CR+LF? - The Old New Thing
Mar 18, 2004 · CR stands for “carriage return” – the CR control character returned the print head (“carriage”) to column 0 without advancing the paper. LF ...
[22]
How prevalent is the CR (classic MacOS) line ending today? [closed]
Sep 30, 2021 · The CR LF (Carriage Return, Line Feed) is the (standard!) ASCII way to tell a printer to start at the beginning of a new line. Unix is a rascal, ...ms dos - Why is Windows using CR+LF and Unix just LF when Unix ...Why did Acorn use LF+CR instead of CR+LF as a line ending?More results from retrocomputing.stackexchange.com
[23]
How did EBCDIC files on IBM mainframes identify newlines?
Apr 16, 2025 · Line separator for EBCDIC systems is NL or x'15' in EBCDIC. Usually translated as NEL, x'85' in ASCII/8859-1/Unicode.Missing: 0x85 | Show results with:0x85
[24]
RFC 4180 - Common Format and MIME Type for Comma-Separated ...
This RFC documents the format of comma separated values (CSV) files and formally registers the "text/csv" MIME type for CSV in accordance with RFC 2048.
[25]
[PDF] The JSON Data Interchange Syntax - Ecma International
\n represents the line feed character (U+000A). \r represents the carriage return character (U+000D). \t represents the character tabulation character (U+0009).
[26]
https://retrocomputing.stackexchange.com/questions/31580/how-did-ebcdic-files-on-ibm-mainframes-identify-newlines
[27]
CSV Line Endings - Microsoft Q&A
Nov 30, 2016 · See RFC 4180. Is there anyway to export using the correct line ending: /n ? A workaround at this stage is to use the Windows Comma Separated .
[28]
Introducing extended line endings support in Notepad
May 8, 2018 · New files created within Notepad will use Windows line ending (CRLF) by default, but it will now be possible to view, edit, and print existing ...
[29]
LF instead of CRLF on Windows devices can break e.g. signatures
Mar 27, 2025 · On Windows and in PowerShell, only “CR/LF” should be used—unless the user explicitly creates something different (e.g., with \n) ...
[30]
A Line Break Is a Line Break - Mac OS X Hacks - O'Reilly
The Mac, by default, uses a single carriage return ( <CR> ), represented as \r . Unix, on the other hand, uses a single linefeed ( <LF> ), \n . Windows goes ...
[31]
Configuring Git to handle line endings - GitHub Docs
You can configure Git to handle line endings automatically so you can collaborate effectively with people who use different operating systems.
[32]
Escape Sequences | Microsoft Learn
Dec 12, 2023 · Character combinations consisting of a backslash (\) followed by a letter or by a combination of digits are called escape sequences.
[33]
PEP 278 – Universal Newline Support | peps.python.org
Jan 14, 2002 · This PEP discusses a way in which Python can support I/O on files which have a newline format that is not the native format on the platform.
[34]
https://www.oreilly.com/library/view/mac-os-x/0596004605/ch01s06.html
[35]
https://docs.github.com/en/get-started/git-basics/configuring-git-to-handle-line-endings
[36]
1212-line-endings - The Rust RFC Book
The following functions will have to be changed: BufRead::lines and str::lines . They both should treat '\r\n' as marking the end of a line. This can be ...
[37]
https://peps.python.org/pep-0278/
[38]
Documentation: 18: 4.1. Lexical Structure - PostgreSQL
SQL input consists of a sequence of commands. A command is composed of a sequence of tokens, terminated by a semicolon (“;”). The end of the input stream ...
[39]
perlre - Perl regular expressions - Perldoc Browser
Perl regular expressions are strings with a specific syntax used to match patterns against target strings, often using delimiters like forward slashes.5.8.1 · 5.6.0 · 5.005 · Perlretut
[40]
https://rust-lang.github.io/rfcs/1212-line-endings.html
[41]
https://pkg.go.dev/bufio#ScanLines
[42]
Text
### Definition of `white-space: pre-line` and Newline Handling
[43]
https://perldoc.perl.org/perlre
[44]
https://html.spec.whatwg.org/multipage/semantics.html#whitespace
[45]
https://html.spec.whatwg.org/multipage/grouping-content.html#the-pre-element
[46]
https://www.w3.org/TR/CSS2/text.html#white-space-prop
[47]
https://www.w3.org/TR/css-text-3/#white-space-rules
[48]
How to remove carriage returns from text files on Linux
The dos2unix command is probably the easiest to remember and most reliable way to remove carriage returns from text files.
[49]
None
Nothing is retrieved...<|separator|>
[50]
tr
### Summary: Usage of `tr` Command for Deleting \r Carriage Return
[51]
https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/io/BufferedReader.html#readLine()
[52]
Modifying a File in Place with -i Switch - Perl Cookbook [Book]
The -i command-line switch modifies each file in place. It creates a temporary file as in the previous recipe, but Perl takes care of the tedious file ...
[53]
Line separators | IntelliJ IDEA Documentation - JetBrains
Jul 23, 2025 · With IntelliJ IDEA, you can set up line separators (line endings) for newly created files and change the line separator style for existing files.
[54]
gitattributes(5) - Linux manual page - man7.org
By default these commands process only a single blob and terminate. If a long running process filter is used in place of clean and/or smudge filters, then Git ...
[55]
https://www.gnu.org/software/gawk/manual/gawk.html
[56]
Bash - seamlessly run scripts with CRLF line endings - Stack Overflow
Apr 13, 2015 · How to run shell script located on Linux server from Windows environment? Related. 12 · How can I configure bash to handle CRLF shell scripts?Are shell scripts sensitive to encoding and line endings?How to convert Windows end of line in Unix end of line (CR/LF to LF)More results from stackoverflow.com
[57]
RFC 8259 - The JavaScript Object Notation (JSON) Data ...
JavaScript Object Notation (JSON) is a lightweight, text-based, language-independent data interchange format. It was derived from the ECMAScript Programming ...
[58]
CRLF Injection - OWASP Foundation
A CRLF Injection attack occurs when a user manages to submit a CRLF into an application. This is most commonly done by modifying an HTTP parameter or URL.
[59]
Excel and line endings - Nice R Code
Apr 30, 2013 · Basically, saving a file as comma separated values (csv) uses a carriage return \r rather than a line feed \n as a newline. Way back before OS X ...
[60]
usr_23 - Vim documentation
The automatic detection avoids this. Suppose you do want to edit the file that way? Then you need to overrule the format: :edit ++ff=unix file.txt The "++ ...
[61]
colcrt(1) - Linux manual page - man7.org
colcrt provides virtual half-line and reverse line feed sequences for terminals without such capability, and on which overstriking is destructive.
[62]
[PDF] User Manual - FX-80/100 Vol. 1 - Epson
In the old days, dot-matrix printers could not underline words. Even in the ... The FX-80 can use reverse feed to return to the original print line,.<|control11|><|separator|>
[63]
[PDF] Teletype Model 37 AUTOMATIC SEND-RECEIVE (ASR ...
Half-forward, half-reverse, and reverse line feed. • Carriage return on receipt of VT or FF characters. • Operating speed of 100 wpm (10 characters a second) ...
[64]
Chr function (Visual Basic for Applications) | Microsoft Learn
Sep 13, 2021 · For example, Chr(10) returns a linefeed character. The normal range for charcode is 0–255. However, on DBCS systems, the actual range for ...Missing: reverse | Show results with:reverse
[65]
[PDF] Complete clear text representation of scientific documents in ...
are "Half Line Feed Forward" and "Half Line Feed Reverse". These are vertical motions on a page. They permit placement of superscripts and subscripts. As ...
[66]
[PDF] IBM Wheelwriter 1000 User's Guide
Typing Subscripts. Hold down Code while you press H (½ ↓). The paper moves one-half line below the typing line. Type the subscript character and the paper moves ...
[67]
[PDF] Software Developer's Manual - ESC/P Command ... - Brother USA
ESC/P is one type of control codes used for printers. With the codes introduced in this document, various labels can be created and printed.Missing: partial | Show results with:partial
[68]
Remote terminal character stream processing in multics
The number of vertical half line feed characters is exactly the number needed to move the carriage from the lowest character of the preceding print position ...