Fact-checked by Grok 2 weeks ago

Caret notation

Caret notation is a convention in computing for representing the non-printable control characters (ASCII codes 0–31 and 127) of the ASCII character set by prefixing a caret symbol (^) to a corresponding printable character, mimicking the effect of pressing the Control key with that character. For instance, ^A denotes the Start of Heading character (ASCII 1), ^G the Bell character (ASCII 7), ^M the Carriage Return (ASCII 13), and ^Z the Substitute character (ASCII 26), while special mappings include ^@ for Null (0), ^[ for Escape (27), and ^? for Delete (127). This two-character sequence provides a compact, human-readable way to visualize and input these otherwise invisible characters in text-based interfaces. Widely adopted since the development of early text editors and terminal systems, caret notation originated as a practical shorthand tied to keyboard control mechanisms and has become standard in Unix-like operating systems, programming documentation, and tools for file inspection. In applications like the Emacs editor, it supports efficient command invocation, such as ^X ^F to open a file, while in GNU Screen, the terminal multiplexer, it documents key bindings like ^A d to detach a session or ^A c to create a new window. It also aids in debugging and data analysis by displaying control sequences in hex editors or logs without executing their effects, though alternatives like hexadecimal (\x01) or octal (\001) escapes are used in source code for precision. The notation's persistence stems from its alignment with hardware-level input (Control key combinations) and its role in maintaining compatibility across diverse computing environments.

Fundamentals

Definition and Purpose

Caret notation is a convention for representing non-printable ASCII control characters using the symbol (^) followed by an uppercase letter or specific symbol, corresponding to the 33 control codes in the ASCII standard: values 0 through 31 and 127. This notation provides a compact, mnemonic way to denote these characters in textual contexts, where the letter following the caret typically represents the uppercase equivalent of the control code's bit pattern (e.g., ^A for code 1). ASCII control characters are a subset of the character set defined in the American Standard Code for Information Interchange (ASCII), consisting of non-printable codes intended for device control, text formatting, or data transmission rather than visual display. Examples include the line feed (code 10), which advances the cursor to the next line, and the horizontal tab (code 9), which moves the cursor to the next tab stop. These characters are "invisible" in output, as they do not produce visible glyphs but instead trigger specific hardware or software behaviors, such as (code 13) for returning the cursor to the line start. The primary purpose of caret notation is to facilitate the human-readable depiction of these characters in environments where direct rendering is impossible or impractical, such as files, command-line interfaces, or programming . By converting control codes into printable strings like ^G for the (code 7), it bridges the divide between low-level binary signals and accessible textual descriptions, aiding in and communication. This approach is particularly valuable in software libraries, such as those implementing the unctrl() function in systems, which automatically generate such representations for display purposes. Among its benefits, caret notation improves clarity and usability in technical contexts by avoiding more cumbersome alternatives like or values, allowing developers and users to quickly recognize and reference sequences in logs, messages, and manuals without specialized tools. For instance, in output, ^D (code 4) can succinctly indicate an signal, enhancing readability over raw byte values.

Syntax and Mapping

Caret notation represents non-printable ASCII control characters (codes 0–31 and 127) using a caret symbol (^) followed immediately by an uppercase letter from A to Z or a specific symbol, providing a textual way to denote these otherwise invisible characters. For the standard alphabetic mappings, ^X denotes the ASCII control code equal to 1 plus the position of X in the alphabet minus 1, where A is position 1, B is 2, and so on up to Z as 26; thus, ^A corresponds to code 1 (Start of Heading, SOH), ^B to code 2 (Start of Text, STX), and ^Z to code 26 (Substitute, SUB). Certain control codes beyond the A–Z range use special symbols after the caret: ^@ for code 0 (Null, NUL), ^[ for code 27 (Escape, ESC), ^\ for code 28 (File Separator, FS), ^] for code 29 (Group Separator, GS), ^^ for code 30 (Record Separator, RS), ^_ for code 31 (Unit Separator, US), and ^? for code 127 (Delete, DEL). These mappings cover all 33 ASCII control characters, with no notation defined for printable characters in the range 32–126, as they are represented directly. The notation is case-insensitive, meaning ^a is equivalent to ^A, though uppercase letters are conventionally used for consistency in documentation and displays. The following table lists all caret notations with their corresponding ASCII decimal values and standard names:
CaretDecimalName
^@0Null (NUL)
^A1Start of Heading (SOH)
^B2Start of Text (STX)
^C3End of Text (ETX)
^D4End of Transmission (EOT)
^E5Enquiry (ENQ)
^F6Acknowledgment (ACK)
^G7Bell (BEL)
^H8Backspace (BS)
^I9Horizontal Tab (HT)
^J10Line Feed (LF)
^K11Vertical Tab (VT)
^L12Form Feed (FF)
^M13Carriage Return (CR)
^N14Shift Out (SO)
^O15Shift In (SI)
^P16Data Link Escape (DLE)
^Q17Device Control 1 (DC1)
^R18Device Control 2 (DC2)
^S19Device Control 3 (DC3)
^T20Device Control 4 (DC4)
^U21Negative Acknowledgment (NAK)
^V22Synchronous Idle (SYN)
^W23End of Transmission Block (ETB)
^X24Cancel (CAN)
^Y25End of Medium (EM)
^Z26Substitute (SUB)
^[27Escape (ESC)
^\28File Separator (FS)
^]29Group Separator (GS)
^^30Record Separator (RS)
^_31Unit Separator (US)
^?127Delete (DEL)

Historical Development

Origins in Early Computing

Caret notation originated as a method to visually represent non-printing control characters in early computing systems, initially employing an up arrow symbol (↑) rather than the modern caret (^). Developed by (DEC) for the computer, this convention first appeared in 1964 alongside the system's introduction. The , DEC's initial foray into larger-scale , began development in spring 1963 and saw its first shipments in summer 1964, with the notation documented in the PDP-6 Monitor brochure from 1965, where is depicted as ↑C on page 4. This notation emerged within the ecosystem of teletype and terminal systems prevalent in mid-1960s computing, where devices like the ASR served as primary interfaces for the PDP-6. Control codes, such as those for carriage return, line feed, and interrupt signals, were essential for managing text-based interactions but could not be directly printed, necessitating symbolic representations for , documentation, and operator training. The up arrow-letter pairing provided a compact, readable way to indicate these invisible signals on paper output or in manuals, addressing the limitations of mechanical teletypes that punched or printed without distinguishing controls visually. The convention is attributed to the engineering team behind the series, including designers of the monitor and associated software, with early adoption in the MACRO-6 and system documentation released in February 1965. Initially confined to the 32 basic control characters defined in the 1963 ASCII standard (decimal codes 0–31 or 000–037), the notation leveraged the up arrow character (ASCII code 136 in early versions) for its symbolic resemblance to an elevated or "control" modifier over the accompanying letter. This choice facilitated clear communication in resource-constrained environments, though it predated broader standardization efforts.

Evolution and Standardization

The 1967 revision of the American Standard Code for Information Interchange (ASCII), formally published in 1968 as ANSI X3.4-1968, marked a pivotal transition in the representation of non-printable control characters by replacing the up-arrow symbol (↑) at code point 94 (0x5E) with the caret (^). This change was necessitated by the addition of lowercase letters to the character set, which required reorganizing positions to accommodate new graphic symbols above them, and by alignment with emerging international standards such as ECMA-6 (1965) and ISO/IEC 646, which designated code point 0x5E for the circumflex diacritic rather than an up-arrow. The up-arrow, featured in the initial 1963 ASCII, had been short-lived and became obsolete, as it was not widely available on keyboards and posed printing challenges on standard teletypes and terminals; the caret's adoption facilitated more reliable textual depiction of control sequences in documentation and output. By the early 1970s, caret notation had been integrated into Unix system documentation and tools, reflecting its growing acceptance as a convention for denoting characters in operating system interfaces and user manuals. Developed amid the evolution of Unix at from onward, this notation appeared in early implementations for echoing and interpreting non-printable inputs, such as in teletype emulations and command-line tools, providing a compact, printable alternative to or representations. Its persistence was bolstered by ASCII's foundational role in shaping subsequent standards, including ISO 646 (1973), which retained the caret at 0x5E, and early development in the , where ASCII controls were directly incorporated without altering the notation tradition. A key milestone in formalization came through its inclusion in (RFCs) for network protocols during the 1970s and beyond, such as RFC 854 (1983) on the protocol, which described control signal transmissions like the Interrupt Process over remote terminals. This adoption extended to standards, where it became the prescribed method for representing controls in specifications like IEEE Std 1003.1 (1988 onward), particularly in terminal handling utilities such as stty, ensuring portability across systems. Overall, these developments solidified caret notation as the enduring standard, addressing legacy printing limitations while supporting in ecosystems.

Applications in Computing

Keyboard Input Mechanisms

The primary mechanism for inputting control characters on standard keyboards involves holding down the Control (Ctrl) key while pressing a corresponding letter or symbol key, which generates the associated ASCII control code by clearing the sixth and seventh bits (values 32 and 64 in decimal) of the character's binary representation. For instance, pressing Ctrl+A produces ASCII code 1 (Start of Heading, SOH), as the uppercase 'A' (ASCII 65) has its higher bits cleared to yield 00000001 in . This method aligns with caret notation mappings, such as ^A representing the input from Ctrl+A. This input approach originated from teletypewriter hardware in the early , where the Ctrl key modified electrical signals to transmit non-printing control codes over telegraph lines, evolving from mechanisms that used shift-like modifiers for upper-case and special functions. By the 1960s, with the adoption of the and the ASCII standard (ANSI X3.4-1968), this Ctrl-based input became standardized on keyboard layouts for computer terminals, enabling reliable generation of the 32 C0 control characters (ASCII 0-31 and 127). Special cases include Ctrl+@, which inputs the Null (NUL, ASCII 0) , as '@' (ASCII 64) clears to all zeros when modified by Ctrl; similarly, Ctrl+? generates the Delete (DEL, ASCII 127) on many systems as a conventional mapping. C1 control codes (128-159, per ISO 6429/ECMA-48) are typically input using escape sequences (e.g., followed by a ) rather than direct key combinations, though some terminals may support extended mappings that vary by . In modern virtual keyboards and emulators, these Ctrl mappings remain consistent with historical ASCII conventions, preserving compatibility across software environments. However, non-ASCII layouts like have different physical key positions for letters compared to (e.g., 'A' occupies the position of 'Q'), requiring users to press Ctrl with the layout-specific key that produces the intended character code, which can lead to inconsistencies in cross-layout sessions.

Display and Interpretation in Software

In operating systems, software tools commonly employ caret notation to render non-printable control characters for enhanced readability in text streams and displays. This substitution occurs when programs detect ASCII control codes (values 0–31 and 127) in input data, replacing them with a symbol (^) followed by the corresponding uppercase letter, such as ^J for line feed (ASCII 10) or ^M for (ASCII 13). The notation makes invisible characters explicit without altering the underlying data, aiding users in identifying and troubleshooting issues in raw text or binary files. Several standard Unix utilities integrate this display mechanism natively. The command with the -A (--show-all) option equivalents to -vET, where -v uses caret notation for most non-printing characters (except line feed and tab), displaying them as ^X and marking tabs as ^I while appending $ to line ends. Similarly, the less pager shows control characters in caret notation by default, such as ^M for , unless overridden by options like -r for display, which can lead to disruptions. The more pager follows historical by rendering controls below ASCII 127 as ^ followed by the letter from '@' (e.g., ^G for bell, ASCII 7) and ^? for delete (ASCII 127). In the vi editor (and its modern variant Vim), control characters appear as ^X during editing, preserving the original byte while allowing visual inspection. Software interpretation of caret notation involves input streams to identify control codes and applying the reversibly in interactive environments. Detection typically relies on ASCII value checks, with the caret representation serving as a mnemonic overlay that can be toggled or undone; for instance, in Vim, users can insert or replace the actual (e.g., via Ctrl-V followed by the key) while the display reverts to ^X, ensuring the file stores the binary value unchanged. This bidirectional handling prevents and supports editing tasks where precise placement is needed. In practical applications, caret notation facilitates by visualizing hidden controls in file contents and logs. For example, when examining files or text with embedded controls, tools like cat -A reveal sequences that might otherwise cause formatting errors, such as stray ^Z suspend signals in scripts. Hex editors, such as bvi, often pair views with an ASCII pane that flags controls via caret or similar indicators to highlight non-printable bytes during low-level . analyzers and debugging utilities benefit similarly; captured packet payloads piped through cat -A or less expose control characters like ^D (end-of-transmission) in raw data streams, helping diagnose protocol anomalies without resorting to full hex dumps. Modern software extends this functionality with configurable options, though implementations vary. Integrated development environments (IDEs) like Visual Studio Code offer a "Render Control Characters" toggle under View > Appearance, which visually represents non-printable controls as shaded blocks (░) rather than traditional caret notation, improving visibility in code but requiring extensions for exact ^X rendering if needed. Terminal emulators such as xterm display caret notation faithfully when applications like less or cat output it, with customization via escape sequences or resource files to adjust how controls are interpreted and shown. However, many graphical user interface (GUI) applications, including some text viewers and word processors, suppress control characters entirely or replace them with generic placeholders, potentially hiding issues unless verbose modes are enabled, which underscores ongoing gaps in universal adoption.

Alternatives and Extensions

Alternative Notations

In certain computing environments, particularly those developed by in the during the , a vertical bar notation served as an alternative to caret notation for representing control characters. This system, used in the Microcomputer and later machines, prefixed a vertical bar (|) followed by a letter or symbol to denote the corresponding ASCII control code, mirroring the Ctrl-key combination. For instance, |M represented (ASCII 13), while |@ denoted the (NUL, ASCII 0). Hexadecimal and decimal representations provided another common alternative, especially in programming languages and where symbolic notation was impractical. These methods directly specified the ASCII , such as 0x01 for start of heading (SOH) or \001 in escape sequences, without relying on alphabetic mappings like those in caret notation. Historically, prior to the widespread adoption of caret notation in the late , an up-arrow notation was used in early ASCII and systems to represent characters. Dating back to the 1963 ASCII standard (X3.4-1963), where the up-arrow symbol (↑) occupied the position later assigned to the caret (^) in 1967, the notation employed the up-arrow symbol in a manner similar to the modern caret for characters. The shift to caret occurred due to international standardization needs for diacritical marks. Mnemonic abbreviations offered a further variant in technical documentation and standards, using short names derived from the character's function rather than symbols. Examples include BS for (ASCII 8) and HT for horizontal tab (ASCII 9), as standardized in protocols like ISO 6429 and RFC 1345. Compared to the universal caret notation, which aligns directly with keyboard Ctrl combinations across most systems, these alternatives were more context-specific: the vertical bar system was largely confined to Acorn's 6502- and ARM-based microcomputers like the and series, while up-arrow and mnemonic forms appeared primarily in pre-1968 historical texts or specialized references without broad software support.

Handling Extended Control Codes

Caret notation is inherently limited to the C0 control codes (decimal 0–31 and 127) within the 7-bit ASCII framework, providing no standardized symbols for the C1 control codes (decimal 128–159) defined in international standards. For instance, the Single Shift Two (SS2) function, which temporarily invokes the character set and is encoded as 0x8E, lacks a conventional caret equivalent, as the notation was developed for basic ASCII environments without 8-bit extensions. This confinement arises from the historical focus on 7-bit systems, where C1 codes were either unavailable or handled separately through mechanisms. Common workarounds for denoting C1 codes in documentation and code involve hexadecimal representations, such as 0x8E or \x8E in string literals across languages like C and Python. In Unicode contexts, C1 codes are precisely identified by their code points, for example U+0085 for Next Line (NEL), enabling unambiguous reference in multilingual text processing without relying on terminal-specific mnemonics. These methods ensure compatibility in environments where direct byte insertion might lead to encoding errors. In contemporary UTF-8 terminals, C1 codes are rarely employed as single bytes due to potential conflicts with multi-byte sequence decoding; instead, equivalent behaviors are implemented via ANSI escape sequences, which leverage caret notation only for the leading Escape character (^[, or 0x1B) followed by the sequence, as in ^[[ for Control Sequence Introducer (CSI). This approach, rooted in VT-series terminal standards, prioritizes sequence-based controls over isolated C1 bytes for formatting and cursor operations. Web protocols and data interchange formats further diminish the need for caret-like notations by mandating Unicode escapes for all control characters, including C1; for example, JSON strings represent NEL as \u0085, promoting safe across diverse systems without interpretation ambiguities.

References

  1. [1]
    [PDF] the-craft-of-text-editing.pdf - MIT
    Control Characters . ... to show such characters in caret notation (for a complete list of the caret notation, see.
  2. [2]
    [PDF] Screen - GNU.org
    Please use the caret notation (^A instead of C-a) as arguments to ... screen can trap flow control characters or pass them to the program, as you see fit.
  3. [3]
    unctrl
    The unctrl() function generates a character string that is a printable representation of c. If c is a control character, it is converted to the ^X notation.Missing: caret | Show results with:caret
  4. [4]
    The Standard ASCII Character Set and Codes
    The ASCII characters with codes in the range 0 to 31 and which are called control characters (or control codes) are "invisible", i.e., they do not represent ...Missing: definition | Show results with:definition
  5. [5]
    Why are special characters such as "carriage return" represented as ...
    Jun 5, 2014 · Caret notation is a notation for unprintable control characters in ASCII encoding. The notation consists of a caret (^) followed by a capital letter.
  6. [6]
    ASCII and ANSI Character Table - Gaijin.at
    The caret notation (in column "C") is often used in terminals to display control characters. These can usually be entered using the control key ( Ctrl ). For ...
  7. [7]
    Appendix > ASCII control code chart
    ASCII control code chart ; Decimal. Hexadecimal. Abbreviation. Caret notation[b]. Name ; 0. 00. NUL. ^@ · Null character ; 1. 01. SOH. ^A · Start of Header.
  8. [8]
    [PDF] pdp-6 time sharing software - 00320 00330 - Bitsavers.org
    Another function of the time-sharing Monitor is to process input/output commands. Only one user at a time is permitted to operate each particular device. The ...
  9. [9]
    The evolution of the DECsystem 10 - ACM Digital Library
    The PDP-6 Monitor was constrained to run in a 16. Kword (minimum) machine with console printer, paper tape reader (for maintenance) and two DECtape units ...Missing: arrow | Show results with:arrow
  10. [10]
    Why do we use caret (^) as the symbol for ctrl/control?
    May 1, 2019 · The caret character ( ^ ) has been used to indicate Ctrl -key combinations since the early UNIX days, if not earlier.
  11. [11]
    [PDF] PROGRAMMING MANUAL MACRO-6
    PDP-6 PROGRAMMING MANUAL. MACRO-6 ASSEMBLY LANGUAGE. DIGITAL EQUIPMENT CORPORATION • MAYNARD. MASSACHUSETTS. Page 4. Copyright 1965 by Digital Equipment ...
  12. [12]
    local copy
    ... PDP-6 (the predecessor to the >PDP-10 and the DEC-20) used uparrow (the ... <up-arrow><character>" was used as the closest, convenient ASCII ...
  13. [13]
    World Power Systems:Texts:Annotated history of character codes
    This document is about character codes, specifically a history of ASCII (1) , the American Standard Code for Information Interchange, and it's immediate ...
  14. [14]
    RFC 854: Telnet Protocol Specification
    ### Summary of RFC 854: Telnet Protocol Specification
  15. [15]
    ASCII - Engineering Information Technology - University of Maryland
    ASCII 1 through ASCII 26 are also known as control characters. They are generated by simultaneously pressing the CTRL key and the alphabet keys, e.g, CTRL-A.<|separator|>
  16. [16]
    Control keys and control characters - Applied Mathematics Consulting
    Sep 28, 2019 · When you hold down the control key and type a letter, it may produce a corresponding control character which differs from the letter by flipping ...
  17. [17]
    Teletype Machines - Columbia University
    Teletypes in one form or another go back to about 1907. They were used originally as automatic Telegraph and Telegram machines. Teletypes reached their familiar ...
  18. [18]
    4 ANSI Control Functions Summary - VT100.net
    DEL is not used as a fill character. Digital does not recommend using DEL as a fill character. Use NUL instead. Table 4–2 C1 (8-Bit) ...
  19. [19]
    France's new AZERTY keyboard layout - Globalization
    Dec 4, 2024 · This keyboard layout is designed to enable you to enter all the accented characters and ligatures commonly used in French.
  20. [20]
    cat(1) - Linux manual page
    ### Summary of `-A` or `--show-all` Option
  21. [21]
    more(1p) - Linux manual page
    ### Summary of How `more` Displays Control Characters and Caret Notation
  22. [22]
    less(1) - Linux manual page
    Summary of each segment:
  23. [23]
    Vim: starting.txt
    ### Summary: How Vim or vi Displays Control Characters Using Caret Notation
  24. [24]
    bvi(1): visual editor for binary files - Linux man page - Die.net
    You can enter the characters from the keyboard (in the ASCII window) or hexadecimal values (in the HEX window). Type TAB to switch between these two windows.
  25. [25]
    Visual Studio Code tips and tricks
    Select a symbol then type Shift+Alt+F12 to open the References view showing all your file's symbols in a dedicated view. Rename Symbol. Select a symbol then ...
  26. [26]
    Important control characters aren't rendered when "editor ... - GitHub
    Feb 18, 2021 · This issue is specifically about the issue that VS Code has a setting called "render control characters" which does not render control ...
  27. [27]
    Resident Star Commands - R. T. Russell: BBC BASIC (86) Manual
    For example, |M indicates CR (carriage-return), |? indicates DEL, || indicates the escape character itself and |! causes bit 7 of the following character to ...
  28. [28]
    None
    Below is a merged summary of the control character notation using the vertical bar (`|`) in the *BBC Microcomputer Advanced User Guide*, consolidating all provided segments into a single, comprehensive response. To maximize detail and clarity, I’ve organized key information into a table where appropriate, while retaining narrative explanations for context. The response includes all examples, section references, and useful URLs from the original summaries.
  29. [29]
    Control characters in ASCII and Unicode - Aivosto
    History of ASCII control characters. The first version of ASCII was released in 1963. Like the ASCII of today, the 1963 version covered some letters and ...
  30. [30]
    RFC 1345 - Character Mnemonics and Character Sets
    ... character acronyms of ISO 2047 are used as mnemonics. For the other control characters of ISO 6429, two-character mnemonics ... BS 0008 BACKSPACE (BS) HT ...Missing: abbreviations | Show results with:abbreviations
  31. [31]
    [PDF] Standard ECMA-48
    Jun 13, 1991 · This ECMA Standard defines control functions and their coded representations for use in a 7-bit code, an extended 7-bit code, an 8-bit code or ...