Caret notation
Caret notation is a convention in computing for representing the non-printable control characters (ASCII codes 0–31 and 127) of the ASCII character set by prefixing a caret symbol (^) to a corresponding printable character, mimicking the effect of pressing the Control key with that character. For instance, ^A denotes the Start of Heading character (ASCII 1), ^G the Bell character (ASCII 7), ^M the Carriage Return (ASCII 13), and ^Z the Substitute character (ASCII 26), while special mappings include ^@ for Null (0), ^[ for Escape (27), and ^? for Delete (127).[1] This two-character sequence provides a compact, human-readable way to visualize and input these otherwise invisible characters in text-based interfaces.[1] Widely adopted since the development of early text editors and terminal systems, caret notation originated as a practical shorthand tied to keyboard control mechanisms and has become standard in Unix-like operating systems, programming documentation, and tools for file inspection.[1] In applications like the Emacs editor, it supports efficient command invocation, such as ^X ^F to open a file, while in GNU Screen, the terminal multiplexer, it documents key bindings like ^A d to detach a session or ^A c to create a new window.[1][2] It also aids in debugging and data analysis by displaying control sequences in hex editors or logs without executing their effects, though alternatives like hexadecimal (\x01) or octal (\001) escapes are used in source code for precision.[1] The notation's persistence stems from its alignment with hardware-level input (Control key combinations) and its role in maintaining compatibility across diverse computing environments.[2]Fundamentals
Definition and Purpose
Caret notation is a convention for representing non-printable ASCII control characters using the caret symbol (^) followed by an uppercase letter or specific symbol, corresponding to the 33 control codes in the ASCII standard: values 0 through 31 and 127.[3][4] This notation provides a compact, mnemonic way to denote these characters in textual contexts, where the letter following the caret typically represents the uppercase equivalent of the control code's bit pattern (e.g., ^A for code 1).[3] ASCII control characters are a subset of the character set defined in the American Standard Code for Information Interchange (ASCII), consisting of non-printable codes intended for device control, text formatting, or data transmission rather than visual display.[4] Examples include the line feed (code 10), which advances the cursor to the next line, and the horizontal tab (code 9), which moves the cursor to the next tab stop.[4] These characters are "invisible" in output, as they do not produce visible glyphs but instead trigger specific hardware or software behaviors, such as carriage return (code 13) for returning the cursor to the line start.[4] The primary purpose of caret notation is to facilitate the human-readable depiction of these control characters in environments where direct rendering is impossible or impractical, such as plain text files, command-line interfaces, or programming documentation.[3] By converting control codes into printable strings like ^G for the bell character (code 7), it bridges the divide between low-level binary signals and accessible textual descriptions, aiding in troubleshooting and communication.[3] This approach is particularly valuable in software libraries, such as those implementing the unctrl() function in POSIX systems, which automatically generate such representations for display purposes.[3] Among its benefits, caret notation improves clarity and usability in technical contexts by avoiding more cumbersome alternatives like decimal or hexadecimal values, allowing developers and users to quickly recognize and reference control sequences in logs, error messages, and manuals without specialized tools.[5] For instance, in debugging terminal output, ^D (code 4) can succinctly indicate an end-of-file signal, enhancing readability over raw byte values.[3]Syntax and Mapping
Caret notation represents non-printable ASCII control characters (codes 0–31 and 127) using a caret symbol (^) followed immediately by an uppercase letter from A to Z or a specific symbol, providing a textual way to denote these otherwise invisible characters.[6] For the standard alphabetic mappings, ^X denotes the ASCII control code equal to 1 plus the position of X in the alphabet minus 1, where A is position 1, B is 2, and so on up to Z as 26; thus, ^A corresponds to code 1 (Start of Heading, SOH), ^B to code 2 (Start of Text, STX), and ^Z to code 26 (Substitute, SUB).[7] Certain control codes beyond the A–Z range use special symbols after the caret: ^@ for code 0 (Null, NUL), ^[ for code 27 (Escape, ESC), ^\ for code 28 (File Separator, FS), ^] for code 29 (Group Separator, GS), ^^ for code 30 (Record Separator, RS), ^_ for code 31 (Unit Separator, US), and ^? for code 127 (Delete, DEL).[6] These mappings cover all 33 ASCII control characters, with no notation defined for printable characters in the range 32–126, as they are represented directly.[7] The notation is case-insensitive, meaning ^a is equivalent to ^A, though uppercase letters are conventionally used for consistency in documentation and displays.[6] The following table lists all caret notations with their corresponding ASCII decimal values and standard names:| Caret | Decimal | Name |
|---|---|---|
| ^@ | 0 | Null (NUL) |
| ^A | 1 | Start of Heading (SOH) |
| ^B | 2 | Start of Text (STX) |
| ^C | 3 | End of Text (ETX) |
| ^D | 4 | End of Transmission (EOT) |
| ^E | 5 | Enquiry (ENQ) |
| ^F | 6 | Acknowledgment (ACK) |
| ^G | 7 | Bell (BEL) |
| ^H | 8 | Backspace (BS) |
| ^I | 9 | Horizontal Tab (HT) |
| ^J | 10 | Line Feed (LF) |
| ^K | 11 | Vertical Tab (VT) |
| ^L | 12 | Form Feed (FF) |
| ^M | 13 | Carriage Return (CR) |
| ^N | 14 | Shift Out (SO) |
| ^O | 15 | Shift In (SI) |
| ^P | 16 | Data Link Escape (DLE) |
| ^Q | 17 | Device Control 1 (DC1) |
| ^R | 18 | Device Control 2 (DC2) |
| ^S | 19 | Device Control 3 (DC3) |
| ^T | 20 | Device Control 4 (DC4) |
| ^U | 21 | Negative Acknowledgment (NAK) |
| ^V | 22 | Synchronous Idle (SYN) |
| ^W | 23 | End of Transmission Block (ETB) |
| ^X | 24 | Cancel (CAN) |
| ^Y | 25 | End of Medium (EM) |
| ^Z | 26 | Substitute (SUB) |
| ^[ | 27 | Escape (ESC) |
| ^\ | 28 | File Separator (FS) |
| ^] | 29 | Group Separator (GS) |
| ^^ | 30 | Record Separator (RS) |
| ^_ | 31 | Unit Separator (US) |
| ^? | 127 | Delete (DEL) |