ASCII
ASCII (American Standard Code for Information Interchange) is a 7-bit character encoding standard that represents 128 characters, including uppercase and lowercase letters, digits, punctuation marks, and control codes, using numeric values from 0 to 127 to facilitate the interchange of text-based information among computers and communication systems.[1] Developed in the early 1960s to address the incompatibility of various proprietary codes used in early computing and telegraphy, ASCII was first published as ASA X3.4-1963 by the American Standards Association (later ANSI) following efforts initiated by its X3.2 subcommittee in 1960.[2] The standard evolved from telegraphic codes, particularly a seven-bit teleprinter code promoted by Bell data services, and was designed to support alphabetization, device compatibility, and efficient data transmission across diverse equipment.[2] The original ASCII specification includes 94 printable graphic characters—such as the English alphabet (A–Z, a–z), numerals (0–9), and common symbols—and 33 control characters for functions like transmission start/end (e.g., SOH, ETX), formatting (e.g., LF for line feed, CR for carriage return), and device control, with the space character treated as an additional graphic.[1] Major revisions occurred in 1967 to refine character assignments and in 1968 (ANSI X3.4-1968) to align with international standards like ISO 646, followed by updates in 1977 and 1986 that clarified definitions, eliminated ambiguities, and incorporated optional features like the "New Line" function combining LF and CR.[1][2] Adopted widely in the 1970s for personal computers, programming languages, and network protocols—such as being formalized in IETF RFC 20 in 1969—ASCII became the de facto encoding for English text on the early internet and remains foundational despite its limitations in supporting only basic Latin characters.[3] Although extended 8-bit versions (often called extended ASCII) emerged in the 1980s to add 128 more characters for symbols and non-English languages, these were not standardized and varied by system, leading to the rise of Unicode in the 1990s as a superset that maintains full backward compatibility with ASCII while supporting global scripts.[3] Today, ASCII underpins much of digital communication, file formats, and protocols like FTP's ASCII mode, though UTF-8 has largely supplanted it for web content since overtaking it in usage around 2008.[2][3]History and Development
Origins in Telegraphy
The origins of ASCII trace back to 19th-century advancements in telegraphy, where the need for efficient, automated transmission of text over long distances drove the development of standardized character encodings. Samuel Morse's 1830s code, relying on variable-length sequences of dots and dashes, was effective for manual operation but posed challenges for mechanical automation due to its irregular timing and difficulty in synchronizing multiple signals. This limitation hindered multiplexing— the simultaneous transmission of several messages over a single wire—and spurred innovations in fixed-width coding to enable mechanical switching and error detection.[4] A pivotal breakthrough came in 1874 when French engineer Émile Baudot patented a printing telegraph system that encoded characters using uniform five-unit binary sequences of on-off electrical impulses, each of equal duration. This 5-bit Baudot code represented 32 distinct symbols, including letters, numbers, punctuation, and basic controls, marking the first widely adopted fixed-width binary character set for telegraphy. Baudot's design facilitated mechanical distributors with concentric rings and brushes, allowing up to six operators to share one circuit through time-division multiplexing, dramatically improving efficiency over Morse systems. By 1892, over 100 such units were in operation in France, laying the groundwork for automated data transmission.[5][6][7] Baudot's code evolved through international standardization efforts by the International Telecommunication Union (ITU) and its predecessor, the International Telegraph Union. In 1901, a refined version was adopted as International Telegraph Alphabet No. 1 (ITA1), incorporating shift mechanisms for letters and figures while reserving positions for national variations; this 5-bit encoding standardized global telegraphic communication and emphasized compatibility with mechanical printers. Further advancements led to ITA2 in 1929, ratified by the International Consultative Committee for Telegraph and Telephone (CCITT), which optimized the code for efficiency by reassigning symbols based on frequency of use and adding support for uppercase and lowercase letters via shifts. ITA2's structure, with its fixed 5-bit format for 32 characters plus controls, became the dominant teleprinter code worldwide before the mid-20th century.[5] Significant refinements to Baudot's system were made by New Zealand-born inventor Donald Murray, who in 1901 introduced a typewriter-like keyboard that punched five-bit codes onto paper tape for asynchronous transmission, reducing mechanical wear by assigning frequent letters to codes with fewer holes. Murray's variant, known as the Murray code, enhanced code efficiency through frequency-based optimization and automated features like carriage returns, influencing printing telegraph designs. By 1912, after selling patents to Western Union, Murray's innovations powered multiplex systems capable of handling multiple streams, further advancing telegraphy toward computational applications.[7][5] The Murray code, as a precursor to ITA2, profoundly impacted early computing through its adoption in teletypewriters, such as the Teletype Model 15 introduced in the 1930s, which used 5-bit encodings for input and output in electromechanical systems. These devices enabled punched-tape storage and retrieval of coded messages, bridging telegraphy and data processing by providing reliable mechanical interfaces for emerging electronic computers in the 1940s and 1950s. This transition from variable Morse signals to fixed 5-bit codes not only streamlined error detection via parity-like checks but also established principles of binary encoding that informed later standards, including those in the 1960s.[8][5]Standardization Efforts
In the early 1960s, the American Standards Association (ASA), predecessor to the American National Standards Institute (ANSI), formed the X3 committee—now known as INCITS—to develop a unified standard for information interchange amid growing incompatibility between proprietary character codes used by early computers. The X3.2 subcommittee, tasked specifically with character sets, held its first meeting on October 6, 1960, marking the formal start of efforts to create a common encoding scheme suitable for data processing and telecommunications. This initiative was driven by the need to replace fragmented systems, with key contributions from industry leaders and government entities seeking interoperability across diverse hardware.[2][9] The culmination of these efforts was the release of ASA X3.4-1963 on June 17, 1963, which defined the initial American Standard Code for Information Interchange (ASCII) as a 7-bit code supporting 128 characters tailored primarily for US English, including uppercase letters, digits, and basic punctuation. This standard emerged from collaborative input by the US Department of Defense (DoD), which advocated for a code compatible with its FIELDATA system to facilitate military data exchange, and major manufacturers such as IBM and Univac, who pushed to supplant proprietary formats like IBM's Binary Coded Decimal (BCD) and BCDIC for broader industry adoption. The DoD's emphasis on a minimal 42-character subset for essential operations, combined with IBM's proposals for Hollerith-punched card compatibility and Univac's support for EBCDIC alignments, ensured the standard prioritized practical interchange over specialized features.[10][9] During the standardization process, significant debates arose over code allocation, particularly the inclusion of lowercase letters, which were omitted in early proposals to conserve positions for controls and symbols in a 6-bit precursor scheme influenced by telegraphy codes. Proponents, including IBM engineers, argued for their addition to support text processing needs like distinguishing "CO" from "co," leading to their eventual incorporation in the 1963 standard within columns 6 and 7, balancing duocase requirements with the 94 printable graphics. This resolution reflected compromises among stakeholders to accommodate both monocase applications and emerging demands for fuller alphabetic representation.[9] ASCII's adoption extended internationally shortly after, with the European Computer Manufacturers Association (ECMA) ratifying ECMA-6 in 1965 as a near-identical 7-bit standard focused on the basic Latin alphabet and numerals to promote cross-border compatibility. In 1967, the International Organization for Standardization (ISO) formalized this through ISO/R 646, accepting ASCII with minor modifications for global information processing interchange while retaining the core structure for uppercase letters, digits, and essential symbols. These efforts established ASCII as a foundational international benchmark, emphasizing universality in early digital communications.[11][12]Key Revisions and Updates
Following its initial standardization in 1963, the ASCII code underwent a significant revision in 1967 with the publication of USAS X3.4-1967, which introduced minor adjustments to control characters for improved compatibility across systems, including cleaned-up message format controls and relocated positions for ACK (Acknowledge) and ESC (Escape) to align with emerging international needs.[13] This revision also permitted optional national variants, such as stylizing the exclamation mark (!) as a logical OR symbol (|) or replacing the number sign (#) with the British pound (£), to accommodate regional differences while maintaining core compatibility.[14] The ECMA-6 standard's second edition in 1967 further propelled international adoption by specifying a 7-bit coded character set closely aligned with the revised USAS ASCII, serving as a foundational reference for global data interchange and allowing options for national or application-specific adaptations without altering the fundamental structure.[15] This effort culminated in the ISO 646:1983 edition, which introduced the International Reference Version (IRV) under ISO/IEC 646, replacing the dollar sign () with the universal currency symbol (¤) at code point 0x24 and permitting variant substitutions for characters like the tilde (~) at 0x7E to support non-English languages, while preserving the 7-bit framework for interoperability. The 1991 edition updated the IRV to match US-ASCII, including the dollar sign ().[16][17][14] Subsequent updates, including the 1977 and 1986 revisions, clarified and refined the definitions and recommended uses of control characters, such as deprecating certain legacy functions (e.g., LF for newline in favor of CR LF) and specifying roles for pairs like Enquiry (ENQ) and Acknowledge (ACK) as standard inquiry/response mechanisms to facilitate reliable device communication, to eliminate redundancies and focus on modern transmission needs.[18] The 1986 ANSI X3.4-1986 revision marked the final major U.S. update, reaffirming the 7-bit structure with 128 code points (33 controls and 95 graphics, including space) and aligning terminology with ISO 646:1983 for global consistency, without introducing structural alterations but adding conformance guidelines.[19] These revisions had lasting impacts on legacy systems, particularly in resolving ambiguities like the handling of Delete (DEL, 0x7F) versus Backspace (BS, 0x08); early implementations often conflated the keys, with DEL intended for obliterating errors on perforated media and BS for non-destructive cursor movement, but later clarifications in ANSI X3.4-1986 specified DEL's role in media-fill erasure and BS as a leftward shift, reducing interoperability issues in teletype and early computer environments.[18][19]Design Principles
Bit Width and Encoding Scheme
The American Standard Code for Information Interchange (ASCII) utilizes a 7-bit encoding scheme to represent 128 distinct characters, providing an optimal balance between the needs of information processing systems and efficient data transmission.[20] This choice of 7 bits yields $2^7 = 128 possible combinations, sufficient to accommodate 95 printable characters—such as uppercase and lowercase English letters, digits, and common punctuation—along with 33 control characters for managing device operations and formatting.[20] Each character is mapped to a unique 7-bit binary value, ranging from000 0000 (null, NUL) to 111 1111 (delete, DEL), where the bits are typically numbered from b6 (most significant) to b0 (least significant) in 7-bit contexts, with an optional b7 parity bit in 8-bit transmissions.[20]
In transmission over 8-bit channels, ASCII's 7-bit codes are commonly padded with an eighth parity bit to enable basic error detection, using schemes like even parity (ensuring an even number of 1s across the byte) or odd parity (ensuring an odd number).[21] This parity bit, while facilitating reliable communication in noisy environments such as early teleprinter networks, is not defined within the core ASCII specification and remains optional.[22]
The 7-bit structure marked a significant improvement over prior 6-bit codes, such as BCDIC (Binary Coded Decimal Interchange Code), which supported only 64 characters and were insufficient for the full English alphabet including lowercase letters, complicating interoperability in computing and communications.[23] By contrast, ASCII's expanded capacity streamlined representation without such workarounds, promoting standardization across diverse systems.[14]
Despite these benefits, ASCII's restriction to 128 characters, focused primarily on Latin-script English, inherently limits support for non-Latin scripts, diacritics, and international symbols, prompting the development of extensions like ISO/IEC 8859 and later Unicode for broader multilingual compatibility.
Internal Organization of Codes
The ASCII code is structured as a 7-bit encoding, where the bits are numbered from b6 (most significant) to b0 (least significant), though in 8-bit implementations b7 is often the parity bit.[19] Within this 7-bit frame, the high-order three bits (b6, b5, b4) serve as "zone" bits, providing categorical grouping for character classes, while the low-order four bits (b3, b2, b1, b0) function as "digit" bits, specifying individual symbols within those groups.[19] This division facilitates efficient processing in hardware, such as serial transmission or tabular storage, by separating structural and symbolic elements. Control characters occupy the lowest range, from binary 0000000 to 0011111 (decimal 0 to 31), where the zone bits are set to 000 or 001, leaving the digit bits to vary across all combinations for formatting and device control functions.[19] Digits 0 through 9 are assigned zone bits 011 (binary 011xxxx), positioning them in code positions 48 to 57 for numerical consistency in computations.[19] Uppercase letters A through Z use zone bits 100 (binary 100xxxx), spanning codes 65 to 90, while lowercase letters a through z employ zone bits 110 (binary 110xxxx), from 97 to 122, enabling case distinction through the zone variation.[19] This organization draws significant influence from Hollerith encoding used in IBM tabulating machines, where zone punches (in rows 11, 12, 0) and digit punches (rows 1-9) mirrored the bit groupings to ensure backward compatibility with existing punched card systems.[24] For instance, uppercase letters map directly to zone punch 12 combined with digit punches 1-9 (A-I), 11-0 (J-R), and 0-8 (S-Z, with adjustments), preserving data interchange with legacy equipment.[24] The design incorporates considerations for punched card and tape media, thereby enhancing reliability in mechanical reading.[19] The delete character (binary 1111111, code 127) was specifically included to obliterate errors on punched tape by filling all positions.[19]Character Ordering and Collation
The ASCII character set is organized sequentially to facilitate efficient processing and collation, with control characters assigned to codes 0 through 31 and 127, followed by printable characters beginning with the space character at code 32, digits from 48 to 57, uppercase letters from 65 to 90, and lowercase letters from 97 to 122.[9] This structure ensures a logical progression that aligns with common data processing needs, placing non-printable controls at the lowest values to separate them distinctly from visible symbols.[9] The collation order in ASCII was designed to mimic the sequence of characters on typewriter keyboards and to follow an alphabetical progression, enabling straightforward sorting of text without requiring complex transformations.[9] Uppercase and lowercase letters occupy contiguous blocks of 26 codes each, promoting collatability where the bit patterns directly correspond to the desired sequence for alphabetic lists.[9] Digits form a compact group immediately following punctuation, reflecting their frequent use in mixed alphanumeric data for efficient numerical collation.[9] Gaps in the assignment, such as the range from 33 to 47 dedicated to punctuation and symbols, were intentionally included to accommodate potential future insertions of additional characters without necessitating a complete renumbering of the set.[9] Initially, entire columns (such as 6 and 7 in the 7-bit matrix) were left undefined, later allocated for lowercase letters in the 1967 revision, demonstrating forward-thinking flexibility in the standard's design.[9] Control characters were placed at low code values primarily to enable simple bitwise masking in software implementations, allowing developers to ignore or filter them easily by operations like ANDing with a mask that sets the high bits.[9] This positioning in the initial columns of the code matrix (0 and 1) also aids hardware separation from graphic characters, using zone bits for clear distinction during transmission and storage.[9] The bit organization supports this order by embedding binary-coded decimal patterns for digits and contiguous zones for letters, optimizing conversion between related codes.[9] In contrast to EBCDIC, which features interleaved zones and non-contiguous blocks for letters (e.g., A-I scattered across codes), ASCII employs tightly grouped, sequential assignments for alphabetic characters to simplify collation and reduce transformation complexity during data interchange.[9] EBCDIC's structure, evolved from punched-card legacies, prioritizes backward compatibility over linear ordering, resulting in higher overhead for sorting compared to ASCII's streamlined approach.[9]Core Character Set
Control Characters
The ASCII standard defines 33 control characters, which are non-printable codes primarily used to manage data transmission, text formatting, and device operations rather than representing visible symbols. These occupy code points 0 through 31 and 127 in the 7-bit encoding scheme, with the remaining codes 32 through 126 reserved for printable characters.[25] The control characters are categorized by function, as outlined in early standards for data processing and interchange. Transmission control characters, such as SOH (Start of Heading, code 1), STX (Start of Text, 2), ETX (End of Text, 3), and EOT (End of Transmission, 4), facilitate structured message handling in communication protocols by marking headers, text blocks, and endings.[26] Formatting effectors include BS (Backspace, 8), HT (Horizontal Tabulation, 9), LF (Line Feed, 10), VT (Vertical Tabulation, 11), FF (Form Feed, 12), and CR (Carriage Return, 13), which control cursor movement and page layout on output devices like printers and terminals.[26] Device control characters, exemplified by BEL (Bell, 7) for audible alerts and DC1–DC4 (Device Controls 1–4, 17–20) for managing peripherals like modems, enable hardware-specific commands.[26] Additional separators like FS (File Separator, 28), GS (Group Separator, 29), RS (Record Separator, 30), and US (Unit Separator, 31) support hierarchical data organization, while characters such as ENQ (Enquiry, 5), ACK (Acknowledge, 6), NAK (Negative Acknowledge, 21), SYN (Synchronous Idle, 22), ETB (End of Transmission Block, 23), CAN (Cancel, 24), EM (End of Medium, 25), and SUB (Substitute, 26) handle synchronization, error recovery, and medium transitions. SO (Shift Out, 14) and SI (Shift In, 15) allow temporary shifts to alternative character sets, and DLE (Data Link Escape, 16) prefixes qualified data. NUL (Null, 0) serves as a no-operation filler, and DEL (Delete, 127) originally acted as a tape-erasing marker. ESC (Escape, 27) initiates sequences for extended controls.[25][26]| Code (Decimal) | Mnemonic | Primary Function |
|---|---|---|
| 0 | NUL | Null (no operation or filler) |
| 1 | SOH | Start of Heading |
| 2 | STX | Start of Text |
| 3 | ETX | End of Text |
| 4 | EOT | End of Transmission |
| 5 | ENQ | Enquiry |
| 6 | ACK | Acknowledge |
| 7 | BEL | Bell (audible signal) |
| 8 | BS | Backspace |
| 9 | HT | Horizontal Tabulation |
| 10 | LF | Line Feed |
| 11 | VT | Vertical Tabulation |
| 12 | FF | Form Feed |
| 13 | CR | Carriage Return |
| 14 | SO | Shift Out |
| 15 | SI | Shift In |
| 16 | DLE | Data Link Escape |
| 17 | DC1 | Device Control 1 |
| 18 | DC2 | Device Control 2 |
| 19 | DC3 | Device Control 3 |
| 20 | DC4 | Device Control 4 |
| 21 | NAK | Negative Acknowledge |
| 22 | SYN | Synchronous Idle |
| 23 | ETB | End of Transmission Block |
| 24 | CAN | Cancel |
| 25 | EM | End of Medium |
| 26 | SUB | Substitute |
| 27 | ESC | Escape |
| 28 | FS | File Separator |
| 29 | GS | Group Separator |
| 30 | RS | Record Separator |
| 31 | US | Unit Separator |
| 127 | DEL | Delete |
Printable Characters
The printable characters in ASCII consist of 95 glyphs that produce visible output, occupying code points from 32 to 126 in decimal, designed to support human-readable text representation in early computing and data transmission systems.[5] These characters follow the control characters in the code order and form the core visible repertoire for English-language text processing.[5] The printable set is organized into distinct categories for clarity and utility. The space character (code 32) serves as a fundamental separator in text layout. Punctuation marks (codes 33–47), such as exclamation point (!), quotation marks ("), and period (.), provide structural elements for sentences and expressions. Digits (codes 48–57) represent the numerals 0 through 9, essential for numerical data. Uppercase letters (codes 65–90) cover A through Z, while lowercase letters (codes 97–122) include a through z, enabling case-sensitive distinctions. Additional symbols (codes 91–96 and 123–126), including brackets ([ ]), backslash (), caret (^), underscore (_), and tilde (~), support mathematical, programmatic, and formatting needs.[5][28] ASCII's printable characters were intentionally designed for compatibility with existing typewriter and teletypewriter keyboards, particularly the QWERTY layout prevalent in the United States, ensuring seamless integration with mechanical printing devices used in telegraphy and early computing.[5] This compatibility influenced the inclusion of specific symbols like the at sign (@, code 64) for addressing in communications and the grave accent (`, code 96) for potential accentuation or quotation purposes, reflecting typewriter key pairings and operational efficiencies.[28] The 7-bit encoding scheme of ASCII inherently limits the character set to 128 total codes, excluding diacritics and accented letters to prioritize basic Latin alphabet support for American English and compatibility across international telegraph standards, with any accent needs addressed via composite sequences like backspace combinations rather than dedicated codes.[5][28] Although positioned at code 127, the delete (DEL) character is classified as non-printable, functioning instead as a control for padding data streams or erasing errors on perforated tape by overwriting with all bits set to 1, thereby invalidating prior characters without producing visible output.[5][18] The evolution of the printable set began with early proposals in the 1960s that omitted lowercase letters, relying on shift mechanisms from telegraph codes like Baudot and Murray for case variation; however, the October 1963 draft of the American Standard Code for Information Interchange (X3.4-1963) incorporated lowercase a–z to provide full alphabetic support, a decision driven by requirements from the International Telegraph and Telephone Consultative Committee (CCITT) for comprehensive text handling.[5][28] This addition, finalized in the 1967 revision, expanded the printable repertoire to its standard 95 characters while maintaining backward compatibility with uppercase-only systems.[5]Code Representations
Control Code Table
The 33 control codes in the 7-bit ASCII standard consist of the C0 set (codes 0–31) and the delete character (code 127), designed primarily for transmission, formatting, and device management without producing visible output. These codes are grouped by functional category as outlined in the original ANSI X3.4-1968 specification, with mnemonics drawn from the associated ANSI X3.32 graphic representation standard. The table below provides decimal, hexadecimal, and binary values alongside each mnemonic and a brief functional summary.[18][18]| Category | Decimal | Hex | Binary | Mnemonic | Function Summary |
|---|---|---|---|---|---|
| Transmission controls (0–6) | 0 | 00 | 000 0000 | NUL | Filler character with no information content, often used as string terminator. |
| 1 | 01 | 000 0001 | SOH | Start of heading in a transmission block. | |
| 2 | 02 | 000 0010 | STX | Start of text following a heading. | |
| 3 | 03 | 000 0011 | ETX | End of text in a transmission block. | |
| 4 | 04 | 000 0100 | EOT | End of transmission, signaling completion. | |
| 5 | 05 | 000 0101 | ENQ | Enquiry to request a response from a remote device. | |
| 6 | 06 | 000 0110 | ACK | Positive acknowledgment to confirm receipt. | |
| Media controls (7–13) | 7 | 07 | 000 0111 | BEL | Audible or visual alert to attract attention. |
| 8 | 08 | 000 1000 | BS | Backspace to move cursor one position left. | |
| 9 | 09 | 000 1001 | HT | Horizontal tabulation to next stop position. | |
| 10 | 0A | 000 1010 | LF | Line feed to advance to the next line. | |
| 11 | 0B | 000 1011 | VT | Vertical tabulation to next stop position. | |
| 12 | 0C | 000 1100 | FF | Form feed to advance to next page or form. | |
| 13 | 0D | 000 1101 | CR | Carriage return to start of current line. | |
| Shift controls (14–15) | 14 | 0E | 000 1110 | SO | Shift out to invoke an alternate character set. |
| 15 | 0F | 000 1111 | SI | Shift in to return to the standard character set. | |
| Device controls (16–27) | 16 | 10 | 001 0000 | DLE | Data link escape for supplementary controls. |
| 17 | 11 | 001 0001 | DC1 | Device control string 1 (e.g., resume transmission). | |
| 18 | 12 | 001 0010 | DC2 | Device control string 2 for special functions. | |
| 19 | 13 | 001 0011 | DC3 | Device control string 3 (e.g., pause transmission). | |
| 20 | 14 | 001 0100 | DC4 | Device control string 4 for reverse effects. | |
| 21 | 15 | 001 0101 | NAK | Negative acknowledgment to indicate error. | |
| 22 | 16 | 001 0110 | SYN | Synchronous idle for timing in transmission. | |
| 23 | 17 | 001 0111 | ETB | End of transmission block before checksum. | |
| 24 | 18 | 001 1000 | CAN | Cancel previous characters due to error. | |
| 25 | 19 | 001 1001 | EM | End of medium signaling tape end. | |
| 26 | 1A | 001 1010 | SUB | Substitute for garbled or erroneous data. | |
| 27 | 1B | 001 1011 | ESC | Escape to initiate a control sequence. | |
| Information separators (28–31) | 28 | 1C | 001 1100 | FS | File separator for hierarchical data division. |
| 29 | 1D | 001 1101 | GS | Group separator within files. | |
| 30 | 1E | 001 1110 | RS | Record separator within groups. | |
| 31 | 1F | 001 1111 | US | Unit separator within records. | |
| Delete | 127 | 7F | 111 1111 | DEL | Delete or ignore previous character. |
Printable Character Table
The 95 printable (graphic) characters in the ASCII 7-bit coded character set occupy codes 32 through 126, consisting of the space, letters, digits, and various punctuation and symbols that form visible representations on output devices. These characters exclude the control codes (0–31 and 127) and are defined with specific glyphs and names in the international standard. The table below presents them in decimal order, including hexadecimal equivalents (prefixed with 0x), 7-bit binary representations (MSB to LSB), representative glyphs (using standard Unicode equivalents for font-independent display), and categories for organizational purposes: whitespace (for spacing), punctuation (for sentence structure and delimiting), digits (numeric), uppercase letters, lowercase letters, and symbols (for special notations). Note that DEL (127) is a control character and thus excluded.| Decimal | Hex | Binary | Glyph | Category |
|---|---|---|---|---|
| 32 | 0x20 | 0100000 | Whitespace | |
| 33 | 0x21 | 0100001 | ! | Punctuation |
| 34 | 0x22 | 0100010 | " | Punctuation |
| 35 | 0x23 | 0100011 | # | Punctuation |
| 36 | 0x24 | 0100100 | $ | Punctuation |
| 37 | 0x25 | 0100101 | % | Punctuation |
| 38 | 0x26 | 0100110 | & | Punctuation |
| 39 | 0x27 | 0100111 | ' | Punctuation |
| 40 | 0x28 | 0101000 | ( | Punctuation |
| 41 | 0x29 | 0101001 | ) | Punctuation |
| 42 | 0x2A | 0101010 | * | Punctuation |
| 43 | 0x2B | 0101011 | + | Punctuation |
| 44 | 0x2C | 0101100 | , | Punctuation |
| 45 | 0x2D | 0101101 | - | Punctuation |
| 46 | 0x2E | 0101110 | . | Punctuation |
| 47 | 0x2F | 0101111 | / | Punctuation |
| 48 | 0x30 | 0110000 | 0 | Digit |
| 49 | 0x31 | 0110001 | 1 | Digit |
| 50 | 0x32 | 0110010 | 2 | Digit |
| 51 | 0x33 | 0110011 | 3 | Digit |
| 52 | 0x34 | 0110100 | 4 | Digit |
| 53 | 0x35 | 0110101 | 5 | Digit |
| 54 | 0x36 | 0110110 | 6 | Digit |
| 55 | 0x37 | 0110111 | 7 | Digit |
| 56 | 0x38 | 0111000 | 8 | Digit |
| 57 | 0x39 | 0111001 | 9 | Digit |
| 58 | 0x3A | 0111010 | : | Punctuation |
| 59 | 0x3B | 0111011 | ; | Punctuation |
| 60 | 0x3C | 0111100 | < | Punctuation |
| 61 | 0x3D | 0111101 | = | Punctuation |
| 62 | 0x3E | 0111110 | > | Punctuation |
| 63 | 0x3F | 0111111 | ? | Punctuation |
| 64 | 0x40 | 1000000 | @ | Symbol |
| 65 | 0x41 | 1000001 | A | Uppercase letter |
| 66 | 0x42 | 1000010 | B | Uppercase letter |
| 67 | 0x43 | 1000011 | C | Uppercase letter |
| 68 | 0x44 | 1000100 | D | Uppercase letter |
| 69 | 0x45 | 1000101 | E | Uppercase letter |
| 70 | 0x46 | 1000110 | F | Uppercase letter |
| 71 | 0x47 | 1000111 | G | Uppercase letter |
| 72 | 0x48 | 1001000 | H | Uppercase letter |
| 73 | 0x49 | 1001001 | I | Uppercase letter |
| 74 | 0x4A | 1001010 | J | Uppercase letter |
| 75 | 0x4B | 1001011 | K | Uppercase letter |
| 76 | 0x4C | 1001100 | L | Uppercase letter |
| 77 | 0x4D | 1001101 | M | Uppercase letter |
| 78 | 0x4E | 1001110 | N | Uppercase letter |
| 79 | 0x4F | 1001111 | O | Uppercase letter |
| 80 | 0x50 | 1010000 | P | Uppercase letter |
| 81 | 0x51 | 1010001 | Q | Uppercase letter |
| 82 | 0x52 | 1010010 | R | Uppercase letter |
| 83 | 0x53 | 1010011 | S | Uppercase letter |
| 84 | 0x54 | 1010100 | T | Uppercase letter |
| 85 | 0x55 | 1010101 | U | Uppercase letter |
| 86 | 0x56 | 1010110 | V | Uppercase letter |
| 87 | 0x57 | 1010111 | W | Uppercase letter |
| 88 | 0x58 | 1011000 | X | Uppercase letter |
| 89 | 0x59 | 1011001 | Y | Uppercase letter |
| 90 | 0x5A | 1011010 | Z | Uppercase letter |
| 91 | 0x5B | 1011011 | [ | Symbol |
| 92 | 0x5C | 1011100 | \ | Symbol |
| 93 | 0x5D | 1011101 | ] | Symbol |
| 94 | 0x5E | 1011110 | ^ | Symbol |
| 95 | 0x5F | 1011111 | _ | Symbol |
| 96 | 0x60 | 1100000 | ` | Symbol |
| 97 | 0x61 | 1100001 | a | Lowercase letter |
| 98 | 0x62 | 1100010 | b | Lowercase letter |
| 99 | 0x63 | 1100011 | c | Lowercase letter |
| 100 | 0x64 | 1100100 | d | Lowercase letter |
| 101 | 0x65 | 1100101 | e | Lowercase letter |
| 102 | 0x66 | 1100110 | f | Lowercase letter |
| 103 | 0x67 | 1100111 | g | Lowercase letter |
| 104 | 0x68 | 1101000 | h | Lowercase letter |
| 105 | 0x69 | 1101001 | i | Lowercase letter |
| 106 | 0x6A | 1101010 | j | Lowercase letter |
| 107 | 0x6B | 1101011 | k | Lowercase letter |
| 108 | 0x6C | 1101100 | l | Lowercase letter |
| 109 | 0x6D | 1101101 | m | Lowercase letter |
| 110 | 0x6E | 1101110 | n | Lowercase letter |
| 111 | 0x6F | 1101111 | o | Lowercase letter |
| 112 | 0x70 | 1110000 | p | Lowercase letter |
| 113 | 0x71 | 1110001 | q | Lowercase letter |
| 114 | 0x72 | 1110010 | r | Lowercase letter |
| 115 | 0x73 | 1110011 | s | Lowercase letter |
| 116 | 0x74 | 1110100 | t | Lowercase letter |
| 117 | 0x75 | 1110101 | u | Lowercase letter |
| 118 | 0x76 | 1110110 | v | Lowercase letter |
| 119 | 0x77 | 1110111 | w | Lowercase letter |
| 120 | 0x78 | 1111000 | x | Lowercase letter |
| 121 | 0x79 | 1111001 | y | Lowercase letter |
| 122 | 0x7A | 1111010 | z | Lowercase letter |
| 123 | 0x7B | 1111011 | { | Symbol |
| 124 | 0x7C | 1111100 | | | Symbol |
| 125 | 0x7D | 1111101 | } | Symbol |
| 126 | 0x7E | 1111110 | ~ | Symbol |
Usage and Applications
In Computing Systems
In computing systems, ASCII serves as a foundational encoding for text representation in programming languages, operating systems, and file storage, enabling efficient handling of basic characters and control sequences. One of its core implementations occurs in the C programming language, where strings are stored as contiguous arrays of bytes terminated by the NUL character (ASCII code 0x00), preventing the null byte from appearing within the string data itself to maintain compatibility with ASCII's 7-bit structure. This null-terminated convention, defined in the ISO C standard, treats strings as sequences of characters in the execution character set, which historically aligns with ASCII for portability across systems. Legacy support for ASCII persists in various operating systems and file systems to ensure backward compatibility with older software and data. In Microsoft Windows, code page 437 functions as the default OEM code page for English-language installations, preserving the 7-bit ASCII range (codes 0x00–0x7F) while adding 128 extended characters for graphics and symbols in console applications.[31] Similarly, Unix-like systems use the US-ASCII locale—equivalent to the POSIX "C" locale—as the baseline encoding, where text processing utilities and shell commands interpret input as 7-bit ASCII unless a different locale is specified.[32] File systems such as FAT, foundational to MS-DOS and early Windows, store text files in ASCII encoding, enforcing an 8.3 filename convention limited to uppercase ASCII letters, digits, and select symbols to avoid encoding ambiguities.[33] ASCII's uniform character representation has enabled creative applications like ASCII art, which depends on fixed-width (monospace) fonts to align printable characters into visual forms, a technique prevalent in early text-based interfaces and terminals where proportional fonts would distort layouts. For instance, characters such as/, \, |, and - form shapes only when each occupies identical horizontal space, as ensured by ASCII's design for teletype and line printer output.[34]
File end-of-file (EOF) handling in ASCII-based systems varies by context: interactive text input on Unix terminals signals EOF via the EOT character (ASCII code 0x04, produced by Ctrl+D), prompting the driver to flush buffers and indicate no further data, while binary files rely on the operating system's knowledge of file length or explicit byte counts to avoid corrupting data with embedded markers.[35]
In contemporary computing, ASCII is largely deprecated in favor of UTF-8, which extends Unicode while preserving exact byte-for-byte compatibility for the ASCII subset, allowing seamless migration without altering legacy ASCII data. This transition is evident in APIs like JSON, where the specification mandates UTF-8 encoding but guarantees that ASCII-only payloads remain 8-bit clean and interoperable with older ASCII-only parsers.[36]