Plain text - Grokipedia

Fact-checked by Grok 2 weeks ago

Plain text is computer-encoded text that consists only of a sequence of code points from a given standard, such as ASCII or Unicode, with no additional formatting, markup, or structural information attached.^[1] This format ensures that the content represents readable characters—letters, numbers, punctuation, spaces, and basic control characters like line breaks—without graphical elements, fonts, or styles, making it the simplest and most portable way to store and exchange textual data across diverse systems. Originating from early computing needs for interoperability, plain text emerged prominently with the development of the American Standard Code for Information Interchange (ASCII) in 1963, which standardized 7-bit encoding for 128 characters to facilitate communication between teleprinters and computers. In modern computing, plain text serves as a foundational format for files with extensions like .txt, configuration files, scripts, and source code, prized for its universality and resistance to proprietary software dependencies.^[2] It underpins internet protocols, notably as the default "text/plain" media type in Multipurpose Internet Mail Extensions (MIME), defined in RFC 2046, which allows unformatted text to be transmitted in emails and web content while supporting character sets beyond ASCII, such as UTF-8 for multilingual support.^[3] Despite the rise of rich text formats like HTML or RTF, plain text remains essential for data preservation, automated processing, and cross-platform compatibility, as it avoids encoding ambiguities and ensures long-term readability without specialized viewers.^[4]

Definition and Characteristics

Core Definition

Plain text refers to computer-encoded text that consists solely of a sequence of code points from a given character encoding standard, without any additional formatting, markup, styling, or embedded binary elements.^[5] It represents human-readable data composed exclusively of printable characters and essential control codes, such as those for line breaks or tabs, ensuring a pure, unadorned sequence suitable for basic text processing.^[6] Key characteristics of plain text include its reliance on character encoding systems for representation, which define how sequences of bytes map to visible glyphs, and the absence of any font, color, or layout specifications.^[6] This results in a format that is inherently portable across different computing environments, as it does not depend on proprietary software or hardware-specific rendering instructions.^[6] When displayed, plain text is typically rendered in a monospace font to maintain consistent spacing and alignment, though the exact appearance may vary based on the viewing application. A representative example is the simple string "Hello, world!" encoded using the ASCII character set, which demonstrates plain text's straightforward structure and cross-system compatibility without requiring specialized interpreters. In computing, plain text functions as the foundational medium for text storage and transmission, enabling efficient parsing, searching, and display operations that underpin more complex data handling.^[6]

Distinction from Formatted Text

Plain text is distinguished from formatted text, such as rich text, primarily by its complete absence of embedded metadata or control codes for styling, layout, or visual attributes like bold, italics, colors, fonts, or positioning. In plain text, the content consists solely of character sequences interpretable as literal text, without any proprietary or interpretive instructions that alter appearance beyond basic rendering. By contrast, rich text formats integrate such directives directly into the file; for instance, the Rich Text Format (RTF), a proprietary standard developed by Microsoft, employs control words like \b to toggle bold formatting and \i for italics, allowing documents to retain stylistic information across compatible applications.^[7] This fundamental simplicity confers several advantages to plain text, including universal readability on virtually any device or software without requiring specialized parsers or rendering engines, which ensures broad interoperability. Plain text files are also notably compact, often significantly smaller than their formatted counterparts due to the lack of overhead from formatting tags or binary data, making them ideal for efficient storage, quick transmission, and version control in collaborative environments. Furthermore, editing plain text demands no proprietary tools—basic text editors suffice—lowering barriers to access and reducing dependency on vendor-specific software.^[8]^[9] Despite these benefits, plain text's limitations become evident in scenarios requiring expressive or structured communication, as it inherently cannot encode emphasis, hierarchy, or multimedia elements without external aids.^[8]^[6] To simulate formatting, users often resort to ad hoc conventions, such as all caps for emphasis (mimicking shouting) or manual spacing for indentation, but these are ambiguous, non-standard, and prone to misinterpretation across audiences or tools. To overcome these constraints for complex documents, formatted alternatives have emerged as evolutions of plain text principles. Markup languages like HTML introduce lightweight, tag-based syntax—such as <b> for bold or <i> for italics—to separate content from presentation while enabling web-compatible features like hyperlinks and layouts. Word processor formats, exemplified by Microsoft's DOCX, extend this further through XML-structured packages that encapsulate rich text, images, tables, and styles in a standardized, compressible format suitable for professional publishing.^[10]

Historical Development

Origins in Early Computing

The concepts underlying plain text originated in 19th-century telegraphy and typewriter systems, where the need for reliable, unadorned character transmission and representation drove the development of fixed-width encodings. Émile Baudot's 1874 printing telegraph was pivotal, introducing the first widely adopted uniform-length binary code—a five-unit system—for encoding letters, numbers, symbols, and control signals, which enabled mechanical synchronization and reduced errors in transmission compared to variable-length codes like Morse.^[11] This fixed-length approach laid foundational principles for plain text as a sequence of discrete, equally timed characters without formatting overlays.^[11] Typewriters further advanced these ideas in the late 19th century, enforcing fixed-width character representations to ensure mechanical alignment and consistent spacing. Devices such as the 1873 Remington typewriter and Donald Murray's 1901 typewriter-based telegraph refinements used consistent character widths and shift mechanisms within five-unit codes, bridging manual typing with automated signaling and influencing the design of early data processing inputs.^[11] These pre-digital innovations emphasized plain, unformatted sequences of characters, prioritizing interoperability and simplicity over visual embellishment. In the 1940s and 1950s, early digital computers built on these foundations by adopting punched cards and tape for alphanumeric data handling with 6-bit or 7-bit codes. The ENIAC, completed in 1945, relied on punched cards for input and output, encoding numerical and basic character data in binary formats that echoed telegraphy-era fixed representations.^[12] Similarly, the UNIVAC I (1951) utilized magnetic tape and punched cards with 6-bit codes (a proprietary BCD-based encoding), allowing efficient storage and transmission of alphanumeric information across systems.^[11] These methods treated text as raw sequences of bits, devoid of proprietary formatting, to facilitate data interchange in computing environments.^[11] A pivotal milestone arrived in 1963 with the American National Standards Institute (ANSI) publication of the American Standard Code for Information Interchange (ASCII, X3.4-1963), the first widespread plain text encoding standard for English, defining 128 characters—including uppercase and lowercase letters, digits, punctuation, and controls—via a 7-bit binary scheme. This standard unified disparate codes from prior decades, promoting interoperability for teletypewriters and computers.^[11] Yet, early implementations grappled with inherent limitations, as the 128-character set confined representations to basic Latin alphabet forms, excluding accents (e.g., é, ñ) and non-Latin scripts, thereby restricting global applicability.

Evolution of Standards

Following the establishment of ASCII in 1963, efforts to extend plain text standards for broader international use emerged in the late 1960s. The International Telegraph and Telephone Consultative Committee (CCITT, predecessor to ITU-T) adopted the International Alphabet No. 5 (IA5) in 1968, which ISO later codified as ISO/IEC 646, first published in 1973, providing a 7-bit framework compatible with ASCII but allowing national variants for basic Latin alphabet extensions, such as accented characters in European languages, to facilitate global data interchange.^[13] In parallel, IBM maintained EBCDIC as an 8-bit encoding scheme originating in the mid-1960s with the System/360 mainframes, which persisted as a proprietary standard in IBM's ecosystem through the 1980s, supporting punched card legacies and business applications despite lacking ASCII compatibility. The push for a universal character set intensified in the early 1990s amid globalization and the internet's rise. Unicode 1.0, released in October 1991 by the Unicode Consortium, marked a pivotal shift by defining a 16-bit encoding for over 7,000 characters across multiple scripts, including Latin, Cyrillic, Greek, and Asian languages, aiming to replace fragmented national codes with a single, extensible system.^[14]^[15] This was closely aligned with the ISO/IEC 10646 standard, first published in 1993 as the Universal Coded Character Set (UCS), through harmonization efforts that synchronized their code assignments to ensure interoperability; subsequent joint updates have maintained this alignment, with Unicode now serving as the practical implementation of UCS. Key advancements in encoding followed to address practical deployment. UTF-8, proposed in 1992 and formalized in 1993, became the dominant transformation format for Unicode by offering variable-length encoding that preserved full backward compatibility with ASCII's 7-bit subset, enabling seamless adoption in existing systems without data migration issues.^[16] Later Unicode revisions enhanced support for complex text rendering, such as the bidirectional algorithm introduced in version 2.0 (1996) for right-to-left scripts like Arabic and Hebrew, and emoji characters starting in version 6.0 (2010), which expanded plain text to include symbolic and pictorial elements while maintaining extensibility.^[17] Standardization bodies have played crucial roles in evolving these standards for interoperability. The American National Standards Institute (ANSI) initially led ASCII's development and influenced early extensions, while ISO has driven international harmonization through standards like ISO 646 and 10646. The Internet Engineering Task Force (IETF) has ensured practical internet compatibility, notably by specifying UTF-8 in RFC 2044 (1996) for MIME and email protocols, promoting its widespread adoption in web and network applications.

Encoding and Representation

Character Encoding Systems

Character encoding systems provide the mechanism for representing plain text characters as sequences of bytes in digital storage and transmission, enabling computers to process and interchange textual data across diverse systems. This process involves mapping each character—an abstract unit of text—to a unique numeric code point, which is then transformed into binary bytes according to a specific encoding scheme. For instance, the uppercase letter 'A' is assigned the decimal value 65 (or hexadecimal 0x41) in many foundational systems, allowing consistent interpretation when decoded back to text.^[18] The foundational 7-bit ASCII (American Standard Code for Information Interchange) encoding, standardized as ANSI X3.4-1986, supports 128 characters (code points 0–127), covering basic English letters, digits, punctuation, and control codes, all representable in a single 7-bit byte to conserve storage in early computing environments. This scheme uses the full range of 7 bits to encode printable characters like 'A' at 65 decimal, ensuring compatibility with binary data streams where the eighth bit might be used for parity checks. ASCII's limited scope, however, proved insufficient for non-English languages, prompting the development of 8-bit extensions.^[19]^[18] The ISO/IEC 8859 series extends ASCII to 8 bits, providing 256 possible code points (0–255) while preserving the first 128 as ASCII for backward compatibility; for example, ISO/IEC 8859-1 (Latin-1) adds characters for Western European languages, such as accented letters like 'é' at decimal 233 (0xE9). Each part of the series targets specific linguistic regions—ISO/IEC 8859-2 for Central and Eastern Europe, ISO/IEC 8859-5 for Cyrillic, and others—allowing single-byte representation for up to 191 printable characters per encoding. These fixed-width 8-bit schemes dominated text handling in the 1980s and 1990s for regional applications but fragmented global interoperability due to their language-specific designs.^[18] To address the limitations of regional encodings, UTF-8 (Unicode Transformation Format-8) emerged as a variable-length encoding for the full Unicode character set, which encompasses 159,801 characters (as of Unicode 17.0) from all writing systems;^[20] it uses 1 to 4 bytes per character, with ASCII-compatible characters (U+0000 to U+007F) encoded in a single byte identical to ASCII, such as 'A' at 0x41. For non-ASCII characters, multi-byte sequences follow a structured format: the leading byte indicates sequence length using high bits set to 1, followed by continuation bytes with high bits set to 10 in binary (e.g., the euro symbol € at U+20AC is encoded as 0xE2 0x82 0xAC). Defined in RFC 3629, UTF-8's design ensures self-synchronization—decoders can identify byte boundaries without prior knowledge—and has become the dominant encoding for web and software interchange due to its efficiency with Latin scripts and scalability for global text.^[21]^[18] Compatibility issues arise when text encoded in one scheme is decoded using another, often resulting in mojibake—garbled or nonsensical output where bytes are misinterpreted as different characters, such as UTF-8 multi-byte sequences appearing as accented Latin-1 glyphs. For example, the UTF-8 encoding of '€' (E2 82 AC) decoded as ISO-8859-1 might render as 'â‚¬', rendering the text unusable. Such mismatches frequently occur in legacy systems or during data migration, underscoring the need for explicit encoding declaration.^[22] To mitigate detection challenges, encoding systems incorporate identifiers like the Byte Order Mark (BOM), a Unicode character (U+FEFF) prefixed at the file's start to signal the encoding and byte order; for UTF-16 (2-byte fixed-width), it appears as FE FF (big-endian) or FF FE (little-endian), while for UTF-32 (4-byte fixed-width), it is 00 00 FE FF or FF FE 00 00. UTF-8 optionally uses EF BB BF as a BOM, though it is discouraged in protocol contexts to avoid confusion with content. For UTF-8 without a BOM, software employs heuristics such as scanning for valid byte patterns (e.g., no isolated continuation bytes) or statistical analysis of byte frequencies to infer the encoding, as outlined in web standards. These methods enable reliable autodetection but are not foolproof, particularly for short or ambiguous texts.^[23]

Encoding	Byte Width	Character Coverage	Example: 'A' (U+0041)	Example: 'é' (U+00E9)
ASCII	7-bit	128 (basic Latin)	`0x41`	N/A (not supported)
ISO/IEC 8859-1	8-bit	256 (Western Europe)	`0x41`	`0xE9`
UTF-8	1–4 bytes	1,114,112 (Unicode full)	`0x41`	`0xC3 0xA9`

Control Characters and Sequences

Control characters in plain text are non-printable codes reserved for managing device operations, text formatting, and data transmission without producing visible output. They are primarily divided into two categories: the C0 set, encompassing code points 0–31 (hexadecimal 00–1F) plus DEL at 127 (7F), and the C1 set, spanning 128–159 (80–9F). The C0 set originates from the ASCII standard (ISO/IEC 646 or ECMA-6), providing foundational controls such as NUL (0x00) for padding or termination and LF (0x0A) for advancing to the next line.^[24] The C1 set, introduced as an extension in standards like ISO/IEC 6429 and ECMA-48, includes advanced functions such as ESC (0x1B) for initiating multi-byte sequences to modify text processing behavior.^[25] These characters serve essential roles in structuring plain text files and streams. For line endings, the carriage return (CR, 0x0D) from the C0 set returns the cursor to the line start, while LF advances it downward; Windows systems conventionally combine them as CRLF (0x0D 0x0A) to denote line breaks in text files.^[26] Unix-like operating systems, adhering to POSIX specifications, employ only LF as the newline terminator, defining a line as non-newline characters followed by this single control.^[27] Additionally, the horizontal tab (0x09, HT) from C0 facilitates spacing and alignment, advancing the cursor to the next tab stop position, typically every eight characters.^[24] Escape sequences build on C1 controls to enable dynamic terminal or device management within plain text environments. The ESC character prefixes parameterizable codes, as in ANSI escape sequences defined by ECMA-48, which control attributes like color and position; for example, the sequence ESC followed by [31m (commonly notated as \033[31m in octal) sets foreground text to red in compatible terminals, though such usage borders on semi-formatted output by altering display without changing the text content itself.^[25] Contemporary plain text handling shows deprecation trends for many control characters, with reduced reliance stemming from graphical user interfaces (GUIs) that favor structured markup (e.g., XML or HTML) and system APIs over inline codes for layout and control.^[28] Legacy functions like certain transmission controls in C0 or positioning aids in C1 are often ignored or obsolete in GUI-dominated workflows, prioritizing compatibility with visual rendering engines.^[28]

Applications and Usage

In Software and Data Processing

In software development, plain text serves as the foundational format for source code files, which are human-readable sequences of characters processed by compilers and interpreters. For instance, Python source files with the .py extension and C source files with the .c extension are stored and edited as plain text, allowing developers to write, version control, and compile code without proprietary formatting. Compilers like GCC translate these plain text inputs into machine code through lexical analysis and parsing stages, while interpreters such as the Python interpreter execute the text directly line by line. Configuration files in software applications are also predominantly plain text to facilitate easy editing and portability across environments. Formats like INI files, used in Windows applications and Python's configparser module, consist of key-value pairs organized into sections within plain text structures.^[29] Similarly, JSON configuration files, such as those for Node.js applications or web APIs, are lightweight plain text documents that encode data hierarchically using brackets and commas, enabling simple parsing by standard libraries. In data processing workflows, plain text plays a key role in extraction, transformation, and loading (ETL) pipelines, where it undergoes parsing to extract structured information from unstructured or semi-structured sources. Tools in ETL systems, such as Apache NiFi or Talend, often ingest plain text logs or documents for tokenization and normalization before loading into databases.^[30] For example, Apache HTTP Server access logs are recorded in plain text using formats like the Common Log Format, capturing details such as IP addresses, timestamps, and request methods in delimited lines for subsequent analysis in pipelines.^[31]^[32] Software tools for creating and manipulating plain text emphasize simplicity and efficiency, particularly in Unix-like environments. Text editors like vi, a modal editor available on most Unix systems, and Notepad, the default Windows editor, are designed specifically for authoring and modifying plain text files without introducing binary elements or formatting overhead.^[33]^[34] For processing, command-line utilities such as grep and sed leverage regular expressions to search, filter, and transform plain text streams; grep identifies lines matching patterns in files or input pipes, while sed performs in-place substitutions or edits on text flows.^[35]^[36] The use of plain text in these contexts offers performance advantages, including fast input/output operations and low memory overhead, especially in resource-limited settings. Plain text files support sequential reading and writing with minimal parsing complexity, enabling rapid I/O in scenarios like log streaming or script execution on embedded systems.^[37] Additionally, their compact structure—lacking metadata or binary encodings—reduces storage and processing demands compared to formatted alternatives, making them ideal for high-volume data handling in memory-constrained environments.^[38] Plain text files typically adhere to character encodings like UTF-8 for consistent representation across systems.

In File Formats and Interchange

Plain text serves as the foundational format for numerous file types used in persistent storage, with common extensions including .txt for general unformatted text documents, .log for log files generated by applications, and .csv for comma-separated values files that store tabular data in a delimited plain text structure.^[39]^[40]^[41] These extensions facilitate interoperability across operating systems and applications by indicating that the content is human-readable text without embedded formatting or binary elements. In data interchange protocols, plain text is identified by the MIME type text/plain, which is the default for textual content in email and web transmissions, ensuring compatibility with legacy systems that expect ASCII or basic character sets.^[42] For email via SMTP, message bodies are transmitted as plain text by default, with the protocol handling the header and body sections in a structured yet unformatted manner to support reliable delivery across networks.^[43] Similarly, HTTP responses often employ text/plain for server-generated outputs like error messages or simple data streams, promoting straightforward parsing without the need for specialized decoders.^[44] Electronic Data Interchange (EDI) relies on subsets of plain text formats, such as ANSI X12 and EDIFACT, where business documents like invoices and purchase orders are encoded using delimiters and fixed-length fields within ASCII-compatible text files to enable automated, standardized exchanges between trading partners.^[45] These formats prioritize machine readability while maintaining human inspectability, using control characters like carriage return and line feed (CRLF) for record separation.^[46] Best practices for plain text interchange emphasize explicit encoding declarations to prevent misinterpretation; for instance, HTTP headers should include Content-Type: text/plain; charset=[UTF-8](/page/UTF-8) to specify Unicode compatibility, reducing risks of character garbling in global communications.^[47] In CSV files, special characters such as commas within fields are escaped by enclosing the field in double quotes, ensuring structural integrity during parsing across diverse systems.^[46] A key example is the CSV format, formalized in RFC 4180, which defines a plain text structure for delimited records using commas as separators and CRLF for line endings, while registering the MIME type text/csv to distinguish it from generic plain text in interchange scenarios.^[46] This standard ensures broad compatibility for data export and import, as seen in applications from spreadsheets to database dumps.^[41]

Limitations and Modern Alternatives

Common Limitations

Plain text lacks native mechanisms for incorporating multimedia elements or advanced structural features, restricting its utility in contexts requiring visual or interactive content. For instance, it provides no inherent support for embedding images, rendering tables, or creating clickable hyperlinks, as these require additional formatting layers or binary components absent in pure plain text representations.^[48] To simulate such features, developers often rely on lightweight conventions like Markdown, which uses simple symbols (e.g., asterisks for emphasis or brackets for links) to denote pseudo-formatting that can be processed by external tools into richer outputs.^[49] This approach maintains backward compatibility with plain text editors but introduces dependencies on specific parsers, limiting universal expressiveness without added infrastructure. Internationalization poses significant challenges for plain text due to the historical dominance of ASCII, a 7-bit encoding limited to 128 characters primarily designed for English-language computing needs. This English bias excludes characters from most non-Latin scripts, such as those in Arabic, Hebrew, or Asian languages, rendering plain text inadequate for global content without extensions like ISO-8859 variants, which still fail to address bidirectional (RTL) text flow or complex character composition. Early encodings like ASCII offer no facilities for right-to-left rendering, where text direction must be inferred or manually adjusted by applications, often leading to misaligned or unreadable output in mixed-language documents.^[50] Similarly, support for combining characters—diacritics or marks that alter base glyphs—is absent, complicating representation of accented letters or tonal languages common outside English.^[51] The unstructured nature of plain text heightens security vulnerabilities, particularly in applications that process user-supplied input without validation. Without built-in escaping or schema enforcement, plain text data can facilitate injection attacks, such as SQL injection, where malicious strings (e.g., appended SQL commands like ' OR '1'='1) are directly concatenated into queries, potentially exposing or altering databases.^[52] This risk arises because plain text treats all input as literal sequences, lacking the delimiters or tags found in structured formats to isolate code from content, making it prone to exploits in web forms, APIs, or configuration files. For handling large-scale data, plain text formats exhibit inefficiencies compared to binary alternatives, as their human-readable structure demands more storage and processing overhead without native compression. Files like CSV, while versatile for tabular data interchange, parse slowly on massive datasets due to sequential line-by-line reading and lack of indexing, often requiring external compression tools like gzip to mitigate size bloat from redundant whitespace or delimiters.^[53] In contrast, binary formats such as Parquet enable columnar storage and built-in compression, achieving up to 10x reductions in size and query times for terabyte-scale analytics, underscoring plain text's scalability constraints in data-intensive environments.^[54]

Transitions to Unicode and Beyond

As plain text evolved to meet global demands, the adoption of Unicode marked a pivotal shift from fragmented legacy encodings like ASCII and ISO-8859 variants, which were limited to specific languages and regions. Unicode provides a unified repertoire of characters, with UTF-8 emerging as the preferred encoding form for its backward compatibility with ASCII and variable-length efficiency, while UTF-16 offers fixed-width processing advantages in certain applications. This transition enabled seamless handling of multilingual content in plain text files, reducing errors from encoding conflicts. By 2024, UTF-8 accounted for 98.8% of character encodings on the web, reflecting widespread industry standardization.^[55] As of Unicode 17.0, released in September 2025, the standard includes 159,801 characters across 172 scripts, building on prior expansions like version 15.0 (2022) to facilitate true global text representation.^[56] Modern enhancements within Unicode have further enriched plain text by incorporating emoji as standard characters, allowing visual elements to be embedded directly without proprietary formats. The zero-width joiner (U+200D) enables the composition of complex emoji sequences, such as family groupings or professional roles with gender modifiers, all within a stream of plain text characters that render appropriately in supporting environments.^[57] These additions preserve plain text's core principle of universality and editability. In structured data contexts, plain text subsets underpin formats like XML and JSON, where Unicode-encoded strings serve as the foundational elements for markup and key-value pairs, ensuring interoperability across systems. JSON, for instance, mandates UTF-8, UTF-16, or UTF-32 for its text-based syntax, blending plain text simplicity with hierarchical organization. While pure plain text remains foundational, alternatives have arisen to balance its virtues with enhanced structure or efficiency. Lightweight markup languages like reStructuredText extend plain text using inline directives and roles for semantic formatting, ideal for technical documentation that stays editable in any text editor.^[58] For scenarios demanding compactness, binary serialization protocols such as Protocol Buffers include an optional text format that serializes structured data more succinctly than verbose plain text equivalents, aiding in configuration and debugging.^[59] Future trends highlight plain text's resilience, particularly in artificial intelligence, where its unadorned structure facilitates massive-scale processing for model training. Large language models, including those in the GPT series, rely on vast plain text corpora—such as The Pile dataset, comprising 800 gigabytes of diverse, high-quality text from 22 sources—for pre-training, emphasizing simplicity over formatted complexity. This preference endures amid broader shifts toward multimedia, as plain text's low overhead supports efficient tokenization and analysis in resource-intensive AI pipelines.

References

[1]
Glossary
Summary of each segment:
[2]
What is plain text? -- introduction by The Linux Information Project (LINFO)
### Definition and Key Characteristics of Plain Text in Computing
[3]
RFC 2046: Multipurpose Internet Mail Extensions (MIME) Part Two
This set of documents, collectively called the Multipurpose Internet Mail Extensions, or MIME, redefines the format of messages.
[4]
RFC 7994 - Requirements for Plain-Text RFCs - IETF Datatracker
The Unicode Consortium defines "plain text" as "Computer-encoded text that consists only of a sequence of code points from a given standard, with no other ...
[5]
draft-iab-rfc-plaintext-03 - IETF Datatracker
The Unicode Consortium defines 'plain text' as "Computer-encoded text that consists only of a sequence of code points from a given standard, with no other ...
[6]
Chapter 2 – Unicode 16.0.0
Plain text is a pure sequence of character codes; plain Unicode-encoded text is therefore a sequence of Unicode character codes. In contrast, styled text, also ...
[7]
RFC 7763 - The text/markdown Media Type - IETF Datatracker
... computer systems, textual data is stored and processed using a continuum of techniques. On the one end is plain text: computer- encoded text that consists ...
[8]
Rich Text Format (RTF) Specification, version 1.6 - LaTeX2RTF
The Rich Text Format (RTF) Specification provides a format for text and graphics interchange that can be used with different output devices, operating ...<|separator|>
[9]
Text File Formats & Document Types Explained - Adobe
Advantages and disadvantages of text files. · Simplicity. Text file formats are easy to use and create. · Easy to recover. Text files can be easier to recover if ...
[10]
7.5 Plain text files
The main advantage of plain text formats is their simplicity: we do not require complex software to create a text file; we do not need esoteric skills beyond ...
[11]
Plain Text Formatting Limitations - Jarte
"Plain text" documents are incapable of recording font and paragraph formatting information such as bold, italic, underline, centered text alignment, custom tab ...
[12]
[MS-DOCX]: Structure Overview (Synopsis) - Microsoft Learn
Jun 16, 2023 · The structures specified in this format provide an extended XML vocabulary for a word processing document.
[13]
[PDF] The Evolution of Character Codes, 1874-1968
Émile Baudot's printing telegraph was the first widely adopted device to encode letters, numbers, and symbols as uniform-length binary sequences.
[14]
A Logical Coding System Applied to the ENIAC
The ENIAC records its results on punched cards from which tables can be automatically printed. Finally, the ENIAC is automatically sequenced, i.e., once set up ...Missing: alphanumeric | Show results with:alphanumeric
[15]
UNIVAC 1100 Series FIELDATA Character Code - Fourmilab
FIELDATA was defined as a 7-bit, 128 character code with the first 64 characters reserved for control codes (which varied among the different versions of the ...Missing: ENIAC tape alphanumeric
[16]
History of Unicode Release and Publication Dates
The Unicode Consortium was not officially incorporated until January 3, 1991. The October, 1990 draft was fairly widely distributed.Unicode Release Dates · Publication Dates for Unicode...
[17]
Rob Pike's UTF-8 history
UTF-8 was designed, in front of my eyes, on a placemat in a New Jersey diner one night in September or so 1992. What happened was this. We had used the original ...Missing: 1993 | Show results with:1993
[18]
Unicode Versions - Emojipedia
Listed below are select versions of the Unicode Consortium's Unicode Standard, a character coding system designed to support the worldwide interchange.Unicode 1.1 · Unicode 6.0 · Unicode 17.0
[19]
None
Summary of each segment:
[20]
[PDF] code for information interchange - NIST Technical Series Publications
This standard is a revision of X3. 4-1968, which was developed in parallel with its international counterpart, ISO 646-1973. This current revision retains the ...
[21]
RFC 3629 - UTF-8, a transformation format of ISO 10646
UTF-8, the object of this memo, has a one-octet encoding unit. It uses all bits of an octet, but has the quality of preserving the full US-ASCII [US-ASCII] ...<|separator|>
[22]
Chapter 12 Character encoding - STAT 545
Here's one that shows what UTF-8 bytes look like when erroneously interpreted under Windows-1252 encoding. This phenomenon is known as mojibake, which is a ...Missing: mismatch explanation
[23]
Encoding Standard
Aug 12, 2025 · The UTF-8 encoding is the most appropriate encoding for interchange of Unicode, the universal coded character set. Therefore, for new protocols ...
[24]
[PDF] Standard ECMA-6
Jun 15, 2017 · This 6th Edition corresponds to the 3rd edition of ISO 646 issued in 1991, a revision of the 1983 issue prepared by WG-7 of. ISO/IEC/JTC1/SC2 in ...
[25]
[PDF] Standard ECMA-48
Jun 13, 1991 · This ECMA Standard defines control functions and their coded representations for use in a 7-bit code, an extended 7-bit code, an 8-bit code or ...
[26]
Encoding and line break characters - Visual Studio (Windows)
Dec 18, 2024 · CR LF, Carriage return plus Line feed, 000D plus 000A ; LF, Line feed, 000A ; NEL, Next line, 0085 ; LS, Line separator, 2028.
[27]
6.1 Portable Character Set
This is used to describe characters within the text of IEEE Std 1003.1-2001. The first eight entries in Portable Character Set are defined in the ISO/IEC 6429: ...
[28]
Control characters in ASCII and Unicode - Aivosto
The current standards for C1 are ISO/IEC 6429:1992 and ECMA-48:1991. These standards now define both the C0 and C1 control characters. Unicode allows the use of ...
[29]
configparser — Configuration file parser — Python 3.14.0 ...
The structure of INI files is described in the following section. Essentially, the file consists of sections, each of which contains keys with values.Missing: plain | Show results with:plain
[30]
CSV ETL Tools: The Definitive Guide for 2025 | Integrate.io
Jun 27, 2025 · CSV ETL tools extract, transform, and load data from flat files into structured storage, handling parsing, validation, cleansing, and schema ...Core Components Of A Csv Etl... · Top Csv Etl Tools In 2025 · Conclusion<|separator|>
[31]
Log Files - Apache HTTP Server Version 2.4
The above configuration will write log entries in a format known as the Common Log Format (CLF). This standard format can be produced by many different web ...Error Log · Access Log · Log Rotation · Piped LogsMissing: plain | Show results with:plain
[32]
Complete Guide to Apache Logs - Access, Analyze, and Manage
Jul 15, 2024 · Apache access logs are structured as a series of plain text entries, with each line representing an individual request to the server. The logs ...<|separator|>
[33]
An introduction to the vi editor - Red Hat
Aug 20, 2019 · At the command line, you type vi <filename> to create a new file, or to edit an existing one. $ vi filename.txt. Vi edit modes. The Vi editor ...
[34]
Windows Notepad - Free download and install on ... - Microsoft Store
Rating 4.2 (21,294) · Free · Windows4 days ago · This fast and simple editor has been a staple of Windows for years. Use it to view, edit, and search through plain text documents instantly.
[35]
Manipulating text with sed and grep - Red Hat
Oct 24, 2019 · Tools like sed (stream editor) and grep (global regular expression print) are powerful ways to save time and make your work faster.
[36]
Invaluable Log Analysis Tools: Sed, Awk, Grep, and RegEx
Oct 9, 2024 · Combining sed, awk, grep, and regex allows for powerful text processing and manipulation. By chaining these tools together, you can perform ...
[37]
5 Efficient input/output - Efficient R programming
The great thing about plain text is their simplicity and their ease of use: any programming language can read a plain text file. The most common plain text ...
[38]
Advantages and disadvantages of plain text and rich text description ...
May 14, 2023 · Lightweight: Plain text has a smaller file size compared to rich text or HTML. This can be advantageous in scenarios where storage or bandwidth ...Missing: distinction | Show results with:distinction
[39]
Common file name extensions in Windows - Microsoft Support
Common file name extensions ; tmp. Temporary data file ; txt. Unformatted text file ; vob. Video object file ; vsd. Microsoft Visio drawing before Visio 2013.
[40]
Research Data Management: File Format Dictionary
Aug 26, 2025 · log: The file extension .log is frequently used for .log files. Such files are usually in plain text file format and are used by many programs.
[41]
CSV, Comma Separated Values (RFC 4180) - Library of Congress
May 9, 2024 · CSV is a simple format for representing a rectangular array (matrix) of numeric and textual values. It an example of a "flat file" format. It is ...Identification and description · Sustainability factors · File type signifiers
[42]
RFC 2046 - Multipurpose Internet Mail Extensions (MIME) Part Two
1. Representation of Line Breaks The canonical form of any MIME "text" subtype MUST always represent a line break as a CRLF sequence. · 2. Charset Parameter A ...
[43]
RFC 5321 - Simple Mail Transfer Protocol - IETF Datatracker
The SMTP content is sent in the SMTP DATA protocol unit and has two parts: the header section and the body. If the content conforms to other contemporary ...
[44]
Common media types - HTTP - MDN Web Docs
Jul 4, 2025 · The default MIME types are text/plain for textual files and application/octet-stream for all other cases, which is used for unknown file types.
[45]
EDI File Formats - Unimaze Software
An EDI file is a data file structured using one of the various Electronic Data Interchange (EDI) standards. It contains information stored in plain text format.
[46]
RFC 4180 - Common Format and MIME Type for Comma-Separated ...
This RFC documents the format of comma separated values (CSV) files and formally registers the "text/csv" MIME type for CSV in accordance with RFC 2048.
[47]
Setting the HTTP charset parameter - W3C
Jul 14, 2006 · Hints on sending out character encoding information using the HTTP charset parameter. Includes pointers on how to set up your server or send ...
[48]
Change the message format to HTML, Rich Text Format, or plain text ...
Plain text. This format works for all email programs, but it doesn't support bold or italic text, colored fonts, or other text formatting. The plain text ...
[49]
Basic Syntax - Markdown Guide
To display a literal character that would otherwise be used to format text in a Markdown document, add a backslash ( \ ) in front of the character.Headings · Paragraphs · Emphasis · Blockquotes
[50]
Handling Right-to-left Scripts in XHTML and HTML Content - W3C
Jun 6, 2007 · This document provides advice for the use of XHTML or HTML markup and CSS to create pages for languages that use right-to-left scripts, such as Arabic and ...Missing: bias | Show results with:bias
[51]
[PDF] To the BMP and beyond! - Unicode
ASCII is very simple, but too much so. Even for English, more than 128 characters are needed: besides the letters and digits, one needs a fair complement of ...
[52]
A03 Injection - OWASP Top 10:2025 RC1
Some of the more common injections are SQL, NoSQL, OS command, Object Relational Mapping (ORM), LDAP, and Expression Language (EL) or Object Graph Navigation ...Missing: plain | Show results with:plain
[53]
How to analyze large datasets with Python: Key principles & tips
While CSV files are widely used, their plain-text structure can lead to inefficiencies with large data volumes. Binary formats like Parquet, HDF5, and ...Python Libraries For... · Using Efficient Data Storage... · Data Size And Memory Usage
[54]
Data File Format Guide: Optimize Storage, Speed, and Cost - ClicData
Rating 4.6 (187) Apr 7, 2025 · When dealing with large datasets, performance and storage efficiency become critical. This is where optimized columnar formats come into play.Structured Data Formats · Csv (comma-Separated Values) · Quick Format Selection TableMissing: inefficiency | Show results with:inefficiency
[55]
Usage statistics of UTF-8 for websites - W3Techs
UTF-8 is used by 98.8% of all the websites whose character encoding we know. Historical trend. This diagram shows the historical trend in the percentage of ...Missing: 2023 | Show results with:2023
[56]
Announcing The Unicode® Standard, Version 15.0
Sep 13, 2022 · This version adds 4,489 characters, bringing the total to 149,186 characters. These additions include two new scripts, for a total of 161 ...
[57]
UTS #51: Unicode Emoji
Summary. This document defines the structure of Unicode emoji characters and sequences, and provides data to support that structure, such as which ...Introduction · Design Guidelines · Which Characters are Emoji · Presentation Style
[58]
An Introduction to reStructuredText - Docutils - SourceForge
reStructuredText is a proposed revision and reinterpretation of the StructuredText and Setext lightweight markup systems. reStructuredText is designed for ...
[59]
Text Format Language Specification | Protocol Buffers Documentation
The protocol buffer Text Format Language specifies a syntax for representation of protobuf data in text form, which is often useful for configurations or tests.Parsing Overview · Lexical Elements · Syntax Elements · Value Types