Fact-checked by Grok 2 weeks ago

Binary-to-text encoding

Binary-to-text encodings are schemes that represent as using a limited set of printable ASCII characters, typically through a higher than 10, to enable safe storage and transmission in environments restricted to US-ASCII, such as legacy text-based protocols. These encodings translate sequences of 6, 5, or 4 bits of into individual characters from a defined , with the choice of balancing efficiency against readability and compatibility. The primary purpose of binary-to-text encoding is to ensure the integrity of binary data across gateways and transports that cannot handle arbitrary 8-bit or binary octets, such as early SMTP implementations limited to 7-bit channels. For instance, in the Multipurpose Internet Mail Extensions (MIME) standard, encodings like Base64 and Quoted-Printable are used to transform binary attachments or non-ASCII content into a 7-bit safe format, preserving the original data for decoding at the recipient end. Base64, defined in radix-64, groups every three bytes (24 bits) into four characters from an alphabet of A-Z, a-z, 0-9, +, and /, with padding using = for incomplete groups, resulting in approximately 33% overhead but high efficiency for pure binary data. In contrast, Quoted-Printable encodes mostly printable ASCII text by representing non-printable octets as = followed by two hexadecimal digits, making it suitable for human-readable content with occasional binary elements and minimizing size increase for text-dominant files. Other standardized schemes include , which uses a 32-character case-insensitive (A-Z and 2-7) to encode 5-bit groups with 60% overhead and is used in applications like one-time passwords, and (hexadecimal), which maps 4-bit groups to 0-9 and A-F for simple, unambiguous representation at 100% overhead. These methods are specified in the Content-Transfer-Encoding header field, which declares the applied transformation (e.g., "base64" or "quoted-printable") to guide proper decoding, ensuring interoperability in , newsgroups, and protocols. Beyond email, binary-to-text encodings appear in URL-safe variants (replacing + and / with - and _ in ) for data URIs and APIs, and in cryptographic contexts for embedding keys or certificates in text formats.

Fundamentals

Definition and Purpose

Binary-to-text encoding is the process of transforming arbitrary , consisting of sequences of 8-bit bytes, into a representation using a limited subset of printable s from the ASCII or sets. This ensures that the remains intact and interpretable when transmitted or stored in environments designed exclusively for textual content. The primary of binary-to-text encoding is to enable the and of across systems and protocols that impose restrictions on non-textual , such as those limited to 7-bit clean ASCII text. For instance, protocols like SMTP for and early HTTP implementations could corrupt or reject binary payloads due to issues like automatic line wrapping, set mismatches, or the presence of non-printable control s. By converting into an ASCII-compatible text format, these encodings prevent such alterations and maintain without requiring modifications to the underlying mechanisms. Common examples of binary data requiring such encoding include images, executable files, and compressed archives, which contain arbitrary byte sequences that fall outside the printable ASCII range. This approach emerged as a response to the assumptions in early protocols, such as SMTP defined in 821, which mandated 7-bit US-ASCII for message bodies to ensure interoperability across heterogeneous networks. Methods like Base64 provide a standardized way to achieve this, mapping binary input to a 64-character alphabet for efficient yet safe textual representation.

Basic Principles

Binary-to-text encoding transforms arbitrary binary data into a sequence of printable text characters, primarily to facilitate transmission or storage in systems that handle only text. There are two main approaches: radix-based encodings, which divide the input into fixed-size chunks of bits matching the radix of the character alphabet (typically a power of 2 for efficient mapping), and escape-based encodings, which represent non-printable or unsafe characters using escape sequences while passing printable text unchanged. Each chunk in radix-based methods is interpreted as an integer and mapped to a corresponding symbol from a predefined set of safe, printable characters, such as alphanumeric letters and symbols. This grouping ensures lossless reversibility, as the original bits can be reconstructed by reversing the mapping and concatenating the chunks. For completeness, if the input length is not a multiple of the chunk size, additional checks or padding ensure the decoder can accurately recover the exact data size. Mathematically, this process equates to a radix conversion from base-2 () to a higher base equal to the size of the , allowing compact representation within text constraints. The encoding efficiency, or the ratio of information density, is calculated as \frac{\log_2 N}{8} \times 100\%, where N is the alphabet size; this measures the percentage of the original byte's capacity (8 bits) utilized per output . For an alphabet of size 64, the efficiency is \frac{6}{8} \times 100\% = 75\%, meaning the encoded output is approximately 33% larger than the input due to the overhead of using 8-bit characters to represent fewer bits of information. Edge cases in input length are addressed through with reserved symbols that do not alter the data but signal the end of valid content; for instance, in schemes using 6-bit chunks, one or two padding symbols may be appended for remainders of 2 or 4 bits, respectively. Line breaking is another common practice, inserting delimiters (like newlines) after a fixed number of characters to improve in protocols or displays, without affecting the core . Some encodings further enhance robustness by integrating error detection, appending checksums or lightweight error-correcting codes; Bech32, for example, employs a that detects up to four random errors with a probability below 1 in $10^9.

History

Early Developments

The need for binary-to-text encoding arose in the late 1970s and 1980s due to the constraints of early computer networks like the and Systems (BBS), which transmitted data over 7-bit ASCII channels that could not reliably handle 8-bit binary files without corruption or stripping of non-text characters. These systems, including early email protocols and for Unix file transfers, prioritized printable text, prompting informal innovations to embed binary data safely within text streams. Without formal standards at the time, developers created ad-hoc methods tailored to specific platforms and transmission mediums. Uuencoding, one of the earliest such methods, was developed around 1980 by Mary Ann Horton at the , specifically for the protocol to enable transfers between Unix systems. It processes binary input by taking groups of three bytes (24 bits) and dividing them into four 6-bit segments, each of which is mapped to a printable ASCII character by adding 32 (the ASCII code for space) to the segment value, yielding 64 possible characters ranging from space to . Uuencoded output includes a header line specifying the file's permissions mode and name in the format "begin ", followed by data lines prefixed with a single character indicating the number of encoded bytes (up to 45 per line for 60 characters total), and concludes with an "end" line. The utility first appeared in (BSD) 4.0 in 1980, quickly becoming a for Unix environments. Building on 's approach, XXencoding emerged in the early as a variant optimized for reliability in environments like , where certain characters could be misinterpreted or removed by gateways. It employs the same 6-bit grouping of three input bytes into four output characters but restricts the to 64 unambiguous symbols: the 26 uppercase letters (A-Z), 26 lowercase letters (a-z), 10 digits (0-9), and the , avoiding problematic characters like equals (=) or slashes (/) that appeared in uuencoding. This design reduced transmission errors in text-only channels, though it maintained a similar overhead and line-based structure without additional headers beyond basic length indicators. For Apple Macintosh users, BinHex addressed similar challenges in the by integrating encoding with , making it ideal for transferring resource-fork-equipped Mac files over and . Originating as a 1984 port to the Mac of an earlier TRS-80 program by Tim Mann, it was refined by contributors including William Davis, evolving into the widely adopted BinHex 4.0 format by the late . BinHex first compresses the file using the StuffIt algorithm to reduce size, then encodes the result in a hexadecimal-based scheme (hence the name, though later versions used 6-bit encoding similar to uuencode) wrapped in a structured header that includes file metadata like type, creator, and fork lengths, ensuring preservation of Mac-specific attributes during 7-bit . This combination of and encoding minimized use and errors in pre-internet .

Standardization and Modern Evolution

The formal standardization of binary-to-text encodings gained momentum in the late 1980s with the development of the Multipurpose Internet Mail Extensions (MIME), which integrated Base64 into email protocols to handle binary data in text-based transmissions. Initially specified in RFC 989 (1987) as part of the Privacy-Enhanced Mail (PEM) framework, Base64 was designed to encode binary content into a 7-bit safe format suitable for SMTP transport, using a 64-character alphabet to represent groups of 6 bits. This approach was later refined and formalized in RFC 2045 (1996), where Base64 became the standard for embedding binary attachments in multipart MIME messages, enabling reliable transfer of non-text files like images and executables over legacy email systems without corruption. Building on these foundations and contrasting early ad-hoc methods like uuencode, the saw the emergence of yEnc as a specialized encoding for binary postings. Developed by Helbing and released into the in 2002, yEnc employs an scheme with dedicated "yEnc:" headers for , achieving encoding efficiency of approximately 99% by minimizing overhead compared to Base64's 33%. Despite its advantages in speed and compactness for large files, yEnc's adoption remained largely confined to due to the platform's evolving challenges, including increased proliferation facilitated by easier binary distribution. Recent advancements have extended these encodings into niche applications requiring compactness, error resilience, and integration with emerging technologies. , defined in 9285 (2023), introduces a 45-character optimized for dense representation in constrained spaces like QR codes, encoding two bytes into three characters for up to 22% better efficiency than in such contexts. In , —specified in Improvement Proposal 173 (2017)—employs a variant with a robust for human-readable addresses, enabling to prevent transcription mistakes in transactions. Similarly, the OpenPGP standard's update in 9580 (2024) enhances ASCII armor, a -derived format for wrapping cryptographic data, by standardizing headers, line lengths, and optional checks to improve interoperability in secure email and file signing. From 2023 to 2025, binary-to-text encodings have seen expanded roles in and ecosystems, addressing gaps in traditional coverage. In prompt engineering, is widely employed to securely inject —such as images or files—into text-based inputs for multimodal large language models, converting raw bytes into safe, embeddable strings that avoid parsing issues and enhance privacy during calls. In , Base64url variants underpin content addressing in systems like IPFS, where Content Identifiers (CIDs) use the encoding to generate verifiable, human-readable hashes for decentralized storage and retrieval of files across distributed networks.

Encoding Methods

Base64 and Its Variants

Base64 is the most widely used binary-to-text encoding scheme, providing a way to represent arbitrary in an ASCII-compatible format suitable for transmission over text-based protocols. Standardized in RFC 4648, it balances efficiency and simplicity, making it a for embedding in , XML, and web applications. The encoding algorithm processes the input binary data by dividing it into groups of three bytes, equivalent to 24 bits. Each 24-bit group is then subdivided into four 6-bit values, and each 6-bit value (ranging from 0 to 63) is mapped to one of 64 characters in a fixed : uppercase A-Z for 0-25, lowercase a-z for 26-51, digits 0-9 for 52-61, plus '+' for 62, and '/' for 63. If the input length is not a multiple of three bytes, the final group is padded: one byte remainder yields two output characters followed by two '=' padding symbols, while two bytes yield three characters followed by one '='. The resulting output length is given by the formula \lceil \frac{4n}{3} \rceil, where n is the number of input bytes, yielding an efficiency of approximately 75% since three input bytes expand to four output characters (24 bits to 32 bits). Several variants of Base64 address specific use cases while retaining the core principles of chunking binary data into fixed-bit groups and mapping to a printable alphabet. The URL-safe Base64 variant, also defined in RFC 4648, modifies the standard alphabet by replacing '+' with '-' and '/' with '_' to avoid requiring percent-encoding in URLs and filenames; padding with '=' is optional when the encoded data length is known in advance. Base32, another RFC 4648 standard, uses 5-bit groups for broader compatibility in case-insensitive environments: input is divided into 40-bit (five-byte) groups, split into eight 5-bit values mapped to a 32-character alphabet (A-Z for 0-25, 2-7 for 26-31), with '=' padding as needed, resulting in an output length of \lceil \frac{8n}{5} \rceil and 62.5% efficiency (five bytes to eight characters). Base85, developed by Adobe for use in PostScript and PDF formats, improves efficiency by employing an 85-character alphabet of printable ASCII symbols (from '!' at code 33 to 'u' at code 117); it divides input into 32-bit (four-byte) groups, interprets each as a base-85 number, and outputs five characters, achieving approximately 80% efficiency (four bytes to five characters), with a special shorthand 'z' representing four zero bytes and Adobe implementations often framing output with <~ and ~>. Implementations of and its variants are ubiquitous across programming languages and libraries. In , the standard base64 module supports standard , URL-safe, , and Base85 (including variant) via functions like b64encode(), b32encode(), and a85encode(); for example, base64.b64encode(b'Man') yields b'TWFu'. Similar support exists in Java's java.util.[Base64](/page/Base64) class (RFC 4648 compliant) and JavaScript's btoa()/atob() for basic . For clarity, here is for standard encoding, based on the 4648 algorithm:
function base64_encode(input: bytes) -> string:
    output = ""
    n = length(input)
    i = 0
    alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
    while i < n:
        if i + 2 < n:
            # Full 24-bit group
            triplet = (input[i] << 16) | (input[i+1] << 8) | input[i+2]
            output += alphabet[(triplet >> 18) & 63]
            output += alphabet[(triplet >> 12) & 63]
            output += alphabet[(triplet >> 6) & 63]
            output += alphabet[triplet & 63]
        elif i + 1 < n:
            # Two bytes
            value = (input[i] << 16) | (input[i+1] << 8)
            output += alphabet[(value >> 18) & 63]
            output += alphabet[(value >> 12) & 63]
            output += alphabet[(value >> 6) & 63]
            output += "="
        else:
            # One byte
            value = input[i] << 16
            output += alphabet[(value >> 18) & 63]
            output += alphabet[(value >> 12) & 63]
            output += "="
            output += "="
        i += 3
    return output
Decoding reverses this process: map characters to 6-bit values, recombine into 24-bit groups, discard , and validate input length. Variants like follow analogous logic with adjusted bit shifts (5 bits) and alphabets, while Base85 requires base-85 arithmetic for 32-bit to five-character conversion.

Other Notable Formats

, an early binary-to-text encoding method originating from Unix systems, structures encoded files with a header line beginning with "begin" followed by a , the file's mode (representing permissions in a single byte equivalent), another , and the in quotes. The body consists of lines where each starts with a indicating the number of input bytes encoded in that line (0 to 45), computed as chr(' ' + num_bytes) so for 0 up to 'M' for 45, followed by up to 60 characters representing groups of three input bytes encoded into four 6-bit ASCII characters from the range 32 () to 95 ('_'), resulting in lines typically 61 characters long excluding the newline. This format achieves approximately 60-70% efficiency due to the 33% expansion from the 3-to-4 byte mapping plus per-line overhead, though it varies with and . Decoding requires processing the entire file to reconstruct the original, as there are no built-in partial verification mechanisms beyond the trailer line "end", and any corruption necessitates full re-decoding. Quoted-Printable encoding, defined in RFC 2045 for , targets text data with occasional binary octets, allowing most US-ASCII printable characters (33-60 and 62-126) to pass unencoded while representing non-printable or special octets as "=XX", where XX is the two-digit value (e.g., "=0A" for line feed). Soft line breaks use a trailing "=" to indicate without adding data, and lines are capped at 76 characters to comply with limits; spaces and tabs at line ends must be encoded to avoid alteration. For predominantly ASCII text, it maintains 95-100% efficiency by minimizing expansions, preserving readability, whereas binary-heavy data incurs higher overhead from frequent hex escapes. yEnc, proposed in 2001 as an efficient alternative for binary transfer, encodes data by adding 42 to each byte value ( 256) and uses "y" as an prefix for problematic control characters (like NUL, LF, or ), further adjusting their values by 64 for safe transmission; it appends a CRC-32 at the end for integrity verification. This approach yields 1-2% overhead, far lower than traditional 6-bit encodings, by leveraging channels while escaping only necessary characters. Designed specifically to optimize posting in environments requiring ASCII compatibility, yEnc includes optional headers with and name for multi-part handling. Among specialized formats, simply maps each to one of 16 hexadecimal digits (0-9, A-F), doubling the input size for 50% efficiency, which prioritizes simplicity and human readability over compactness despite its verbosity. , employed in for addresses, uses a 58-character alphabet excluding visually ambiguous symbols (0, O, I, l) to enhance error resistance in manual transcription, achieving about 73% efficiency—slightly less than —while producing shorter, URL-safe strings from data. , standardized in , converts two 8-bit bytes into three symbols from a 45-character set compatible with alphanumeric mode (0-9, A-Z, space, $%*+-./), yielding roughly 97% efficiency by packing 16 bits into approximately 16.47 bits of output, ideal for compact QR representations of binary payloads like certificates.

Applications

Communication Protocols

Binary-to-text encodings play a vital role in communication protocols by allowing to be safely transmitted over text-oriented channels, preventing in systems originally designed for 7-bit ASCII . In email systems, the Multipurpose Internet Mail Extensions (MIME) standard, defined in RFC 2045, enables the inclusion of binary attachments through encodings like or , which convert non-text data into a text-safe format. These encodings are applied within a multipart/mixed structure, where the message body is divided into parts: one for textual content and another for the encoded binary attachment, ensuring compatibility with (SMTP) servers that enforce 7-bit clean channels. is preferred for arbitrary binary data due to its efficiency in representing 8-bit bytes, while is used for text-like binaries with mostly printable characters to minimize expansion. In web protocols, Hypertext Transfer Protocol (HTTP) leverages within data URIs to embed small binary resources directly in or CSS, avoiding additional requests; for example, an image can be referenced as src="...", as specified in RFC 2397. This scheme supports encoding to handle binary payloads inline, making it suitable for static content like icons or thumbnails in web pages. Separately, the application/x-www-form-urlencoded format for submissions employs —a related binary-to-text mechanism—to escape binary or special characters in query strings, transforming bytes into a sequence of hexadecimal escapes prefixed by '%', as outlined in the URI generic syntax of RFC 3986. This approach ensures binary form data, such as file uploads in non-multipart contexts, remains intact during transmission over HTTP. For , the Network News Transfer Protocol (NNTP) traditionally relied on encodings to post binaries to text-based newsgroups, but yEnc emerged as a more efficient alternative specifically designed for this purpose. yEnc encodes binary files with minimal overhead by using a simple byte rotation and escaping only problematic characters like nulls or controls, allowing larger files to be posted across multiple articles while reducing bandwidth usage compared to earlier methods. In modern protocols, binary-to-text encodings like facilitate integration of binary data into structured text payloads, such as messages over WebSockets for real-time applications. WebSockets, per RFC 6455, support direct binary frames but often embed binaries as strings within for easier parsing in APIs. This is common in WebRTC signaling, where binary elements like ICE candidates or media descriptions are Base64-encoded in exchanged via WebSockets to establish connections for audio and video streams.

Data Storage and Representation

Binary-to-text encodings play a crucial role in file formats where must be stored in text-based systems, ensuring and preservation of across platforms. One historical example is BinHex, developed for Macintosh archives, which encodes files including resource forks into a 7-bit safe ASCII format suitable for transmission and storage on non-Mac systems. This format wraps the encoded data with headers containing metadata like file creator and type, allowing tools such as StuffIt Expander to reconstruct the original Macintosh file structure. In modern cryptographic applications, the () format employs encoding to represent binary and keys in a human-readable, text-only structure. PEM files begin with headers like "-----BEGIN CERTIFICATE-----" followed by -encoded DER (Distinguished Encoding Rules) binary , and end with corresponding footers, facilitating secure storage and exchange in protocols such as TLS. This approach ensures that binary cryptographic objects remain intact within ASCII-compatible environments like configuration files or attachments. In and systems, encodings prioritize human readability, error resistance, and compactness for addresses and identifiers. Base58, a variant excluding ambiguous characters like '0', 'O', 'I', and 'l', is used in legacy addresses via Base58Check, which appends a to detect transcription errors while encoding 25-byte public key hashes into alphanumeric strings of 26-35 characters. Similarly, Bech32 and its variant Bech32m provide improved error detection for addresses, using a 32-character lowercase alphabet to encode witness program data, reducing the risk of invalid transactions due to copying mistakes. For content addressing in the (IPFS), Content Identifiers (CIDs) version 1 default to encoding of multihashes, enabling compact, case-insensitive representations of file integrity across distributed networks. Visual and compact media formats leverage specialized encodings to maximize data density within constrained spaces like barcodes. Base45, designed for alphanumeric modes, encodes binary payloads more efficiently than by mapping 45 symbols to 5-bit groups, achieving approximately 2 bytes per 3 characters and supporting up to about 1,613 bytes in a version 40 with Q error correction level for applications like certificates. Emerging applications in 2024-2025 highlight binary-to-text encodings for secure data handling in AI and configuration contexts. In large language model (LLM) prompts, Base64 encoding user inputs serves as a defense against indirect prompt injection attacks by obfuscating malicious instructions, reducing the likelihood of the model interpreting them as executable commands while preserving functionality upon decoding, as demonstrated in evaluations reducing attack success rates significantly. Additionally, YAML and JSON configuration files commonly embed binary artifacts like images or keys as Base64 strings to maintain text-only compatibility, with YAML's binary tag recommending Base64 for opaque data to avoid parsing issues in tools like Kubernetes secrets.

Comparison and Analysis

Efficiency and Performance

Binary-to-text encodings vary in , measured as the of output size to input size, often expressed in terms of bits per character relative to 8-bit bytes, including overhead from padding or formatting. For , three input bytes (24 bits) are mapped to four 6-bit characters, yielding an of 6/8 or 75%, with padding ('=') adding up to 2 extra characters for non-multiples of three bytes. (Base16) encoding doubles the size, achieving 50% by mapping each 4-bit to one character, with no padding required. yEnc achieves near-optimal of approximately 99%, with only 1-2% overhead, by using a simple byte-wise transformation over an alphabet that avoids most escaping. The following table compares efficiencies for select encodings, showing approximate output sizes for a 1 KB (1024-byte) binary input, excluding line breaks which can add minor overhead (e.g., 1-2% in with 76-character lines per RFC 4648):
EncodingEfficiency (%)Overhead (%)Output Size for 1 KB Input
(Base16)501002048 bytes
7533~1366 bytes
yEnc991-2~1030-1045 bytes
Performance in encoding and decoding depends on and implementation; simpler schemes like are generally faster than due to basic bit-shifting operations without lookup tables, achieving speeds up to several gigabytes per second on modern hardware. encoding is faster than variants like Base85, as its 64-character alphabet requires less complex indexing, though optimized libraries (e.g., using AVX2 instructions) can push throughput to over 10 GB/s for decoding. yEnc prioritizes speed for large files in contexts, with implementations reaching 5 GB/s or more on multi-core systems via SIMD acceleration, incurring low CPU overhead compared to table-driven methods in . Reliability concerns include error propagation during transmission, where a single corrupted character in can affect up to 1.5 bytes of decoded output due to bit recombination across character boundaries. In contrast, yEnc mitigates errors through integrated CRC-32 checksums embedded in headers, enabling detection of corruption across the entire encoded stream with high probability (1 in 4 billion for random errors).

Advantages, Limitations, and Alternatives

Binary-to-text encodings offer significant advantages in environments constrained to text-based transmission and processing. They enable the safe transport of arbitrary over channels that only support printable ASCII characters, such as protocols or legacy text systems, by mapping binary octets to a restricted set of characters without introducing non-printable or control characters. This universality ensures compatibility across diverse systems, preventing that would occur if raw binary were sent through text-only mediums. Additionally, the resulting text output is more accessible for human inspection and debugging compared to raw binary dumps; for instance, Base64-encoded logs can be easily viewed and searched in text editors without specialized tools. Textual formats also facilitate simple modifications, such as concatenating data segments, using standard text processing utilities. Despite these benefits, binary-to-text encodings have notable limitations. The primary drawback is size inflation, where the encoded data expands by approximately 33%—for every three input bytes, four output characters are produced—due to the reduced size and requirements. This overhead makes them inefficient for large-scale or high-volume , such as streaming video, where the increased and demands can degrade . Without additional error-correcting codes (), these encodings are vulnerable to errors; a single bit flip in the encoded text can propagate to multiple bytes upon decoding, leading to widespread corruption rather than isolated failures. In modern contexts, particularly with large models (LLMs), decoding introduces security risks, as attackers can obfuscate malicious prompts using encoded payloads to bypass filters in prompt injection attacks, evading detection in scenarios like multilingual or obfuscated inputs. Recent research highlights that while can serve as a defensive encoding to separate user inputs, it often degrades model on tasks like reasoning and incurs computational overhead, underscoring its limitations in secure AI applications. Alternatives to binary-to-text encodings prioritize efficiency by avoiding text conversion altogether or combining it with preprocessing. Native binary protocols transmit data directly without encoding overhead, offering higher for machine-to-machine communication; for example, FTP's binary mode transfers files byte-for-byte identically, preserving for non-text assets like images, while 's binary framing layer enables and reduces through structured packet streams over a single connection. Similarly, leverages for compact binary serialization atop , providing faster serialization, smaller payloads, and strong typing compared to text-based APIs, making it ideal for and real-time data exchange. For scenarios requiring text compatibility, hybrid approaches compress binary data first—using algorithms like —before applying encoding, mitigating size inflation; this sequence yields better ratios than encoding alone, as effectively reduces redundancy prior to the 33% expansion.

References

  1. [1]
    RFC 4648 - The Base16, Base32, and Base64 Data Encodings
    This document describes the commonly used base 64, base 32, and base 16 encoding schemes. It also discusses the use of line-feeds in encoded data.
  2. [2]
    RFC 2045: Multipurpose Internet Mail Extensions (MIME) Part One
    This set of documents, collectively called the Multipurpose Internet Mail Extensions, or MIME, redefines the format of messages to allow for (1) textual ...
  3. [3]
    RFC 2045: Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies
    ### Summary of Content-Transfer-Encoding for Binary Data in MIME (RFC 2045)
  4. [4]
  5. [5]
    [PDF] A Proposal of Substitute for Base85/64 – Base91
    The coding efficiency of Base64 coding is 6/8 = 75%. The data expansion rate is 8/6 = 4/3 = 133.333%. Another unambiguous and unaffected transmission by net ...
  6. [6]
    BIP 173: Base32 address format for native v0-16 witness outputs
    Mar 20, 2017 · BIP 173 proposes a checksummed base32 format called 'Bech32' for native witness outputs, replacing base58 due to its limitations.
  7. [7]
    Why is base64 needed (aka why can't I just email a binary file)?
    Mar 18, 2012 · The SMTP data is 7-bit ASCII characters. Each character is transmitted as an 8-bit byte with the high-order bit cleared to zero. This persisted ...Missing: limitations | Show results with:limitations<|control11|><|separator|>
  8. [8]
    Email Attachments - Mary Ann Horton
    Little did I know I had just invented binary email attachments. Mary Ann with her uuencode invention. It turns out that uuencode became the standard way people ...
  9. [9]
    uuencode - The Open Group Publications Catalog
    The historical uuencode algorithm does not share this property, which is the reason that a second algorithm was added to the ISO POSIX-2 standard. The string " ...Missing: uuencoding | Show results with:uuencoding<|control11|><|separator|>
  10. [10]
    uuencode(1) — Arch manual pages
    HISTORY. The uuencode command first appeared in BSD 4.0. AUTHORS. Free Software Foundation, Inc. COPYRIGHT.
  11. [11]
    XXENCODE 1 "29 August 1990" - Math
    Aug 29, 1990 · With the -u option, xxencode uses uuencoding (and uudecode can decode the output); with the -x option, or no options at all, it uses xxencoding.Missing: history | Show results with:history
  12. [12]
  13. [13]
    Prehistory of BinHex - Tim Mann's Home Page
    The BinHex format often used for transferring Macintosh files is a distant descendant of a Basic program I wrote for the TRS-80 in 1981!Missing: history | Show results with:history
  14. [14]
    BinHex 4.0 Definition - Stairways Files
    ... BinHex 4.0 has been the standard for ASCII encoding of Macintosh files. To my knowledge, there has never been a full definition of this format. Info-Mac had ...Missing: history | Show results with:history
  15. [15]
    yEnc.org
    ### Summary of yEnc Details
  16. [16]
    RFC 9285: The Base45 Data Encoding
    This document describes the Base45 encoding scheme, which is built upon the Base64, Base32, and Base16 encoding schemes.Table of Contents · Introduction · The Base45 Encoding
  17. [17]
    RFC 9580: OpenPGP
    6.2. Forming ASCII Armor. When OpenPGP encodes data into ASCII Armor, it puts specific headers around the base64-encoded data, so OpenPGP can reconstruct the ...
  18. [18]
    PDF Reference, version 1.6 - Adobe Open Source
    1985–2004 Adobe® Systems Incorporated. All rights reserved. PDF Reference, fifth edition: Adobe Portable Document Format version 1.6.
  19. [19]
    base64 — Base16, Base32, Base64, Base85 Data Encodings ...
    The RFC 4648 encodings are suitable for encoding binary data so that it can be safely sent by email, used as parts of URLs, or included as part of an HTTP POST ...
  20. [20]
    Base64 (Java Platform SE 8 ) - Oracle Help Center
    This class implements an encoder for encoding byte data using the Base64 encoding scheme as specified in RFC 4648 and RFC 2045. Method Summary. All Methods ...Missing: algorithm | Show results with:algorithm<|control11|><|separator|>
  21. [21]
    uuencode(5) - NetBSD Manual Pages
    While uuencode(1) will not encode more than 45 bytes per line, uudecode(1) will tolerate the maximum line size. The remaining characters in the line represent ...Missing: specification | Show results with:specification
  22. [22]
    Why did base64 win against uuencode? - Hacker News
    Nov 21, 2023 · One reason that uuencode lost out to Base64 was that uuencode used spaces in its encoding. It was fairly common for Internet protocols in ...<|control11|><|separator|>
  23. [23]
  24. [24]
    yEnc - Efficient encoding for Usenet and eMail
    standard encoding for large binaries on Usenet. It also used in many binary picture groups. Meanwhile all major newsreaders have been extended to yEnc support.
  25. [25]
    eshaz/simple-yenc: Minimalist JavaScript binary string ... - GitHub
    yEnc, or yEncode, is a very space efficient method of binary encoding. yEnc's overhead is around 1–2%, compared to 33%–40% overhead for 6-bit encoding methods ...Simple-Yenc · What Is Dynencode? · Dynencode FormatMissing: specification | Show results with:specification
  26. [26]
    draft-msporny-base58-01 - IETF Datatracker
    Nov 27, 2019 · when encoding large amounts of data, it is 2% less efficient than base64. Developers might avoid Base58 if a 2% increase in efficiency over ...
  27. [27]
    RFC 9285 - The Base45 Data Encoding - IETF Datatracker
    QR codes have a limited ability to store binary data. · A 45-character subset of US-ASCII is used; the 45 characters usable in a QR code in Alphanumeric mode ( ...Table of Contents · Introduction · The Base45 Encoding
  28. [28]
    RFC 2397 - The "data" URL scheme - IETF Datatracker
    A new URL scheme, "data", is defined. It allows inclusion of small data items as "immediate" data, as if it had been included externally.
  29. [29]
    RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax
    This specification defines the generic URI syntax and a process for resolving URI references that might be in relative form, along with guidelines and security ...
  30. [30]
    RFC 1741: MIME Content Type for BinHex Encoded Files
    3. BinHex BinHex 4.0 is a propular means of encoding Macintosh files for archiving on non-Macintosh file systems and for transmission via Internet mail. ( See ...
  31. [31]
    RFC 7468 - Textual Encodings of PKIX, PKCS, and CMS Structures
    This document describes and discusses the textual encodings of the Public-Key Infrastructure X.509 (PKIX), Public-Key Cryptography Standards (PKCS), and ...
  32. [32]
    Base58Check encoding - Bitcoin Wiki
    Oct 28, 2021 · Base58Check is a modified Base 58 encoding used for Bitcoin addresses, encoding byte arrays into human-readable strings, with version, error ...
  33. [33]
    Content Identifiers (CIDs) - IPFS Docs
    Oct 30, 2025 · A content identifier, or CID, is a label used to point to material in IPFS. It doesn't indicate where the content is stored, but it forms a kind of address ...
  34. [34]
    Binary Data Language-Independent Type for YAML™ Version 1.1
    Jan 18, 2005 · Base64 is the recommended way to store opaque binary data in YAML files. Note that many forms of binary data have internal structure that may ...
  35. [35]
    What is yEnc? - Giganews
    Dec 2, 2021 · yEnc is far more efficient, with an overhead as little as 1-2%. Most news clients natively decode yEnc encoded Usenet posts.
  36. [36]
    Which is faster, hex encoding or base64 encoding? - Stack Overflow
    Nov 15, 2014 · Which would be the faster encoding method between hex encoding and base64 encoding? The data is around 40 MB or more, which is why the performance matters.Base64 vs HEX for sending binary content over the internet in XML ...Binary Data in JSON String. Something better than Base64 [closed]More results from stackoverflow.comMissing: yEnc | Show results with:yEnc
  37. [37]
    [PDF] Faster Base64 Encoding and Decoding using AVX2 Instructions
    We estimate that billions of base64 messages are decoded every day. We are motivated to improve the efficiency of base64 encoding and decoding. Compared to ...
  38. [38]
    How do I calculate Base64 conversion rate?
    Sep 14, 2020 · base64 is defined on rfc4648. As poncho mentions, it has an overhead of 33%, and thus 96 / 4 × 3 = 72 bytes needed. Share. Share a link to this ...
  39. [39]
    None
    ### Advantages of Textual Encodings Over Binary for Security Structures
  40. [40]
    [PDF] arXiv:2504.07467v1 [cs.CL] 10 Apr 2025
    Apr 10, 2025 · Recently, the Base64 defense has been recognized as one of the most effective methods for reduc- ing success rate of prompt injection attacks.
  41. [41]
    LLM01:2025 Prompt Injection - OWASP Gen AI Security Project
    An attacker uses multiple languages or encodes malicious instructions (e.g., using Base64 or emojis) to evade filters and manipulate the LLM's behavior.
  42. [42]
    FTP Binary And ASCII Transfer Types And The Case Of Corrupt Files
    Binary mode suits non-text files (e.g., images), while ASCII mode suits text files. Set default modes for efficiency and discover why some text files, like ...
  43. [43]
    HTTP/2 - High Performance Browser Networking (O'Reilly)
    The new binary framing layer in HTTP/2 resolves the head-of-line blocking problem found in HTTP/1. x and eliminates the need for multiple connections to enable ...Brief History of SPDY and... · Binary Framing Layer · One Connection Per Origin
  44. [44]
    Compare gRPC services with HTTP APIs - Microsoft Learn
    Jul 31, 2024 · gRPC messages are serialized using Protobuf, an efficient binary message format. Protobuf serializes very quickly on the server and client.
  45. [45]
    Base64 Encoding & Performance, Part 1: What's Up with Base64?
    Feb 12, 2017 · Base64 encoding restricts our ability to cache assets individually; our images and fonts are now bound to the same caching rules as our styles, ...