Fact-checked by Grok 2 weeks ago

Checksum

A checksum is a value computed on a block of for the purpose of detecting errors that may have been introduced during its storage or transmission. It is typically generated using an that produces a fixed-size datum from a variable-size input, often a 16-, 32-, or 64-bit value, which is appended to or stored alongside the original data. Upon receipt or retrieval, the receiving system recomputes the checksum from the data and compares it to the provided value; a match indicates the data is likely intact, while a mismatch signals potential or manipulation. Checksums serve as a fundamental error-detection mechanism in , widely employed in data transmission protocols, file integrity verification, and storage systems to ensure reliability without the overhead of more robust cryptographic methods. In networking, they are integral to protocols such as the (IP), (UDP), and (TCP), where the —a 16-bit one's complement —helps identify transmission errors in packet headers and payloads. Beyond networking, checksums support by acting as digital fingerprints to monitor file fixity, detecting even minor alterations that could compromise . Their efficiency makes them suitable for real-time applications, though they offer no protection against intentional tampering by adversaries who could recompute matching values. Various checksum algorithms exist, ranging from simple summations to more sophisticated techniques, each balancing computational cost and -detection strength. Basic types include the (LRC), which XORs all bytes of the data to produce a parity-like value, and the , which folds 16-bit words using one's complement arithmetic to yield a 16-bit result. Advanced variants, such as , use two running totals modulo 255 to improve burst- detection over simple , while Adler-32 employs a similar modular approach for libraries like zlib. Cyclic redundancy checks (CRCs), often classified alongside checksums, apply polynomial division for higher reliability in detecting multi-bit s, commonly used in storage devices and Ethernet frames. Selection of a checksum type depends on the error patterns expected, with simpler methods sufficing for random single-bit s and stronger ones needed for burst errors in noisy channels.

Fundamentals

Definition

A checksum is a fixed-size datum computed from a larger block of , typically to detect errors that may occur during or . This value acts as a compact representation of the original , enabling quick of without retransmitting or re-storing the entire block. The general computation process involves applying a checksum to the block, where the function processes the contents—often by aggregating or transforming the elements according to a predefined rule—to generate the fixed-size value. This resulting checksum is appended to or associated with the , and upon receipt or retrieval, the same is reapplied to check for discrepancies indicative of alterations. In contrast to cryptographic hash functions, which emphasize security properties like resistance to finding collisions for adversarial purposes, checksums prioritize computational efficiency and simplicity to facilitate rapid error detection in non-adversarial contexts. They are not designed to withstand deliberate manipulation, as their algorithms are often linear or easily invertible, making them unsuitable for or data protection against intentional attacks. A basic illustrative example of checksum computation is to sum the integer values of all bytes in the data block and then take the total modulo a fixed value, such as 256, yielding a single-byte result that summarizes the data. This approach highlights the foundational without requiring complex operations, though practical implementations vary in sophistication.

Purpose

Checksums serve primarily as a mechanism for detecting accidental errors in data, such as bit flips caused by electrical , malfunctions, or issues during and . These errors are typically random and unintentional, distinguishing checksums from cryptographic methods designed to counter deliberate tampering. By appending or embedding a computed value derived from the original , checksums enable the receiver to verify without assuming malicious intent. A key advantage of checksums is their low computational overhead, which allows for efficient error checking even in resource-constrained environments like embedded systems or high-speed networks. This efficiency supports quick verification processes, where only the checksum needs to be recalculated and compared, avoiding the need for full data retransmission and thereby enhancing overall system reliability. In protocols such as , , and , this enables robust data handling with minimal impact on performance. Unlike error-correcting codes, checksums are intended solely for detection, flagging discrepancies to prompt actions like retransmission but not locating or repairing the errors themselves. This contrasts with certain schemes, which can sometimes support single-error correction when combined with additional redundancy, though basic parity is limited to detection similar to checksums. The effectiveness of a checksum against random errors is statistical, with the probability of an undetected error approximately 1 in 2^k for a k-bit checksum, assuming errors are independently distributed. This provides a high likelihood of catching typical transmission faults, making checksums a practical choice for maintaining data reliability in diverse computing applications.

History

Early Developments

The origins of checksum concepts trace back to the challenges of reliable data transmission in 19th-century , where ensuring message accuracy over long distances was critical. The transition to digital in the mid-20th century introduced automated checksum techniques, particularly for media. In 1951, the system's Uniservo tape drive marked the first commercial use of for , incorporating eight s consisting of six data tracks, one track, and one timing track to detect single-bit errors by ensuring an even or odd number of 1s across each character. Similarly, IBM's 726 magnetic tape unit, announced in 1952 for the computer, employed checks on its seven- format (six data tracks plus one parity track) to validate data read from or written to tape during early tasks. Punch card systems, prevalent since the 1890s but digitized in the 1950s, also adopted bits; for instance, paper tape readers in setups added an even- bit in an extra to confirm before processing. Key innovations during this period included contributions from engineer , whose 1953 internal memorandum outlined hashing as a method to distribute data into "buckets" for faster retrieval, effectively serving as an early checksum for detecting alterations in records stored on punched cards and tapes. This work culminated in the , patented in 1954, which used a modulus-10 calculation to generate check digits for validating numerical identifiers, widely applied in early for error-prone input like account numbers. Although Claude Shannon's 1948 formulation of provided a theoretical basis for quantifying noise and error rates in channels, influencing subsequent error-correcting codes, practical checksums like had already emerged as simple, hardware-efficient solutions predating this formal framework. By the 1960s, checksums became standardized in IBM's computing ecosystems for ensuring integrity. The , launched in 1964 as the first family of compatible computers, integrated and summation-based checks in its channels and devices to verify blocks during sequential batch , reducing errors in high-volume applications like and inventory management. These early implementations prioritized detection over correction, setting the stage for more sophisticated algorithms in networking and .

Evolution in Computing

In the 1970s, checksums were integrated into protocols to detect errors in packet transmission, with a checksum appended to every packet of information for verification upon receipt. This approach evolved from early ARPA editions and laid the groundwork for broader network standardization, culminating in the definition of the 16-bit checksum in 791 in 1981, which ensured header integrity during processing and fragmentation across interconnected systems including . During the 1980s and 1990s, the proliferation of personal computing drove increased adoption of checksums in file systems and storage devices. For instance, the FAT file system, originating in the early 1980s for , incorporated checksums in its long filename extensions by the mid-1990s to verify the association between long and short file name entries, preventing mismatches due to or reorganization. Concurrently, cyclic redundancy checks (CRC) became standard in storage devices like hard disk drives, enabling robust error detection as capacities grew and personal computers became ubiquitous, with implementations in interfaces such as and protocols supporting reliable data retrieval. From the 2000s onward, the demand for high-speed networks and prompted a shift toward checksum algorithms to minimize computational overhead while maintaining detection . Algorithms like one's complement sums and checksums were favored for their efficiency in and high-throughput environments, as demonstrated in evaluations showing their balance of speed and undetected rates below 10^{-5} for typical control network payloads. In systems, checksums evolved to support scalable integrity verification, with providers integrating methods like CRC32C for end-to-end validation during data transfers and replication, addressing the challenges of distributed architectures that emerged with services like in the early 2000s. The specification of in 2460 (1998) marked a notable by omitting a header checksum—unlike IPv4—to reduce processing latency in high-speed environments, relying instead on transport-layer checksums for . Subsequent updates, such as 6935 (2013), refined checksum handling for tunneled packets over , allowing zero-checksum modes in low-error links to further optimize performance without compromising . By the 2020s, non-cryptographic checksums continued to play a role in emerging domains, including transaction verification where they validate addresses to detect typographical errors, as in Bitcoin's Base58Check encoding, and in data pipelines to monitor during model and , ensuring uncorrupted inputs via periodic hashing.

Algorithms

Parity Methods

Parity methods represent one of the simplest forms of checksums used for error detection, primarily relying on the addition of a single to data to ensure a consistent count of 1s across the transmitted or stored unit. In even , the is chosen such that the total number of 1s in the data plus the is even; conversely, in odd , the total is made odd. This approach allows the receiver to verify by recounting the 1s and checking against the expected , flagging any discrepancy as a potential . These methods extend beyond single bits to larger units like bytes or words by applying across multiple positions, forming structures such as longitudinal redundancy checks (LRC). In LRC, is computed column-wise across a of bytes, where each represents the even or odd count for that bit position across all bytes. Similarly, in storage systems like levels 3, 4, and 5, is applied longitudinally across disk using bitwise operations to enable of lost data from a single failure. This extension improves detection for burst errors within the while maintaining computational simplicity. The is mathematically derived as the (XOR, denoted \oplus) of all bits for even , ensuring the overall sum 2 equals zero: p = \bigoplus_{i=1}^{n} d_i where d_i are the bits and p is the ; for , the bit is inverted. This XOR operation effectively counts the of 1s 2, making it efficient for . Consider an 8-bit word 10110001, which has four 1s (even count). For even , the p = 0 (XOR of bits yields 0), resulting in as 101100010. If a single bit flips during to 101100000 (now three 1s plus p=0, odd total), the receiver's recount detects the mismatch. This example illustrates single-error detection, as the method alters the parity only for an odd number of bit flips. The primary strengths of parity methods lie in their extreme simplicity and low overhead, requiring just one additional bit per unit and minimal computation via XOR, which suits resource-constrained environments. They reliably detect all single-bit errors and any odd-numbered bit errors, providing a foundational layer of checking without the complexity of advanced algorithms.

Summation-Based Methods

Summation-based checksums compute an error-detecting value by arithmetically summing the data in fixed-size words, typically 16 bits, and applying operations like carry folding and one's complement to produce the final checksum. This approach treats all positions equally, relying on to detect multi-bit errors more effectively than simple parity bits, which only catch odd-numbered bit flips. A 16-bit sum-of-words checksum detects all single-bit errors and all bursts up to 16 bits long, along with approximately 99.998% of longer bursts. The , standardized in 1071, exemplifies this method and is widely used in , , and protocols. It processes the data as a sequence of 16-bit words, summing them using one's complement arithmetic: any carry from the most significant bit is folded back by adding it to the least significant bit position. The checksum is then the one's complement of this sum, targeted to produce 0xFFFF when the entire message (including the checksum field) is summed correctly at the receiver. Formally, for data words w_1, w_2, \dots, w_n, the checksum C is given by C = \overline{\left( \sum_{i=1}^{n} w_i \mod 2^{16} \right) \mod 2^{16}}, where \overline{x} denotes the one's complement (bitwise NOT) and the sum includes folding any carries exceeding 16 bits back into the total. For example, consider two 16-bit words 0x1234 and 0x5678. Their sum is 0x68AC (no carry to fold). The one's complement is 0x9753, which serves as the checksum. At verification, summing the original words with 0x9753 yields 0xFFFF, confirming integrity. A notable variant is , introduced to improve error detection distribution over simple sums by using two running accumulators. Proposed by J. G. in , it processes data bytes sequentially: initialize two 8-bit accumulators A and B to zero; for each byte, update A as A = (A + byte) mod 255, then B = (B + A) mod 255. The 16-bit checksum concatenates the final A and B values (often with one's complement for compatibility). This dual-accumulator design enhances properties, reducing the likelihood of undetected s compared to single-sum methods, while remaining computationally efficient for serial transmissions.

Position-Dependent Methods

Position-dependent checksum methods, also referred to as weighted checksums, enhance detection by assigning unique weights to elements based on their positions before performing the . This weighting scheme ensures that errors involving positional changes, such as swaps or shifts, produce a distinct alteration in the checksum value compared to uniform approaches. By incorporating factors like powers of a base number, consecutive integers, or primes, these methods achieve higher to certain patterns without significantly increasing . The core computation follows the formula: \text{Checksum} = \left( \sum_{i=1}^{n} d_i \cdot w_i \right) \mod m where d_i represents the data element at position i, w_i is the position-dependent weight (for example, w_i = 10^{i-1} or decreasing integers like 10 to 1), and m is the modulus, typically 10 or 11. This structure allows the checksum to validate data integrity by verifying if the computed value matches an appended check digit. Weights are chosen to maximize discrepancy for common errors, such as adjacent transpositions, where swapping elements d_i and d_{i+1} results in a non-zero difference unless w_i = w_{i+1}. A widely adopted example is the , employed for and identification number validation. Developed by researcher and patented in 1960, it processes digits from right to left, doubling every second digit (effectively weighting alternate positions by 2 while others remain 1), summing the results (splitting doubled values exceeding 9 into their digits), and checking if the total modulo 10 equals zero. This method reliably detects all single-digit errors and approximately 98% of adjacent digit transpositions, making it suitable for manual entry scenarios prone to such mistakes. In the ISBN-10 standard, position-dependent weighting is used to compute the final for book identification numbers. The first nine digits are multiplied by weights decreasing from 10 to 2, the products are summed, and the is the value that makes the total divisible by 11 (using 'X' for 10). Established under ISO 2108, this approach leverages consecutive integer weights and a prime to detect nearly all single errors, transpositions, and even some multiple errors, outperforming unweighted sums in bibliographic data handling. The primary advantages of position-dependent methods lie in their improved resilience to systematic errors like transpositions or shifts, as the varying weights amplify positional discrepancies in the sum. For instance, in transposition cases, the error term (d_i - d_{i+1})(w_i - w_{i+1}) is unlikely to be zero modulo m when weights differ, enabling detection rates far superior to non-weighted alternatives for human-entered data. These techniques balance simplicity with effectiveness, finding application in scenarios where error patterns are predictable but computation must remain lightweight.

Fuzzy Methods

Fuzzy methods in checksums enable the generation of similar values for that are nearly identical, facilitating approximate comparisons tolerant to minor alterations such as small edits or . Unlike exact checksums, these approaches leverage principles to produce outputs that remain close when inputs differ slightly, making them suitable for tasks like detection and content deduplication without requiring cryptographic strength. This non-cryptographic focus prioritizes efficiency and probabilistic similarity estimation over . A foundational technique in fuzzy checksums is , introduced by in , which models data as a over the GF(2) and computes a compact representation modulo a randomly selected . The data string M = m_0 m_1 \dots m_{n-1}, where each m_i \in \{0,1\}, is interpreted as the polynomial M(x) = m_0 + m_1 x + m_2 x^2 + \dots + m_{n-1} x^{n-1}. An P(x) of degree k is chosen randomly from GF(2), and the is the of the division: f(M) = M(x) \mod P(x) This yields a value of degree less than k, typically represented as a k-bit integer. The method ensures low collision probability for distinct data while allowing Hamming distance checks: for two fingerprints f(M) and f(N), their bitwise Hamming distance reflects the bit-level differences in the original data with high probability, enabling detection of small variations. In practice, Rabin fingerprinting supports fuzzy matching by applying rolling hashes over overlapping windows (shingles) of the data, generating sets of fingerprints that can be compared for similarity. For instance, in duplicate file detection, systems tolerate 1-2 byte differences by chunking files based on rolling Rabin hashes and verifying similarity through low Hamming distances or Jaccard indices on fingerprint sets, avoiding full recomputation for near-matches. This approach efficiently identifies modified versions without exact equality. Applications extend to search engines, where Rabin-based fingerprints detect near-duplicate web pages by extracting fingerprints from documents and clustering those with small Hamming distances, reducing indexing while preserving diverse . For example, Google's crawling systems use variants to boilerplate or slightly altered pages, improving efficiency and result quality.

Polynomial-Based Methods

Polynomial-based methods employ mathematical operations over polynomials in the Galois field GF(2) to compute checksums, with the (CRC) serving as the most prominent example for its robustness in detecting burst errors in data transmission and storage. In the CRC algorithm, the sender represents the data message as a polynomial M(x) of degree n-1, where n is the message length in bits. To generate the checksum, the message is augmented by appending k zero bits, equivalent to computing M(x) \cdot x^k, with k being the degree of the chosen generator G(x). This augmented polynomial is divided by G(x) using in GF(2), where addition and subtraction are performed via XOR operations. The R(x), a polynomial of degree less than k, becomes the CRC checksum. The transmitted codeword is then M(x) \cdot x^k + R(x), which is divisible by G(x) (yielding zero remainder upon ). At the receiver, the received codeword is divided by G(x); a non-zero indicates an . Widely adopted generator polynomials include the CRC-16 variant with hexadecimal value 0x8005 (binary x^{16} + x^{15} + x^2 + 1), commonly used in storage systems like Modbus, and CRC-32 with 0x04C11DB7 (binary x^{32} + x^{26} + x^{23} + \dots + 1), standard in Ethernet frames per IEEE 802.3 and in ZIP archives for file integrity verification. To illustrate the computation, consider an 8-bit data message 11010000 (polynomial x^7 + x^6 + x^4) and generator G(x) = x^3 + x + 1 (binary 1011). Augment the data by appending three zeros: 11010000000. Perform modulo-2 division step by step:
  • Align 1011 under the first four bits 1101: $1101 \oplus 1011 = 0110. Updated dividend: 0110000000 (with remaining bits).
  • Next four bits (shifted): 0110 (leading 0 implies quotient 0, no XOR), bring down 0: 01100.
  • Continue: 01100 (0, no XOR), bring down 0: 011000.
  • 011000 (0), bring down 0: 0110000.
  • 0110000 (0), bring down 0: 01100000.
  • Now process further alignments, yielding subsequent XORs that result in the final remainder 110 after all 11 bits are processed.
The checksum is 110, so the transmitted codeword is 11010000110 (original + ). Dividing this by 1011 yields zero , confirming . A key strength of is its guaranteed detection of all burst errors—consecutive bit flips—of length up to the k of the , making it superior for channels prone to such errors compared to simpler linear methods.

Applications

Data Transmission

In data transmission, checksums play a critical role in communication protocols to detect errors introduced during packet transit, ensuring reliable delivery over potentially noisy channels. In the TCP/IP suite, the (IP) header includes a mandatory 16-bit checksum computed as the one's complement of the one's complement sum of all 16-bit words in the header, which verifies the integrity of the header fields against transmission errors. For the (UDP), the 16-bit checksum is optional in IPv4 but recommended to protect the header and data payload, covering a pseudo-header derived from the IP header along with the UDP header and data. At the , Ethernet frames employ a 32-bit (CRC-32) as the (FCS) to provide integrity, detecting bit errors in the frame payload and headers as defined in the standard. In wireless protocols such as , the standard similarly uses a 32-bit for frame checks, enabling detection of errors caused by interference, fading, or collisions in the radio environment. Upon receipt, the recomputes the checksum over the relevant packet or frame fields; a mismatch indicates corruption, prompting the protocol to trigger retransmission through mechanisms like negative acknowledgments (NAK) in (ARQ) schemes or the absence of positive acknowledgments () in protocols such as . This process ensures erroneous packets are discarded and resent, maintaining end-to-end reliability without assuming error-free channels. The computational and bandwidth overhead of checksums remains minimal, with the CRC-32 adding only 4 bytes to Ethernet and Wi-Fi frames, making it essential for high-latency or bandwidth-constrained networks where undetected errors could propagate significantly.

Data Storage and Integrity

In data storage systems, checksums play a crucial role in maintaining integrity by verifying that stored data remains uncorrupted over time. File systems such as ZFS employ checksums at the block level to detect and mitigate errors. Specifically, ZFS uses the Fletcher-4 algorithm by default to compute 256-bit checksums for all data and metadata blocks, storing these alongside the data in the storage pool. This enables self-healing in redundant configurations like mirrors or RAID-Z, where if a read operation detects a checksum mismatch indicating corruption, ZFS automatically retrieves and repairs the affected block from a healthy copy. Backup tools leverage checksums to ensure accurate incremental transfers and verify data fidelity during synchronization. For instance, utilizes an Adler-32-based rolling checksum as a weak check in its delta-transfer algorithm, dividing files into blocks and computing checksums to identify unchanged portions, thereby minimizing data movement while confirming integrity before applying updates. This approach allows to efficiently handle large backups by only transferring differing blocks, with a subsequent strong checksum (such as or ) validating the reconstructed file. In array-based storage like , parity checksums provide for disk failures. distributes parity information across all drives in a striped , using exclusive-OR (XOR) operations to compute parity blocks that enable of data from a single failed drive. When reading data, the system verifies parity to detect inconsistencies, such as bit errors, and can correct them by recalculating from the remaining drives and parity. This method enhances reliability in multi-disk setups without dedicating an entire drive to . Modern cloud storage services integrate checksums for upload validation to prevent corruption during ingestion. , for example, supports multiple checksum algorithms (e.g., CRC32, SHA-256) that users can specify when uploading objects; S3 computes and stores the provided checksum in , then verifies it upon to ensure the data matches exactly. This process catches or errors immediately, with S3 rejecting invalid uploads and allowing retries. To proactively detect silent data corruption—errors that go unnoticed until data access—storage systems perform periodic scrubbing. During idle periods, these systems systematically read all blocks, recompute checksums, and compare them against stored values; mismatches trigger repairs using if available. In ZFS, the zpool scrub command automates this traversal, scanning terabytes of data to identify and heal or media faults without user intervention. Similar mechanisms in other systems ensure long-term data durability by addressing degradation that might otherwise propagate undetected.

Limitations

Detection Capabilities

Checksums are designed to detect certain types of errors based on their mathematical structure, though they cannot guarantee detection of all possible error patterns. All non-trivial checksums reliably detect single-bit errors, as a single bit flip alters the computed value, resulting in a checksum mismatch. For burst errors, detection capabilities vary by method. Simple parity checksums detect bursts of odd length, since an even number of bit flips preserves parity, potentially allowing even-length bursts to go undetected. Polynomial-based checksums like CRC can detect all burst errors up to a length equal to the degree of the generator polynomial plus one; for example, a 16-bit CRC detects bursts up to 17 bits long. The probability of an undetected depends on the error model and checksum strength. For random bit errors over uniformly distributed , the undetected error probability is approximately $1/2^k for a k-bit checksum, meaning a 16-bit checksum fails to detect about 1 in random errors. The checksum, using one's complement summation, exhibits weaknesses against specific patterns, such as errors where the net change sums to a multiple of $2^{16}, increasing the likelihood of undetected errors beyond the random case in adversarial scenarios. Certain error patterns evade detection in specific checksum types. For instance, in summation-based checksums, two-bit errors within the same 16-bit word can sometimes out if their positional values result in no net change $2^{16}, though such cases are rare and depend on the bit states. From a perspective, checksums typically achieve a minimum of 2, enabling reliable detection of all single-bit but offering no correction capability. Achieving single-error correction requires a minimum distance of 3, which exceeds the of standard checksums focused solely on detection.

Comparisons to Other Techniques

Checksums differ fundamentally from error-correcting codes () in their capabilities and design. While checksums, such as or , are primarily used for error detection by identifying inconsistencies in data without providing mechanisms for repair, ECC methods like Reed-Solomon codes incorporate redundancy to both detect and correct errors automatically. For instance, Reed-Solomon codes add parity symbols that enable the recovery of up to t erroneous symbols in a block of n symbols, making them suitable for applications like data storage in systems where correction is essential without retransmission. In contrast, checksums require additional protocols, such as (ARQ), to handle detected errors through feedback and retransmission, as they lack the inherent correction capability of ECC. Compared to cryptographic hash functions like SHA-256, checksums prioritize computational efficiency over and security against intentional tampering. Checksums are lightweight algorithms designed to catch accidental errors, such as those from transmission noise, but they are vulnerable to deliberate modifications because collisions can be found relatively easily without significant effort. Cryptographic hashes, however, produce fixed-size digests that are computationally infeasible to reverse or collide under adversarial conditions, making them ideal for verifying data authenticity and integrity in untrusted scenarios, though at the cost of higher processing overhead. Checksums also contrast with (FEC) techniques, which embed corrective information directly into the transmitted data to enable error repair at the without needing or retransmission. In FEC, codes like Hamming or BCH add systematic upfront, allowing of errors in real-time, which is advantageous for -constrained or high- channels. Checksums, by relying on detection followed by ARQ, introduce latency due to the feedback loop but consume less per since no extra is added for correction. Checksums are preferred in trusted environments where efficiency is key and threats are limited to random errors, such as internal network transmissions or storage verification without adversarial risks. In contrast, cryptographic hashes should be used for security-critical integrity checks, like or digital signatures, to guard against malicious alterations. Hybrid approaches often combine checksums and cryptographic hashes to balance performance and security; for example, in file signing protocols like OpenPGP, a fast checksum may perform initial validation, while a secure of the file is signed with a private key to ensure authenticity and . This layering allows quick error detection in routine operations while providing robust protection against tampering in distributed systems.

References

  1. [1]
    checksum - Glossary - NIST Computer Security Resource Center
    A checksum is a value computed on data to detect error or manipulation, stored with the data to detect changes.
  2. [2]
    RFC 1071 - Computing the Internet checksum - IETF Datatracker
    This memo discusses methods for efficiently computing the Internet checksum that is used by the standard Internet protocols IP, UDP, and TCP.
  3. [3]
    Fixity and checksums - Digital Preservation Handbook
    Fixity ensures a digital file remains unchanged. Checksums are digital fingerprints that change with any file change, used to monitor fixity.
  4. [4]
    [PDF] Selection of Cyclic Redundancy Code and Checksum Algorithms to ...
    A longitudinal redundancy check (LRC), also known as an XOR checksum, involves XORing all the chunks of a dataword together to create a check sequence. It is ...
  5. [5]
    Error Checking - Fletcher's Checksum :: Intro CS Textbook
    Let's look at an even better way to do a checksum to check and see if our data is correct. This is called Fletcher's checksum. And here we're giving a quick ...
  6. [6]
    cyclic redundancy check (CRC) - Glossary | CSRC
    A type of checksum algorithm that is not a cryptographic hash but is used to implement data integrity service where accidental changes to data are expected.
  7. [7]
  8. [8]
    [PDF] Detecting Bit Errors - MIT
    Oct 7, 2010 · After the modulo operation the A and B values can be represented as 16-bit quantities. The Adler-32 checksum is the 32-bit quantity (B 16) + A ...<|control11|><|separator|>
  9. [9]
    CSC161 2011S : Laboratory: Input/Output in C - Samuel A. Rebelsky
    An easy way to compute a checksum is to sum all of the characters read (after all, in C every character is an integer) and then take that sum modulo some number ...
  10. [10]
    [PDF] TaintScope: A Checksum-Aware Directed Fuzzing Tool for ...
    Aug 2, 2010 · In this paper, we focus on checksums that are designed to protect against mainly accidental errors that may have been introduced during ...
  11. [11]
    RFC 791 - Internet Protocol - IETF Datatracker
    The internet protocol is designed for use in interconnected systems of packet-switched computer communication networks. Such a system has been called a catenet.
  12. [12]
    What Are Checksums, CRCs, & Hashes in Embedded Software?
    Oct 29, 2024 · A checksum is an algorithm designed to detect random errors. A cryptographic hash maps a dataset to another dataset to help prevent intentional data changes.
  13. [13]
    7.10. Checksums - Wireshark
    Checksums are calculated summaries of data used to ensure data integrity and detect transmission errors. If the calculated checksums don't match, an error has ...
  14. [14]
    [PDF] The Effectiveness of Checksums for Embedded Control Networks
    Mar 24, 2009 · By using one's complement addition, the Fletcher checksum is performing integer addition modulo 255 for 8-bit blocks and modulo 65,535 for 16- ...
  15. [15]
    CRC32 Checksum - Importance and Limitations - Embedded Office
    Low Computational Overhead: CRC32 checksum is relatively efficient in terms of computational complexity. It can be performed quickly and with minimal ...
  16. [16]
    Error Detection - UMSL
    A single bit error can be detected but it cannot be corrected. Parity Checks do not work if there are are an even number of errors. CheckSum Error Detection: A ...
  17. [17]
    Error Detection in Computer Networks - GeeksforGeeks
    Sep 25, 2025 · Two-Dimensional Parity Check can not correct two or three bit error. It can only detect two or three bit error. If we have a error in the ...Checksum · Cyclic Redundancy Check(CRC) · One's Complement
  18. [18]
    [PDF] Direct Link Networks – Error Detection and Correction
    output that's essentially random. ○. So any error would be detected with probability 1 – 2-k. ▫ Checksum: not close to ideal. ▫ CRC: better. Spring 2019. © CS ...
  19. [19]
    CRC Series, Part 2: CRC Mathematics and Theory - Barr Group
    Dec 1, 1999 · To repeat, the probability of detecting any random error increases as the width of the checksum increases. Specifically, a 16-bit checksum will ...
  20. [20]
    Invention of the Telegraph | Articles and Essays | Samuel F. B. ...
    The idea of using electricity to communicate over distance is said to have occurred to Morse during a conversation aboard ship when he was returning from ...
  21. [21]
    1951: Tape unit developed for data storage
    In 1951, Univac introduced the Uniservo 1, a tape drive using 0.5 inch tape, and in 1952, IBM announced its first magnetic tape storage unit.
  22. [22]
    ACL::Punched Tape Codes - Chilton Computing
    This is the basic 6-bit code laid out on 8-track tape (1 inch wide) with the addition of an even-parity bit in track 5 and a hole in track 8 to mean lower case.
  23. [23]
    Hans Peter Luhn and the Birth of the Hashing Algorithm
    Jan 30, 2018 · In early 1953, Luhn had written an internal IBM memo in which he suggested putting information into “buckets” in order to speed up a search.
  24. [24]
    Mr. Hans Peter Luhn | IT History Society
    The Luhn algorithm or Luhn formula, also known as the "modulus 10" or "mod 10" algorithm, is a simple checksum formula used to validate a variety of ...
  25. [25]
    [PDF] A History of the ARPANET: The First Decade - DTIC
    Apr 1, 1981 · checksum is appended to every packet of information to ... Outside the U.S. military, the commercial world has begun to use the ARPANET technology ...
  26. [26]
    FAT Filesystem - Elm-chan.org
    The FAT file system originated around 1980 and is the filesystem that was first supported by MS-DOS. ... Checksum of the SFN entry associated with this entry.
  27. [27]
    Cyclic redundancy check - Wikipedia
    A cyclic redundancy check (CRC) is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to digital data.Computation of cyclic... · Mathematics of cyclic... · Polynomial long division
  28. [28]
    Checksums in Storage Systems and Why You Should Care - Quobyte
    May 10, 2018 · The checksum is validated along the path throughout the life of the data – even at rest when the data isn't accessed (via periodic disk ...
  29. [29]
    RFC 2460 - Internet Protocol, Version 6 (IPv6) Specification
    This document specifies version 6 of the Internet Protocol (IPv6), also sometimes referred to as IP Next Generation or IPng.Missing: evolutions | Show results with:evolutions
  30. [30]
    RFC 6935: IPv6 and UDP Checksums for Tunneled Packets
    This document updates the IPv6 specification (RFC 2460) to improve performance when a tunnel protocol uses UDP with IPv6 to tunnel packets.Missing: evolution post
  31. [31]
    Checksum | A Simple Error Detection Tool - Learn Me A Bitcoin
    Aug 5, 2025 · A quick explanation of what a checksum is in Bitcoin, how they're used to detect errors in things like addresses, and how they're created.Creating · Verifying · TerminologyMissing: non- | Show results with:non-
  32. [32]
    Why is data integrity key to ML Monitoring? - Deepchecks
    The act of validating data, or to check data integrity, encompasses several methods. These could range from simple checksums or hash functions for detecting ...
  33. [33]
    [PDF] Detecting and Correcting Bit Errors - cs.Princeton
    How many bit errors can we detect? • Suppose the minimum Hamming distance between any pair of codewords is dmin. • Then, we can detect at most dmin.
  34. [34]
    [PDF] Communication and Networking Error Detection Basics - spinlab
    Checksum Example: 00 01 F2 03 F4 F5 F6 F7 00 00. Result of all ones (FFFF) ... and modulo 2 addition of any number with itself is zero, we get. T. P. = Q + ...
  35. [35]
    [PDF] Error Detection
    Apr 29, 2013 · • Fraction of errors undetected at a given random probability of bit flips. – ... • For a given BER, can determine Probability of UnDetected error ...
  36. [36]
    parity - UMD Computer Science
    The parity bit contains the XOR of all the other bits in the same position on each disk. i.e. bit 0 on each disk, bit 1 on each disk, etc. In the case of a disk ...
  37. [37]
    Basic RAID Organizations
    The parity bit is the XOR of these four data bits, which can be calculated by adding them up and writing a 0 if the sum is even and a 1 if it is odd. Here the ...
  38. [38]
  39. [39]
    [PDF] Lecture 9: Physical layer - Error detection and reliable transmission
    Simplest error detection mechanism is the parity check bit. Rule: the final bit is the parity bit ensures that there are always an even number of 1s.
  40. [40]
    Checking Up
    Checksums, like parity bits, are used to detect errors in data transmission. Checksums are usually a summation of data, and can be more complex than parity ...
  41. [41]
    CRC Series, Part 1: Additive Checksums - Barr Group
    Nov 1, 1999 · A 16-bit sum-of-words checksum will detect all single bit errors and all error bursts of length 16 bits or fewer. It will also detect 99.998% of ...
  42. [42]
    RFC 1071 - Computing the Internet checksum - IETF Datatracker
    This memo discusses methods for efficiently computing the Internet checksum that is used by the standard Internet protocols IP, UDP, and TCP.
  43. [43]
    An Arithmetic Checksum for Serial Transmissions
    **Summary of Content from https://ieeexplore.ieee.org/document/1095369:**
  44. [44]
    Secrets of the ISBN - An Error Detection Method
    If it happens that the value of the check digit must be 10, the letter 'X' is used instead to complete the ISBN.
  45. [45]
    Computer for verifying numbers - US2950048A - Google Patents
    This invention relates to a hand computer for computing a check digit for numbers or for verifying numbers which already have a check digit appended.Missing: algorithm | Show results with:algorithm
  46. [46]
    Secrets of the LUHN-10 Algorithm - An Error Detection Method
    One of the most widely used algorithms for character-based error control is known as the LUHN-10 Algorithm. This simple but effective error detection method is ...
  47. [47]
    [PDF] Fingerprinting By Random Polynomials by Michael O. Rabin - XMail
    Let GF(2) = E be the Galois field with. 2k elements. irreducible polynomial p ... It follows that the number of these polynomials is (2-2)/k. •. Consider ...
  48. [48]
    [PDF] Detecting Near-Duplicates for Web Crawling - Google Research
    Each task solves the Hamming Distance Problem over some 64-MB chunk of F and the entire file Q as inputs. A list of near-duplicate fingerprints discovered by a ...
  49. [49]
    [PDF] Cyclic Codes for Error Detection - Semantic Scholar
    Cyclic Codes for Error Detection. W. W. Peterson and D. T. Brown by. Fuad Mohamed. Page 2. Outline. ▫ Concept of Cyclic Redundancy Checks. ▫ Error detections.Missing: W. Wesley
  50. [50]
    [PDF] Cyclic Redundancy Code (CRC) Polynomial Selection For ...
    Cyclic Redundancy Codes (CRCs) are used for error detection in networks. This paper proposes a selection process for good general-purpose CRC polynomials.
  51. [51]
    A Painless Guide to CRC Error Detection Algorithms - Zlib
    To see this, note that 1) CRC multiplication is simply XORing a constant value into a register at various offsets, 2) XORing is simply a bit-flip operation, and ...<|separator|>
  52. [52]
    Cyclic Redundancy Check and Modulo-2 Division - GeeksforGeeks
    May 24, 2025 · Cyclic Redundancy Check or CRC is a method of detecting accidental changes/errors in the communication channel. CRC uses Generator Polynomial ...
  53. [53]
    Error Detection with the CRC
    The above polynomial of degree 16 generates a 16-bit CRC. Typically ... 100% detection of all burst errors spanning up to 16-bits;; 100% detection of ...
  54. [54]
    [PDF] Cyclic Redundancy Checks and Error Detection - arXiv
    May 23, 2022 · A generator polynomial with degree 𝑝 will detect 1 − 1/2𝑝 of all burst errors of length greater than 𝑝 + 1. [5] [14]. However these properties ...
  55. [55]
    RFC 768 - User Datagram Protocol (UDP) - IETF
    Checksum is the 16-bit one's complement of the one's complement sum of a pseudo header of information from the IP header, the UDP header, and the data, padded ...<|control11|><|separator|>
  56. [56]
    RFC 9293: Transmission Control Protocol (TCP)
    A TCP header follows the IP headers, supplying information specific to TCP. This division allows for the existence of host-level protocols other than TCP.Table of Contents · Purpose and Scope · Introduction · Functional Specification
  57. [57]
    Chapter 1 Oracle Solaris ZFS File System (Introduction)
    Checksums and Self-Healing Data. With ZFS, all data and metadata is ... In addition, ZFS provides for self-healing data. ZFS supports storage pools ...
  58. [58]
    ZFS Terminology - Managing ZFS File Systems in Oracle® Solaris ...
    A 256-bit hash of the data in a file system block. The checksum capability can range from the simple and fast fletcher4 (the default) to cryptographically ...
  59. [59]
    [PDF] The rsync algorithm - andrew.cmu.ed
    Jun 18, 1996 · The rsync algorithm consists of the following steps: 1. splits the file B into a series of non-overlapping fixed-sized blocks of size. S ...
  60. [60]
    [PDF] A Case for Redundant Arrays of Inexpensive Disks (RAID)
    A Case for Redundant Arrays of Inexpensive Disks (RAID). Davtd A Patterson, Garth Gibson, and Randy H Katz. Computer Saence D~v~smn. Department of Elecmcal ...
  61. [61]
    Checking object integrity in Amazon S3 - AWS Documentation
    With Amazon S3, you can use checksum values to verify the integrity of the data that you upload or download. In addition, you can request that another checksum ...
  62. [62]
    Performance of Checksums and CRC's over Real Data
    When performed in twos- complement, this 16-bit checksum detects all single bit errors, a single error of less than 16 bits in length, and all double bit ...
  63. [63]
    [PDF] Tutorial: Checksum and CRC Data Integrity Techniques for Aviation
    May 9, 2012 · – Many good polynomials are not primitive nor divisible by (x+1). – Divisibility by (x+1) doubles undetected error rate for even # of bit errors.
  64. [64]
    Performance of checksums and CRCs over real data
    Checksum and CRC algorithms have historically been studied under the assumption that the data fed to the algorithms was entirely random.Missing: survey | Show results with:survey
  65. [65]
    18-348 Lab #4
    But, some two-bit errors are undetected. A one's complement checksum is ... cancel each other out in terms of effect on the checksum. This is a ...
  66. [66]
    [PDF] The Effectiveness of Checksums for Embedded Networks
    Embedded control networks commonly use checksums to detect data transmission errors. However, design decisions about which checksum to use are difficult ...
  67. [67]
    2.4 Error Detection - Computer Networks: A Systems Approach
    In general, the goal of error detecting codes is to provide a high probability of detecting errors combined with a relatively low number of redundant bits.
  68. [68]
    [PDF] CSE 561 – Error detection & correction - Washington
    Error Detection/Correction Codes​​ The sender computes the ECC bits based on the data. The receiver also computes ECC bits for the data it receives and compares ...
  69. [69]
    [PDF] A Tutorial on Reed-Solomon Coding for Fault-Tolerance in RAID ...
    Jul 19, 1996 · It is well-known that Reed-Solomon codes may be used to provide error correction for multiple failures in. RAID-like systems.
  70. [70]
    An introduction to hashing and checksums in Linux - Red Hat
    Jan 18, 2021 · SHA256 generates a bigger hash, and may take more time and computing power to complete. It is considered to be a more secure approach. MD5 is ...
  71. [71]
    [PDF] NIST.SP.800-175Br1.pdf
    Mar 1, 2020 · The algorithm and key are used together to apply cryptographic protection to data (e.g., to encrypt the data or to generate a digital signature) ...