Fact-checked by Grok 2 weeks ago

Hamming distance

The Hamming distance between two strings of equal length is defined as the number of positions at which the corresponding symbols differ.^[1] This metric, originally developed for binary sequences but generalizable to any finite alphabet, quantifies the minimum number of substitutions required to transform one string into another.^[2] Introduced by Richard W. Hamming in his seminal 1950 paper on error-detecting and error-correcting codes, the concept arose from the need to reliably transmit data over noisy channels in early computing systems at Bell Laboratories.^[3] Hamming's work laid the foundation for modern coding theory, where the Hamming distance serves as a key parameter in assessing the error-correcting capability of a code: a code's minimum distance determines how many errors it can correct or detect. For instance, a code with minimum Hamming distance d can correct up to \lfloor (d-1)/2 \rfloor errors and detect up to d-1 errors.^[4] Beyond coding theory, the Hamming distance finds broad applications in computer science and related fields. In machine learning and information retrieval, it measures similarity between binary feature vectors or hash codes, enabling efficient nearest-neighbor searches in large-scale datasets.^[5] For example, locality-sensitive hashing techniques use Hamming distance to approximate distances in high-dimensional spaces for tasks like image retrieval and recommendation systems.^[6] In bioinformatics, it evaluates differences between DNA sequences, aiding in sequence alignment and mutation analysis by counting mismatched nucleotides.^[7] Additionally, it appears in cryptography for secure multi-party computation protocols that preserve privacy while computing distances,^[8] and in network security for aggregating alerts based on similarity thresholds.^[9] These uses highlight its versatility as a simple yet powerful tool for comparing discrete structures across disciplines.

Core Concepts

Definition

The Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols differ.^[10] Formally, for two strings x = x_1 x_2 \dots x_n and y = y_1 y_2 \dots y_n over a finite alphabet, the Hamming distance is given by

d_H(x, y) = \sum_{i=1}^n \delta(x_i, y_i),

where \delta(a, b) = 1 if a \neq b and \delta(a, b) = 0 otherwise. This measure quantifies the minimum number of symbol substitutions required to transform one string into the other.^[4] The definition applies strictly to strings of equal length n; extensions to unequal lengths have been developed to handle such cases.^[11] The Hamming distance represents a special case of the Levenshtein distance, in which only substitutions are permitted as edit operations, excluding insertions and deletions.^[11] This distance metric is foundational in the design of error-correcting codes, where it determines the code's ability to detect and correct transmission errors.^[12]

Generalizations

The weighted Hamming distance extends the standard Hamming distance by incorporating position-specific weights to emphasize differences in certain coordinates. It is defined as d_H(x, y) = \sum_{i=1}^n w_i \delta(x_i, y_i), where w_i > 0 are weights assigned to each position i, and \delta(x_i, y_i) = 1 if x_i \neq y_i and 0 otherwise. This generalization is particularly useful in applications where some features or symbols carry more importance than others, such as in pattern recognition or asymmetric error models. The q-ary Hamming distance generalizes the metric to alphabets of size q greater than 2, measuring the number of positions at which two strings over a finite alphabet differ. In coding theory, this distance is defined for vectors in \mathbb{F}_q^n, where \mathbb{F}_q is the finite field with q elements, and the distance between codewords is the count of differing coordinates. q-ary codes, such as q-ary Hamming codes denoted Ham(r,q), leverage this distance to achieve error correction over non-binary channels, with the minimum distance determining the code's error-detecting and -correcting capabilities.^[13] In the context of vector spaces, the Hamming distance applies naturally to binary vectors in the space \mathbb{F}_2^n, where it equals the weight of the difference vector

[x + y](/page/X&Y)

(since addition is modulo 2). This structure forms the Hamming space, a metric space on (\mathbb{F}_2)^n where the distance induces a topology useful for linear algebra over finite fields.^[14] The minimum distance in linear codes over \mathbb{F}_2^n corresponds to the minimum weight of nonzero codewords, facilitating the design of error-correcting codes.^[12] The Hamming distance relates to the Jaccard distance when binary strings are interpreted as set representations. For binary vectors x and y of length n indicating sets A and B, the (unnormalized) Hamming distance is d_H(x, y) = |A \Delta B|, the size of the symmetric difference. The normalized Hamming distance is then d_H(x, y)/n, while the Jaccard distance is d_J(x, y) = \frac{|A \Delta B|}{|A \cup B|}. These metrics coincide when |A \cup B| = n but differ otherwise. This connection arises in set similarity tasks, where the Hamming distance provides a metric assuming fixed-length representations and the Jaccard distance accommodates varying support sizes.^[15]

Examples and Illustrations

Binary Examples

Consider two 7-bit binary strings: 1011101 and 1001001. The Hamming distance between them is 2, as they differ in the third and fifth positions.^[16] To visualize the differences, align the strings by position:

Position:  1  2  3  4  5  6  7
String A:  1  0  1  1  1  0  1
String B:  1  0  0  1  0  0  1
Position:  1  2  3  4  5  6  7
String A:  1  0  1  1  1  0  1
String B:  1  0  0  1  0  0  1

The bits differ only at positions 3 (1 vs. 0) and 5 (1 vs. 0), while all other positions match.^[16] A step-by-step calculation proceeds as follows for these 7-bit strings, which match the length used in the classic (7,4) Hamming code:

Position 1: 1 == 1 (match, count remains 0)
Position 2: 0 == 0 (match, count remains 0)
Position 3: 1 ≠ 0 (mismatch, count = 1)
Position 4: 1 == 1 (match, count remains 1)
Position 5: 1 ≠ 0 (mismatch, count = 2)
Position 6: 0 == 0 (match, count remains 2)
Position 7: 1 == 1 (match, count remains 2)
The final count of 2 is the Hamming distance.^[2]^[17]

This distance aligns with the formal definition of Hamming distance as the number of positions where the strings differ, equivalent to the summation over all positions of an indicator for inequality between bits.^[3] In data transmission scenarios, each differing bit represents a potential single-bit error introduced during channel noise.^[3]

Multidimensional Examples

The Hamming distance extends naturally to multidimensional contexts beyond binary strings, such as sequences over finite alphabets of size q > 2 or vectors in higher-dimensional spaces, where it counts the number of positions at which corresponding components differ. This generalization, known as the q-ary Hamming distance, applies to symbols from an alphabet \Sigma with |\Sigma| = q, measuring dissimilarity in fields like bioinformatics and signal processing.^[18] In bioinformatics, the Hamming distance quantifies mutations or substitutions between DNA sequences, treating nucleotides (A, C, G, T) as symbols from a q-ary alphabet with q=4. For example, the sequences AGCT and AACT differ only at the second position (G versus A), yielding a Hamming distance of 1. This metric helps assess genetic similarity without considering insertions or deletions, focusing on point-wise mismatches.^[19] A similar application appears in q-ary coding theory, where vectors over alphabets larger than binary are compared. Consider the ternary vectors (1, 0, 1) and (1, 1, 0) over \{0,1,2\} (q=3); they differ in the second and third components, resulting in a Hamming distance of 2. Such examples illustrate error detection in non-binary communication channels.^[18] For binary images, the Hamming distance can measure pixel-wise differences by treating the image as a flattened vector of 0s and 1s, where each entry represents a pixel's intensity (black or white). For instance, two 2x2 binary images—one with pixels [[0,1],[0,0]] and the other [[0,1],[1,0]]—differ in only the bottom-left pixel when vectorized as (0,1,0,0) and (0,1,1,0), giving a distance of 1. This approach is useful for simple image comparison tasks, such as detecting changes in grayscale binary representations.^[20] In color image analysis, Hamming distance can compare discretized RGB values by viewing each pixel as a 3-tuple over a large alphabet (e.g., q=256 for 8-bit channels), counting differing components across red, green, and blue. For example, the colors (255,0,0) (pure red) and (255,128,0) (orange) differ in the green channel, yielding a distance of 1, which highlights channel-specific variations in quantized color spaces. This method, though less common than Euclidean metrics, aids in palette matching or error analysis for digital colors.^[18]^[21]

Mathematical Properties

Metric Space Properties

The Hamming distance d_H on the space \mathcal{A}^n, where \mathcal{A} is a finite alphabet and n is the length of the sequences, satisfies the non-negativity axiom: d_H(x, y) \geq 0 for all x, y \in \mathcal{A}^n, with equality if and only if x = y.^[2] This follows directly from the definition, as the distance counts the number of differing positions, which is zero precisely when the sequences are identical and positive otherwise.^[22] Symmetry holds as well: d_H(x, y) = d_H(y, x) for all x, y \in \mathcal{A}^n.^[2] The equality arises because the positions where x and y differ are the same as those where y and x differ, making the count invariant under reversal.^[23] The triangle inequality is also satisfied: d_H(x, z) \leq d_H(x, y) + d_H(y, z) for all x, y, z \in \mathcal{A}^n.^[2] To see this, consider the set of positions A(x, z) = \{i : x_i \neq z_i\}. For each such position i, either x_i \neq y_i or y_i \neq z_i (or both), so A(x, z) \subseteq A(x, y) \cup A(y, z). Thus, |A(x, z)| \leq |A(x, y)| + |A(y, z)|, which is the desired subadditivity.^[22] This property holds per coordinate and extends to the total count.^[24] These axioms establish (\mathcal{A}^n, d_H) as a metric space, providing a geometric structure for analyzing sequences via distances that respect the standard metric requirements.^[25] In this space, sequences are points, and the Hamming distance measures their separation based on symbol mismatches.^[2]

Algebraic Properties

The Hamming distance in the vector space \mathbb{F}_q^n over a finite field \mathbb{F}_q satisfies the linearity property d_H(x + z, y + z) = d_H(x, y) for all x, y, z \in \mathbb{F}_q^n, since the distance equals the Hamming weight of the difference x - y, which remains unchanged under translation by z.^[26] This invariance under addition makes Hamming distance particularly suitable for analyzing linear codes, where codewords form a subspace closed under addition.^[26] The Hamming weight of a vector x \in \mathbb{F}_q^n, denoted wt_H(x), is defined as d_H(x, 0), the number of nonzero coordinates in x.^[26] In a linear code C \subseteq \mathbb{F}_q^n, the minimum Hamming distance d equals the minimum weight of any nonzero codeword, providing an algebraic measure of the code's error-correcting capability.^[26] In binary codes over \mathbb{F}_2, even-weight codes form a subspace where all codewords have even Hamming weight, often constructed via parity-check matrices that enforce the sum of coordinates to be zero modulo 2.^[27] Such parity checks detect single errors by ensuring minimum distance at least 2, as any odd-weight error alters the parity.^[27] The Singleton bound relates the minimum Hamming distance d to code parameters: for a q-ary linear code of length n and dimension k, d \leq n - k + 1, or equivalently, k \leq n - d + 1. This bound, achieved by maximum distance separable codes like Reed-Solomon codes, limits the trade-off between rate and distance in algebraic code constructions. Combinatorially, the Hamming bound (or sphere-packing bound) limits code size via disjoint spheres of radius t = \lfloor (d-1)/2 \rfloor around codewords: |C| \leq q^n / \sum_{i=0}^t \binom{n}{i} (q-1)^i, ensuring no overlap in the space \mathbb{F}_q^n.^[28] Codes attaining this bound, such as Hamming codes, are perfect and saturate the packing.^[28]

Applications in Coding Theory

Error Detection

In coding theory, the Hamming distance is central to error detection, as the minimum distance d of a code determines its ability to identify corrupted codewords. Specifically, a code with minimum Hamming distance d can detect up to d-1 errors, since any alteration of fewer than d bits in a valid codeword cannot produce another valid codeword, ensuring the received word falls outside the code.^[4]^[29] In linear codes, this minimum distance corresponds to the minimum Hamming weight of any nonzero codeword.^[30] A classic illustration is the even parity code, which appends a single parity bit to data bits to ensure an even number of 1s overall, yielding a minimum distance of 2. A single bit error flips the parity to odd, producing an invalid codeword at Hamming distance 1 from the original, which is detectable upon parity check; however, two errors may preserve even parity, evading detection.^[31] The Hamming bound, or sphere-packing bound, establishes a fundamental limit on the size of codes with specified minimum distance, thereby constraining achievable error detection capabilities for given redundancy. For a binary code of length n and minimum distance d, letting t = \lfloor (d-1)/2 \rfloor, the maximum number of codewords M satisfies

M \sum_{k=0}^{t} \binom{n}{k} \leq 2^n.

This inequality underscores the trade-off between code rate and detection strength, as larger d (enhancing detection of up to d-1 errors) reduces M.^[32] Cyclic redundancy check (CRC) codes apply Hamming distance principles in real-world systems for efficient error detection, especially against burst errors common in communication channels. A CRC with generator polynomial of degree r guarantees detection of all burst errors up to length r, while its minimum Hamming distance d ensures detection of up to d-1 random bit errors; for example, CRC-32 achieves d \geq 4 for short messages, detecting at least 3 random errors reliably.^[33] A key limitation of Hamming distance-based detection is its inability to locate or repair errors, distinguishing it from correction schemes; moreover, if d or more errors occur, the received word may coincide with a valid codeword, resulting in undetected transmission failures.^[29]

Error Correction

In coding theory, the Hamming distance plays a central role in error correction by determining the error-correcting capability of a code. For a code with minimum Hamming distance d, it is possible to correct up to t = \lfloor (d-1)/2 \rfloor errors in a received word, as spheres of radius t around codewords are disjoint, ensuring unique decoding to the nearest codeword.^[34] This bound arises from the geometry of the Hamming space, where the separation imposed by d \geq 2t + 1 prevents overlap in error patterns of weight at most t. A canonical example is the binary Hamming code of length 7 and dimension 4, denoted as the (7,4) code, which has minimum distance d=3 and thus corrects single errors (t=1).^[35] In this linear code, the 4 information bits are encoded with 3 parity bits to form 7-bit codewords, ensuring that any single-bit flip can be identified and reversed without ambiguity.^[17] Syndrome decoding exploits the parity-check matrix H of the code to locate and correct errors efficiently. For a received vector r = c + e, where c is the transmitted codeword and e is the error vector, the syndrome s = H r = H e (modulo 2) is computed; since H c = 0 for codewords c.^[36] In the Hamming code, the columns of H are all distinct nonzero binary vectors of length 3, so s matches the column corresponding to the error position, allowing direct correction by flipping that bit.^[37] Hamming codes are perfect codes, meaning they achieve equality in the sphere-packing bound (Hamming bound), which states that the number of codewords M satisfies M \cdot \sum_{i=0}^{t} \binom{n}{i} \leq 2^n for binary codes of length n.^[35] For the (7,4) code, with n=7, t=1, the spheres of radius 1 around the 16 codewords exactly cover the 128 possible vectors ($16 \cdot (1 + 7) = 128), saturating the bound and leaving no gaps in the space.^[28] Extensions of Hamming codes to multiple-error correction are provided by Bose-Chaudhuri-Hocquenghem (BCH) codes, which generalize the construction using roots of unity in finite fields to achieve designed distances d = 2t + 1 for correcting t > 1 errors.^[38] Binary primitive BCH codes of length $2^m - 1 can correct up to t errors with parity-check matrices extended from the Hamming case, enabling applications in systems requiring higher reliability, such as satellite communications.

Broader Applications and History

Applications in Computer Science and Beyond

In machine learning, the Hamming distance serves as a measure of feature similarity for binary or categorical data, particularly in collaborative filtering systems where user preferences or item attributes are represented as binary vectors. For instance, hashing-based recommender systems learn compact binary codes for users and items, enabling efficient similarity computation via Hamming distance to predict preferences and handle large-scale recommendation tasks. This approach is advantageous for its computational efficiency, as bitwise operations allow rapid nearest-neighbor searches in high-dimensional spaces.^[39]^[40] In phylogenetics, the Hamming distance is employed to compare genetic sequences, facilitating the construction of evolutionary trees by quantifying nucleotide differences between aligned DNA or protein sequences. Distance-based methods use these pairwise Hamming distances to infer phylogenetic relationships, such as in reconstructing trees from subclonal populations in cancer genomics or analyzing microbial evolution. This metric is particularly useful for binary-like representations of sequence variations, providing a foundation for clustering taxa and estimating divergence times without assuming complex substitution models.^[41]^[42] In image processing, Hamming distance enables pixel-wise comparison for similarity metrics, often through perceptual hashing techniques that convert images into binary fingerprints for tasks like duplicate detection or content retrieval. For example, local binary patterns or hash codes derived from image features allow quick assessment of structural similarity by counting differing bits, which is robust to minor distortions like rotation or compression. This application is common in large-scale image databases, where it supports efficient indexing and matching.^[43]^[44] In cryptography, Hamming distance analyzes diffusion properties in stream ciphers, measuring how input differences propagate through the keystream to ensure avalanche effects against differential attacks. Designers evaluate the minimum Hamming weight of output differences to verify that small input changes result in substantial keystream alterations, enhancing security in lightweight ciphers based on cellular automata or linear feedback shift registers. This metric helps quantify the cipher's resistance to cryptanalysis by assessing the balance between confusion and diffusion layers.^[45]^[46] Recent applications in the 2020s extend Hamming distance to emerging storage and computing paradigms, such as DNA storage error models where it models substitution errors during synthesis and sequencing to design correcting codes that maintain minimum distances for reliable data retrieval. In DNA systems, codes are constructed to tolerate Hamming distances corresponding to error rates in oligonucleotide pools, enabling high-density archival storage with end-to-end correction. Similarly, in quantum error correction, analogs of the Hamming bound guide code design by bounding the error-correcting capability based on the minimum distance in stabilizer codes, with random constructions approaching these limits for fault-tolerant quantum computing.^[47]^[48]^[49]

Historical Development

The concept of Hamming distance originated in 1950 when Richard W. Hamming, a mathematician at Bell Laboratories, developed it to address frequent errors in early computing systems caused by unreliable punched card readers and vacuum tube failures.^[50] Frustrated by machines that halted on errors without correction, Hamming sought systematic ways to detect and fix them automatically during weekends when colleagues were unavailable.^[35] Hamming formalized the distance in his landmark paper "Error Detecting and Error Correcting Codes," published in the Bell System Technical Journal, defining it as the number of positions at which two equal-length binary strings differ.^[35] This measure underpinned the Hamming codes, the first practical single-error-correcting codes, which built on Claude E. Shannon's 1948 foundational work in "A Mathematical Theory of Communication," establishing information theory's limits for reliable transmission over noisy channels.^[51] Shannon's theoretical proofs of error-correcting code existence inspired Hamming's concrete implementations, marking a shift from abstract theory to applied engineering.^[52] Preceding Hamming's publication, Marcel J. E. Golay introduced the binary and ternary Golay codes in 1949 through a concise note on digital coding, achieving perfect error correction with minimum distances of 7 and 5, respectively; these linear codes were later recognized as extensions of Hamming's linear framework. In 1959, Alexis Hocquenghem devised a class of cyclic codes capable of multiple-error correction, independently rediscovered by Raj Bose and Dinesh K. Ray-Chaudhuri in 1960 and named BCH codes, generalizing Hamming codes to higher dimensions and error capacities using finite field algebra.^[53] Hamming's innovations earned him the inaugural IEEE Richard W. Hamming Medal and the 1968 ACM Turing Award, the latter specifically honoring his error-correcting code contributions that revolutionized data integrity in computing.^[52] Following a period of refinement in coding theory during the mid-20th century, the Hamming distance experienced renewed prominence post-2000 in genomics, where it facilitates clustering of single nucleotide polymorphisms (SNPs) and sequence alignment to uncover population structures and genetic variations.^[54]

Computation

Basic Algorithms

The basic algorithm for computing the Hamming distance between two strings or vectors of equal length n is a straightforward iterative comparison that counts the number of positions at which the corresponding symbols differ. This naive approach directly implements the definition by examining each element pairwise, incrementing a counter for each mismatch. Pseudocode for this algorithm is as follows:

function hamming_distance(x, y):
    if length(x) != length(y):
        reject computation or pad the shorter with a predefined symbol (e.g., 0 for [binary](/page/Binary))
    count = 0
    for i = 1 to n:
        if x[i] != y[i]:
            count = count + 1
    return count
function hamming_distance(x, y):
    if length(x) != length(y):
        reject computation or pad the shorter with a predefined symbol (e.g., 0 for [binary](/page/Binary))
    count = 0
    for i = 1 to n:
        if x[i] != y[i]:
            count = count + 1
    return count

The algorithm requires O(n) time due to the single pass through the length of the inputs and uses O(1) additional space beyond storing the input strings. For unequal lengths, the standard definition assumes equal length, so implementations typically reject the inputs or pad the shorter one with a neutral symbol to align them, though padding alters the distance and is context-specific (e.g., leading zeros for binary representations). A simple Python implementation for binary strings, which follows this iterative counting logic, is:

python
def hamming_distance_binary(s1, s2):
    if len(s1) != len(s2):
        raise ValueError("Inputs must be of equal length")
    return sum(c1 != c2 for c1, c2 in [zip](/page/Zip)(s1, s2))
def hamming_distance_binary(s1, s2):
    if len(s1) != len(s2):
        raise ValueError("Inputs must be of equal length")
    return sum(c1 != c2 for c1, c2 in [zip](/page/Zip)(s1, s2))

This code uses Python's [zip](/page/Zip) to pair characters and counts mismatches, mirroring the core of optimized libraries like SciPy's hamming function, which computes the proportion via array comparisons but can be adapted for the absolute count.^[55]

Optimized Implementations

For binary strings, the Hamming distance can be computed efficiently using bitwise XOR followed by a population count (popcount) operation, which counts the number of set bits in the result. This approach leverages the fact that XOR identifies differing bits, and modern compilers provide intrinsics like __builtin_popcountll in GCC for 64-bit integers, enabling single-instruction computation on supported hardware. Such methods achieve constant-time performance per word, scaling linearly with the number of words in longer strings, and are foundational in applications requiring rapid similarity checks.^[56] To further accelerate computation over multiple pairs or high-dimensional vectors, single instruction, multiple data (SIMD) instructions like those in the AVX2 extension process 256-bit registers, allowing vectorized XOR and popcount across 4-8 32/64-bit words simultaneously.^[57] Algorithms exploiting AVX2 can double the throughput of scalar popcount on Intel processors by parallelizing bit operations within registers, reducing cycles per distance calculation for datasets with thousands of dimensions.^[57] This vectorization is particularly effective for batch processing, as demonstrated in string matching tasks where mismatch counts (equivalent to Hamming distances) are computed in parallel.^[58] For non-binary alphabets, such as q-ary symbols in coding theory, efficient computation often employs lookup tables to precompute distances between symbol pairs, avoiding repeated comparisons in inner loops. These tables map each pair to the indicator value (1 if differing, 0 otherwise), enabling O(n summation over sequence length n with minimal overhead; for larger q, modular arithmetic can optimize symbol comparisons if the alphabet is represented modulo q. This technique trades modest memory for speed in embedded or high-throughput scenarios, such as non-binary error-correcting codes. In large-scale nearest neighbor searches, exact computation becomes prohibitive, prompting approximate methods like locality-sensitive hashing (LSH) tailored for Hamming space, where hash functions preserve proximity with high probability for close points. The seminal LSH framework projects binary vectors onto random hyperplanes, yielding bit hashes whose Hamming distances approximate the original, enabling sublinear query times by bucketing similar items. This addresses scalability gaps in high-dimensional data, with theoretical guarantees on collision probabilities for c-approximate nearest neighbors. Hardware accelerations on field-programmable gate arrays (FPGAs) and graphics processing units (GPUs) target big data applications like genomics, where Hamming distances filter sequence alignments rapidly. FPGA designs implement parallel XOR-popcount pipelines for short-read mapping, achieving 10-100x speedups over CPUs by customizing logic for fixed-length k-mers. In genomics pipelines, GPU variants exploit massive parallelism for pairwise distances across billions of bases, reducing alignment times from hours to minutes while handling variable-length sequences.^[59]

References

[1]
Hamming distance
Definition: The number of bits which differ between two binary strings. More formally, the distance between two strings A and B is ∑ | Ai - Bi |. Aggregate ...
[2]
[PDF] Hamming Metric and the Minimum Distance
Hamming distance. Definition: Hamming distance and Hamming weight. Given two vectors x and y of the same length n over 1, we define the. Hamming distance d(x,y) ...
[3]
Error Detecting and Error Correcting Codes - Hamming - 1950
Error Detecting and Error Correcting Codes. R. W. Hamming,. R. W. Hamming ... First published: April 1950. https://doi.org/10.1002/j.1538-7305.1950 ...
[4]
[PDF] Coding Theory: Math 454 Lecture 19 (7/31/2017)
Dec 17, 2017 · Definition 1.2 (Hamming distance). The Hamming distance d(x, y) between two strings is the number of positions in which they differ. For ...
[5]
[PDF] Hamming Distance Metric Learning - Mohammad Norouzi
The Hamming distance, a natural similarity measure on binary codes, can be computed with just a few machine instructions per comparison.
[6]
Privacy-preserving Hamming distance Protocol and Its Applications
It has important theoretical significance and application value in graph recognition, machine learning, artificial intelligence, bioinformatics and other ...
[7]
Hamming Distance as a Concept in DNA Molecular Recognition - NIH
Apr 5, 2017 · The number of positions that two codewords of the same length differ is the Hamming distance. In case of DNA sequences, we define this distance ...
[8]
Reducing the Cognitive Load on Analysts Through Hamming ...
Sep 30, 2014 · We explore minimizing the number of both alerts and data fields by aggregating at Hamming distances greater than 1. We show how increasing bin ...<|control11|><|separator|>
[9]
[PDF] On the Hamming distance between base-n representations of ... - arXiv
Apr 29, 2012 · Defined between two strings of equal length, the Hamming distance gives the number of po- sitions at which corresponding symbols differ.
[10]
[PDF] Robustness Analysis of String Transducers?
The Hamming distance and Levenshtein distance metrics are often used to measure distances (or similarity) between strings. The Hamming distance, de- fined ...
[11]
[PDF] Notes on Error Correcting Codes - Brown Math Department
The Hamming distance between two elements is the number of coordi- nates in which they differ. For instance, the Hamming distance between. 101000 and 111001 ...
[12]
[PDF] Coding Theory: Linear Error-Correcting Codes
Apr 23, 2014 · For a non-binary finite field Fq, the q-ary Hamming code is denoted as Ham(r,q). Page 41. Coding. Theory: Linear Error-. Correcting. Codes. Anna ...
[13]
[PDF] Introduction to binary block codes
Therefore the Hamming distance,. dH (x, y) = wH (x − y) = wH (x + y), may be used to define (F2)n as a metric space, called a Hamming space. We now show that ...
[14]
[PDF] Error-correcting codes introduction, Hamming distance
Mar 5, 2008 · Few examples: The Hamming distance between: • 1011101 and 1001001 is ... • 2173896 and 2233796 is ...
[15]
[PDF] Hamming Codes
Hamming codes are error-correcting codes, designed to correct single errors and detect double errors, and are built using a specific check matrix.
[16]
Hamming Distance - an overview | ScienceDirect Topics
The binary Hamming space is usually represented as the linear space F 2 n of all n-tuples a = (a1, a0,…, an) of elements ai from the binary field F 2 . We ...<|separator|>
[17]
[PDF] Distance Measures for Sequences - arXiv
For example if we consider two DNA sequences, AGCTAAC and AACTCCA, the Hamming distance between them will be 4, because at positions 2, 5, 6 and 7 we find ...
[18]
[PDF] Polymorphisms and Neural Networks for Image Classification
May 2, 2023 · . In other words, the Hamming distance is the number of pixels whose values are different in a1 and a2. Now we can formally define the Hamming ...
[19]
Image Retrieval Using Different Distance Methods and Color ... - NIH
Mar 14, 2022 · Euclidean, Manhattan, and Hamming distances are used to calculate the nearest distance of similar images with the query image in the dataset.Missing: discretized | Show results with:discretized<|control11|><|separator|>
[20]
[PDF] EE 605: Error Correcting Codes
Prove that the Hamming distance satisfies the triangle inequality, i.e. d(u,v) ≤ d(u,w) + d(w,v) for all n-tuples u,v,w. Solution: For a set A, ...
[21]
[PDF] Recitation - 15853 Fall 2019: Algorithms in the Real World
Sep 13, 2019 · Properties of Hamming distance. Page 17. Hamming distance. Definition. The Hamming distance between x, y ∈ Fn is: d(x, y) = |{i ∈ [n] | xi ̸= yi ...
[22]
[PDF] Solutions to Chapter 5 - MIT
Feb 16, 2012 · Since the Hamming distance satisfies the triangle inequality for every bit, it satisfies the triangle inequality for the entire n-bit word ...
[23]
[PDF] Lecture 1: Basic problems of coding theory - Rutgers University
The Hamming distance between x, y ∈ {0,1}n is defined as the number of coordinates in which x and y differ. We shall denote the Hamming distance between x and y ...
[24]
[PDF] Chapter 31 Coding Theory
The Hamming distance d(u, v) of u, v ∈ Fn is the number of positions in which they differ so that d(u, v) = wt(u − v).
[25]
[PDF] 1 Hamming Distance 2 Error Correcting Codes
Hamming distance is the number of places where two vectors differ. Error correcting codes can detect up to d-1 errors, where d is the minimum distance between ...
[26]
[PDF] Chapter 2. Sphere Packing and Shannon's Theorem
For x, y ∈ An, we define. dH(x, y) = the number of places in which x and y differ. This number is the Hamming distance between x and y. ... (Sphere packing bound; ...
[27]
[PDF] Lecture 3: Error Correction and Distance 1 A closer look at C⊕
Aug 31, 2007 · Note that the definition of Hamming distance just depends on the number of differences and not the nature of the difference. For example, for ...
[28]
[PDF] Coding theory, basic notions and notation
If C has minimum distance d, then it is t-error correcting for t ≤ [(d − 1)/2], and it is (d − 1)-error detecting. In other words, a t-error correcting code ...<|separator|>
[29]
[PDF] Codewords and Hamming Distance • Error Detection: parity - MIT
We can add single-bit error detection to any length code word by adding a parity bit chosen to guarantee the Hamming distance between any two valid code words ...
[30]
[PDF] Lecture 4: Hamming code and Hamming bound
Sep 5, 2007 · The Hamming code (CH) has a distance of 3, and the Hamming bound is an upper bound on the dimension of codes with distance 3.Missing: reference | Show results with:reference
[31]
[PDF] Efficient High Hamming Distance CRCs for Embedded Networks
CRCs are commonly used in enterprise, desktop, and high-end embedded applications, including standards such as Ethernet [12], ATM networks [4], and IEEE 1394( ...
[32]
[PDF] 1 Hamming Code Recap 2 Code Distance - People @EECS
Aug 29, 2022 · The distance of a code is defined as the minimum Hamming distance between any two codewords c, c0. Linear codes allow us to simplify this ...
[33]
[PDF] The Bell System Technical Journal - Zoo | Yale University
April, 1950. The Bell System ... Copyright, 1950,American Telephone and Telegraph Company. Error Detecting and Error Correcting Codes. By R. W. HAMMING.
[34]
[PDF] 6.02 Lecture 5: Error correction, syndrome decoding
Syndrome Decoding – Matrix Fo rm Task: given n-bit code word, compute (n-k) syndrome bits. Again we can use matrix multiply to do the job.
[35]
Hamming Code with Matrices
For example, given the codeword C = [ 1 0 1 0 0 1 0 ], then B = C + E = [ 1 1 1 0 0 1 0 ] where a single bit is made incorrect. Of course, when checking ...Missing: ternary | Show results with:ternary
[36]
[PDF] BCH Codes
This class of codes is a remarkable generalization of the Hamming code for multiple-error correction. We only consider binary BCH codes in this lecture note.
[37]
Projected Hamming Dissimilarity for Bit-Level Importance Coding in ...
Mar 26, 2021 · Object similarity can then be computed by learning binary representations (hash codes) of the objects and computing their Hamming distance.
[38]
[PDF] Content-aware Neural Hashing for Cold-start Recommendation - arXiv
May 31, 2020 · We present a content-aware neural hashing-based collaborative filtering approach (NeuHash-CF), which generates binary hash codes for users and ...
[39]
DEVOLUTION—A method for phylogenetic reconstruction ... - Nature
Sep 20, 2021 · In order to generate the phylogenetic trees, the genetic distances between all the subclones must be computed. Here, the Hamming distance ...
[40]
[PDF] Distance-based phylogenetic inference from typing data - arXiv
Jun 12, 2020 · The focus here is on distance-based analysis of DNA sequences or typing profiles, where the Hamming distances between pairs of sequences/ ...
[41]
[PDF] Hamming Distributions of Popular Perceptual Hashing Techniques
Dec 16, 2022 · These hashes are then compared using a vari- ety of similarity metrics (Hamming distance, Eu- clidean distance, Earthmover distance, L2 dis-.
[42]
A Feature Matching Method Based on Multi-Level Refinement ...
Feb 21, 2024 · The similarity between two feature points can be evaluated by calculating the Hamming distance between their descriptors.
[43]
[PDF] On the design of stream ciphers with Cellular Automata having ...
– Hamming distance: The Hamming distance between two given functions is the Hamming weight of the XOR of the two functions. Now, we discuss certain ...
[44]
[PDF] Cryptanalysis of stream ciphers with linear masking
May 31, 2002 · For this type of attacks, we again analyze the statistical distance, as a function of the number of bits in the low-diffusion characteristic.
[45]
[PDF] End-to-end Correction in DNA Storage Systems - arXiv
Jun 30, 2024 · errors as well when changing the Hamming distance to the edit distance. ... Yaakobi, "Reconstruction Algorithms for DNA-Storage Systems", bioRxiv ...
[46]
Reconstruction algorithms for DNA-storage systems - Nature
Jan 23, 2024 · ... Hamming distance ... Our algorithms are designed to specifically support DNA storage systems and to reduce the edit error rate of the ...
[47]
Haar random codes attain the quantum Hamming bound ... - arXiv
Oct 8, 2025 · Our main result is that Haar random codes can approximately correct errors up to the quantum Hamming bound, meaning that a set of m m Pauli ...
[48]
CIO Blast from the Past: 60 years of Hamming codes
Nov 25, 2010 · In 1950 Bell Labs researcher, Richard W Hamming, made a discovery that would lay an important foundation for the entire modern computing and ...
[49]
[PDF] A Mathematical Theory of Communication
379–423, 623–656, July, October, 1948. A Mathematical Theory of Communication. By C. E. SHANNON. INTRODUCTION. THE recent development of various methods of ...
[50]
Richard W. Hamming - A.M. Turing Award Laureate
Richard Hamming is best known for his work at Bell Labs on error-detecting and error-correcting codes. His fundamental paper on this topic, Error detecting and ...
[51]
BCH Codes | SpringerLink
A class of powerful multiple-error-correcting cyclic codes was discovered by Bose and Ray-Chaudhuri in 1960 [1] and independently by Hocquenghem in 1959.
[52]
Using Hamming Distance as Information for SNP-Sets Clustering ...
Aug 24, 2015 · The Hamming distance-based clustering algorithm can be used as a tool to elucidate the underlying population structure from genomic SNP data. We ...Missing: post- | Show results with:post-
[53]
[PDF] Data Structures and Algorithms for the Identification of Biological ...
For pattern matching with mismatches, a naive algorithm computes the Hamming distance for every alignment of the pattern in the text, in time O(nm). A ...
[54]
hamming — SciPy v1.16.2 Manual
Compute the Hamming distance between two 1-D arrays. The Hamming distance between 1-D arrays u and v, is simply the proportion of disagreeing components in u ...Missing: code | Show results with:code
[55]
[PDF] Fast and compact Hamming distance index - Persone
Indeed, the Hamming distance between K and Q equals to H(K, Q) = popcount(K ⊕ Q), where ⊕ denotes the bitwise XOR and popcount counts the number of bits ...<|separator|>
[56]
[1611.07612] Faster Population Counts Using AVX2 Instructions
Nov 23, 2016 · We show that a vectorized approach using SIMD instructions can be twice as fast as using the dedicated instructions on recent Intel processors.
[57]
String searching with mismatches using AVX2 and AVX-512 ...
The mismatch distance of two strings of equal length is also called the Hamming distance. In our study, we propose algorithms that make use of SIMD (Single ...
[58]
Hardware acceleration of genomics data analysis - Oxford Academic
May 25, 2021 · This article provides a state of the art review on current hardware acceleration strategies for genomic data analysis, and it establishes the challenges and ...