Binary number

A binary number is a number expressed in the base-2 numeral system or binary numeral system, a method for representing numeric values using only two symbols: typically the digits 0 and 1. It forms the basis for all modern digital computing and data storage, where information is encoded in binary form.^[1]

History

Ancient origins

In ancient Egypt, mathematical practices documented in the Rhind Mathematical Papyrus, dating to approximately 1650 BCE, incorporated binary-like methods for multiplication and division through repeated doubling and halving. This approach represented numbers in a form akin to binary by expressing the multiplier as a sum of powers of two, then summing corresponding doubled values of the multiplicand; for instance, multiplying 70 by 13 involved doubling 70 to generate 1×70, 2×70, 4×70, and 8×70, then adding the terms for 13's binary decomposition (1101₂ = 8 + 4 + 1) to yield 910.^[2] Such techniques facilitated efficient computation without a positional numeral system, relying instead on additive combinations of doubled units.^[2] The I Ching, an ancient Chinese divination text compiled around the 9th century BCE during the Western Zhou dynasty, utilized hexagrams formed by six lines, each either solid (yang, representing 1) or broken (yin, representing 0), creating 64 distinct patterns that encode binary sequences (2^6 combinations).^[3] These hexagrams, built from eight trigrams of three lines each, reflected a cosmological binary duality central to early Chinese philosophy and decision-making. In the 11th century CE, Neo-Confucian scholar Shao Yong arranged the 64 hexagrams in a deductive binary order, progressing from all-yin to all-yang, which systematically enumerated them as 6-bit binary numbers from 000000 to 111111.^[3] In ancient India, the mathematician Pingala, in his Chandahshastra treatise on Sanskrit prosody around 200 BCE, described binary patterns to enumerate poetic meters using short (laghu, akin to 0) and long (guru, akin to 1) syllables. This generated sequences of meter variations, such as 1 for one matra, 2 for two, 3 for three, 5 for four, and so on, following the recurrence where the number of patterns for n matras equals the sum for n-1 and n-2.^[4] Later interpretations, including by medieval scholars, recognized this as the Fibonacci sequence, with Pingala's combinatorial rules prefiguring the series' properties in counting binary-like syllable arrangements.^[5] Among the Yoruba people of West Africa, the Ifá divination system, with origins tracing back over 2,500 years to pre-10th century traditions, employed binary marks generated through palm nuts or a divination chain to produce 256 odu (sacred signs), equivalent to 8-bit binary combinations (2^8).^[6] Diviners marked single (I, light/expansion) or double (II, darkness/contraction) lines in two columns of four, forming octograms that encoded polarities for interpreting life events and cosmic balance.^[7]^[8] This binary structure underpinned an extensive corpus of poetic verses and mathematical formulas preserved orally.^[7] Classical Greek and Roman cultures developed binary-like encoding tools, such as the Polybius square devised by the historian Polybius around 150 BCE, a 5x5 grid assigning letters coordinates from 1 to 5 for signaling with torches. This positional system transmitted messages via two numerical signals per letter (e.g., row-column pairs), analogous to binary coordination though operating in base 5.^[9]

European developments

In the 13th century, Catalan philosopher and theologian Ramon Llull developed a pioneering combinatorial system in his Ars Magna (1308), which employed binary combinations to systematically analyze philosophical and theological concepts. Llull assigned letters (B through K) to nine fundamental divine dignities, such as goodness (B) and greatness (C), and generated all possible binary pairings—yielding 36 unique combinations (treating order as irrelevant)—to explore relational principles like concordance and opposition. This method allowed for the mechanical production of logical statements, such as "Goodness differs from magnitude," forming the basis of an early form of symbolic logic aimed at universal demonstration.^[10] Llull's framework drew on biblical interpretations of creation and divine essence, viewing the binary pairings as a reflection of God's ordered attributes to rationally affirm Christian truths against non-believers. By combining dignities with questions (e.g., "whether") and subjects (e.g., God, creation), the system produced arguments supporting doctrines like the Trinity, positioning binary logic as a tool for evangelization and interfaith debate. His approach emphasized exhaustive enumeration over intuition, influencing later European thinkers in mechanics and computation.^[11] By the 16th century, English mathematician Thomas Harriot advanced binary concepts through practical arithmetic in unpublished manuscripts circa 1610. Harriot represented integers in base 2 using dots and circles (1 and 0), performing operations like addition (e.g., 101 + 111 = 1100), subtraction, and multiplication (e.g., 1101101 × 1101101 = 10111001101001), while converting between binary and decimal for efficiency in calculations. This work demonstrated binary's utility for decomposing numbers into powers of 2, predating similar explorations and highlighting its potential in scientific computation.^[12] In the mid-17th century, Spanish bishop and polymath Juan Caramuel y Lobkowitz provided the first published systematic treatment of binary arithmetic in Mathesis Biceps (1670), dedicating a chapter to base-2 notation as a "universal" simplification of counting. Caramuel tabulated binary equivalents for numbers 0 to 1023, illustrated addition and subtraction (e.g., 101 + 10 = 111), and argued for its elegance in reducing arithmetic to doublings and halvings, extending discussions to other bases while praising binary's theological symbolism of unity and duality. His treatise marked a key step toward formalizing binary as a viable computational tool in Europe.^[13]

Modern formalization

The modern formalization of binary numbers as a rigorous numeral system began in the early 18th century with the work of Gottfried Wilhelm Leibniz. In his 1703 essay "Explication de l'Arithmétique Binaire," Leibniz presented binary arithmetic as a base-2 system using only the digits 0 and 1, emphasizing its simplicity and potential for mechanical calculation compared to the decimal system.^[14] He illustrated binary operations through examples, such as addition and multiplication, and highlighted its philosophical significance as a representation of creation from nothingness (0) and unity (1).^[15] Additionally, Leibniz designed a tactile binary clock intended for the blind, featuring raised dots to indicate binary digits on a clock face divided into powers of 2, allowing time to be read through touch.^[16] Leibniz's interest in binary was further deepened by his correspondence with Jesuit missionaries in China, particularly Joachim Bouvet, who in 1701 described the hexagrams of the ancient I Ching as a binary-like system of yin (0) and yang (1) lines forming 64 combinations.^[17] This exchange led Leibniz to draw parallels between binary arithmetic and Chinese philosophy, viewing the I Ching as an early precursor to his formalization and reinforcing binary's universal applicability.^[18] In the 19th century, George Boole advanced the algebraic foundations of binary through his 1847 work The Mathematical Analysis of Logic, where he developed a calculus treating logical propositions as binary variables (true/false, or 1/0) and operations like conjunction and disjunction as algebraic functions.^[19] Boole's system, later known as Boolean algebra, provided a mathematical framework for deductive reasoning, enabling binary to be formalized not just as a counting method but as a tool for symbolic manipulation.^[20] The 20th century saw binary's integration into electrical engineering and computing, pioneered by Claude Shannon in his 1937 master's thesis "A Symbolic Analysis of Relay and Switching Circuits." Shannon demonstrated that Boolean algebra could model the on/off states of electrical switches using binary logic, laying the groundwork for digital circuit design.^[21] This application propelled binary's adoption in computing; although the ENIAC (completed in 1945) used decimal representation internally, the project's influence spurred the shift to binary in subsequent designs.^[22] John von Neumann's 1945 "First Draft of a Report on the EDVAC" explicitly advocated binary encoding for data and instructions, arguing it simplified multiplication, division, and circuit implementation, thus establishing the von Neumann architecture as the standard for binary-based stored-program computers.^[23]

Representation

Basic structure

A binary number is a numeral expressed in the base-2 numeral system, which uses only two digits: 0 and 1, referred to as bits.^[24] This positional notation system assigns place values to each bit based on powers of 2, with the rightmost bit representing the 2^0 position (equal to 1) and each subsequent bit to the left representing the next higher power of 2.^[25] The overall value of the binary number is the sum of the products of each bit and its corresponding place value.^[26] For instance, consider the binary number 1011. Its value is computed as follows:

\begin{align*} 1 \times 2^3 + 0 \times 2^2 + 1 \times 2^1 + 1 \times 2^0 &= 1 \times 8 + 0 \times 4 + 1 \times 2 + 1 \times 1 \\ &= 8 + 0 + 2 + 1 \\ &= 11 \end{align*}

in decimal (base-10) notation.^[27] Adding leading zeros to a binary number does not alter its numerical value, as these zeros contribute nothing to the sum (since 0 multiplied by any power of 2 is 0).^[28] However, the minimal representation of a binary number omits all leading zeros to provide the shortest string of bits that uniquely identifies the value, with the exception of the number zero, which is represented simply as "0".^[29] Binary numerals are often distinguished from general binary strings—sequences of bits that may or may not represent numbers—by appending a subscript 2, as in 1011_2, to explicitly indicate the base-2 interpretation.^[30] This notation helps avoid ambiguity when binary sequences appear in contexts like data encoding or computing.^[31]

Signed representations

Signed representations in binary extend the unsigned system to include negative values, allowing computers to handle a range of integers that encompass both positive and negative numbers. Three primary methods exist: sign-magnitude, one's complement, and two's complement, with the latter being the most widely adopted in modern computing due to its efficiency in arithmetic operations.^[32]^[33] In sign-magnitude representation, the most significant bit (MSB) serves as the sign bit—0 for positive and 1 for negative—while the remaining bits encode the absolute value (magnitude) of the number in standard binary form. For example, in an 8-bit system, the positive number 5 is represented as 00000101, and -5 as 10000101. This method is intuitive as it directly mirrors decimal sign conventions but requires separate logic for addition and subtraction of magnitudes.^[34]^[35] One's complement representation inverts all bits of the positive binary equivalent to obtain the negative value, with the MSB indicating the sign (0 for positive, 1 for negative). For instance, in 4 bits, +3 is 0011, and -3 is its bitwise complement, 1100. This approach simplifies negation to a simple inversion but introduces a dual zero representation: +0 as 0000 and -0 as 1111, which can complicate comparisons and arithmetic.^[36]^[37] Two's complement, the predominant standard in digital systems, negates a number by inverting its bits and adding 1 to the result, enabling seamless arithmetic without separate sign handling. For a 4-bit example, +3 is 0011; inverting gives 1100, and adding 1 yields 1101 for -3. This method eliminates the dual zero issue, as both +0 and -0 are represented uniquely as 0000, and it unifies addition and subtraction into the same binary addition operation, ignoring overflow for most cases.^[33]^[38]^[32] For an n-bit binary number, the unsigned range spans from 0 to $2^n - 1, accommodating only non-negative values. In signed representations like two's complement, the range shifts to -2^{n-1} to $2^{n-1} - 1, symmetrically utilizing half the values for negatives (MSB=1) and half for non-negatives (MSB=0), excluding the asymmetric -0 in one's complement. Sign-magnitude and one's complement also follow this n-bit signed range but with inefficiencies in zero handling and arithmetic.^[39]^[40]

Counting in binary

Binary sequence

The binary counting sequence represents natural numbers starting from 0 using powers of 2, where each successive number is obtained by adding 1 in base 2.^[41] The sequence begins as 0, 1, 10, 11, 100, 101, 110, 111, 1000, and continues indefinitely, with the rightmost bit (least significant bit) toggling between 0 and 1 on every increment, while higher bits change less frequently.^[42] This process resembles an odometer, where incrementing flips the current bit from 0 to 1 or propagates a carry to the next bit if it is already 1.^[42] A key property of the binary sequence is that every non-negative integer has a unique representation without leading zeros, ensuring no ambiguity in encoding values.^[43] Incrementing involves no borrowing, as it only requires flipping a sequence of trailing 1s to 0s and changing the next 0 to 1, simplifying the operation compared to higher bases.^[41] However, when all bits are 1 (e.g., 111 in three bits, representing 7), adding 1 causes carry propagation through every bit, resulting in 1000 (8 in decimal) and resetting lower bits to 0.^[44] The following table illustrates the binary sequence for decimal values 0 through 15, using four bits for clarity:

Decimal	Binary
0	0000
1	0001
2	0010
3	0011
4	0100
5	0101
6	0110
7	0111
8	1000
9	1001
10	1010
11	1011
12	1100
13	1101
14	1110
15	1111

This pattern highlights the doubling of representable values with each additional bit.^[42] A variant of the binary sequence is the Gray code, which orders binary numbers such that adjacent values differ by exactly one bit, minimizing transitions in applications like error detection or mechanical encoding.^[45] For example, the three-bit Gray code sequence is 000, 001, 011, 010, 110, 111, 101, 100.^[45] This property contrasts with the standard binary sequence, where multiple bits may change simultaneously, such as from 011 to 100.^[45]

Comparison to decimal

The binary number system, or base-2, employs only two digits—0 and 1—to represent values, in contrast to the decimal system, or base-10, which uses ten digits from 0 to 9.^[46] This fundamental difference arises from the positional notation in each system, where the place values are powers of the base: powers of 2 in binary (1, 2, 4, 8, etc.) and powers of 10 in decimal (1, 10, 100, etc.).^[47] As a result, binary representations are more compact for numbers that are powers of 2—for instance, 2^10 (1,024 in decimal) requires just 11 bits (1 followed by 10 zeros)—but generally require longer strings of digits to express large values compared to decimal.^[48] From a human readability perspective, binary poses challenges for interpreting large numbers due to its repetitive sequences of 0s and 1s, making it less intuitive for mental arithmetic or quick estimation than the more familiar decimal groupings.^[49] For example, the decimal number 255 expands to the binary string 11111111, an eight-bit sequence that lacks the structural cues of decimal's varied digits.^[46] This verbosity can complicate direct comparisons or visualizations without conversion tools. In electronic systems, binary's alignment with binary states—such as on/off switches in transistors or voltage levels (high/low)—provides significant efficiency advantages over decimal, which would require more complex circuitry to distinguish ten distinct states reliably.^[48] Historically, early computing tools like the abacus operated on decimal principles, using beads or positions to track base-10 values for manual calculations.^[50] However, the shift to binary in modern computing, accelerated by the 1945 EDVAC report emphasizing its hardware simplicity, enabled scalable digital architectures that avoided the mechanical intricacies of decimal relays or tubes seen in machines like ENIAC.^[51] To illustrate the progression in counting, the sequence from 1 to 10 in decimal reads as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, while in binary it is 1, 10, 11, 100, 101, 110, 111, 1000, 1001, 1010—highlighting binary's rapid increase in length even for small counts.^[46]

Binary fractions

Fractional notation

In binary notation, numbers less than 1 are represented using a binary point, analogous to the decimal point in base-10, which separates the integer part (to the left, typically 0 for pure fractions) from the fractional part (to the right). This fixed-point representation allows for the encoding of fractional values by assigning bits after the binary point to specific positional weights.^[52]^[53] The place values for the fractional part are determined by negative powers of 2, starting immediately after the binary point. The first position to the right of the point represents $2^{-1} = 0.5, the second $2^{-2} = 0.25, the third $2^{-3} = 0.125, and so on, decreasing by half for each subsequent bit. Each bit in these positions is either 0 or 1, contributing its full place value if 1 or nothing if 0, similar to how integer binary places use positive powers of 2 but scaled to fractions.^[52]^[54]^[55] For example, the binary fraction 0.11 consists of a 1 in the $2^{-1} place and a 1 in the $2^{-2} place, yielding a decimal value of (1 \times 0.5) + (1 \times 0.25) = 0.75. Similarly, 0.1 binary equals exactly 0.5 decimal, as it uses only the first fractional place.^[52]^[54] Binary fractions can have finite representations when the decimal equivalent's denominator in lowest terms is a power of 2, allowing termination within a fixed number of bits; otherwise, they may require infinite or repeating bits, akin to non-terminating decimals. For instance, 0.5 decimal is 0.1 binary (finite), while numbers like 0.1 decimal generally need an infinite series in binary.^[52]^[56] In practical fixed-point systems, precision is limited by the allocated number of bits for the fractional part, such as 8 or 16 bits, which can only exactly represent fractions up to that resolution and may introduce rounding errors for others. This bit limitation ensures compact storage but constrains the range of precisely representable values.^[53]^[55]

Binary decimals

Decimal fractions, which are numbers between 0 and 1 in base-10, are converted to binary by repeatedly multiplying the fractional part by 2 and recording the integer part (0 or 1) as the next binary digit after the binary point, continuing until the fraction becomes zero or a repeating pattern emerges.^[52] This process mirrors the division algorithm used for integer conversion but in reverse for the fractional component.^[52] If the decimal fraction is a sum of distinct negative powers of 2 (dyadic rational), the binary representation terminates exactly; otherwise, it may repeat indefinitely, similar to how some fractions like 1/3 repeat in decimal (0.333...).^[57] For instance, the decimal 0.625, which equals 5/8 or \frac{5}{2^3}, converts exactly to 0.101 in binary: multiplying 0.625 by 2 yields 1.25 (record 1, fraction 0.25); 0.25 by 2 yields 0.5 (record 0, fraction 0.5); 0.5 by 2 yields 1.0 (record 1, fraction 0).^[58] In contrast, 0.1 in decimal has a non-terminating binary representation of 0.00011001100110011...₂, where the block "0011" repeats indefinitely because 0.1 cannot be expressed as a finite sum of negative powers of 2.^[59] Similarly, 1/3 in decimal approximates to 0.010101...₂ in binary, with the pattern "01" repeating, as the multiplication steps alternate between fractions less than 1 and greater than or equal to 1 without terminating.^[52] When binary representations are truncated to a finite number of bits, as in computer floating-point systems following IEEE 754, rounding errors occur because non-terminating decimals are approximated.^[57] A classic example is that 0.1 + 0.2 in binary floating-point does not exactly equal 0.3; instead, the sum approximates to 0.30000000000000004 in decimal due to the imprecise representations of 0.1 (≈0.000110011001100110011...) and 0.2 (≈0.0011001100110011001101...), whose addition requires rounding.^[60] These errors accumulate in repeated calculations, potentially leading to significant discrepancies in applications like financial modeling or scientific simulations where exact decimal precision is required.^[57] To mitigate this, programmers often use decimal arithmetic libraries or rounding techniques to maintain accuracy.^[57]

Binary arithmetic

Addition and subtraction

Binary addition follows a column-by-column process from right to left, similar to decimal addition but simplified due to only two possible digits (0 and 1). Each column sums the bits from the two numbers plus any carry from the previous column, producing a sum bit (0 or 1) and a possible carry (0 or 1) to the next column. The basic rules for adding two bits plus a carry are as follows:

Inputs (A + B + Carry)	Sum Bit	Carry Out
0 + 0 + 0	0	0
0 + 0 + 1	1	0
0 + 1 + 0	1	0
0 + 1 + 1	0	1
1 + 0 + 0	1	0
1 + 0 + 1	0	1
1 + 1 + 0	0	1
1 + 1 + 1	1	1

This method, often called the ripple-carry addition or long carry propagation, builds the result by propagating carries sequentially through each bit position. For example, adding 1101₂ (13₁₀) and 101₂ (5₁₀): starting from the right, 1+1=0 carry 1; 0+0+1=1 carry 0; 1+1=0 carry 1; 1+0+1=0 carry 1; resulting in 10010₂ (18₁₀) with the final carry forming an extra bit.^[61] Binary subtraction of unsigned integers can be performed directly using borrow propagation, but for efficiency in hardware, subtraction is typically implemented using addition with two's complement representation for signed numbers. To subtract B from A (A - B), compute the two's complement of B (which represents -B) and add it to A; any carry out from the most significant bit is discarded in fixed-width arithmetic. The two's complement of a binary number is obtained by inverting all bits (0 to 1, 1 to 0) and adding 1 to the result.^[62]^[33] For instance, subtracting 011₂ (3₁₀) from 101₂ (5₁₀) in 3 bits: the two's complement of 011₂ is 100₂ (invert) + 1 = 101₂ (-3₁₀); then 101₂ + 101₂ = 1010₂ (10₁₀), discarding the overflow bit yields 010₂ (2₁₀). In signed 4-bit two's complement, adding 0111₂ (7₁₀) and 1011₂ (-5₁₀, two's complement of 0101₂): 0111₂ + 1011₂ = 10010₂, discarding the carry out gives 0010₂ (2₁₀).^[62]^[33] In fixed-bit representations, overflow occurs during addition if the result cannot be accurately represented within the bit width, particularly for signed numbers. Detection is straightforward: overflow happens if adding two positive numbers yields a negative result or two negative numbers yields a positive result, which corresponds to the carry into the sign bit differing from the carry out of the sign bit.^[63]^[64]

Multiplication and division

Binary multiplication operates on the principle that multiplying by a binary digit is straightforward: the product of 0 and any number is 0, while the product of 1 and a number is the number itself.^[65] This leads to the shift-and-add algorithm, where the multiplicand is shifted left (equivalent to multiplying by powers of 2) for each 1-bit in the multiplier and added to a running partial product.^[66] For two n-bit numbers, the process iterates n times, producing a 2n-bit product.^[67] Consider the example of multiplying 110₂ (6₁₀) by 101₂ (5₁₀). Start with the multiplicand 110₂ and multiplier 101₂. For the least significant bit of the multiplier (1), add 110₂ shifted by 0 positions: 110₂. For the next bit (0), add 0 (no addition needed). For the most significant bit (1), add 110₂ shifted left by 2 positions: 11000₂. The partial products sum to 110₂ + 11000₂ = 11110₂ (30₁₀).^[66] In hardware, this algorithm leverages shifters and adders, often implemented sequentially in a single cycle per iteration using an arithmetic-logic unit (ALU), which supports efficient multiplication in processors by reusing addition circuitry.^[67] For signed numbers in two's complement representation, both operands are sign-extended to the full product width (typically 2n bits), and the unsigned shift-and-add method is applied directly; the result's least significant n bits yield the correct signed product.^[68] For instance, multiplying -7 (1001₂ in 4-bit two's complement) by -6 (1010₂) involves sign-extending to 8 bits (11111001₂ and 11111010₂), performing the multiplication to get 00101010₂ (42₁₀), which is correct for the positive product of two negatives.^[69] Binary division follows a long division process analogous to decimal division, where the divisor is compared to portions of the dividend to determine quotient bits (0 or 1), with subtractions yielding remainders that are brought down with subsequent bits.^[70] The algorithm proceeds bit by bit: if the current dividend portion is at least the divisor, subtract it and set the quotient bit to 1; otherwise, set it to 0 and append the next bit. The final remainder is less than the divisor.^[71] An example is 1011₂ (11₁₀) divided by 10₂ (2₁₀). Compare the first two bits 10₂ with 10₂: subtract to get 0, quotient bit 1. Bring down 1 (01₂): compare with 10₂ (too small), quotient bit 0. Bring down 1 (011₂): subtract 10₂ from 10₂ (after implicit 0), get 1, quotient bit 1. The quotient is 101₂ (5₁₀) with remainder 1₂.^[70] For signed division in two's complement, the process treats numbers as unsigned after sign adjustment, but the quotient and remainder must account for signs (e.g., dividing negatives yields a negative quotient).^[68] Hardware implementations use subtractors and comparators in a loop, similar to multiplication's iterative nature.^[71]

Advanced operations

Bitwise operations

Bitwise operations are fundamental manipulations performed directly on the binary representations of numbers, treating them as sequences of bits rather than numerical values. These operations include logical gates applied bit by bit and bit shifts, which are essential in low-level programming, hardware design, and efficient data processing. Unlike arithmetic operations, bitwise operations do not involve carry propagation or borrowing between bits; each bit is processed independently.^[72] The bitwise AND operation (&) returns 1 in a bit position only if both corresponding bits from the operands are 1; otherwise, it returns 0. This is useful for masking, where specific bits are isolated by ANDing with a pattern that has 1s only in the desired positions. For example, performing AND on 1010 (binary 10) and 1100 (binary 12) yields 1000 (binary 8), as the result takes 1 only where both inputs have 1. The truth table for AND is:

Input A	Input B	A AND B
0	0	0
0	1	0
1	0	0
1	1	1

^[72]^[73] The bitwise OR operation (|) returns 1 in a bit position if at least one of the corresponding bits is 1; it returns 0 only if both are 0. This operation sets bits in the result where either operand has a 1, commonly used to combine bit fields or enable flags. The truth table for OR is:

Input A	Input B	A OR B
0	0	0
0	1	1
1	0	1
1	1	1

^[72] The bitwise XOR operation (^) returns 1 if the corresponding bits differ (one is 0 and the other is 1) and 0 if they are the same. XOR is particularly useful for toggling bits, as applying it with a mask flips the targeted bits without affecting others; for instance, XOR with 0001 toggles the least significant bit. The truth table for XOR is:

Input A	Input B	A XOR B
0	0	0
0	1	1
1	0	1
1	1	0

^[72]^[73] The bitwise NOT operation (~) inverts all bits in the operand, changing 0 to 1 and 1 to 0; in practice, it often operates within a fixed bit width (e.g., 32 bits), leading to two's complement effects for signed numbers. The truth table for NOT (unary) is:

Input A	NOT A
0	1
1	0

For example, NOT on 1010 (assuming 4 bits) yields 0101.^[72] Bit shift operations move the bits of a binary number left or right by a specified number of positions. A logical left shift (<<) shifts bits toward the most significant bit (left), filling the least significant bits with 0s and effectively multiplying the value by 2 for each position shifted; for instance, 0001 << 2 equals 0100 (multiplying 1 by 4). A logical right shift (>>) shifts bits toward the least significant bit (right), filling the most significant bits with 0s and dividing by 2 per position, discarding overflow bits; however, for signed integers, an arithmetic right shift preserves the sign bit by filling with copies of the original most significant bit (1 for negative numbers), maintaining the sign during division-like operations.^[72]^[74] Applications of bitwise operations include bit masking for extracting or clearing specific bits, such as using AND with 0x0F to isolate the lowest 4 bits of a number, which is common in encoding schemes like SIB bytes in x86 instructions. Setting bits employs OR to merge values without altering existing ones, while XOR enables efficient toggling, as seen in parity checks or flag manipulations in software like the Linux kernel. These operations optimize performance in areas like graphics clipping and SIMD processing by avoiding branches.^[73]^[74]

Square roots

Computing the integer square root of a binary number involves finding the largest integer r such that r^2 \leq n, where n is the given binary integer, often using algorithms adapted to binary representation for efficiency in digital systems.^[75] Two common methods are binary search and digit-by-digit calculation, both leveraging the binary structure to avoid decimal conversions.^[76] The binary search method initializes a range from 0 to n (or a more efficient upper bound like $2^{\lceil \log_2 n / 2 \rceil}) and iteratively narrows it by testing the midpoint m. If m^2 \leq n, the search continues in the upper half; otherwise, in the lower half. This converges in O(\log n) steps, suitable for hardware or software implementations. For example, to compute \sqrt{11001_2} (25 in decimal), start with low = 0, high = 101_2 (5, since $2^{3} = 8 > \sqrt{25} \approx 5); midpoint 10_2 (2), $2^2 = 100_2 = 4 < 25, so low = 3 (11_2); next midpoint 100_2 (4), $16 < 25, low = 5 (101_2); next 101_2 (5), $25 = 25, so result 5 (101_2).^[76] The digit-by-digit calculation mimics long division but for square roots, grouping the binary digits into pairs from the least significant bit (padding with a leading zero if odd length) and building the root bit by bit from the most significant. For each pair, double the current root to form a divisor base, then test appending a 1 (testing if (2 × current_root + 1) ≤ current remainder shifted by the pair); if yes, subtract the square and set the bit to 1, else 0. This method processes two bits of input per output bit, efficient for fixed-width binary numbers.^[75] Consider the example \sqrt{01111001_2} (121 in decimal, padded to 8 bits for even pairs: 01 | 11 | 10 | 01). Start with first pair 01 (1 decimal); largest square ≤ 1 is 1 (1_2), subtract to remainder 0, root = 1_2. Bring down 11 (remainder 0 × 4 + 3 = 3); double root 10_2 (2), test append 1: (4 + 1) = 5 > 3, so bit 0, root = 10_2, remainder 3. Bring down 10 (remainder 3 × 4 + 2 = 14); double root 100_2 (4), test append 1: (8 + 1) = 9 ≤ 14, subtract 9, remainder 5, root = 101_2. Bring down 01 (remainder 5 × 4 + 1 = 21); double root 1010_2 (10), test append 1: (20 + 1) = 21 ≤ 21, subtract 21, remainder 0, root = 1011_2 (11 decimal).^[75] For non-perfect squares, these methods yield the floor result, the greatest integer less than or equal to the true square root, with the remainder indicating the fractional part (e.g., for n = 1010_2 (10 decimal), digit-by-digit gives root 11_2 (3), since $3^2 = 9 < 10 < 16 = 4^2, remainder 1).^[75] In fixed-precision binary systems, such as 32-bit integers, the algorithms are limited to the bit width; overflow in squaring or shifting can occur if not handled, and results truncate to the available bits, potentially losing accuracy for large n near the precision limit (e.g., \sqrt{2^{32}-1} \approx 2^{16}, but exact computation requires careful bit management).^[75]

Conversions

To and from decimal

To convert a binary integer to its decimal equivalent, each bit is multiplied by the corresponding power of 2 based on its position, starting from the rightmost bit as $2^0, and the results are summed.^[77] For example, the binary number 1101 represents $1 \times 2^3 + 1 \times 2^2 + 0 \times 2^1 + 1 \times 2^0 = 8 + 4 + 0 + 1 = 13 in decimal.^[78] This positional method leverages the binary place-value system, where each position to the left doubles the previous value.^[79] The reverse process, converting a decimal integer to binary, uses repeated division by 2, recording the remainders from least to most significant bit.^[80] Starting with 13, divide by 2 to get quotient 6 and remainder 1; then 6 by 2 yields 3 remainder 0; 3 by 2 yields 1 remainder 1; and 1 by 2 yields 0 remainder 1. Reading the remainders bottom-up gives 1101.^[81] This algorithm produces the binary representation directly and terminates when the quotient reaches zero.^[82] For binary fractions, the conversion to decimal extends the integer method by using negative powers of 2 to the right of the binary point.^[83] Consider 0.101 in binary: $1 \times 2^{-1} + 0 \times 2^{-2} + 1 \times 2^{-3} = 0.5 + 0 + 0.125 = 0.625 in decimal.^[84] To convert a decimal fraction to binary, repeatedly multiply the fractional part by 2, recording the integer part (0 or 1) as the next bit, until the fraction becomes zero or a desired precision is reached.^[58] For 0.625, multiply by 2 to get 1.25 (integer 1, fraction 0.25); 0.25 by 2 gives 0.5 (integer 0, fraction 0.5); 0.5 by 2 gives 1.0 (integer 1, fraction 0), yielding 0.101.^[85] Signed binary numbers in two's complement representation allow negative values, where the most significant bit (MSB) indicates the sign (0 for positive, 1 for negative).^[86] For a 4-bit two's complement number like 1101, the MSB is 1, so interpret it as negative: invert the bits to 0010, add 1 to get 0011 (which is 3 in decimal), and negate to -3.^[33] Positive numbers convert directly using the standard method; for example, 0011 is $1 \times 2^1 + 1 \times 2^0 = 3.^[87] Two's complement fractions follow similar sign extension rules but are typically handled in fixed- or floating-point formats. Both the binary-to-decimal summation and decimal-to-binary division algorithms have a time complexity of O(\log n), where n is the magnitude of the number, as the number of operations scales with the bit length \log_2 n.^[88] This efficiency makes them suitable for computational implementations.^[81]

To and from hexadecimal and octal

Hexadecimal, or base-16, is a positional numeral system that employs 16 distinct symbols: the digits 0 through 9 for values 0 to 9, and the letters A through F for values 10 to 15, respectively.^[89] This system facilitates compact representation of binary data in computing, as each hexadecimal digit directly corresponds to a unique 4-bit binary sequence, since $2^4 = 16.^[30] To convert a binary number to hexadecimal, divide the binary digits into groups of four, starting from the least significant bit (rightmost) and moving left, adding leading zeros if necessary to complete the leftmost group. Each 4-bit group is then replaced by its equivalent hexadecimal digit, using the standard mapping: 0000=0, 0001=1, ..., 1001=9, 1010=A, 1011=B, 1100=C, 1101=D, 1110=E, 1111=F.^[90] For instance, the binary value 11111010 groups as 1111 and 1010, which convert to F (15 in decimal) and A (10 in decimal), yielding FA in hexadecimal.^[27] The reverse conversion—from hexadecimal to binary—involves expanding each hexadecimal digit back to its 4-bit binary equivalent, concatenating the results from left to right.^[91] Octal, or base-8, uses eight symbols: the digits 0 through 7, each representing values from 0 to 7 in decimal.^[92] Like hexadecimal, it aligns efficiently with binary, as each octal digit encodes exactly three bits, given that $2^3 = 8.^[90] Conversion from binary to octal proceeds by grouping the binary digits into sets of three from the right (least significant bit), padding with leading zeros if the leftmost group is incomplete, and substituting each group with its octal digit: 000=0, 001=1, 010=2, 011=3, 100=4, 101=5, 110=6, 111=7.^[93] To convert octal back to binary, each octal digit is replaced by its 3-bit binary representation and the bits are concatenated.^[92] A practical example illustrates both conversions: the 8-bit binary number 10110110 (182 in decimal) groups into four bits for hexadecimal as 1011 0110, equating to B (11) and 6, or B6 in hexadecimal. For octal, grouping into threes from the right requires padding one zero on the left to form 010 110 110, which converts to 2, 6, and 6, yielding 266 in octal.^[90]^[93] These conversions are direct and efficient due to the power-of-two alignment with binary (4 bits for hex, 3 for octal), resulting in shorter notations than binary strings while being simpler for programmers and systems to manipulate than decimal equivalents in computer applications like memory addressing and data encoding.^[94]^[95]

Real number representation

Fixed-point formats

Fixed-point formats represent real numbers in binary by allocating a predetermined number of bits to the integer and fractional parts, with the binary point fixed at a specific position within the bit string. This approach allows for the encoding of fractional values without variable scaling, making it suitable for hardware implementations where simplicity and predictability are prioritized. The total word length is divided into integer bits (to the left of the binary point) and fractional bits (to the right), often denoted in formats like Qm.n, where m is the number of integer bits (excluding the sign bit for signed representations) and n is the number of fractional bits. For example, an 8.8 format uses 8 bits for the integer part and 8 bits for the fractional part in a 16-bit word, typically employing two's complement for signed values.^[96]^[97] The actual numerical value is obtained by interpreting the entire bit string as an integer and applying a scaling factor of $2^{-n}, where n is the number of fractional bits; thus, the real value equals the integer interpretation divided by $2^n. This scaling ensures that each fractional bit position corresponds to a power of 1/2, similar to binary fractions but with a rigidly positioned point. For instance, the binary string 10101100 in a 4.4 fixed-point format (4 integer bits, 4 fractional bits) is first read as the integer 172 in decimal (binary 10101100 = 128 + 32 + 8 + 4 = 172), then scaled by dividing by $2^4 = 16, yielding 10.75; this breaks down as the integer part 1010₂ = 10 and the fractional part 1100₂ = 0.5 + 0.25 = 0.75. Signed representations, such as two's complement, extend this by using the most significant bit as the sign, allowing negative values while maintaining the fixed scaling.^[97]^[96] Arithmetic operations in fixed-point formats treat the numbers as scaled integers, requiring alignment of the binary points before performing standard integer addition, subtraction, multiplication, or division. For addition or subtraction, operands are sign-extended if necessary to match word lengths, and the results are computed as integers, with the binary point implicitly aligned; overflow can occur if the result exceeds the representable range, often mitigated by using guard bits or wider intermediate accumulators. Multiplication typically yields a result with doubled fractional bits (e.g., n fractional bits in each operand produce 2n in the product), necessitating right-shifting by n bits to restore the original scaling, while division involves similar adjustments but risks precision loss. These operations are efficient on hardware lacking dedicated floating-point units, as they leverage integer arithmetic pipelines.^[97] Fixed-point formats are widely used in resource-constrained environments such as embedded systems and digital signal processing (DSP), where they provide deterministic performance and lower power consumption compared to floating-point alternatives. In embedded microcontrollers, they enable cost-effective handling of sensor data or control algorithms without floating-point hardware, as seen in fixed-point implementations for real-time signal filtering on 16-bit processors. Similarly, in simple DSP applications like audio processing or motor control, fixed-point arithmetic supports multiply-accumulate operations essential for filters, offering predictable timing critical for embedded devices.^[98]^[99]

Floating-point systems

Floating-point systems in binary representation enable the encoding of a wide range of real numbers using a fixed number of bits, allowing for both very large magnitudes and high precision through an exponent that scales the significand. The predominant standard for this is IEEE 754, first established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE) to promote portability and consistency in floating-point computations across computing systems. This standard was revised in 2008 to incorporate decimal formats and additional features, and further updated in 2019 to address evolving needs in scientific computing, including better support for subnormal numbers and fused operations.^[100]^[101] The IEEE 754 binary formats include single-precision (32 bits) and double-precision (64 bits), which balance range and accuracy for most applications. In single-precision, the format allocates 1 bit for the sign, 8 bits for the biased exponent, and 23 bits for the mantissa (significand). The double-precision format uses 1 sign bit, 11 exponent bits, and 52 mantissa bits. The value is interpreted as (-1)^s \times (1 + f) \times 2^{e-127} for single-precision normalized numbers, where s is the sign bit, f is the fractional part of the mantissa (with an implicit leading 1 for normalization), and e is the biased exponent (ranging from 1 to 254, biased by +127). For double-precision, the bias is 1023, and the exponent range is 1 to 2046.^[102]^[103] This structure ensures normalized representation where the mantissa begins with 1.xxxx... in binary, maximizing precision within the bit constraints. A representative example is the decimal number 3.5, which in binary is 11.1_2 or $1.11_2 \times 2^1. For IEEE 754 single-precision encoding, the sign bit is 0 (positive), the unbiased exponent 1 is biased to 128 (10000000_2), and the mantissa is 11000000000000000000000_2 (23 bits, with the implicit 1 omitted). The full 32-bit representation is 01000000011000000000000000000000_2.^[104] This encoding allows 3.5 to be exactly representable, as its binary form fits within the mantissa without rounding. IEEE 754 defines special values to handle exceptional conditions: infinity (exponent all 1s, mantissa 0), signed zeros (exponent and mantissa 0), Not a Number (NaN; exponent all 1s, non-zero mantissa), and denormalized (subnormal) numbers (exponent 0, non-zero mantissa, no implicit leading 1). Infinities represent overflow results, such as division by zero, while NaNs indicate invalid operations like square root of a negative number; quiet NaNs propagate silently, and signaling NaNs trigger exceptions. Denormals extend the representable range toward zero, providing gradual underflow and avoiding abrupt precision loss.^[105]^[106] Precision in floating-point systems is inherently limited, leading to issues like rounding errors during arithmetic operations. The standard specifies four rounding modes for binary floating-point: round to nearest (ties to even, the default), round toward positive infinity, round toward negative infinity, and round toward zero. Machine epsilon, the smallest positive floating-point number \epsilon such that $1 + \epsilon > 1, quantifies relative precision and is approximately $2^{-23} \approx 1.192 \times 10^{-7} for single-precision and $2^{-52} \approx 2.220 \times 10^{-16} for double-precision. These modes and epsilon ensure predictable behavior, though they can introduce accumulation of errors in iterative computations.^[107]^[108]