Fact-checked by Grok 2 weeks ago

Arithmetic logic unit

The Arithmetic Logic Unit (ALU) is a core digital circuit within a computer's central processing unit (CPU) responsible for executing arithmetic operations, such as addition, subtraction, multiplication, and division, as well as logical operations, including AND, OR, NOT, and comparisons, on binary data represented as electrical signals in 0s and 1s.^[1]^[2]^[3] As a key element of the von Neumann computer architecture, the ALU processes inputs from CPU registers, performs the specified computations under control signals from the CPU's control unit, and outputs results back to registers or memory, enabling the fundamental data manipulation required for program execution.^[4]^[5]^[6] It represents the modern evolution of the central arithmetic component envisioned in early stored-program computer designs, primarily handling integer arithmetic and logical operations to support complex calculations in everything from simple embedded systems to high-performance processors, with floating-point operations typically managed by a separate floating-point unit (FPU).^[6]^[7]^[8] The ALU's design typically incorporates combinational logic gates and multiplexers to select between arithmetic and logical functions, with flags for status conditions like zero, carry, overflow, and sign to inform subsequent instructions.^[9]^[10] In multi-bit configurations, such as 32-bit or 64-bit ALUs common in contemporary CPUs, individual bit-slice units are interconnected via carry chains to enable efficient parallel processing of operands.^[11] This structure ensures high-speed operation, often pipelined in modern architectures to overlap instruction execution and boost overall system throughput.

Overview

Definition and Purpose

The arithmetic logic unit (ALU) is a combinational digital circuit designed to perform a variety of arithmetic and logical operations on binary inputs. It processes pairs of operands to execute functions such as addition, subtraction, bitwise AND, OR, and XOR, producing corresponding binary outputs without relying on sequential storage elements.^[12] This design ensures that the ALU responds instantaneously to input changes, making it a fundamental building block for data manipulation in digital systems.^[13] In central processing units (CPUs), the ALU serves as the primary execution core for arithmetic and logical instructions, handling the computational tasks essential to program execution. It enables the processor to perform basic data operations required by software, such as calculating sums or comparing values, thereby supporting the overall functionality of the computer system.^[14] As a critical component of the CPU, the ALU integrates with registers to fetch operands and store results, facilitating efficient instruction processing.^[7] The ALU occupies a central role in the von Neumann architecture, where the CPU is divided into distinct units: the ALU for computation, the control unit for instruction decoding and sequencing, and memory for storing both programs and data. This separation allows the ALU to focus solely on operand processing, receiving inputs from registers or memory via the control unit's orchestration, while outputs are routed back for further use or storage.^[4] Unlike the control unit, which manages flow without direct computation, or memory, which provides passive storage, the ALU actively transforms binary data to enable algorithmic execution.^[15] A textual representation of a basic ALU block diagram illustrates two operand inputs (A and B, typically n-bit wide), a function select input (a multi-bit control signal to choose the operation), and primary outputs consisting of the result (Y, n-bit) plus auxiliary status signals like carry-out or zero flag. This configuration positions the ALU as an interface between data sources in the CPU, ensuring operations align with instruction requirements while generating flags for conditional control.

Role in Computer Architecture

The arithmetic logic unit (ALU) is a core component of the central processing unit (CPU) datapath, where it receives input operands from the data read ports of the register file and delivers computation results back to the register file for storage.^[16] This integration enables efficient data flow within the processor, allowing operands to be fetched from general-purpose registers, processed by the ALU, and written back in a single cycle for basic operations.^[17] The ALU interacts closely with the control unit, which decodes fetched instructions and generates control signals to route the appropriate opcode to the ALU, thereby selecting the specific operation to perform.^[18] These signals direct the ALU's function selection mechanism, ensuring that the unit executes only the arithmetic or logical task mandated by the current instruction without unnecessary overhead.^[19] In the instruction cycle, the ALU plays a pivotal role during the execution stage, where it computes results for arithmetic and logical instructions using the decoded operands from prior stages.^[20] This stage involves the ALU applying operations such as addition or bitwise AND directly on register-sourced data, contributing to the overall throughput of instruction processing in the CPU.^[21] The scope of the ALU differs between RISC and CISC architectures; in RISC designs, it focuses on straightforward register-to-register operations to simplify hardware and enable pipelining, while in CISC, it accommodates more intricate instructions that can reference memory alongside registers. This distinction influences processor efficiency, with RISC emphasizing ALU simplicity for faster execution cycles.^[22]

Signals and Interfaces

Data Inputs and Outputs

The arithmetic logic unit (ALU) receives two primary data inputs known as operands, typically denoted as A and B, each consisting of n bits representing binary integers.^[23]^[24] These operands are loaded from processor registers or memory into the ALU's input ports, often via multiplexers that select the appropriate data paths based on the instruction being executed.^[23] The ALU processes these inputs to produce an output result, which is generally an n-bit value matching the operand width, though certain operations may extend it to n+1 bits to accommodate carry or overflow bits.^[24] Data bus widths in ALUs vary by processor architecture, commonly implemented as 8-bit, 16-bit, 32-bit, or 64-bit to align with the system's word size.^[23] Wider bus widths enable handling of larger numerical ranges and greater precision in computations, thereby increasing the ALU's processing capacity for complex applications, but they also demand more transistor resources and can introduce propagation delays in carry chains without optimized designs like carry-lookahead adders.^[24] For instance, a 64-bit ALU supports operands up to approximately 1.8 × 10^19 in unsigned magnitude, significantly expanding the scope of addressable memory and data manipulation compared to an 8-bit variant limited to 255.^[23] ALUs handle both signed and unsigned data representations, with the same hardware circuitry often supporting both through interpretive conventions rather than distinct paths.^[23] Unsigned operands treat all bits as magnitude, while signed ones use two's complement encoding, where the most significant bit indicates sign (0 for positive/zero, 1 for negative), allowing seamless extension for arithmetic without altering the core logic gates.^[24] In data flow, for example, multiplexers route register values (e.g., from a general-purpose register file) to the A and B inputs, ensuring operands are properly aligned and zero- or sign-extended if necessary before ALU processing.^[23] This setup permits opcode-driven selection of operations on the incoming data, integrating with broader datapath control.^[24]

Control Signals

The opcode functions as a multi-bit control input to the arithmetic logic unit (ALU), typically 3 to 4 bits wide, enabling the selection of specific operations from a set of 8 to 16 functions, such as addition (ADD), logical AND (AND), and shift right (SHR).^[25]^[26] This binary code is fed into the ALU's function selection mechanism, where a decoder interprets it to route the appropriate arithmetic or logical circuitry for execution on the input operands.^[16] Enable signals complement the opcode by gating the ALU's activity, activating processing only when asserted to prevent unnecessary computations and ensure proper timing in synchronous designs.^[24] In clocked architectures, these signals synchronize ALU operations with the system clock, latching inputs and outputs at rising or falling edges to maintain data integrity across pipeline stages.^[27] Without an enable signal, the ALU may default to a pass-through or hold state, conserving power in idle cycles.^[28] A representative example of opcode decoding appears in single-cycle processor designs, where the control signals derive from the instruction's primary opcode and, for register operations, the function code subfield. The following textual truth table illustrates a simplified 3-bit opcode mapping for common ALU functions in a MIPS-like architecture:

Opcode	Operation
000	ADD
001	SUBTRACT
010	AND
011	OR
100	SLT (set on less than)
101	NOR
110	SHIFT LEFT
111	SHIFT RIGHT

This mapping ensures efficient selection without overlap, with the ALU control unit refining the signal based on the full instruction decode.^[26] The CPU's control unit generates these opcode and enable signals during the instruction fetch and decode phases, extracting the opcode from the fetched machine instruction and mapping it to ALU-specific controls via a combinational logic decoder or microcode lookup.^[29]^[16] This process integrates with the broader datapath, asserting enables only for instructions requiring ALU involvement, such as arithmetic or branch comparisons.^[30]

Status Flags

Status flags are specialized output signals produced by the arithmetic logic unit (ALU) to reflect specific conditions arising from the execution of arithmetic or logical operations. These flags provide essential metadata about the result, such as whether it represents zero, a negative value, or an invalid signed magnitude due to overflow, allowing the processor to make informed decisions for control flow. Typically implemented as single-bit indicators, status flags are captured and stored in a dedicated flags register (also called the status register or processor status word) immediately following an ALU operation. This register serves as a repository for conditional information that instructions can query to implement branching, looping, and error handling in software.^[31]^[32] The most common status flags in ALU designs include the zero flag (Z), carry flag (C), sign flag (S or N), overflow flag (V or O), and parity flag (P). The zero flag is asserted (set to 1) when the ALU result is exactly zero, meaning every bit in the output is 0; otherwise, it is cleared to 0. This flag is generated by performing a logical NOR across all bits of the result or equivalently by detecting if the result equals zero through comparison circuitry. The carry flag is set to 1 if there is a carry-out from the most significant bit (MSB) position during an addition operation or a borrow-in to the MSB during subtraction, facilitating extended-precision arithmetic for unsigned numbers and multi-limb calculations.^[33]^[31] The sign flag reflects the sign of the ALU result in two's complement arithmetic and is simply set to the value of the MSB of the output: 1 for negative (MSB=1) and 0 for non-negative (MSB=0). The overflow flag indicates an arithmetic anomaly in signed operations, specifically when the result exceeds the representable range in two's complement notation, leading to an incorrect sign interpretation. It is generated by checking if the operands share the same sign but produce a result with the opposite sign; for addition of operands A and B yielding result Z, this is detected using the condition where both A and B are positive but Z is negative, or both are negative but Z is positive. In hardware, this is commonly implemented with the equation:

\text{overflow} = (A_{\text{sign}} \oplus B_{\text{sign}}) \land (A_{\text{sign}} \oplus Z_{\text{sign}})

where \oplus denotes XOR and \land denotes AND, with the sign bits extracted from the MSB of each value; this circuit ensures overflow is flagged only when the sign inputs match but differ from the output sign. The parity flag is set to 1 if the number of 1s in the least significant byte (or word) of the result is even, computed via a parity-checking circuit that folds the bits through XOR gates to yield the overall parity.^[31]^[34]^[35] These flags are latched into the flags register upon completion of an ALU operation, overwriting previous values unless preserved by specific instructions, and are subsequently read by conditional instructions (e.g., branch if zero) to direct program execution based on the computational outcome. This mechanism integrates seamlessly with the processor's control unit for efficient conditional processing without additional data transfers.^[33]^[32]

Operation

Circuit Functionality

The arithmetic logic unit (ALU) functions as a combinational digital circuit, primarily constructed from basic logic gates such as AND, OR, and XOR, along with multiplexers to route and select outputs from various sub-units.^[36] These components enable the ALU to process binary data without relying on memory elements for computation, ensuring that the output depends solely on the current inputs.^[37] In this design, the circuit evaluates operations instantaneously upon receiving stable inputs, making it ideal for high-speed processing within a processor's datapath.^[38] Data flow through the ALU begins with two operands, typically n-bit binary values from processor registers, entering the circuit alongside a control opcode that dictates the operation. The opcode drives a selection mechanism, such as a multiplexer array, which routes the operands to the appropriate sub-circuit—for instance, an adder for arithmetic tasks or a bitwise logic unit for operations like AND or XOR—before combining the results into a single output.^[39] The final result, along with generated status flags (e.g., zero, carry, or overflow indicators), then exits the ALU for storage in a destination register or further processing.^[40] This streamlined path supports parallel evaluation of potential operations while ensuring only the selected one propagates.^[41] In synchronous ALU implementations, common in pipelined processors, clocked latches capture and stabilize input operands at the rising edge of the clock signal, preventing glitches or timing violations during computation.^[42] This synchronization allows the combinational core to operate reliably within a fixed clock cycle, where the latch holds values steady until the next cycle, aligning ALU results with the broader system's timing requirements.^[43]

Function Selection Mechanism

The function selection mechanism in an arithmetic logic unit (ALU) enables the dynamic choice of operations by routing inputs to the appropriate functional blocks—such as arithmetic adders, logical gates, or shifters—and selecting their outputs based on control signals derived from the processor's opcode. This is commonly achieved through multiplexers (MUX) that integrate the outputs from multiple function units into a single result line, with selection lines controlled by the opcode to determine which unit's output is propagated. For instance, control signals from the control unit direct the MUX to choose between the adder output for arithmetic operations or the logic unit output for bitwise functions.^[16]^[44] Decoder circuits provide an alternative or complementary approach by converting the binary opcode into one-hot enable signals that activate specific function blocks while deactivating others, ensuring only the desired operation executes without interference. The decoder takes the opcode bits as input and generates distinct enable lines, each tied to a function unit; for example, in a design with a 2-bit opcode, a 2-to-4 decoder produces four enable outputs (e.g., E0 for AND, E1 for OR, E2 for ADD, E3 for SUB), which are fed to the select inputs of a 4:1 MUX or directly gate the outputs of the respective blocks before multiplexing. This setup minimizes power consumption and propagation delays by isolating inactive units.^[45]^[46] A practical example involves a 4-bit opcode that, through partial decoding or a 4-to-16 decoder, enables one of up to 16 paths, though often simplified to eight primary functions in basic ALUs; textually, the schematic features the opcode fed into a decoder generating enable signals (e.g., EN_ADD, EN_AND), which connect to tri-state buffers or MUX select lines, with all function block outputs converging on a final output MUX—such that for opcode 0010 (binary for ADD), EN_ADD asserts high, routing the adder's sum to the ALU result while others remain low. This decoder-MUX combination allows efficient scaling for wider opcodes in modern designs.^[47]^[45] For compound functions like addition with carry-in (e.g., ADC operations), the selection mechanism incorporates additional control bits to configure the arithmetic block via a sub-MUX that selects between a zero carry (for standard ADD) or an external carry-in signal, ensuring the function unit adapts without requiring separate hardware paths. This is typically handled by extending the opcode-derived controls to toggle the carry input MUX within the adder, maintaining compatibility with broader circuit flows where status flags influence subsequent selections.^[48]^[49]

Core Functions

Arithmetic Operations

The arithmetic operations in an arithmetic logic unit (ALU) primarily revolve around addition and subtraction, which form the foundation for more complex computations in digital systems. Addition is typically implemented using a chain of full adders, where each full adder processes one bit position along with a carry-in from the previous stage. The sum bit S_i for the i-th position is computed as S_i = A_i \oplus B_i \oplus C_i, where A_i and B_i are the input bits and C_i is the carry-in, while the carry-out C_{i+1} is generated as the majority function C_{i+1} = A_i B_i + A_i C_i + B_i C_i.^[50] This structure ensures binary addition with carry propagation across multiple bits.^[51] Two common implementations for multi-bit addition in ALUs are the ripple-carry adder and the carry-lookahead adder. In a ripple-carry adder, the carry-out from each full adder ripples sequentially to the next, resulting in a delay proportional to the number of bits, which can be a bottleneck for wide operands.^[52] To mitigate this, carry-lookahead adders generate carry signals in parallel using propagate (P_i = A_i \oplus B_i) and generate (G_i = A_i B_i) terms, allowing faster computation for widths beyond 16 bits by reducing the critical path delay.^[53] These designs are integral to ALU performance in processors, balancing speed and hardware complexity.^[54] Subtraction in an ALU is commonly achieved by leveraging the adder circuit through two's complement representation, where the operation A - B is performed as A + (\neg B + 1), with \neg B denoting the bitwise complement of B.^[55] This method reuses the existing adder hardware, often incorporating a control signal to invert the B input and set the initial carry-in to 1, enabling efficient subtraction without dedicated subtractor circuits.^[56] A dedicated subtractor may be used in some designs, but the two's complement approach predominates due to its simplicity and integration with addition.^[57] Increment and decrement operations are special cases of addition and subtraction, respectively, implemented by adding or subtracting 1 to/from the operand using the ALU's adder. For increment, one input is set to all zeros except the least significant bit (which is 1), while decrement uses the two's complement negation of 1 added to the operand. These operations are essential for address arithmetic and loop counters in processors, often optimized with minimal additional logic beyond the core adder.^[58] In basic ALUs, multiplication and division are not typically performed in a single cycle but can be supported through iterative use of the adder for repeated addition or subtraction. Multiplication, for instance, accumulates partial products by adding the multiplicand multiple times based on the multiplier bits, while division employs repeated subtraction to determine quotient and remainder.^[59] These methods highlight the ALU's role as a foundational building block, though dedicated multipliers and dividers are used in advanced designs for efficiency.^[60]

Logical Operations

The logical operations performed by an arithmetic logic unit (ALU) encompass bitwise boolean functions applied independently to each bit of the input operands, enabling efficient manipulation of binary data without numerical interpretation or carry propagation. These operations, including AND, OR, exclusive OR (XOR), and NOT, are realized through parallel arrays of basic logic gates, forming combinational circuits that process all bits simultaneously for high-speed execution. Unlike arithmetic functions, logical operations rely solely on gate-level logic, avoiding sequential dependencies to support applications in data processing and control flow.^[39] The bitwise AND operation generates an output bit that is true (1) only if both corresponding input bits are true, defined as \text{Output}_i = A_i \land B_i for each bit position i. This is implemented using a matrix of two-input AND gates, one per bit, where the inputs from operands A and B are directly fed into the gates without interconnection between bit positions. AND is widely used for masking in data manipulation, where ANDing with a mask value preserves selected bits while clearing others to zero, facilitating operations like extracting fields from registers.^[61]^[62] Similarly, the bitwise OR operation produces an output bit that is true if at least one of the corresponding input bits is true, expressed as \text{Output}_i = A_i \lor B_i, and is constructed via an array of OR gates applied in parallel across all bits. The XOR operation, yielding true when input bits differ (\text{Output}_i = A_i \oplus B_i), employs XOR gates and serves purposes such as detecting bit differences or computing parity in error-checking scenarios. The unary NOT operation inverts each bit of a single operand (\text{Output}_i = \neg A_i), implemented with NOT gates (inverters) for each position, often combined with other operations for complementation. These gate matrices ensure minimal propagation delay, as each output depends only on its local inputs.^[63] In practice, logical operations support bit toggling by XORing with a mask of ones in targeted positions, flipping those bits while leaving others unchanged, which is essential for flag manipulation and state updates in processor control. The selection of these functions occurs through control signals that multiplex the gate outputs, allowing the ALU to switch between logical modes as directed by instruction opcodes.^[61]^[63]

Shift and Rotate Operations

Shift operations in an ALU manipulate the positions of bits within a data word, enabling efficient multiplication or division by powers of 2 for unsigned integers, as well as alignment and extraction tasks. A logical shift left moves all bits to the left by a specified number of positions, filling the vacated least significant bits (LSBs) with zeros; this operation effectively multiplies the value by $2^n, where n is the shift amount. For example, shifting the 8-bit value 00000001 (1 in decimal) left by 3 positions yields 00001000 (8 in decimal). Conversely, a logical shift right moves bits to the right, filling the vacated most significant bits (MSBs) with zeros, which divides an unsigned value by $2^n.^[64] Arithmetic shifts preserve the sign of signed integers, differing from logical shifts primarily in the right-shift variant. An arithmetic shift right moves bits right by n positions but fills the vacated MSBs with copies of the original sign bit (0 for positive, 1 for negative), maintaining the number's sign and enabling signed division by $2^n. For instance, the 8-bit two's complement value 11111000 (-8 in decimal) arithmetically shifted right by 2 becomes 11111110 (-2 in decimal). Logical shifts, by contrast, do not preserve sign and are unsuitable for signed arithmetic. Left shifts are typically logical for both signed and unsigned representations, as sign extension is unnecessary.^[64]^[65] Rotate operations differ from shifts by wrapping bits around the ends of the word, preserving all bit values without loss or introduction of zeros. A rotate left shifts bits left, with the overflow bits from the MSB moving to the LSB positions; similarly, a rotate right shifts bits right, placing underflow bits from the LSB into the MSB. This is useful for circular data manipulation, such as in cryptography or bit-field rotations. For example, rotating 00001101 (13 in decimal) left by 3 positions produces 01101000 (104 in decimal), where the three MSBs wrap to the LSBs. Some ALUs support rotate through carry, incorporating the carry flag in the wrap-around.^[64]^[65] To handle variable shift amounts efficiently in a single clock cycle, ALUs often incorporate a barrel shifter, a combinational circuit built from cascaded 2:1 multiplexers arranged in logarithmic stages. Each stage shifts by a power-of-2 amount (e.g., 1, 2, 4 bits for a 32-bit word), selected by bits of the shift control input, allowing any shift from 0 to word size-1 with O(\log n) delay and O(n) hardware complexity. This design supports logical shifts, arithmetic shifts (via sign extension logic), and rotates by routing wrap-around paths through additional multiplexers. Barrel shifters are integral to modern processor ALUs, such as those in ARM architectures, for high-performance bit manipulation.^[66]^[67]

Design and Implementation

Hardware Components

The arithmetic logic unit (ALU) is constructed from fundamental digital components that enable both arithmetic and logical processing. Core elements include basic logic gates such as AND, OR, XOR, and NOT, which form the basis for bitwise operations. For arithmetic functions, half adders and full adders are essential; a half adder processes two input bits to produce a sum and carry output using an XOR gate for the sum and an AND gate for the carry, while a full adder extends this to three inputs (two bits plus a carry-in) by incorporating two half adders and an additional OR gate to combine carries. These adders are typically chained using parallel prefix schemes, such as carry-lookahead or Kogge-Stone adders, to handle multi-bit operations efficiently with logarithmic delay; simpler ripple-carry configurations are used in basic or low-power designs but limit performance due to linear carry propagation.^[68]^[24]) Multiplexers play a critical role in function selection, allowing the ALU to route inputs from different sub-units (such as the arithmetic or logic paths) to a common output based on control signals. Typically, a 4-to-1 multiplexer selects among addition/subtraction, logical AND/OR/XOR, or other operations for each bit position. This selection mechanism ensures efficient sharing of hardware resources across functions.^[68]^[69] The ALU is organized into key sub-units: the arithmetic unit, which relies on an adder/subtractor circuit (implemented by XORing one operand with a control signal to enable subtraction before feeding into the adder); the logic unit, comprising an array of parallel logic gates for bitwise AND, OR, XOR, and similar operations; and the shifter unit, which performs bit shifts and rotations using either a serial shifter (for simple, low-cost designs that shift one bit per cycle) or a barrel shifter (a multi-stage multiplexer array enabling arbitrary shifts in a single cycle). The barrel shifter, in particular, uses logarithmic stages of 2:1 multiplexers to achieve O(log n) delay for n-bit shifts.^[70]^[71]^[67] In VLSI design, these components influence power consumption and chip area significantly, as each gate and multiplexer translates to multiple transistors. For instance, a basic 4-bit ALU implementation may require around 100-200 logic gates, leading to hundreds of transistors depending on the technology node, with power scaling quadratically with transistor count in CMOS processes. Bit-slice design is commonly employed for scalability, where a single 1-bit ALU slice—containing a full adder, logic gates, and a multiplexer—is replicated n times for an n-bit ALU, with carry chains linking slices. This modular approach minimizes design complexity while optimizing for parallelism.^[58]^[72]^[68]

Combinational and Sequential Designs

Arithmetic logic units (ALUs) can be implemented using purely combinational logic, where the output depends solely on the current inputs without any memory elements. In such designs, the ALU consists of interconnected logic gates that directly compute arithmetic and logical operations, such as addition via a basic ripple-carry adder chain, though advanced designs employ carry-lookahead adders to reduce delays.^[73] The performance of combinational ALUs is limited by propagation delays through the circuit, particularly in the critical path of the carry chain during addition, where each bit's carry signal must propagate sequentially, resulting in delays proportional to the bit width (e.g., up to 33 gate delays for a 16-bit ripple-carry adder).^[73] This delay arises because the carry-out from one full adder serves as the carry-in for the next, creating a ripple effect that slows overall operation as operand size increases.^[58] To mitigate these timing constraints in high-performance systems, sequential ALU designs incorporate storage elements like flip-flops to register inputs and outputs, enabling synchronous, clocked operation. These flip-flops capture operand values at the clock edge, allowing the combinational core to process data in a controlled manner and facilitating pipelining, where multiple instructions overlap in execution stages.^[74] In pipelined ALUs, registers break the computation into stages (e.g., operand fetch, execution, result write-back), reducing the clock period to the longest stage delay rather than the full operation time, which has been a key enabler for clock frequencies exceeding 1 GHz in commercial microprocessors since the 1990s.^[74] The choice between combinational and sequential designs involves trade-offs in simplicity, latency, and throughput. Combinational ALUs offer lower latency for single operations due to the absence of clock overhead and simpler wiring, making them suitable for low-power or embedded applications where speed is not paramount.^[59] However, sequential designs achieve higher overall clock rates and better scalability in multi-stage pipelines, despite added area for flip-flops and potential throughput penalties from pipeline hazards, prioritizing them in modern general-purpose CPUs.^[74] Most practical ALUs adopt a hybrid approach, with a combinational logic core for computation surrounded by sequential elements such as input registers and output latches to interface with the processor's clock domain. This wrapper enables precise timing control and status flag latching in sequential contexts, balancing the strengths of both paradigms.^[74]

Integration in Processors

In basic central processing units (CPUs), the arithmetic logic unit (ALU) serves as the core computational component within the processor's datapath, positioned between the register file and the data memory unit to facilitate efficient operand fetching, processing, and result storage. Operands are typically loaded from the register file into the ALU for arithmetic or logical operations, with results either written back to the register file or forwarded to memory access stages. This integration ensures that the ALU handles integer computations directly as part of the instruction execution pipeline, minimizing data movement overhead in simple, single-issue designs. To mitigate data hazards and reduce latency in pipelined processors, bypassing (also known as forwarding) paths are incorporated around the ALU, allowing results from prior instructions to be routed directly to subsequent ALU inputs without waiting for register file writes. These paths connect the outputs of the execute stage—where the ALU resides—to the inputs of later pipeline stages or earlier instructions in flight, enabling operand availability as soon as computation completes and preventing stalls that could otherwise degrade performance by up to several cycles per hazard. Such mechanisms are essential in maintaining high instruction throughput, particularly when dependent operations chain together in the datapath. In more advanced superscalar processors, which issue multiple instructions per cycle to exploit instruction-level parallelism, ALUs are replicated and integrated as distinct execution units to handle concurrent operations, contrasting with the single ALU in basic CPUs. For instance, designs often include separate integer ALUs for general arithmetic and specialized units for shifts or multiplications, alongside floating-point units, allowing parallel execution of non-dependent instructions while the scheduler dispatches them based on resource availability. This multi-ALU approach significantly boosts throughput, as seen in early superscalar implementations like the MIPS R10000, which features two integer ALUs to support up to four instructions per cycle.^[75] Modern processors, such as those implementing the x86-64 architecture, incorporate 64-bit ALUs as integral parts of their integer execution pipelines, enabling wide data operations on 64-bit operands for enhanced performance in general computing tasks. Similarly, ARM-based cores in the Cortex-A series, like the Cortex-A78, employ multiple 64-bit ALUs within superscalar out-of-order execution engines to process integer instructions efficiently, often issuing up to four operations per cycle across dedicated units. These integrations reflect the evolution toward wider, parallel datapaths that balance latency reduction via forwarding with scalability for high-performance workloads.

Applications

General-Purpose Computing

In general-purpose computing, the arithmetic logic unit (ALU) serves as the core component for executing fundamental arithmetic and logical instructions within a central processing unit (CPU). For instance, the ADD instruction fetches two operands from registers, routes them to the ALU for summation, and stores the result back in a destination register, enabling basic computational tasks in assembly language programs. Similarly, the CMP instruction directs the ALU to perform a subtraction between operands without storing the result, instead updating status flags to indicate relational outcomes such as equality or greater-than, which are essential for decision-making in software. These operations form the backbone of instruction execution pipelines in processors like x86 and ARM architectures.^[76]^[77]^[78] The ALU plays a pivotal role in implementing control flow structures such as loops and conditionals by generating status flags that influence branching decisions. In a loop, an ALU operation like subtraction or comparison sets the zero flag if the result is zero, allowing a branch-on-zero instruction to terminate the loop when a counter reaches the end condition. For conditionals, flags such as negative or carry enable selective execution paths, where the CPU evaluates ALU outputs to redirect program flow, optimizing sequential processing in general-purpose tasks. This integration with status flags, as detailed in processor documentation, ensures efficient handling of if-else constructs without excessive overhead.^[79]^[80] To maintain high execution efficiency, modern CPUs employ operand forwarding mechanisms that bypass the register file, directly supplying ALU results from prior instructions as inputs to subsequent ALU operations. This technique resolves data hazards in pipelined designs by routing intermediate values—such as an addition result—straight to the next stage's ALU inputs, preventing pipeline stalls and preserving instruction throughput. In RISC-V processors, for example, ALU-to-ALU forwarding supports back-to-back execution of dependent integer instructions, reducing latency in general-purpose workloads.^[81] Performance in general-purpose computing is often quantified by throughput metrics like instructions per cycle (IPC), which measures the average number of instructions completed per clock cycle, reflecting ALU utilization and pipeline efficiency. In integer-dominated benchmarks, superscalar CPUs achieve IPC values ranging from 1 to 4, constrained by ALU bandwidth and dependency chains, with forwarding enhancements boosting effective throughput by 3-7% in out-of-order execution scenarios. These metrics underscore the ALU's impact on overall system responsiveness for everyday computational tasks.^[82]

Specialized Arithmetic Tasks

In multi-precision arithmetic, ALUs are often cascaded to support operations on big integers larger than the native word size, with carry signals chained between units to propagate results across multiple stages. For instance, a 128-bit addition can be achieved by linking four 32-bit ALUs, where the carry-out from one ALU serves as the carry-in to the next, enabling seamless handling of extended precision while minimizing hardware redundancy. This approach is essential in cryptographic applications and scientific computing, where operands exceed standard 64-bit limits, and carry chaining ensures correct propagation without intermediate storage overhead. Fixed-point arithmetic adapts the ALU for representing fractional values with a fixed binary point position, typically aligning operands by shifting to match the point before addition or subtraction.^[83] In addition, the operation proceeds as integer addition after alignment, with the result retaining the predefined binary point; scaling factors may be applied to prevent overflow by adjusting the point position in software or hardware.^[83] Subtraction follows similarly, using two's complement negation. Overflow detection is critical and occurs if a carry propagates across the binary point into the integer portion, potentially invalidating the fractional result.^[83] A basic fixed-point addition is expressed as:

\text{result} = A + B

where A and B are aligned fixed-point numbers, and overflow is flagged if the carry extends beyond the integer bits.^[83] Beyond basic operations, ALUs facilitate complex tasks like multiplication and division through algorithmic implementations relying on shifts and adds. Booth's multiplication algorithm, introduced in 1951, efficiently handles signed binary numbers by examining multiplier bits in pairs or triplets, replacing strings of ones with subtract-and-shift sequences to reduce add operations.^[84] This method uses the ALU's shifter and adder iteratively, achieving up to 50% fewer additions compared to standard shift-and-add for certain patterns.^[84] For division approximations, non-restoring algorithms employ successive shifts and conditional adds/subtracts to compute quotients without restoration steps, iterating through dividend bits while using the ALU for remainder updates.^[85] These techniques extend ALU utility for high-precision tasks in embedded systems and digital signal processing, where full hardware dividers are cost-prohibitive.^[86]

Use in Graphics and Vector Processing

In vector processing, arithmetic logic units (ALUs) are extended into multiple parallel lanes to handle packed data formats, enabling single instructions to perform operations across several elements simultaneously. For instance, Intel's Streaming SIMD Extensions (SSE) and Advanced Vector Extensions (AVX) in x86 processors incorporate vector ALUs that process 128-bit or 256-bit registers, allowing operations like four single-precision floating-point additions in a single instruction. These SIMD units replicate scalar ALU functionality across lanes, such as executing packed multiplies or adds on arrays of values, which accelerates data-parallel tasks in scientific computing and multimedia processing. AVX-512 further expands this to 512-bit vectors with up to 16 lanes for 32-bit floats, enhancing throughput for vectorized workloads.^[87] In graphics processing units (GPUs), ALUs are deployed in vast arrays within shader cores to support massively parallel computations for rendering pipelines. NVIDIA's GeForce GTX 280, for example, features 30 shader cores with 8 ALUs each, totaling 240 ALUs operating at 1.3 GHz to deliver 933 GFLOPS for fragment processing.^[88]^[89] Similarly, AMD's Radeon HD 4870 includes 10 shader cores with 80 ALUs per core, emphasizing SIMD widths of 64 threads to handle pixel and vertex operations concurrently. These ALUs execute instructions in SIMT (Single Instruction, Multiple Threads) fashion, where warps of 32 threads share control flow, allowing thousands of ALUs across the GPU to process independent fragments or vertices in parallel for high-resolution graphics.^[88] Graphics tasks heavily rely on these ALUs for core rendering operations, such as texture mapping and lighting calculations. In texture mapping, ALUs perform bit shifts for address calculations and multiplies for bilinear interpolation of texel values, blending samples from texture memory to apply surface details without fixed-function hardware dominance. Lighting computations, often in fragment shaders, use ALU multiplies for dot products in models like Phong shading and adds for accumulating contributions from multiple light sources, enabling realistic illumination effects across millions of pixels.^[90] The evolution of ALUs in GPUs has shifted from scalar designs to highly parallel architectures, driven by unified shader models in NVIDIA and AMD hardware. Early NVIDIA GPUs like the G80 (2006) introduced scalar ALUs in place of vector units, with 128 CUDA cores per chip enabling flexible execution of vertex and pixel shaders. By the Fermi architecture (2010), this expanded to 512 CUDA cores with dedicated integer and floating-point ALUs per core, supporting IEEE-compliant operations for broader parallelism. AMD followed a similar path, transitioning from scalar ALUs in the 2004 Xenos GPU (96 calculations per cycle) to VLIW-based unified shaders in the 2007 RV600 series, where clusters of five stream processors executed SIMD instructions for enhanced graphics throughput. Modern iterations, such as NVIDIA's Ampere with 108 streaming multiprocessors containing 64 FP32 cores each, scale to thousands of ALUs for vectorized graphics and compute, achieving teraflop-scale performance.^[91]^[92]^[93]

Historical Development

Early Concepts and Inventions

The origins of the arithmetic logic unit (ALU) trace back to mechanical precursors designed for automated arithmetic computations. In the 1820s, English mathematician Charles Babbage conceived the Difference Engine, a mechanical device intended to calculate mathematical tables through the method of finite differences, relying exclusively on repeated addition operations to avoid the complexities of multiplication and division in mechanical form.^[94] This machine represented an early effort to mechanize arithmetic, using gears and levers to perform sequential additions that could generate polynomial values up to the seventh degree with 31-digit precision, though it was never fully constructed during Babbage's lifetime.^[95] Babbage's work laid conceptual groundwork for separating computational mechanisms from human calculation, influencing later designs by emphasizing modular arithmetic hardware.^[96] The transition to electronic arithmetic began in the 1940s with the development of vacuum-tube-based computers, marking the first practical implementations of electronic adders and logic units. The ENIAC (Electronic Numerical Integrator and Computer), completed in 1945 and designed by J. Presper Eckert and John Mauchly at the University of Pennsylvania, was the first general-purpose electronic digital computer and featured 20 accumulators that performed addition and subtraction using approximately 18,000 vacuum tubes for high-speed arithmetic operations.^[97] These accumulators functioned as early arithmetic units, capable of adding 10-digit numbers in a few thousandths of a second, though reconfiguration for different operations required manual rewiring, highlighting the limitations of pre-programmable designs.^[98] Eckert's engineering contributions were pivotal in scaling vacuum-tube technology for reliable arithmetic, enabling ENIAC to compute artillery firing tables for the U.S. Army.^[97] A seminal advancement in ALU conceptualization came from John von Neumann's 1945 proposal for the EDVAC (Electronic Discrete Variable Automatic Computer), which formalized the idea of a dedicated central arithmetic unit within a stored-program architecture. In his "First Draft of a Report on the EDVAC," von Neumann outlined a single arithmetic organ to handle all basic operations—addition, subtraction, multiplication, and division—using binary representation and vacuum tubes, integrated with a control unit for sequencing.^[99] This design shifted arithmetic hardware from ad-hoc accumulators to a unified, programmable module, emphasizing efficiency through serial processing and a 32-bit word length.^[99] Von Neumann's framework, developed during meetings at the Moore School of Electrical Engineering, influenced subsequent machines by prioritizing a centralized ALU for both arithmetic and logical tasks.^[100] By the early 1950s, these ideas materialized in commercial systems like the UNIVAC I, delivered in 1951 by Eckert and Mauchly's company to the U.S. Census Bureau. The UNIVAC I incorporated a decimal arithmetic unit using binary-coded decimal representation and vacuum tubes for floating-point operations on 72-bit words (including sign, exponent, and mantissa), supporting addition, subtraction, multiplication, and division at speeds on the order of hundreds of microseconds per addition.^[101] This unit drew on EDVAC principles while adapting to decimal formats for business applications, featuring magnetic tape input with error checking, and represented the first commercially available electronic computer with an integrated ALU-like component for general arithmetic tasks.^[102] Eckert's role extended from ENIAC to UNIVAC, where he refined vacuum-tube circuitry for more reliable and versatile arithmetic processing.^[103]

Evolution in Modern Processors

The evolution of the arithmetic logic unit (ALU) in modern processors began in the 1970s with the advent of integrated circuit microprocessors, marking a shift from discrete components to on-chip integration. The Intel 4004, released in 1971, featured the first commercially available 4-bit ALU on a single chip, capable of performing basic arithmetic operations such as addition and subtraction, as well as logical operations including AND, OR, and NOT.^[104] This ALU processed 4-bit data words and was integral to the 4004's role as a complete central processing unit (CPU), enabling programmable computations for applications like calculators.^[105] Its design laid the foundation for scaling ALU complexity within microprocessors, transitioning from specialized devices to general-purpose computing elements. By the 1980s and 1990s, ALU designs advanced to support wider data paths and faster arithmetic, driven by the demand for personal computing. The Intel 8086, introduced in 1978, incorporated a 16-bit ALU that handled integer arithmetic and logic operations using a Manchester carry chain for efficient carry propagation, reducing delays compared to simple ripple-carry adders.^[106] This architecture supported a 16-bit data bus and was pivotal in the x86 family, enabling broader memory addressing through a 20-bit bus. In the mid-1990s, the Intel Pentium processor extended this to a 32-bit ALU, employing a parallel prefix adder—a variant of carry-lookahead logic—to generate carries more rapidly across bits, significantly improving addition and subtraction speeds for multimedia and scientific workloads.^[107] These enhancements allowed ALUs to operate at clock speeds exceeding 100 MHz, facilitating the rise of 32-bit operating systems and applications. Entering the 2000s, ALU evolution incorporated parallelism through single instruction, multiple data (SIMD) extensions, augmenting scalar ALUs with vector processing capabilities to handle multimedia and data-intensive tasks efficiently. Intel's MMX instructions, introduced in 1996 with the Pentium processor, repurposed floating-point registers for 64-bit integer SIMD operations, effectively extending the ALU for parallel arithmetic on packed data.^[108] Subsequent extensions like SSE (1999) and AVX (2011) further widened vector widths to 128 bits and 256 bits, respectively, integrating dedicated vector ALUs that performed simultaneous operations on multiple elements, such as fused multiply-add for graphics and signal processing.^[108] Concurrently, out-of-order execution in processors like the Intel Core series, starting from 2006, utilized multiple ALUs per core—typically 3-4 integer execution units—to dispatch and complete independent arithmetic instructions dynamically, maximizing throughput by reordering operations as dependencies resolved.^[109] This combination enabled modern CPUs to sustain several ALU operations per cycle, with metrics showing up to four integer additions dispatched across ports in high-performance cores.^[110] In recent years up to 2025, ALU designs in AI accelerators have shifted toward custom operations optimized for machine learning, diverging from general-purpose integer ALUs. Google's Tensor Processing Units (TPUs), evolving through versions like TPU v5e and v5p in 2023, incorporate specialized arithmetic units in systolic arrays for high-throughput matrix multiplications and activations, supporting bfloat16 and int8 formats with custom fused operations that provide up to 2.5x higher throughput per dollar compared to TPU v4.^[111] In 2025, Google introduced the Ironwood TPU, its seventh-generation model, offering more than 4x performance improvement over predecessors for AI inference tasks.^[112] These units handle tensor arithmetic natively, bypassing traditional ALU limitations for scalar processing, and integrate with multi-core architectures for scalable AI training.^[113] Similarly, neural processing units (NPUs) in processors like Intel's Lunar Lake (2024) feature dedicated ALUs for low-precision vector operations, enabling on-device AI acceleration while maintaining compatibility with x86 SIMD extensions.^[108]

References

[1]
What is an arithmetic logic unit (ALU) and how does it work?
May 16, 2025 · An arithmetic logic unit (ALU) is a part of a central processing unit (CPU) that carries out arithmetic and logic operations.
[2]
What Is an Arithmetic Logic Unit (ALU)? 7 Key Components
Apr 24, 2023 · ALU is a circuit in the CPU which performs mathematical and logical operations using electrical signals in 0s and 1s.
[3]
ALU Functions and Bus Organization - GeeksforGeeks
Oct 13, 2025 · Arithmetic Logic Unit (ALU) The ALU is a digital circuit within the CPU that performs all arithmetic and logical operations. It takes input ...
[4]
5.2. The von Neumann Architecture - Dive Into Systems
The processing unit of the von Neumann machine consists of two parts. The first is the arithmetic/logic unit (ALU), which performs mathematical operations such ...
[5]
[PDF] Von Neumann Computers 1 Introduction - Purdue Engineering
Jan 30, 1998 · The ALU combines and transforms data using arithmetic operations, such as addition, subtraction, multiplication, and division, and logical.
[6]
[PDF] ARCHITECTURE BASICS - Milwaukee School of Engineering
The arithmetic and logic unit is the modern version of von Neumann's central arithmetic part. The ALU completes arithmetic and bitwise-logic operations on two n ...
[7]
Arithmetic Logic Unit: Functions & Operations - StudySmarter
Aug 7, 2023 · The Arithmetic Logic Unit (ALU) is a crucial component of a computer's central processing unit (CPU), responsible for performing mathematical ...Arithmetic Logic Unit Explained · Arithmetic Logic Unit Functions<|control11|><|separator|>
[8]
Arithmetic Logic Unit in Digital Electronics - Tutorials Point
ALU is basically a combination logic circuit that can perform arithmetic and logical operation on digital data (data in binary format).
[9]
Arithmetic Logic Unit | ALU Definition, Function & Operation - Lesson
The function of an ALU is to take binary inputs, execute the operation, and create, store, and distribute binary output. ALUs make arithmetic and logical ...What is ALU? · What does the ALU do? · Function of ALU · Applications of ALU
[10]
https://study.com/academy/lesson/arithmetic-logic-unit-alu-definition-design-function.html
[11]
CS 240 Lab 4
An Arithmetic Logic Unit (ALU) is a combinational circuit used to perform all the arithmetic and logical operations for a computer processor, and can be built ...
[12]
How The Computer Works: The CPU and Memory
The arithmetic/logic unit (ALU) contains the electronic circuitry that executes all arithmetic and logical operations. The arithmetic/logic unit can perform ...
[13]
Design and Simulation of Arithmetic Logic Unit (Theory)
Arithmetic Logic Unit (ALU) is a critical component of a microprocessor and is the core component of central processing unit. Fig.1 Central Processing Unit (CPU).Missing: definition | Show results with:definition
[14]
Components of the CPU - Dr. Mike Murphy
Mar 29, 2022 · The Arithmetic Logic Unit (ALU) is responsible for performing basic calculations, implementing the CA part of the von Neumann Architecture.
[15]
Von Neumann Architecture - Kalamazoo College
Arithmetic Logic Unit (ALU); Set of registers; Control unit that reads in the next instruction, parses it, and controls what happens in the ALU and registers.
[16]
Organization of Computer Systems: Processor & Datapath - UF CISE
Result from ALU is applied as an address to the data memory. Data retrieved from the memory unit is written into the register file, where the register index is ...<|control11|><|separator|>
[17]
Datapath – Clayton Cafiero - University of Vermont
Oct 15, 2025 · The arithmetic logic unit (ALU) performs: ... The ALU receives its operands from the register file, performs the computation, and returns the ...
[18]
[PDF] 16.1 / micro-operations 577
A control signal from the control unit tem- porarily opens the gate to let data pass. • ALU: The control unit controls the operation of the ALU by a set of ...
[19]
[PDF] Datapath and Control (Chapter 4) - Auburn University
Control: Datapath for each step is set up by control signals that set up dataflow directions on communication buses and select ALU and memory functions. ...Missing: interaction | Show results with:interaction
[20]
5.6. The Processor's Execution of Program Instructions
After the Decode stage determines the operation to perform and the operand sources, the ALU performs the operation in the next stage, the Execution stage. The ...
[21]
CS3130: Processors
Fetch, reads an instruction from memory; Decode, retrieves values from registers; Execute, uses the ALU; Memory, reads from or writes to one address of memory ...
[22]
[PDF] INSTRUCTION SETS - Milwaukee School of Engineering
CISC machines have these general characteristics: • The most important identifying characteristic is arithmetic instructions that can access both ALU registers ...Missing: versus | Show results with:versus
[23]
[PDF] Performs arithmetic and logic operations on
ALU Inputs and Outputs. Integer Representation. • We have the smallest possible alphabet: the symbols 0 & 1 represent everything. • No minus sign. • No period.
[24]
[PDF] The Arithmetic/Logic Unit - UCSB ECE
Computer Architecture, The Arithmetic/Logic Unit. Slide 47. An ALU for. MiniMIPS. Figure 10.19 A multifunction ALU with 8 control signals (2 for function class ...
[25]
Chapter 5: The Processor: Datapath and Control
No readable text found in the HTML.<|separator|>
[26]
[PDF] Control Logic for the Single-Cycle CPU
ALU control input lw. 00 load word xxxxxx add. 010 sw. 00 store word xxxxxx add. 010 beq. 01 branch eq xxxxxx subtract 110. R-type. 10 add. 100000.
[27]
[PDF] Control Overview - cs.wisc.edu
ALU-ctrl = f(opcode, function). Page 3. CS/ECE 552 Lecture Notes: Chapter 5. 5 ... Control Signals Needed (Fig. 5.19). PC. Instruction memory. Read address.
[28]
[PDF] CS/ECE 250: Computer Architecture Basics of Logic Design: ALU ...
Sep 10, 2020 · CS/ECE 250: Computer Architecture. Basics of Logic Design: ALU ... − If no enable signal shown, implies always enabled. • Get output ...
[29]
Control Signal - CS2100 - NUS Computing
The control signals are generated based on the instruction to be executed. This can be found by looking at the opcode field.
[30]
13.1 Annotated Slides | Computation Structures
The ALUFN control signals tell the ALU what operation to perform. These control signals are determined by control logic from the 6-bit opcode field. For ...<|separator|>
[31]
Status Register - an overview | ScienceDirect Topics
Overflow Flag (OF) Set if the result of the instruction overflowed. Parity Flag (PF) Set if the result has an even number of bits set. Carry Flag (CF) Used for ...
[32]
[PDF] CHAPTER SIXTEEN - Intel 80x86 Base Architecture
The flags of the 80x86 processor are divided into two categories: control flags and status flags. The control flags are modified by the software to change ...
[33]
Condition Codes 1: Condition Flags and Codes - Arm Developer
Sep 11, 2013 · C : Carry (or Unsigned Overflow) The C flag is set if the result of an unsigned operation overflows the 32-bit result register. This bit can be ...Missing: ALU | Show results with:ALU
[34]
Two's Complement Overflow Rules
The rules for detecting overflow in a two's complement sum are simple: If the sum of two positive numbers yields a negative result, the sum has overflowed.
[35]
Calculating the Overflow Flag in an ALU
Dec 30, 2015 · Overflow occurs if the addition of two positive numbers gives a negative number and if the addition of two negative numbers gives a positive.
[36]
Implementation of an ALU using modified carry select adder for low ...
In digital computer, an Arithmetic logic unit (ALU) is a powerful combinational circuit that executes arithmetic and logical functions.
[37]
An Optimization Design Approach for Arithmetic Logic Unit
An arithmetic logic unit (ALU) is a combination of various digital circuits merged together to execute data processing instruction (i.e. arithmetic ...
[38]
[PDF] 6.2: Sequential Circuits - cs.Princeton
Arithmetic Logic Unit. TOY ALU. □. Big combinational circuit. □. 16-bit bus. □. Add, subtract, and, xor, shift left, shift right, copy input 2. ALU select. 16.<|control11|><|separator|>
[39]
[PDF] Building an ALU (Part 1): - Publish
These generally control how data flows and what operations are performed. ▫ E.g., the SUB signal. ... Selecting the desired logical operation. ▫We need a ...
[40]
https://www.csc.villanova.edu/~mdamian/Past/csc2400fa13/assign/ALU.html
[41]
[PDF] 8-Bit Arithmetic Logic Unit (ALU) PURPOSE - University of Florida
Oct 8, 2004 · The register should be synchronously loadable from its 8-bit input d, on the rising edge of the clock. Simulate the register design using a ...
[42]
[PDF] CS240 Laboratory 4 ALU and Sequential/Memory Circuits Arithmetic ...
To synchronize when the latch changes state, add a clock input: In a clocked latch, whenever the clock is high, the outputs/state of the latch can change.
[43]
[PDF] 410 Lab Assignment
In this project, the register file and ALU/shifter will be synchronized by two non- overlapping clock phases that determine when data is latched into the ...Missing: synchronous clocked stabilization
[44]
[PDF] Exceptions and Interrupts
Check for invalid opcode/function field values. ○. ALU modified to detect overflow. ○. Exception handling address input to PC multiplexer. ○. Control signals ...
[45]
[PDF] ECE 250 / CS 250 Introduction to Computer Architecture Exceptions ...
▫ Detect unknown opcode. • Arithmetic overflow. ▫ Add logic in the ALU to detect overflow. ▫ Provide overflow signal as ALU output. • Unaligned access. ▫ Add ...
[46]
[PDF] Arithmetic And Logic Unit Alu
Based on control signals from the control unit, multiplexers choose between different inputs and direct the output accordingly. This selection mechanism allows ...
[47]
[PDF] design and implementation of an alu using a decoder for operation ...
Abstract. This paper presents the design of an Arithmetic Logic Unit (ALU) using a decoder to select the operation based on the control signals.
[48]
[PDF] Lecture 3 Control Unit, ALU, and Memory
Three lines are used to select the ALU's function via a decoder (which is not bit sliced of course). Notice that all the logic is combinatorial. The speed of ...
[49]
[PDF] Design of 4-bit Arithmetic and Logic Unit - GW Engineering
have extra control circuits or components to control the data flow so that the correct operation of the circuit is always guaranteed. The core of the module ...
[50]
[PDF] CS240 Laboratory 4 ALU and Sequential/Memory Circuits Arithmetic ...
Invert A is used to complement the input A. Negate B/Carry in used to complement input B for logical operations, and as a carry-in when addition is performed.
[51]
[PDF] The Multiplexor Different Implementations Building a 32 bit ALU
• A Ripple carry ALU. • Two bits decide operation. – Add/Sub. – AND. – OR. – LESS. • 1 bit decide add/sub operation. • A carry in bit. • Bit 31 generates ...
[52]
[PDF] ARITHMETIC COMBINATIONAL MODULES AND NETWORKS
SPECIFICATION OF ADDER MODULES FOR POSITIVE INTEGERS. • HALF-ADDER AND FULL-ADDER MODULES. • CARRY-RIPPLE AND CARRY-LOOKAHEAD ADDER MODULES.
[53]
[PDF] Basic Arithmetic and the ALU - cs.wisc.edu
• Add/Sub ALU. • full adder, ripple carry, subtraction, together. • Carry-Lookahead addition, etc. • Logical operations. • and, or, xor, nor, shifts - barrel ...
[54]
[PDF] CS 140 Lecture 6
The carry-lookahead adder has 4-bit blocks. Assume that each two-input gate delay is 100 ps and the full adder delay is 300 ps. Adder Delay ...
[55]
[PDF] Lecture 8:
Digital Design and Computer Architecture: ARM® Edition © 2015. • Carry in ... ALU with Status Flags: Negative. N = 1 if: Result is negative. So, N is ...
[56]
[PDF] Chapter 5: Adders
Sep 11, 2024 · An N-bit carry-lookahead adder is generally much faster than a ripple-carry adder for N > 16 ... ALU with Status Flags: Carry. C = 1 if: Cout ...
[57]
[PDF] Subtraction: Addition's Tricky Pal A 16-bit ALU This Unit
How to subtract using an adder? • sub A, B = add A, -B. • Negate B before adding (fast negation trick: –B = ...
[58]
[PDF] Lecture 4 Arithmetic-Logic Unit - Semantic Scholar
Addition and Subtraction. ❑ Normal binary addition circuitry. ❑ Take two's complement of subtrahend and add to minuend. i.e. a - b = a + (-b). ❑ Need only ...
[59]
[PDF] Constructing a Basic Arithmetic Logic Unit
The arithmetic logic unit (ALU) is the brawn of the computer, the device that per- forms the arithmetic operations like addition and subtraction or logical ...Missing: architecture | Show results with:architecture
[60]
[PDF] Arithmetic logic UNIT (ALU) design using reconfigurable CMOS logic
The ALU can perform four arithmetic and four logical operations. Multi-input floating gate (MIFG) transistors have been promising in realizing increased ...
[61]
[PDF] Arithmetic-Logic Unit (ALU) - Computation Structures Group - MIT
Feb 10, 2012 · 6.S078 - Computer Architecture: A Constructive Approach ... function Bit#(width) alu(Bit#(width) a,. Bit#(width) b, AluFunc op);. Bit ...Missing: definition | Show results with:definition
[62]
Binary Arithmetic
Jan 31, 2016 · multiplication is repeated addition; division is repeated subtraction. Binary addition. To add in binary, we just remember that we only have 0 ...
[63]
[PDF] Arithmetic And Logic In Computer Systems Mi Lu
Bitwise operations manipulate individual bits within binary representations, allowing efficient data processing and manipulation, such as masking, toggling ...
[64]
Logic Gates - Building an ALU
1 Introduction. The goal of this tutorial is to understand the basics of building complex circuit from simple AND, OR, NOT and XOR logical gates.Missing: combinational | Show results with:combinational
[65]
CS3410 Spring 2012 Lab 1 - Cornell: Computer Science
Feb 6, 2023 · The logical and (&), or (|), xor (^), nor, and complement (~) operators are all bit-wise. Don't duplicate components. Your ALU should use your ...
[66]
Shift Operation - an overview | ScienceDirect Topics
We refer to both shift and rotate generically as shift operations. ARM shift operations are LSL (logical shift left), LSR (logical shift right), ASR (arithmetic ...
[67]
2.3. Arithmetic Logic Unit - Intel
The ALU supports shift and rotate operations ... The ALU supports arithmetic shift right and logical shift right/left. The ALU supports rotate left/right.
[68]
Organization of Computer Systems: § 3: Computer Arithmetic
3.2. In order to implement a shifter external to the ALU, we consider the design of a barrel shifter, shown schematically in Figure 3.13. Here, the closed ...
[69]
[PDF] Design alternatives for barrel shifters - Princeton University
The logical right shifter can be extended to also perform shift right arithmetic and rotate right operations by adding additional multiplexors. This approach ...Missing: explanation | Show results with:explanation
[70]
22C:60 Notes, Chapter 8 - University of Iowa
We now have the parts we need to build a simple arithmetic logic unit that combines an adder-subtractor with three simple logic gates using a multiplexer.
[71]
[PDF] Combinational Circuits | CS 261 Fall 2017 - Computer Science - JMU
○ Combine adders and multiplexors to make arithmetic/logic units. ○ Combine flip-flops to make register files and memory. Basic Arithmetic Logic Unit (ALU).
[72]
Arithmetic Logic Unit for the BOMB!
The arithmetic logic unit performs six functions - add, subtract, logical AND, logical OR, shift left and shift right on 4-bit two's complement binary numbers.
[73]
[PDF] Arithmetic and ALU Design Shifts Rotations Barrel Shifter
Control algorithm: repeat 8 times (not 16!) • Based on 3b groups, add/subtract shifted/unshifted multiplicand. • Shift product/multiplier right by 2.Missing: operations | Show results with:operations
[74]
[PDF] Design of a 4 bit Arithmetic and Logical unit with Low Power and ...
Mar 30, 2021 · Abstract: In this presented work we designed the 4- bit. Arithmetic & Logical Unit (ALU) by using the different modules.
[75]
[PDF] Lecture 7: Arithmetic Logic Unit - UMBC
1-bit ALUs, using Carry In and. Carry Out lines. • Chain Carry Out from one adder to Carry In of next adder (a ripple carry adder). • Slow due to gate ...
[76]
[PDF] cG 2018 Bhargava Reddy Gopireddy - The i-acoma group at UIUC
Despite the slowdown caused, pipelined ALU designs have been employed in commercial microprocessors since Alpha 21064 to reach high frequencies. Therefore ...
[77]
[PDF] The Microarchitecture of Superscalar Processors - cs.wisc.edu
Aug 20, 1995 · Superscalar processors execute multiple instructions per clock cycle by exploiting instruction-level parallelism, fetching and decoding ...
[78]
[PDF] x86 Instruction Set
Assume 0x0201 is machine code for an ADD instruction of R2. = R0 + R1. • Control Logic will… – select the registers (R0 and R1). – tell the ALU to add. – select ...
[79]
Decision-Making in Assembly Language - UMBC
Comparing numbers. The cmp instruction has two operands: cmp reg/mem, reg/mem/constant. The computer will perform a subtraction of operand2 from operand1 ...
[80]
[PDF] Lecture 17: - ARM Assembly Language
• Hardware to decode and execute instructions kept simple, small, and fast ... • Method 1: Compare instruction: CMP. Example: CMP R5, R6. ▫ Performs: R5 ...
[81]
15. Inside a Modern CPU - University of Iowa
The ALU/MA stage, as digital logic Both a and c depend on whether the instruction is a memory reference instruction or an arithmetic instruction. Control input ...
[82]
Conditional Instructions – ECE353
Conditional instructions allow us to implement high level language constructs like if/else and for loops.<|separator|>
[83]
Ramping Up Open-Source RISC-V Cores: Assessing the Energy ...
Jul 4, 2025 · ALU-to-ALU Operand Forwarding: Lightweight operand forwarding was implemented for scenarios where two ALU instructions are issued in the ...
[84]
A Metric-Guided Method for Discovering Impactful Features and ...
IPC —Instructions Per Cycle. Useful performance metric for comparing two microarchitectures at iso-frequency. FLOPC —Arithmetic FLoating-point OPerations ...
[85]
Arithmetic, computer - ACM Digital Library
Fixed-point arithmetic is done essentially like ordinary binary arithmetic, except for the restriction that nega- tive numbers are generally stored in some ...
[86]
SIGNED BINARY MULTIPLICATION TECHNIQUE - Oxford Academic
A technique is described whereby binary numbers of either sign may be multiplied together by a uniform process which is independent of any foreknowledge of the ...
[87]
(PDF) An Algorithm for Non-Restoring Division - ResearchGate
An Algorithm for Non-Restoring Division. May 1977. Authors: Sugata Sanyal at University of Louisiana at Lafayette.
[88]
https://graphics.stanford.edu/~kayvonf/papers/fatahalianCACM.pdf
[89]
[PDF] Introduction to Intel® Advanced Vector Extensions - | HPC @ LLNL
May 23, 2011 · At the lowest programming level, most common x86 assemblers now support Intel® AVX, FMA,. AES, and the VPCLMULQDQ instructions, including ...
[90]
[PDF] A closer look at GPUs - Stanford Computer Graphics Laboratory
Processing Resources. A large fraction of a GPU's resources exist within programmable processing cores responsible for executing shader functions. While ...
[91]
[PDF] From Shader Code to a Teraflop: How Shader Cores Work
Part 1: throughput processing. • Three key concepts behind how modern. GPU processing cores run code. • Knowing these concepts will help you:.
[92]
[PDF] FermiTM - NVIDIA
Oct 4, 2009 · The Fermi architecture is the most significant leap forward in GPU architecture since the original G80. G80 was our initial vision of what a ...
[93]
[PDF] NVIDIA A100 Tensor Core GPU Architecture
The diversity of compute-intensive applications running in modern cloud data centers has driven the explosion of NVIDIA GPU-accelerated cloud computing.Missing: ALU evolution
[94]
AMD's Unified Shader GPU History | IEEE Computer Society
May 3, 2023 · The GPU's ALUs were 32-bit IEEE 754 floating-point compliant (with typical graphics simplifications of rounding modes), denormalized numbers ( ...
[95]
Difference Engines | National Museum of American History
In the early 1800s, the English mathematician Charles Babbage proposed a machine called a difference engine that would compute and print automatically a large ...Missing: arithmetic | Show results with:arithmetic
[96]
[PDF] Difference Engine - cs.Princeton
Jan 21, 2007 · Babbage's design could evaluate 7th order polynomials to 31 digits of accuracy. I set out to build a working. Difference Engine using standard ...
[97]
[PDF] Charles Babbage and his Inventions
For each cycle of the machine, it computes f(x+1) given f(x) as follows: Recall: Δf(x) = f(x+1) – f(x) or: f(x+1) = f(x) + Δf(x). So the machine adds the Δf(x) ...
[98]
ENIAC - Penn Engineering
Mauchly was very much the visionary of the ENIAC's use of mechanical and vacuum tube technologies. Eckert was the engineer of the project who solved its ...Missing: adders | Show results with:adders
[99]
J. Presper Eckert - eniac - National Inventors Hall of Fame®
Oct 31, 2025 · J. Presper Eckert was co-inventor of ENIAC, introduced to the public at the University of Pennsylvania in 1946.Missing: 1945 adders
[100]
[PDF] First draft report on the EDVAC by John von Neumann - MIT
Turing, "Proposals for Development in the Mathematics. Division of an Automatic Computing Engine (ACE)," presented to the National Physical Laboratory, 1945.
[101]
https://www.britannica.com/technology/UNIVAC
[102]
[PDF] History of Electronic Computers
▷ arithmetic unit: binary, floating point, word length 22-bits. (sign, 7-bit ... UNIVAC (1951). (taken over by Sperry. Rand Co; for several years was ...
[103]
History Of Computers 1937-2011
3 UNIVAC 1. The UNIVAC 1 (Universal Automatic Computer 1) was the first commercial computer produced in the United States. It was designed by J. Presper ...
[104]
The Surprising Story of the First Microprocessors - IEEE Spectrum
Aug 30, 2016 · The 4-bit 4004 (meaning that it manipulated data words that were only 4 bits wide) is often considered the first microprocessor.
[105]
Announcing a New Era of Integrated Electronics - Intel
Intel's 4004 microprocessor began as a contract project for Japanese calculator company Busicom. Intel repurchased the rights to the 4004 from Busicom.Missing: ALU | Show results with:ALU
[106]
Reverse-engineering the 8086's Arithmetic/Logic Unit from die photos
Aug 22, 2020 · In this blog post, I reverse-engineer the 8086's ALU and explain how it works. It's more complex than other vintage ALUs that I've studied.
[107]
Reverse-engineering a carry-lookahead adder in the Pentium
Jan 18, 2025 · The Pentium's adder implements the carry lookahead in a different way, called the "parallel prefix adder."7 The idea is to produce the propagate ...
[108]
Intel® Instruction Set Extensions Technology
Explains Instruction Set Extensions include SSE Streaming SIMD Extensions technologies, including SSE2, SSE3, SSE4, and AVX (Advanced Vector Extensions).
[109]
Manuals for Intel® 64 and IA-32 Architectures
### Summary of Out-of-Order Execution and Multiple Execution Units in Intel Core Processors
[110]
CPU Metrics Reference - Intel
This metric represents Core cycles fraction CPU dispatched uops on execution port 1 (ALU). Port 2. Metric Description. This metric represents Core cycles ...
[111]
Tensor Processing Units (TPUs) - Google Cloud
Google Cloud TPUs are custom-designed AI accelerators, which are optimized for training and inference of AI models.
[112]
https://blog.google/products/google-cloud/ironwood-tpu-age-of-inference/