Fact-checked by Grok 2 weeks ago

Load–store architecture

A load–store architecture is an in which arithmetic and logical operations are performed exclusively on data stored in , with access limited to dedicated load instructions that transfer data from to registers and store instructions that transfer data from registers to . This separation ensures that computational instructions do not directly reference addresses, distinguishing it from register-memory or memory-memory architectures where operations can involve operands directly. Load–store architectures form the foundation of Reduced Instruction Set Computing (RISC) designs, which prioritize simplicity and efficiency in instruction execution. Originating in the late 1970s and early 1980s through projects like the , Berkeley RISC, and Stanford , these architectures aimed to optimize pipelined processors by minimizing memory access complexity and enabling better compiler optimizations. Key characteristics include a large number of general-purpose registers (often 16 to 32) to hold operands and results, fixed-length instructions for uniform decoding, and the ability to schedule load/store operations in parallel with arithmetic instructions, which reduces overall execution latency. Prominent examples of load–store architectures include the family, widely used in mobile and embedded systems; , an open-source gaining traction in academia and industry; , influential in early RISC implementations; , developed by for servers; and PowerPC, employed in . These architectures typically account for 15–35% of instructions being loads or stores in typical programs, highlighting their role in balancing usage with interactions. The advantages of load–store designs lie in their support for high-performance pipelining and superscalar execution, as the restricted memory access simplifies hardware design, shortens clock cycles, and lowers memory traffic by encouraging register reuse. This approach has made them dominant in modern processors, particularly in energy-efficient and scalable systems, though they require sophisticated compilers to manage effectively.

Core Concepts

Definition and Principles

A load–store architecture, also known as a architecture, is an in which memory access is strictly separated from computational operations. In this model, data must first be loaded from memory into s using dedicated load instructions before any arithmetic or logical processing can occur, and results are subsequently stored back to memory via store instructions. All arithmetic and logical unit (ALU) operations are performed exclusively between s, prohibiting direct memory-to-memory or memory-operand computations. The core principles of load–store architectures emphasize this rigid separation to simplify instruction execution and enhance pipelining efficiency. Memory operations are confined to load and store instructions, ensuring that ALU operations remain register-bound and thus predictable in terms of timing and resource usage. This design mandates explicit data movement for all computations, fostering a uniform instruction format and reducing the complexity of decoding and execution stages. Unlike register-memory architectures, which allow ALU instructions to directly reference memory operands, load–store systems enforce a clear delineation that minimizes variable-latency memory accesses during computation. Central to this architecture is the register file, which serves as the primary hub for data manipulation and temporary during . The file consists of a fixed set of general-purpose registers that hold operands and results for ALU instructions. formats in load–store architectures are typically fixed-length and include fields for the , source and destination specifiers, and immediate values or offsets for addressing in load and store operations. This structure supports efficient encoding and decoding, with load/store instructions often using base-plus-displacement addressing to compute memory locations relative to a register value. To illustrate, consider a simple addition operation followed by storage: an ADD instruction might compute the sum of two registers and place it in a third, as in ADD R1, R2, R3 (where R1 = R2 + R3), and a subsequent STORE instruction would write the result to memory, such as STORE R1, [address]. Direct operations like adding two memory locations without intermediate registers are not permitted, requiring explicit loads beforehand. This pseudocode exemplifies the separation:
LOAD R2, [mem_addr1]   // Load first value into R2
LOAD R3, [mem_addr2]   // Load second value into R3
ADD R1, R2, R3         // Compute sum in R1 (register-register)
STORE R1, [result_addr] // Store result to memory

Register File and Operations

In load-store architectures, the register file serves as a small, high-speed array of general-purpose registers (GPRs) designed to hold operands and results for computations, minimizing memory access latency. Typically comprising 16 to 32 GPRs, each of fixed width such as 32 bits or 64 bits to match the processor's word size, the register file emphasizes rapid read and write operations. For instance, the DLX architecture features 32 32-bit GPRs, with R0 hardwired to zero and R31 often used for return addresses. In addition to GPRs, the register file includes specialized registers like the program counter (PC) for instruction addressing and status registers for flags such as overflow or zero conditions, though GPRs form the core for data manipulation. Operations on the register file are restricted to register-to-register computations, ensuring that , logical, and shift instructions operate exclusively on GPR operands without involvement. instructions include (e.g., ADD Rd, Rs1, Rs2), (SUB), and variants with immediate values (ADDI), all producing results in a destination . Logical operations encompass bitwise , XOR (e.g., AND Rd, Rs1, Rs2), and their immediate forms (ANDI), enabling efficient bit-level manipulation. Shift instructions, such as logical left shift (SLL) or right shift (SRA), support both and immediate shift amounts, facilitating data alignment and multiplication by powers of two. These operations leverage the (ALU) and are optimized for single-cycle execution in pipelined designs. Addressing modes in load-store architectures are intentionally simple to simplify hardware and enhance pipelining, limited primarily to register-indirect for memory accesses. Load instructions (e.g., LOAD Rd, [Rs + offset]) compute the effective address as the contents of a base register plus a sign-extended immediate offset, transferring data from memory to a destination register. Store instructions (e.g., STORE Rs, [Rd + offset]) similarly use the base-plus-offset mode to write register data to memory, without allowing direct memory operands in computational instructions. In ARM implementations, this extends to pre-indexed and post-indexed variants, where the base register is updated after address calculation, but complex modes like indirect through memory are avoided. This restriction contrasts with more elaborate modes in non-RISC designs, prioritizing predictable timing over flexibility. The data flow model in load-store architectures centers on the register file as an intermediary, enabling computations to bypass for reduced in pipelined execution. In a typical five-stage pipeline ( fetch, decode/ read, execute, access, write-back), operands are read from the register file during the decode , processed in the execute via the ALU, and written back to the register file in the write-back , with load/store alone accessing in the dedicated . This separation allows multiple register reads (often two or three ports) and writes (one or two ports) per cycle, with forwarding paths from execute or to resolve data hazards without stalling. By confining arithmetic and logical operations to registers, the model exploits temporal locality, keeping active data on-chip and streamlining dispatch.

Comparisons with Other Architectures

Versus Register-Memory Architectures

Register-memory architectures, such as the Intel x86, permit arithmetic and logical operations to access one operand directly, allowing instructions such as ADD R1, [memory_address] where one operand is in a and the other sourced from without explicit loading into a first. This contrasts with load-store architectures, which strictly separate access from computation by requiring data to be loaded into before any operations and stored back afterward. A primary difference lies in the number of instructions required for memory-involved operations: load-store architectures typically demand 3 (two loads and compute) for tasks like adding two memory values to a , whereas register-memory designs can accomplish the same in 2 (load and compute with memory ). This results in higher instruction counts and potentially lower code density in load-store systems, though it enhances execution efficiency by minimizing memory traffic during computation. In terms of instruction complexity, load-store architectures favor fixed-length, simple with limited addressing modes, simplifying decoding and enabling straightforward pipelining, while register-memory architectures often employ variable-length with richer addressing modes to support , increasing hardware complexity for instruction fetch and decode. For illustration, consider computing the sum of two memory locations into a register: Load-Store Pseudocode:
LOAD R1, mem1
LOAD R2, mem2
ADD R3, R1, R2
Register-Memory Pseudocode:
LOAD R3, mem1
ADD R3, [mem2]
This example highlights how load-store requires explicit data movement for both operands, leading to more instructions but allowing reuse of register values without repeated memory accesses. Hardware trade-offs further distinguish the approaches: load-store designs simplify the (ALU) and stages by confining operations to registers, reducing control logic complexity and enabling higher clock speeds, but they necessitate a larger to accommodate temporary values and mitigate the increased instruction count. Conversely, register-memory architectures demand more sophisticated hardware to handle memory operands inline, including additional addressing hardware and potential stalls from memory dependencies, though they can reduce overall register pressure by allowing direct memory use. These choices reflect a balance between code compactness and execution predictability, with load-store prioritizing the latter for modern pipelined processors.

Versus Stack-Based Architectures

Stack-based architectures utilize an operand stack as the primary mechanism for handling data, where arithmetic and logical operations implicitly pop the required s from the top of the stack (TOS), perform the , and push the result back onto the . This design contrasts with load–store architectures, which rely on a fixed set of explicit registers for s, separating access into dedicated load and instructions. Representative examples include the (JVM) , which employs a for operand management, and Hewlett-Packard calculators using (RPN), where user-entered values are pushed onto a for postfix evaluation. A fundamental difference lies in operand access and addressing: load–store architectures use named registers specified explicitly in instructions, enabling direct multi-operand operations without implicit positioning, whereas stack-based designs depend on the stack's depth for operand location, with the TOS serving as the implicit source and destination. In stack machines, there are no explicit operand fields in arithmetic instructions, relying instead on sequences that build the stack state through loads or pushes followed by operations that manipulate the TOS. This leads to instruction semantics based on postfix notation in stack architectures, such as a sequence of PUSH A; PUSH B; ADD that pops two values and pushes their sum, in contrast to the explicit form in load–store architectures like ADD R1, R2, R3, where registers R2 and R3 are added and stored in R1. Consider an example of accumulating a in a , such as the total of elements. In a load–store architecture, this can leverage indexing for : initialize a to zero, load each element into a temporary using an address , add it to the , and increment the index . In a stack-based architecture, the process requires careful stack management, such as duplicating the value (via ) before loading the next element, adding, and then handling storage, which can lead to deeper usage and more complex sequencing to avoid overwriting intermediate results. These differences have notable implications for . Load–store architectures facilitate straightforward during , as compilers can assign variables to specific registers based on lifetime , simplifying optimization for loops and data reuse. Stack-based architectures, however, are particularly amenable to compiling expression trees via postfix traversal, enabling simple, uniform for nested operations, but they complicate loop handling due to the need to track and manage depth to prevent spills or overflows.

Historical Development

Origins in Early RISC Projects

The Reduced Instruction Set Computer (RISC) philosophy emerged in the late 1970s and early 1980s as a deliberate reaction to the growing complexity of Complex Instruction Set Computer (CISC) architectures, which featured intricate instructions, multiple addressing modes, and microcode implementations that complicated hardware design and limited performance scaling. Proponents argued that simplifying the instruction set would reduce decoding overhead, enable deeper pipelining, and allow higher clock speeds by dedicating more transistors to execution rather than control logic. This shift emphasized a load-store architecture, where arithmetic and logical operations occur exclusively between registers, with memory access restricted to dedicated load and store instructions, thereby separating computation from data movement to streamline hardware implementation. Pioneering academic projects at the , and formalized these ideas into the first load-store designs. At , David Patterson initiated the RISC-I project in 1980, aiming to create a VLSI-compatible with a minimal instruction set focused on operations. The design incorporated 31 general-purpose registers (plus a ) and restricted memory interactions to load and store instructions, reflecting empirical studies of program behavior that revealed approximately 80% of executed instructions involved register-to-register operations without memory access. Concurrently, at Stanford, John Hennessy launched the project in 1981, developing a load-store architecture with 32 registers and an emphasis on pipelined execution to achieve single-cycle throughput for most instructions. These efforts were motivated by the need to minimize hardware complexity—such as eliminating variable-length instructions and complex addressing—to facilitate faster clock rates and more efficient VLSI fabrication. Early prototypes demonstrated the viability of these principles. The Berkeley RISC-I chip, fabricated in 1982 using a 5-micrometer NMOS process, featured exactly 31 instructions, all adhering to a load-store separation that ensured arithmetic operations remained register-bound while loads and stores handled two-cycle memory transfers. This design achieved a clock speed of around 1 MHz and validated the RISC approach through benchmarks showing performance comparable to contemporary CISC machines but with simpler circuitry. Although influenced by earlier experimental work, such as IBM's 801 minicomputer project in the 1970s—which introduced load-store separation and register-focused computation in a single-chip prototype—the Berkeley and Stanford efforts distinctly formalized these concepts within the RISC framework, prioritizing academic rigor and VLSI integration over proprietary constraints.

Adoption in Commercial Processors

The commercialization of load-store architecture began in the mid-1980s with the founding of Computer Systems in 1984, culminating in the release of the R2000 processor in 1986. This 32-bit implementation featured a pure load-store design, separating memory access from computation to enhance pipelining efficiency, and was later used in workstations such as the DECstation series starting in 1989. Following this milestone, introduced the RS/6000 line in February 1990, incorporating load-store elements into its POWER architecture for workstations and servers. The POWER ISA's load-store model supported precise interrupts and efficient data handling, enabling over 800,000 RS/6000 systems to ship by 1999. In parallel, load-store principles expanded into embedded systems through ARM's development at Acorn Computers, where the ARM1 processor debuted in 1985 as a low-power RISC core for personal computing tasks like graphics and word processing. Evolving to the ARMv2 core in 1987 with the ARM2 chip, this load-store architecture optimized register-based operations for battery-constrained devices, powering Acorn's Archimedes computers and laying the groundwork for widespread mobile adoption, including the Nokia 6110 in the 1990s. The architecture's influence extended to open standards with ' SPARC in 1987, an open RISC specification featuring a load-store model with register windows to minimize memory traffic. Standardized as IEEE 1754-1994, drove Unix-based workstations and servers, achieving over 450 record benchmarks by the and enabling scalable deployments in and infrastructure. This openness facilitated adoption in supercomputing, where systems like Cray's T3D (1993) and T3E (1995) integrated RISC load-store processors such as the for massively parallel processing in scientific simulations. The 1990s saw accelerated growth, highlighted by the 1991 AIM alliance forming PowerPC as a load-store RISC derivative of POWER for broader markets. Its integration into Apple's Power Macintosh series from 1994 onward restored performance competitiveness, with millions sold annually in the mid-1990s, including around 4.5 million total Macintosh units in 1995. Meanwhile, Intel's Itanium, announced in 1999 and released in 2001, adopted an explicit instruction-level parallel load-store model under the IA-64 umbrella, targeting enterprise servers despite later market challenges. By 2000, load-store RISC architectures dominated commercial designs in embedded and high-performance segments.

Notable Implementations

MIPS and RISC-I

The RISC-I, developed at the in 1982 as part of the initial RISC project, employed a load-store architecture with 32 general-purpose registers (GPRs) numbered r0 through r31, where r0 was hardwired to zero and writes to it were ignored. The design utilized a uniform 32-bit fixed-length instruction format for all 39 instructions, simplifying decoding and enabling efficient pipelining. Memory operations were restricted to dedicated load and store instructions, including byte (LB/SB), halfword (LH/SH), and word (LW/SW) variants that supported or zero filling for sub-word loads. The MIPS project originated in 1981 at under John Hennessy, initially featuring a with 16 GPRs before evolving to 32 GPRs in commercial implementations like the released in 1988. The maintained a load-store design, with load and store instructions using a 16-bit signed offset from a base register for addressing, limiting immediate displacements but promoting register-based computations. It incorporated delayed branches, where the instruction immediately following a branch was always executed to fill bubbles, and provided interfaces for up to four coprocessors to handle tasks like without interrupting the main . Key innovations in these designs included a consistent three-operand format for all arithmetic-logic unit (ALU) operations in both RISC-I and , allowing operations like ADD Rd, Rs, Rt to specify distinct destination and source registers independently. Exception handling relied on dedicated registers rather than complex traps; for instance, used the (EPC) register to store the address of the interrupted instruction. Early performance evaluations demonstrated the efficacy of these features, with the achieving up to 20 MIPS at 25 MHz in benchmark tests. The legacy of MIPS extends to its widespread adoption in embedded systems, powering networking equipment from vendors like and gaming consoles such as the original , which utilized a customized R3000A core at 33.8 MHz. open-sourced the architecture in 2018 under the MIPS Open initiative, but following the company's acquisition and subsequent bankruptcy, active development of the was discontinued in 2021, with the company transitioning to .

ARM and RISC-V

The architecture originated in 1985 as part of ' effort to develop a reduced instruction set for personal , establishing a load-store paradigm that separates memory access from computation. It utilizes 16 general-purpose registers, R0 through R15, with R15 functioning as the to hold the address of the next instruction. Core memory operations rely on dedicated load (LDR) and store () instructions, which support multiple addressing modes such as pre-indexed—where the offset is applied before memory access and the base register is updated—and post-indexed, where the update occurs after access, facilitating efficient data handling in resource-constrained environments. To address code density in embedded applications, ARM introduced Thumb mode in 1994, compressing a subset of instructions to 16 bits while maintaining compatibility with the 32-bit ARM instruction set, thereby reducing program size by up to 35% without sacrificing performance in narrow memory systems. Distinctive features include conditional execution, allowing most instructions to be predicated on processor flags to minimize branch overhead, and the Jazelle extension, which enables hardware-accelerated execution of as a third mode alongside ARM and Thumb. RISC-V emerged as an open-standard in 2010, developed by a team at the , to provide a modular, extensible foundation for diverse needs, building on principles from earlier RISC designs like . Its base integer ISA (RV32I or RV64I) includes 32 general-purpose registers denoted x0 through x31, with x0 hardwired to zero to serve as a constant source and simplify certain operations. The architecture enforces a strict load-store model, where memory instructions like LB (load byte) and SB (store byte) handle all data movement, while arithmetic and logical operations act solely on registers. Modularity is central to , with standard extensions such as M for integer and —adding instructions like MUL and DIV—and A for atomic memory operations, including load-reserved (LR) and store-conditional (SC) for thread-safe synchronization in multiprocessor systems. The V extension, frozen in version 1.0 and ratified in 2021, introduces scalable registers and instructions for , complementing the base ISA's uncompressed 32-bit fixed-length format that avoids the complexity of variable-length decoding. ARM's design emphasizes power efficiency, making it ideal for battery-constrained embedded devices through techniques like low-power states and optimized pipelines, while prioritizes customizability via its royalty-free, open-source model, allowing implementers to add or modify extensions without licensing fees to suit specific or specialized hardware needs. By 2020, ARM-based processors dominated the market with approximately 95% share, powering billions of mobile devices annually; as of 2025, this has grown to over 99%. In contrast, has seen rapid adoption in ecosystems, where its flexibility supports low-cost, tailored microcontrollers for sensors, , and connected devices; as of 2025, it is increasingly used in and AI accelerators by vendors like and Alibaba.

Advantages and Limitations

Performance and Efficiency Gains

Load–store architectures facilitate higher clock speeds by simplifying instruction decoding and execution pipelines, as computational operations are isolated from memory addressing, reducing hardware complexity and enabling deeper pipelining without excessive branch hazards or variable-length instructions. For instance, early RISC implementations like the M/2000 achieved a 40 ns cycle time, comparable to the VAX 8700's 45 ns but with superior effective throughput due to streamlined pipelines. In terms of code execution efficiency, load–store designs with large files minimize accesses by keeping operands in s, leading to fewer misses and higher throughput. Studies on codes show that register promotion in such architectures reduces loads by 30%–60% and stores by 50%, cutting traffic and improving by 5%–20% through optimizations like allocation. Empirical evidence from 1980s benchmarks demonstrates 2–3× speedups for RISC load–store processors over CISC register- designs, attributed to fewer loads/stores per . Power efficiency benefits arise from the reduced for arithmetic logic units, as memory addressing is confined to dedicated load/store instructions, avoiding complex operand decoding in compute paths. ARM processors, exemplifying load–store RISC, exhibit superior in server workloads like SQL and static HTTP serving compared to x86 equivalents due to these design traits. Quantitative metrics further highlight gains: load–store architectures achieve (IPC) of 1–2 in pipelined implementations, versus 0.5 or less in complex ISAs, while predictable memory operations enhance hit rates.

Design Trade-offs and Challenges

One significant in load-store architectures is reduced code density compared to register-memory or complex instruction set computing (CISC) designs. These architectures require separate load and store instructions for all accesses, leading to more instructions per program and thus larger binaries; on average, RISC load-store is about 25% larger than equivalent CISC , with some benchmarks showing up to threefold increases due to fixed-length 32-bit instructions versus variable-length ones in CISC. This increased size can pressure instruction caches and , potentially offsetting some performance gains from simpler decoding. To mitigate this, techniques like instruction compression have been employed; for instance, 's Thumb mode uses 16-bit instructions for common operations, achieving approximately 30% better code density over standard 32-bit ARM instructions. Another challenge arises from register pressure, as load-store architectures mandate that all and logical operations occur exclusively between , amplifying the demand for on-chip resources. With typically general-purpose , compilers often face scenarios where live variables exceed available , necessitating spilling to via additional load and store instructions, which introduces and complexity in . in these systems relies on sophisticated algorithms like , where variables are nodes in an interference graph and colored to assign without conflicts, but high pressure can lead to suboptimal spilling decisions that degrade performance. This issue is particularly pronounced in RISC designs with specialized (e.g., for immediate values or zero), further constraining the effective pool and requiring integrated scheduling to minimize spills. Hardware implementation costs also pose trade-offs, primarily from the need for a large, multi-ported to support parallel register accesses in pipelined execution. In early RISC processors like RISC II, the register file consumed 27.5% of the chip area, highlighting how scaling to 32 registers significantly increases die size, consumption, and access due to the quadratic growth in ports for read/write operations. Additionally, features like delayed slots in branches—common in load-store RISC to hide pipeline hazards—complicate modern branch prediction, as the slot instruction executes unconditionally regardless of the branch outcome, making it harder to speculate correctly in superscalar designs and often requiring no-operation () fillers that waste cycles. Compatibility with legacy software presents further hurdles, especially when emulating CISC binaries on load-store hardware. The Intel Itanium, an (EPIC) load-store processor produced from 2001 to 2021, exemplified these issues by relying on compilers to expose , which proved challenging for optimizing existing x86 CISC code and led to inefficient emulation modes for complex instructions. Hardware accelerators for x86 compatibility existed but incurred high overhead due to the architectural mismatch, contributing to Itanium's market struggles as software ecosystems favored adaptable extensions over full redesigns.

References

  1. [1]
    Load-Store Architecture - an overview | ScienceDirect Topics
    Load-store architecture is a type of computer architecture where arithmetic operations use operands from and produce results in addressable registers.
  2. [2]
    4.4: Load and Store Architecture - Engineering LibreTexts
    Jun 29, 2023 · A load and store architecture uses special instructions to load data from memory into registers and store data from registers back to memory. ...
  3. [3]
    [PDF] REDUCED INSTRUCTION SET COMPUTERS
    Since it takes one cycle to calculate the address and one cycle to access memory, the straightforward way to implement LOADS and STORES is to take two cycles.
  4. [4]
    Lecture 2: Instruction Set Architectures and Compilers
    So-called Reduced Instruction Set Computing (RISC) architectures are load/store architectures. Examples: SPARC, MIPS, Alpha AXP, PowerPC. Complex Instruction ...
  5. [5]
    [PDF] INSTRUCTION SETS - Milwaukee School of Engineering
    Arithmetic instructions never directly put values into memory. This is called the load-store principle or load-store memory access. Load-store memory access is ...
  6. [6]
    [PDF] Instruction Set Principles
    – Load/store architecture with up to one memory access/instruction. – Few addressing modes, synthesize others with code sequence. – Register-register ALU ...
  7. [7]
    [PDF] REDUCED INSTRUCTION SET COMPUTERS
    Often, RISC is referred to as Load/Store architecture. Alternatively the operations in its instruction set are defined as Register-to-Register operations. The ...
  8. [8]
    [PDF] Chapter 13 Reduced Instruction Set Computers (RISC) Computer ...
    Many studies fail to distinguish the effect of a large register file from the effect of RISC instruction set. • Many designs ...
  9. [9]
    [PDF] The DLX Instruction Set Architecture - UT Computer Science
    There are 3 miscellaneous registers. →PC, Program Counter, contains the address of the instruction currently being retrieved from.
  10. [10]
  11. [11]
    None
    ### Summary of MIPS Register Operations, Logical, and Load/Store Instructions
  12. [12]
    2. Instruction Set Architecture - UMD Computer Science
    – Register – register, where registers are used for storing operands. Such architectures are in fact also called load – store architectures, as only load and ...
  13. [13]
    Addressing modes - Arm Developer
    Addressing modes use a base register and offset. The offset can be immediate, register, or scaled. Modes include offset, pre-indexed, and post-indexed.
  14. [14]
    [PDF] EECS 361 Computer Architecture Lecture 3 – Instruction Set ...
    Register-to-Register: Load-Store Architectures. 3-17. EECS 361. Page 18. Register-to-Memory Architectures. 3-18. EECS 361. Page 19. Memory-to-Memory ...Missing: differences | Show results with:differences
  15. [15]
    Lecture notes - Chapter 8 - MAL and Registers - cs.wisc.edu
    Arithmetic/logical instructions use register values as operands. A set up ... For a load/store instruction, we need a register specification also ...
  16. [16]
    Stack Computers: 6.2 ARCHITECTURAL DIFFERENCES FROM ...
    The obvious difference between stack machines and conventional machines is the use of 0-operand stack addressing instead of register or memory based addressing ...Missing: architecture<|control11|><|separator|>
  17. [17]
    [PDF] Virtual Machine Showdown: Stack Versus Registers - USENIX
    Jun 12, 2005 · The most popular VMs, such as the Java VM, use a virtual stack architecture, rather than the register architecture that dominates in real ...
  18. [18]
    [PDF] HP 35s scientific calculator
    automatic storage is the automatic, RPN memory stack. HP's operating logic is based on an unambiguous, parentheses–free mathematical logic known as "Polish ...
  19. [19]
    [PDF] The Case for the Reduced Instruction Set Computer - People @EECS
    Investigation of a RISC architecture has gone on for several months now under the supervision of D.A. Patterson and C.H. Séquin. By a judicious choice of the ...
  20. [20]
    [PDF] Design and implementation of RISC I - UC Berkeley EECS
    Load and store instructions move data between registers and memory. These instructions use two CPU cycles. We decided to make an exception to our. 6. Page ...
  21. [21]
    [PDF] John Hennessy, Norman Jouppi, - Steven Przybylski, Christopher ...
    We also chose to design a load-store architccturc, i.c., a machine in which only the load and store operations access memory and all ALU instructions arc ...Missing: history | Show results with:history
  22. [22]
    [PDF] A Perspective on the 801/Reduced Instruction Set Computer
    On virtually every Sys- tem/370 implementation the Load-Store sequence is considerably faster than MVC for short align moves. The trace tapes also provide ...
  23. [23]
    Introduction to the MIPS ISA
    To introduce MIPS R-Type, immediate, and load-store instructions Materials 1. MIPS ISA Handout (will have been distributed before class) 2. Connection to MIPS ...Missing: processor | Show results with:processor
  24. [24]
    [PDF] A MIPS R2000 IMPLEMENTATION - IIS Windows Server
    Corporation in 1985, releasing the MIPS R2000 running at. 8 MHz on 2.0 micron process. In 1988 the R3000 was released, improving performance to eventually 40 ...Missing: commercialization | Show results with:commercialization
  25. [25]
    [PDF] The POWER4 Processor Introduction and Tuning Guide
    The first RS/6000 products were announced by IBM in February of 1990, and were based on a multiple chip implementation of the POWER architecture, described ...
  26. [26]
    [PDF] RS/6000 Systems Handbook - Kev009
    ... RS/6000 History. The first RS/6000 was announced February 1990 and shipped June 1990. Since then, over 800,000 systems have shipped to over 125,000 customers ...
  27. [27]
    [PDF] IBM Power Architecture - IBM Research Report
    Apr 15, 2011 · Power ISA specifies a load-store architecture consisting two distinct types of instructions: (1) memory access instructions, and (2) compute ...
  28. [28]
    The Official History of Arm
    Aug 16, 2023 · Arm was officially founded as a company in November 1990 as Advanced RISC Machines Ltd, which was a joint venture between Acorn Computers, Apple Computer.
  29. [29]
    Milestones:SPARC RISC Architecture, 1987
    Mar 18, 2024 · Sun Microsystems first introduced SPARC (Scalable Processor Architecture) RISC (Reduced Instruction-Set Computing) in 1987. Over the course of ...Missing: load- store Unix
  30. [30]
    [PDF] RISC Microprocessors and Scientific Computing
    Mar 26, 1993 · Cray systems can fetch two operands and store one operand every clock period, ... width by means of burst load and store operations. It remains to ...
  31. [31]
  32. [32]
    [PDF] Intel Itanium® Architecture Software Developer's Manual
    A key feature of the Itanium architecture is IA-32 instruction set compatibility. The Intel® Itanium® Architecture Software Developer's Manual provides a.Missing: integration | Show results with:integration
  33. [33]
    An overview of RISC architecture
    It is becoming clear that most commercial RISC based processors are ac- tually similar to CISC based processors in many ways. The first generations of RISC ...
  34. [34]
    RISC I: A REDUCED INSTRUCTION SET VLSI COMPUTER
    The static frequencies of RISC I instructions for nine typical C programs show that less than 20% of the instructions were loads and stores, and more than 50% ...
  35. [35]
    RISC on a Chip: David Patterson and Berkeley RISC-I
    Apr 23, 2023 · The first version of the RISC-I design included only 31 instructions. Late in the design process 8 additional load and store instructions, with ...
  36. [36]
    (PDF) MIPS: A microprocessor architecture - ResearchGate
    Aug 7, 2025 · MIPS is a new single chip VLSI microprocessor. It attempts to achieve high performance with the use of a simplified instruction set, similar to those found in ...
  37. [37]
    Computer MIPS and MFLOPS Speed Claims 1980 to 1996
    This document contains performance claims and estimates for more than 2000 mainframes, minicomputers, supercomputers and workstations, from around 120 suppliers
  38. [38]
    A Brief History of the MIPS Architecture - SemiWiki
    Dec 7, 2012 · John Hennessy at Stanford University led a team of researchers in ... R3000, was released in 1988. It was used primarily in SGI's ...
  39. [39]
    MIPS Goes Open Source - EE Times
    Dec 17, 2018 · Wave Computing (Campbell, Calif.) announced Monday (Dec. 17) that it is putting MIPS on open source, with MIPS Instruction Set Architecture (ISA) ...Missing: 2010 discontinued 2020<|control11|><|separator|>
  40. [40]
    First ARM processor powered up - Event - Computing History
    Apr 26, 1985 · The ARM1 processor was developed in just 18 months. Steve Furber defined the architecture while Sophie Wilson developed the instruction set. The ...Missing: load- | Show results with:load-<|separator|>
  41. [41]
    ARM core registers - Arm Developer
    This view provides 16 ARM core registers, R0 to R15, that include the Stack Pointer (SP), Link Register (LR), and Program Counter (PC). These registers are ...Missing: origins 1985
  42. [42]
    Loads and stores - addressing - Arm Developer
    Pre-indexed addressing is like offset addressing, except that the base pointer is updated as a result of the instruction. In the preceding figure, X1 would have ...
  43. [43]
    Architecture history and extensions - Arm Developer
    The ARM architecture changed relatively little between the first test silicon in the mid-1980s through to the first ARM6 and ARM7 devices of the early 1990s ...
  44. [44]
    Jazelle direct bytecode execution support - Arm Developer
    The Jazelle extension provides architectural support for hardware acceleration of bytecode execution by a Java Virtual Machine (JVM).
  45. [45]
    About RISC-V International
    Celebrating 15 years of open innovation, RISC-V has grown from a research project at UC Berkeley into a global movement transforming the semiconductor industry.
  46. [46]
    [PDF] The RISC-V Instruction Set Manual - People @EECS
    May 31, 2016 · There are 31 general-purpose registers x1–x31, which hold integer values. Register x0 is hardwired to the constant 0. There is no hardwired ...
  47. [47]
    [PDF] Design of the RISC-V Instruction Set Architecture - People @EECS
    Jan 3, 2016 · Even so, the registers are of limited use to the kernel because they are not protected from user access. • Handling misaligned loads and stores ...Missing: studies motivations
  48. [48]
    Confused about authoritative specification info - Google Groups
    May 13, 2024 · https://wiki.riscv.org/display/HOME/Ratified+Extensions. The V (Vector) extension was ratified in November 2021. I presume that this refers ...
  49. [49]
    The Future of RISC-V in Embedded Systems - RunTime Recruitment
    Jun 18, 2025 · Unlike proprietary architectures like ARM or x86, RISC-V is license-free, allowing anyone to implement, modify, and extend the ISA without ...
  50. [50]
    ARM Revenue and Growth Statistics (2024) - SignHouse
    Do smartphones use ARM processors? Arm CPUs are the leading smartphone processor IP on the market today. 95 per cent of premium smartphones are powered by Arm.
  51. [51]
    IoT/Embedded - RISC-V International
    RISC-V is the AI-native ISA that addresses the challenges of the IoT and embedded markets. Powering ultra-efficient MCUs to high-performance application ...
  52. [52]
    [PDF] The Need for Large Register Files in Integer Codes - Trevor Mudge
    Large register files have many advantages. If used effectively they can: 1) reduce memory traffic by removing load and store operations; 2) improve performance ...
  53. [53]
    A comparison of x86 and ARM architectures power efficiency
    ▻ ARM systems are more power efficient for SQL and static HTTP servers. ▻ x86 architecture is still more power efficient for floating point computation.
  54. [54]
    [PDF] Code Density Concerns for New Architectures
    There is a tradeoff, in that having fewer registers generates more loads/stores from spilling in load-store architectures. Virtual address of first instruction ...Missing: challenges pressure
  55. [55]
    Thumb Instruction Set - an overview | ScienceDirect Topics
    The Thumb instruction set achieves higher code density by using 16-bit instructions instead of the original 32-bit ARM instructions, resulting in a code size ...Missing: compression | Show results with:compression
  56. [56]
    Integrating register allocation and instruction scheduling for RISCs
    To achieve high performance in uniprocessor. RISC systems, compilers must perform both register alloca- tion to reduce memory references and instruction.
  57. [57]
    A reduced register file for RISC architectures
    A multiple-window register file has been shown to be very effective in reducing the memory traffic due to saving and restoring of registers in function.
  58. [58]
    [PDF] Branch Prediction and Multiple-Issue Processors
    Size of basic blocks limited to 4-7 instructions. • Delayed branches not a solution in multiple- issue processors. • Why? Hard to find independent ...<|separator|>
  59. [59]
    [PDF] the vliw-supercisc compiler: exploiting - D-Scholarship@Pitt
    Apr 28, 2006 · Several large scale projects, such as the Intel Itanium 2 VLIW processor, have failed in the market due to the lack of ability to leverage ...
  60. [60]
    (PDF) Itanium: a system implementor's tale - ResearchGate
    Oct 4, 2025 · Itanium is a fairly new and rather unusual architecture. Its defining feature is explicitly-parallel instruction-set computing (EPIC).Missing: 2001-2021 | Show results with:2001-2021