PA-RISC
PA-RISC (Precision Architecture - Reduced Instruction Set Computing) is a reduced instruction set computing (RISC) instruction set architecture (ISA) developed by Hewlett-Packard (HP) during the 1980s, designed for high-performance Unix-based workstations and servers.[1] It employs a load/store architecture with fixed 32-bit instructions, emphasizing simple operations, single-cycle execution where possible, and hardware efficiency to replace earlier stack-based and CISC designs like the HP 3000 and Motorola 68000.[1] The architecture evolved through three main versions—PA-RISC 1.0 (32-bit, introduced 1986), 1.1 (enhanced 32-bit with multimedia extensions, 1991), and 2.0 (64-bit with out-of-order execution, 1996)—powering over 16 processor implementations from early TTL/NMOS chips to advanced CMOS designs like the PA-8900.[1][2] Key technical features of PA-RISC include 32 general-purpose 32-bit registers (with 7 shadow registers for interrupts), 32 floating-point registers supporting 32-, 64-, and 128-bit formats, and control registers for memory management, privilege levels (0–3), and system state via the Processor Status Word (PSW).[2] Addressing supports up to 64-bit virtual spaces with page sizes of 4–32 KB, translation lookaside buffers (TLBs) for virtual-to-physical mapping, and mixed-endian capabilities for compatibility.[2] The instruction set comprises around 140–190 instructions across categories like arithmetic/logical operations (e.g., ADD, SH1ADD for multiplication primitives), branches with delay slots, floating-point computations compliant with IEEE standards, and system control for multiprocessing and cache coherence.[2] Notable enhancements include multimedia acceleration (MAX-1 and MAX-2 extensions) and support for symmetric multiprocessing (SMP), memory-mapped I/O, and optional coprocessors for tasks like debugging and performance monitoring.[1][2] Historically, PA-RISC originated from HP Labs' Spectrum project under Joel Birnbaum (formerly of IBM's 801 RISC effort) and debuted in the HP 9000 Series 800 workstations in 1986, running HP-UX.[1] It saw peak adoption in the 1990s with processors like the PA 7100 (1991, 50 MHz) and PA 8500 (1998, up to 440 MHz), scaling to multi-processor systems for enterprise computing.[3][4] By the early 2000s, HP collaborated with Intel to transition to Itanium (IA-64), a VLIW architecture influenced by PA-RISC concepts, marking the beginning of its decline.[1] HP completed its PA-RISC roadmap with the PA-8900 in 2005 and ceased sales of PA-RISC systems in 2008, with support for HP-UX 11i v3 ending on December 31, 2025.[5][6]Overview
Definition and Origins
PA-RISC, also known as Precision Architecture, is a load/store Reduced Instruction Set Computing (RISC) instruction set architecture (ISA) developed by Hewlett-Packard (HP).[7] It emphasizes simplified instructions to enhance execution efficiency, distinguishing it from contemporary complex instruction set computing (CISC) designs.[8] Development commenced in 1982 under HP's internal Spectrum program, which represented the company's largest research and development effort to that point, aimed at creating a unified architecture for diverse computing needs.[7][8] The origins of PA-RISC stemmed from HP's strategic push to replace aging CISC-based systems, such as the HP 3000 business computer, with a more scalable and performant alternative suitable for both workstations and servers.[7][8] By reducing instruction complexity, the architecture sought to minimize hardware overhead, enable faster pipelined execution, and support a broad spectrum of applications through optimized compiler techniques and hardware simplicity.[8] This motivation was driven by analyses of application workloads, which revealed that most operations could be handled efficiently with a smaller set of primitive instructions.[8] PA-RISC was initially conceived as a 32-bit architecture, with built-in forward compatibility for 64-bit extensions to accommodate future growth in addressable memory and data processing demands.[7][8] The first hardware implementation, the TS-1—a multi-board TTL logic system—was introduced in 1986, marking a key milestone in validating the design.[9][8] At its core, PA-RISC incorporates three-operand instructions for flexible register-based operations, fixed-length 32-bit instruction encoding to simplify decoding, and branch delay slots to mitigate pipeline stalls and improve overall throughput.[7][8] These principles aligned with emerging RISC philosophies, prioritizing hardware simplicity and software optimization for high-performance computing.[8]Key Features and Design Principles
PA-RISC, developed by Hewlett-Packard, incorporates space registers as a fundamental element of its virtual memory management, enabling efficient segmentation without relying on full translation lookaside buffers (TLBs) in initial implementations. The architecture employs eight space registers (SR0 through SR7) to define space identifiers that facilitate virtual address translation across privilege levels and addressing modes. These registers support the organization of memory into up to eight 4 GB segments, with each space using 4 KB pages, allowing for flexible protection and isolation in multiprogrammed environments. By concatenating a space register's contents with an offset from a general register, effective addresses are formed, streamlining translation for intraspace and interspace operations.[2] From its inception, PA-RISC was engineered to accommodate both 32-bit and 64-bit addressing, providing forward compatibility for evolving memory demands. The architecture natively handles 32-bit absolute and virtual addresses, while extending to 64-bit virtual addressing through mechanisms like space registers and functions such as sign_ext_64 and zero_ext_64 for manipulating 64-bit quantities in double-word formats. In PA-RISC 1.1, this support was enhanced with 64-bit integer operations, allowing general registers to process double-word data via paired 32-bit registers, with virtual page numbers ranging from 36 to 52 bits and physical pages of 20 bits. PA-RISC 2.0 further refined this by introducing full 64-bit registers and a flat 64-bit virtual address space when the Processor Status Word (PSW) wide-bit is set, enabling seamless handling of large address spaces up to 2^62 bits.[2][10] A core design principle of PA-RISC emphasizes compiler-driven optimizations to exploit hardware efficiency and minimize pipeline disruptions. Features like delayed branching execute an instruction in the branch delay slot before the control transfer, reducing stalls by allowing useful work to fill pipeline gaps. Annulled (or nullified) instructions further support this by conditionally skipping execution based on branch outcomes, as seen in instructions like BL (branch long) with nullification completers, enabling compilers to schedule operations without hardware intervention for condition resolution. These mechanisms, combined with support for static branch prediction hints in later versions, prioritize software control over complex hardware speculation to achieve high instruction throughput.[2][10] PA-RISC introduces assist instructions to offload system-level and specialized tasks, integrating them with a modular execution model comprising five primary units: integer, load/store, floating-point, and two special function units (SFUs) for coprocessor tasks. Assist instructions handle system calls through mechanisms like GATEWAY for privilege transitions and RFI (return from interruption), triggering assist emulation traps if unsupported by hardware to allow software fallback. For floating-point operations, dedicated instructions such as FADD, FSUB, and FDIV operate via the floating-point unit or coprocessor, supporting single-, double-, and quad-precision formats with up to 32 64-bit registers. The SFUs extend this for performance monitoring and debugging, ensuring the architecture's scalability across integer, memory, and vector-like workloads while maintaining a clean load/store separation.[2][10]History
Early Development
In 1982, Hewlett-Packard launched the Spectrum program to develop a unified reduced instruction set computing (RISC) instruction set architecture (ISA) capable of supporting the company's diverse range of non-PC-compatible systems, from workstations to servers.[11] This initiative, led by key architects including Joel Birnbaum and Ruby B. Lee, aimed to create a scalable "precision architecture" that emphasized compiler optimization, load/store operations, and single-cycle instruction execution to achieve high performance across varying hardware implementations.[12] The program's focus on RISC principles allowed for simpler hardware designs while maintaining compatibility with existing HP software ecosystems. Initial prototyping efforts finalized the design of the TS-1 in 1984, a transistor-transistor logic (TTL)-based implementation operating at 8 MHz with a three-stage pipeline, separate 64 KB instruction and data caches, and support for up to 128 MB of memory; hardware production occurred in 1986, serving as the foundational testbed for validating the 32-bit PA-RISC 1.0 ISA.[9] Subsequent implementations advanced fabrication technologies, transitioning from TTL to NMOS and then CMOS for lower power consumption and higher density.[9] These iterations prioritized 32-bit integer and floating-point operations, achieving early benchmarks around 4.5 million instructions per second (MIPS) in simulations, and addressed core RISC tenets like register-rich designs and delayed branching.[9] The first commercial release arrived in 1986 with the HP 9000 Series 840 workstation, powered by the TS-1 implementation across six boards and shipping initially in November alongside an early version of HP-UX, a UNIX variant compliant with System V interfaces.[13] Priced at approximately $113,500 base, the system supported up to 112 MB of RAM and targeted computer-integrated manufacturing and engineering applications.[14] Development challenges centered on transitioning from bulky TTL prototypes to compact VLSI processes, which required overcoming fabrication yield issues and optimizing for 32-bit data paths without compromising clock speeds or thermal management.[11] This shift enabled cost-effective scaling while delivering superior floating-point throughput compared to contemporary CISC architectures.[15]Evolution and Version Milestones
The PA-RISC architecture originated with version 1.0 in 1986, establishing a baseline 32-bit instruction set architecture (ISA) designed for load/store operations, fixed 32-bit instructions, and support for up to 64-bit virtual addresses, implemented initially in the TS-1 prototype processor.[2] This foundational version emphasized simplicity, pipelining efficiency, and direct hardware implementation without virtual memory or translation lookaside buffers (TLBs), focusing on absolute addressing and basic IEEE 754 floating-point operations.[2] In 1990, PA-RISC evolved to version 1.1, introducing key enhancements including multimedia extensions via fused multiply-add instructions like FMPYADD and FMPYSUB, and full compliance with IEEE 754 floating-point standards for single-, double-, and quad-precision formats with configurable exception handling.[2] These changes expanded the instruction set for better floating-point graphics clip tests and coprocessor support, increased page sizes to 4 Kbytes, added virtual addressing with space identifiers and TLBs, and enabled cache-coherent I/O, while maintaining forward compatibility with 1.0 software.[16] The PA-7000 series processors, released in 1991, were the first to implement this version, marking a shift toward broader application in workstations and servers.[16] PA-RISC 2.0, introduced in 1996, represented a major advancement to a full 64-bit ISA with 32 general-purpose 64-bit registers and 32 floating-point registers, a flat 64-bit virtual address space, and weak memory ordering for improved performance.[10] It incorporated MAX-2 SIMD instructions for parallel halfword operations (e.g., HADD, HSUB) to accelerate media processing, along with enhanced branch prediction via a Branch Target Stack (BTS) for indirect branches and static/dynamic hints, while preserving unprivileged compatibility with prior versions through the IEEE 754-1985 standard.[10] The PA-8000, a superscalar processor launched in 1996, embodied these features as the first 2.0 implementation.[10] Standardization efforts began in 1992 with the formation of the Precision RISC Organization (PRO), an independent group led by Hewlett-Packard to cross-license the architecture, develop compliance standards, and promote adoption beyond HP systems by partners like Hitachi and Convex.[17] A significant milestone came in 2005 with the PA-8900, the final major update providing a 16% performance increase over predecessors and concluding HP's two-decade PA-RISC roadmap before the shift to Itanium.[18]Architecture
Instruction Set
The PA-RISC instruction set architecture (ISA) employs a fixed 32-bit instruction format to facilitate efficient decoding and execution, consisting of four primary fields: a 6-bit opcode that specifies the operation, two or three 5-bit register fields for source and destination operands (depending on the instruction type), and a variable-length immediate or displacement field (typically 5, 11, 14, or 21 bits) for constants or addresses.[2] This uniform structure supports a load/store architecture, where data processing occurs only in registers, and memory access is restricted to dedicated load and store instructions.[2] Instructions are categorized into several functional groups, emphasizing simplicity and single-cycle execution for common operations. Load and store instructions handle memory access, such as LDW (load word) and STW (store word) for 32-bit integer data, along with floating-point variants like FLDWX (load floating-point word indexed) and FSTDX (store floating-point double).[2] Arithmetic operations include ADD and SUB, which perform integer addition and subtraction with optional condition code updates for subsequent branches.[2] Logical instructions encompass bitwise operations like AND and OR for manipulating register contents.[2] Floating-point instructions support single- and double-precision arithmetic, exemplified by FADD (floating-point add) and FMUL (floating-point multiply), as well as FSQRT (floating-point square root).[2] Control instructions manage program flow, including BE (branch equal) for conditional jumps based on register or condition values.[2] Branch instructions incorporate optimization features to mitigate pipeline stalls, executing one delay slot instruction following the branch, with nullification bits (such as the PSW N-bit or ",n" completer) allowing selective skipping of this slot based on branch outcomes.[2] This design, including types like unconditional branches (BL, BLR) and conditional variants (BV, BLE), enables compilers to fill delay slots productively while preserving semantic correctness.[2] Special instructions address system-level and debugging needs. BREAK triggers a debugging trap for breakpoints or exceptions, while MTCTL (move to control register) manages coprocessor state.[2] An assist coprocessor mechanism provides privileged instructions for operating system primitives, such as context switching or interrupt handling, executed via dedicated opcodes.[2] The PA-RISC 1.1 specification defines approximately 190 instructions in total, focusing on core RISC principles.[2][1] PA-RISC 2.0 expands the instruction set with additional instructions, introducing multimedia extensions (e.g., HADD for parallel halfword addition) and performance-oriented features like the MAX category, including FMPYADD for fused multiply-add operations that combine multiplication and addition in a single instruction to reduce latency in floating-point computations.[10] These additions maintain the 32-bit format while enhancing support for vector-like processing and fused operations.[10]Registers and Memory Model
The PA-RISC architecture employs a register file consisting of 32 general-purpose registers (GR0 through GR31), which are 32 bits wide in the PA 1.x versions and extended to 64 bits in PA 2.0 to support larger address spaces and integer operations.[2][10] GR0 is hardwired to zero and cannot be modified, serving as a constant source for computations, while GR31 functions as the stack pointer in standard calling conventions.[19] These registers handle integer arithmetic, logical operations, and address calculations, with instructions typically using them in a load-store manner.[2] Floating-point operations utilize 32 dedicated registers (FR0 through FR31), each 64 bits wide, allowing paired usage for single-precision (32-bit) or double-precision (64-bit) IEEE 754 floating-point values; in PA 2.0, they also support quad-precision (128-bit) formats via even-odd pairing.[2][10] FR0 incorporates embedded status and exception fields for rounding modes, trap enables, and coprocessor configuration, enabling efficient handling of floating-point exceptions without dedicated control registers.[2] Special-purpose registers include eight space registers (SR0 through SR7), which store space identifiers comprising 14-bit protection IDs and base address components to enforce memory protection and segmentation in virtual addressing.[2][20] Instructions select these via a 3-bit 's' field for memory references, with SR0 reserved for return space IDs during inter-space branches and SR4–SR7 typically zeroed in kernel mode for short addressing.[20] Control registers (CR0 through CR31 in PA 2.0, with CR0–CR7 core in PA 1.x) manage system state, including CR0 for the recovery counter, CR8–CR13 for protection IDs, CR11 for shift amounts, CR14 for interruption vectors, and CR16 for the interval timer, facilitating interrupt handling and privilege-level transitions.[2][10][19] The memory model is big-endian by default, with an optional little-endian mode via the processor status word (PSW) E-bit, and organizes memory as a flat virtual address space segmented only by the eight space registers to provide isolation without traditional segmentation.[2][10] In PA 1.x, it supports a 32-bit virtual address space per segment (up to 4 GB), while PA 2.0 expands to 64-bit addressing (up to 16 exabytes) with the PSW W-bit enabled, using a two-level paged structure managed by a translation lookaside buffer (TLB) and hashed page tables for 4 KB to 64 MB pages.[2][10] Physical addresses extend to 44 bits in early implementations but scale to 64 bits, with access rights, reference bits, and cacheability controlled per page entry.[2] Supported data types encompass signed and unsigned integers in 8-bit (byte), 16-bit (halfword), 32-bit (word), and 64-bit (doubleword) formats using two's complement for signed values, alongside IEEE 754 single-, double-, and quad-precision floating-point types.[2][10] PA 2.0 introduces packed formats, including 16-bit multimedia integers for saturation arithmetic and packed decimal (up to 31 BCD digits) for legacy applications, enhancing efficiency in vector-like operations without altering the core register width.[10]Implementations
First-Generation Processors
The first-generation PA-RISC processors encompassed implementations of the 32-bit PA-RISC 1.0 and 1.1 instruction set architectures, transitioning from multi-chip TTL designs to single-chip VLSI solutions in CMOS technology. These early chips focused on establishing the architecture's viability for workstations and servers, emphasizing load/store design principles with separate integer and floating-point units. They featured in-order execution pipelines, with performance scaling from single-issue to basic superscalar capabilities in later variants.[9][21] The inaugural implementation was the TS-1 processor, released by Hewlett-Packard in 1986 as a prototype for PA-RISC 1.0. Operating at 8 MHz, it utilized discrete TTL logic across six boards (totaling approximately 900 integrated circuits), with no on-chip cache and an external 128 KB combined instruction/data cache. This multi-board design supported a 27-bit physical address space (128 MB maximum memory) and was deployed in early HP 9000 Series 840 servers for testing the architecture's RISC principles, including fixed-length instructions and delayed branching. While not a single-die solution, it laid the groundwork for subsequent VLSI integrations.[9][14] Following the TS-1, Hewlett-Packard introduced the CS-1 and RS-1 in 1986 as the first very-large-scale integration (VLSI) implementations of PA-RISC 1.0, marking a shift to more efficient NMOS and CMOS processes. Clocked at 10 MHz, these processors integrated a floating-point unit (FPU) on-chip for the first time, enabling basic scientific computing workloads alongside integer operations. The CS-1 handled control functions, while the RS-1 managed register storage, together forming a chipset that reduced board count compared to the TS-1 and improved power efficiency over TTL designs. These chips supported early HP 9000 systems and validated PA-RISC's media compatibility features, such as branch delay slots.[9] Advancing to higher performance, the PA7100 family represented a major milestone in first-generation PA-RISC 1.1 processors, debuting in 1992. Fabricated on a 0.8 μm CMOS process with about 850,000 transistors, the PA7100 ran at 60 MHz in a 504-pin ceramic package, featuring a 5-stage pipeline for in-order execution and off-chip level-1 caches (configurable up to 1 MB instruction and 2 MB data, 64-bit wide). It integrated integer ALU and FPU units, capable of issuing one integer or floating-point instruction per cycle, with a 120-entry fully associative TLB for virtual memory management. Power consumption was approximately 20 W at 100 MHz variants, balancing performance for mid-range servers. The successor PA7200, introduced in 1994 at 75 MHz (and up to 120 MHz), enhanced this with two-way superscalar execution, allowing dual instruction dispatch (one integer, one floating-point) per cycle while maintaining in-order completion. It retained the 5-stage pipeline but added support for 2 loads and 1 store per cycle via dedicated pipes in the load/store unit, improving memory bandwidth for database and simulation applications. Caches remained off-chip at 8 KB instruction and 64 KB data minimum configurations, with a die size of 14 mm × 14 mm.[21][22][23] A low-cost derivative, the PA7100LC, arrived in 1994 to target embedded and entry-level systems while upholding PA-RISC 1.1 compatibility. Clocked at up to 80 MHz (with 100 MHz options) on a 0.75 μm CMOS process, it integrated 1 KB on-chip instruction cache, a memory/I/O controller, and bi-endian support as the first in the family, alongside MAX-1 multimedia extensions for audio/video processing. The microarchitecture mirrored the PA7100's two-way superscalar, 5-stage pipeline and in-order execution but added on-chip unification for reduced system cost, with external cache expandable to 2 MB data. Featuring 900,000 transistors in a 14.2 mm × 14.2 mm die and 432-pin PGA package, it achieved 64-bit memory buses for 480-600 MB/s throughput, making it suitable for cost-sensitive workstations.[24][25]| Processor | Year | Clock (MHz) | Process | Key Features | Execution Model |
|---|---|---|---|---|---|
| TS-1 | 1986 | 8 | TTL (multi-board) | External 128 KB I/D cache, no FPU integration | Single-issue, in-order |
| CS-1/RS-1 | 1986 | 10 | NMOS/CMOS | Integrated FPU, first VLSI chipset | Single-issue, in-order |
| PA7100 | 1992 | 60 | 0.8 μm CMOS | Off-chip caches (up to 1 MB I/2 MB D), integrated ALU/FPU | Single-issue, in-order, 1 load/store per cycle |
| PA7200 | 1994 | 75 | 0.8 μm CMOS | Off-chip 8 KB I/64 KB D min caches, 2 loads/1 store per cycle | Two-way superscalar, in-order |
| PA7100LC | 1994 | 80 | 0.75 μm CMOS | Integrated 1 KB I-cache + MIOC, bi-endian, multimedia extensions | Two-way superscalar, in-order, 1 load/store per cycle |
Second-Generation Processors
The second-generation PA-RISC processors implemented the PA-RISC 2.0 instruction set architecture, which extended the original design with 64-bit integer and floating-point support, enhanced privilege levels, and improved media instructions to enable more efficient high-performance computing. These processors shifted to sophisticated out-of-order execution microarchitectures, starting with the PA-8000 family in 1996, emphasizing superscalar designs for enterprise servers and workstations.[16] The PA-8000, released in 1996, was Hewlett-Packard's first 64-bit PA-RISC 2.0 processor, featuring a four-way superscalar core with out-of-order execution managed by a 56-entry instruction reorder buffer that allowed reordering of up to four instructions per cycle. Fabricated on a 0.5 μm CMOS process with 3.8 million transistors on a 17.7 mm × 19.1 mm die, it operated at clock speeds of 160–180 MHz and connected to a high-bandwidth Runway bus. Cache configuration included separate 1 MB instruction and data L1 caches on-chip, with an optional off-chip 1 MB L2 cache; at 180 MHz, it achieved 11.8 SPECint95 and 20.2 SPECfp95.[26][27][28] An interim upgrade, the PA-8200 (PCX-U+), was introduced in 1997 at 200–300 MHz on the same 0.5 μm process. It added an integrated L2 cache controller and improved power management over the PA-8000, with similar four-way superscalar out-of-order execution and 1 MB L1 caches per type on-chip, supporting external L2 up to 2 MB for better hit rates in workstation applications.[29] Succeeding the PA-8000, the PA-8500 and PA-8600 processors, introduced in 1998 and 1999 respectively, refined the microarchitecture with dual integer execution units for better throughput on integer workloads and enhanced branch prediction using a two-level adaptive predictor to reduce misprediction penalties. Both were built on a 0.25 μm CMOS process with 140 million transistors and a die size of 21.3 mm × 22.0 mm; the PA-8500 reached up to 300–440 MHz, while the PA-8600 scaled to 400–550 MHz on the same Runway DDR bus at 2 GB/s peak bandwidth. They incorporated 0.5 MB instruction and 1 MB data L1 caches on-chip, paired with a 2 MB off-chip L2 cache, enabling higher clock rates and improved performance over the PA-8000 in server environments.[4][30][16] The PA-8700 and PA-8800, launched in 2001 and 2003, further evolved the design with deeper pipelines and larger caches to support gigahertz clock speeds, including external L3 cache support in the PA-8700 for reduced latency in multiprocessor systems. The PA-8700 used a 0.18 μm silicon-on-insulator CMOS process, operating at 750 MHz with 2.25 MB on-chip L1 cache (0.75 MB instruction + 1.5 MB data), while the PA-8800 adopted a 0.13 μm process for dual-core configurations at 900 MHz–1 GHz, attaching to the faster Itanium bus at 6.4 GB/s and incorporating simultaneous multithreading-like features for better resource utilization across threads, with 1.5 MB L1 cache (0.75 MB instruction + 0.75 MB data) per core and shared off-chip L2/L3 caches. These models prioritized scalability in symmetric multiprocessing setups, with the PA-8800's dual cores sharing resources to boost overall system throughput.[31][32][33] The final high-end model, the PA-8900 introduced in 2005, represented the pinnacle of PA-RISC 2.0 implementations as an eight-way superscalar dual-core processor at 1.0–1.1 GHz on a 0.13 μm process with 317 million transistors and a 23.6 mm × 15.5 mm die. It featured expanded caching with 3 MB total L1 (0.75 MB instruction and 0.75 MB data per core) and a massive 64 MB shared L3 cache to handle demanding enterprise workloads, connecting via the 200 MHz Itanium 2 bus; this configuration delivered approximately 16% higher performance than the PA-8800, though formal SPECint95 scores reached around 25 in optimized configurations. The PA-8900 served as HP's last major PA-RISC upgrade before transitioning to Itanium.[34][35][36] Third-party implementations included Hitachi's HA8000 series in 1997, which integrated PA-RISC 2.0 cores into custom server designs for Japanese markets, and OKI's embedded variants like the OP32/50N developed in the mid-1990s for low-power applications, adapting the architecture for specialized control systems without the full superscalar complexity of HP's high-end chips.[37][38]| Processor | Release Year | Clock Speed (MHz) | Process (μm) | Key Microarchitecture Features | Cache Configuration |
|---|---|---|---|---|---|
| PA-8000 | 1996 | 160–180 | 0.5 CMOS | 4-way superscalar, out-of-order, 56-entry reorder buffer | 1 MB I + 1 MB D L1, optional 1 MB L2 |
| PA-8500 | 1998 | 300–440 | 0.25 CMOS | Dual integer units, improved branch prediction | 0.5 MB I + 1 MB D L1, 2 MB L2 |
| PA-8600 | 1999 | 400–550 | 0.25 CMOS | Enhanced PA-8500 core, higher frequency scaling | 0.5 MB I + 1 MB D L1, 2 MB L2 |
| PA-8700 | 2001 | 750 | 0.18 SOI CMOS | Deeper pipeline, L3 support for SMP | 0.75 MB I + 1.5 MB D L1 on-chip (2.25 MB total), external L2/L3 |
| PA-8800 | 2003 | 900–1000 | 0.13 CMOS | Dual-core, SMT-like threading, Itanium bus | 1.5 MB L1 per core (0.75 MB I + 0.75 MB D), shared off-chip L2/L3 |
| PA-8900 | 2005 | 1000–1100 | 0.13 CMOS | 8-way superscalar dual-core, massive L3 | 3 MB L1 total + 64 MB L3 |