PDP-11 architecture
The PDP-11 architecture is a 16-bit CISC instruction set architecture developed by Digital Equipment Corporation (DEC) for its PDP-11 series of minicomputers, introduced in 1970 and produced through the late 1990s.[1] It features eight 16-bit general-purpose registers (R0–R7), with R7 serving as both a register and the program counter, enabling position-independent code and efficient stack-based operations.[1] The architecture supports a virtual address space of 64 KB (expandable to 128 KB with separate instruction and data spaces), physical addressing up to 4 MB in advanced models, and three privilege modes (kernel, supervisor, user) for multiprogramming and protection.[1]
Introduced with the PDP-11/20 model, the architecture revolutionized minicomputing by employing the UNIBUS—a single, bidirectional, asynchronous bus—for interconnecting CPU, memory, and peripherals, which simplified system design and scalability across low-end to high-end configurations.[2] Key evolutions included the PDP-11/45 in 1971 for multi-user support, the LSI-11 in 1975 integrating large-scale integration for cost-effective microcomputers, and the PDP-11/70 in 1975 as a performance leader with memory management and cache options.[1] By 1983, over 300,000 PDP-11 systems had been shipped, establishing it as a standard for scientific, industrial control, commercial data processing, and real-time applications.[1]
The PDP-11's instruction set comprises 73 base instructions, expandable to over 400 with optional extensions like the Extended Instruction Set (EIS) for multiplication/division, Floating-Point Processor (FP-11) for scientific computing, and Commercial Instruction Set (CIS11) for string and decimal operations.[1] It offers 12 addressing modes, including register, immediate, absolute, relative, autoincrement, autodecrement, indexed, and deferred variants, which enhance code density and execution efficiency—particularly the PC-relative modes for relocatable programs.[1] Interrupt and trap handling is prioritized for low latency, with vectors starting at memory location 0 and fast bus arbitration via the UNIBUS, supporting reentrant code and interrupt-driven I/O in multitasking environments.[1]
Memory management units in higher-end models provide paging, relocation, and protection, addressing up to 22 bits physically while maintaining a 16-bit virtual space, which facilitated operating systems like RT-11, RSX-11, and RSTS/E.[1] The architecture's byte-addressable memory and bit-field operations further supported diverse peripherals and networking via DECnet protocols.[3] Its influence extended to military standards and early UNIX implementations, underscoring its role in advancing accessible computing power.[2]
Processor Design
Registers
The PDP-11 architecture features eight 16-bit general-purpose registers, designated R0 through R7, which serve as the primary means for data storage, manipulation, and address computation within the processor.[4] These registers are integral to the CPU's operation, supporting a variety of tasks from arithmetic computations to memory addressing.[4]
Registers R0 through R5 function as versatile general-purpose registers, commonly employed as accumulators, counters, pointers, or index registers in software implementations.[4] R6 is dedicated as the stack pointer (SP), which points to the top of the current stack and operates exclusively in full-word (16-bit) units, incrementing or decrementing by 2 bytes to maintain even addressing alignment.[4] R7 serves as the program counter (PC), holding the 16-bit memory address of the next instruction to be fetched, and automatically increments by 2 after each word-aligned instruction access.[4]
Certain usage conventions govern register operations to optimize efficiency and compatibility across PDP-11 implementations. For instance, in multiply (MUL) and divide (DIV) instructions, specifying an even-numbered destination register (R) allows the 32-bit result to span R and the subsequent odd register (R+1), whereas an odd destination yields only a 16-bit low-order result.[5] Byte operations on registers do not inherently distinguish even from odd registers, but memory byte accesses align low bytes at even addresses and high bytes at odd addresses, influencing register loads and stores.[4] These registers interact with addressing modes, such as autoincrement and autodecrement, primarily through R0-R6 to compute effective addresses dynamically.[4]
Upon processor reset, initiated by the INIT signal, all general registers including the PC and SP are cleared to zero, providing a defined initial state for system startup.[4] During context switching, such as in response to interrupts or traps, the contents of R0-R7, along with the PC, are automatically pushed onto the stack using the SP for preservation, and restored upon return to ensure seamless task resumption; some advanced models like the PDP-11/45 and 11/70 support dual register sets to accelerate this process without stack operations.[4]
Processor Status Word
The Processor Status Word (PSW) is a 16-bit register in the PDP-11 architecture that captures essential processor state information, including condition codes reflecting operation results, operating mode for privilege control, interrupt priority level, and a trace enable bit for debugging.[6] It resides at memory address 177776 (octal) and is integral to managing execution context across the PDP-11 family of processors.[7] The PSW's contents are preserved and restored during context switches to ensure reliable program flow.
The PSW's bit layout is standardized across PDP-11 implementations, with fields dedicated to mode control, priority, and status flags. In models supporting multiple privilege levels, such as the PDP-11/70, the structure includes previous and current mode indicators along with a register set selector. The following table outlines the bit assignments:
| Bits | Field | Description |
|---|
| 15-14 | Current Mode | 00: Kernel mode (full privileges); 01: Supervisor mode (intermediate privileges); 11: User mode (restricted access); 10: Invalid.[6] |
| 13-12 | Previous Mode | Encodes the mode active before an interrupt or trap, using the same values as current mode.[6] |
| 11 | Register Set | 0: Selects register set 0; 1: Selects register set 1 (for models with dual sets to support context switching).[6] |
| 10-8 | Reserved | Must be 0; unused in standard implementations.[7] |
| 7-5 | Interrupt Priority | 000: Level 0 (lowest, all interrupts enabled); 111: Level 7 (highest, no interrupts accepted).[6] |
| 4 | T (Trace) | 1: Enables trace trap after each instruction execution for debugging; 0: Disabled.[7] |
| 3 | N (Negative) | Condition code: 1 if most significant bit of result is 1 (negative for signed operations).[6] |
| 2 | Z (Zero) | Condition code: 1 if result is all zeros.[6] |
| 1 | V (Overflow) | Condition code: 1 if signed arithmetic overflow occurs.[6] |
| 0 | C (Carry) | Condition code: 1 if carry out from most significant bit (addition) or borrow needed (subtraction).[6] |
Early PDP-11 models, such as the PDP-11/40, simplify modes to kernel (00) and user (11), omitting supervisor mode.[7]
Operating modes enforce privilege separation to protect system integrity. Kernel mode grants unrestricted access to all instructions, including those for memory management and I/O, while user mode limits execution to non-privileged instructions and prevents direct hardware access.[6] Supervisor mode, available in advanced models, provides an intermediate level for operating system services without full kernel privileges.[6] Mode transitions are automatic and hardware-driven: on an interrupt or trap, the processor shifts to a higher-privilege mode (typically kernel), saving the prior mode in bits 13-12 and updating the current mode bits to reflect the new state.[6] Return to the original mode occurs via instructions that restore the saved PSW.
Condition codes in bits 3-0 are modified by arithmetic, logical, and shift instructions to indicate result characteristics, enabling conditional branching without explicit testing. For addition, the N bit sets if the result's most significant bit is 1, Z sets if the result is zero, V sets if operands of identical sign yield an opposite-sign result (indicating overflow), and C sets on carry from the most significant bit.[6] Subtraction follows analogous rules: N and Z based on the result, V sets if opposite-sign operands produce a result matching the source's sign, and C sets if borrow is required from beyond the most significant bit.[6] These rules support both signed and unsigned interpretations, with V relevant primarily for signed operations and C for unsigned.
During interrupts or traps, the processor hardware saves the current PSW (along with the program counter) onto the mode-appropriate stack—using the kernel stack pointer in kernel mode, for instance—to preserve the execution context.[6] A new PSW is then loaded from the corresponding vector location in low memory, which specifies the priority, mode (typically kernel), and initial condition codes for the handler.[7] Restoration pops the saved PSW and program counter from the stack upon handler completion.[6]
Instruction Execution Cycle
The PDP-11 processor operates on a classic fetch-decode-execute cycle. In the fetch phase, the instruction is loaded from memory at the address specified by the program counter (PC, R7), and the PC is incremented by 2 to point to the next instruction. The decode phase interprets the opcode (typically bits 15-12 or similar, depending on format) and any immediate operands or addressing modes. The execute phase performs the operation, which may involve calculating effective addresses, fetching operands, executing arithmetic/logic functions, and storing results. Addressing modes like autoincrement add microsteps for address adjustment.[7]
Instruction timing varies by model, addressing mode, and memory type, measured in microseconds rather than fixed clock cycles due to the asynchronous UNIBUS. For example, in the PDP-11/40, a simple register-to-register move takes approximately 0.9 µs, while memory accesses add 1.5-3 µs depending on core or semiconductor memory. Complex instructions with multiple memory cycles, such as those using indexed or deferred modes, can take 5-10 µs or more. Appendix C of the PDP-11/40 manual provides detailed timings, such as effective address calculation adding 1.5 µs for register modes and up to 4.5 µs for autoincrement/decrement.[7]
Traps and interrupts are integrated seamlessly into the cycle. Traps (synchronous exceptions like overflow or odd-address faults) occur during execution and vector immediately after saving the PC and PSW to the stack. Interrupts (asynchronous, from devices via UNIBUS) are checked at the end of each instruction; if the device's priority exceeds the current IPL in the PSW, the processor saves state, sets kernel mode and appropriate IPL, and loads the handler's PC and PSW from a vector table starting at memory location 0 (even vectors for device interrupts, odd for traps). This ensures low-latency response in multitasking environments. The return from interrupt or trap restores the state via the RTI instruction.[7]
Compared to the original PDP-11/20 (1970), which lacked memory management and supported only 32 KB, the PDP-11/40 (1972) added optional MMU for 128 KB and faster cycles via improved logic. The later J-11 (LSI-11, 1975) was microprogrammed for compatibility, with 16 general registers in some variants and reg-to-reg transfers in 0.3 µs, offering better performance and integration but similar cycle structure.[7]
Memory Organization
The PDP-11 architecture organizes data primarily into 8-bit bytes and 16-bit words, with memory being byte-addressable to support flexible access to these units.[4] Words consist of two consecutive bytes stored on even boundaries, and there is no native support for 32-bit data types without software extensions or multiple-word emulation.[4] This design emphasizes efficient handling of small, fixed-size integers and characters, aligning with the system's 16-bit word-oriented processor.
The architecture employs little-endian byte order, where the least significant byte of a multi-byte value occupies the lower memory address.[4] For example, in a 16-bit word, the low-order byte is stored first, followed by the high-order byte; this ordering applies consistently to integer representations and affects operations like data transfer between registers and memory.[8] Byte-swapping, facilitated by the SWAB instruction, exchanges the high and low bytes within a word to accommodate scenarios such as interfacing with big-endian peripherals or correcting misaligned data imports.[4]
Integer data uses two's complement representation for signed values. An 8-bit signed integer ranges from -128 to 127, while an unsigned 8-bit integer spans 0 to 255; similarly, a 16-bit signed integer covers -32,768 to 32,767, and an unsigned 16-bit integer extends from 0 to 65,535.[4] These formats enable straightforward arithmetic instructions, where byte-mode operations affect only the lower 8 bits of a register or memory location.[8]
Characters are handled as 8-bit bytes, typically using ASCII encoding, and can be processed individually via byte instructions like MOVB or in strings as contiguous sequences addressed by base location and length.[4] Bit fields, representing subsets of bits within a byte or word, are manipulated using bitwise instructions such as BIS, BIC, or BIT, which allow masking and testing without dedicated field-extraction hardware.[4] The little-endian ordering and byte-swapping capability ensure compatibility in bit-level operations, particularly when packing multiple characters or flags into a word.[8]
Physical Memory Characteristics
The PDP-11 architecture features a 16-bit address bus that supports a physical address space of up to 64 kilobytes (2^16 bytes), enabling direct addressing of memory without memory management extensions.[7] This base configuration limits systems to 32K 16-bit words or 64K 8-bit bytes, with the uppermost 4K words (addresses 160000 to 177777 octal) reserved for UNIBUS I/O device registers and peripherals.[7] Additionally, low memory addresses from 0 to 377 octal are reserved for interrupt and trap vectors, such as locations 4, 10, 14, 20, 24, 30, 34, and 250 octal for specific traps and errors.[7]
Early PDP-11 models primarily utilized magnetic core memory, which offered access times ranging from 1 to 4 microseconds, depending on the specific module like the MM11 or MF11 series.[3] These core memory systems featured destructive reads, requiring a write-after-read cycle that contributed to their relatively slower performance compared to later technologies. In contrast, later models transitioned to semiconductor RAM, including bipolar and MOS variants, achieving faster access times of approximately 0.5 microseconds or less, with cycle times as low as 480 to 700 nanoseconds in configurations like the PDP-11/44 or PDP-11/70.[3] Semiconductor memory modules, such as the MSV11 series, supported higher capacities and were asynchronous, with minimum timings of 100 nanoseconds for data setup in write operations.[9]
Parity and error detection were optional features across PDP-11 memory systems to enhance reliability, particularly in multi-user environments. Core memory typically included 1-bit parity per byte, while semiconductor options provided parity generation and detection, often with onboard indicators like LEDs for error signaling in modules such as MSV11-JD.[3] Advanced configurations in later models, like the PDP-11/84, incorporated error-correcting code (ECC) capable of correcting single-bit errors and detecting double-bit errors, integrated into MOS memory up to 4 MB.[9] Memory parity errors generally triggered processor traps, ensuring system integrity without halting operations unless configured otherwise.[7]
| Memory Type | Typical Access Time | Representative Models | Key Features |
|---|
| Magnetic Core | 1–4 µs | PDP-11/04, PDP-11/20 | Destructive read/write, parity per byte |
| Semiconductor (MOS/Bipolar) | 0.5 µs or less (480–700 ns cycle) | PDP-11/34A, PDP-11/70, PDP-11/84 | Non-destructive, optional ECC, higher density up to 4 MB |
Memory Management Units
The PDP-11 architecture incorporated optional Memory Management Units (MMUs) to enable address relocation, protection, and expansion beyond the inherent 16-bit (64 KB) physical address space, particularly in mid-range models like the PDP-11/45. In these systems, the MMU employed eight Page Address Registers (PARs) and eight corresponding Page Descriptor Registers (PDRs) to define eight variable-length segments for user-mode address translation. Each PAR stored a 12-bit Page Address Field (PAF) specifying the base physical address as a multiple of 32 words (64 bytes), allowing relocation to any location in up to 256 KB of physical memory on the UNIBUS. The associated PDR contained a 7-bit Page Length Field (PLF) to set the segment limit from 32 words up to 4,096 words (8 KB) in 64-byte increments, with base/limit checking performed on every virtual address access to trap violations such as out-of-bounds references. This segmentation scheme divided the 64 KB virtual address space into up to eight segments, each growing either upward or downward from its base, facilitating efficient relocation for single-user or early multi-tasking environments while adding approximately 150 ns to each memory cycle for translation.[7]
Later high-end models, such as the PDP-11/70, advanced the MMU design with support for larger physical address spaces and enhanced mapping capabilities through the KT11-J variant. The PDP-11/70 MMU maintained the eight-segment structure but extended PARs to 16-bit PAFs for 22-bit physical addressing, enabling mapping into up to 4 MB of main memory while retaining the same 4 KB (effective segment granularity in byte terms, though specified in 512-byte blocks of 256 words) paging-like mechanism via 64-byte block increments. Address translation used an eight-entry hardware register set acting as a Translation Lookaside Buffer (TLB), where the virtual address's three high bits (15-13) indexed the active PAR/PDR pair, concatenating the PAF with the 13-bit displacement to form the physical address without software page tables. This direct-mapped approach, combined with separate mappings for instruction (I-space) and data (D-space), allowed split address spaces up to 128 KB per mode (kernel, supervisor, user), doubling effective capacity while preserving compatibility with earlier models. Without the MMU, systems were limited to 64 KB physical memory, but with it, expansion to 256 KB was straightforward via 18-bit addressing modes, scaling further to 4 MB through extended bus support.
Protection in PDP-11 MMUs was integrated into the PDRs via a 3-bit Access Control Field (ACF) and additional status bits, providing granular control per segment without full page table indirection. The ACF supported modes including non-resident (ACF=000, aborting all accesses), read-only (ACF=001 or 010, trapping writes), and read/write (ACF=100, 101, or 110, allowing full data access), with an implicit execute permission for I-space fetches treated as reads. A dedicated Access bit (bit 7) and Write/Modified bit (bit 6) tracked usage for demand paging, setting traps to kernel vector 004 (address error) or 010 (non-resident) on violations, while mode-specific register sets (separate for kernel, supervisor, and user) enforced privilege separation. In the PDP-11/45, these bits enabled basic protection for up to 248 KB, preventing unauthorized writes or overflows in segmented programs. The PDP-11/70 extended this with per-space protection, where I-space ACFs controlled instruction fetches (effectively read/execute) and D-space handled data read/write, supporting multi-user isolation in systems like UNIX V6.[7]
MMU stacking, an architectural extension in compatible models, allowed cascading multiple MMU units or extended register configurations to achieve higher memory capacities, such as 256 KB in base configurations or up to 4 MB in fully expanded systems like the PDP-11/70. This involved layering relocation registers to handle 18-bit or 22-bit addressing without altering the core 16-bit virtual space, using control bits in the Maintenance Register (MMR3) to select mapping width and enable stacking for incremental expansion in 256 KB modules. For instance, in the PDP-11/45 with stacking options, base relocation could be combined with additional descriptor sets to map beyond 248 KB, while the PDP-11/70's design natively supported 4 MB via stacked 22-bit translations across Unibus adapters. This capability was crucial for scaling from single-process limits to multi-MB shared memory, though it required careful configuration to avoid address overlap traps.
Addressing Mechanisms
In the PDP-11 architecture, the register mode provides direct access to one of the eight general-purpose registers as the operand for an instruction. This mode, denoted as Rn where n ranges from 0 to 7, treats the contents of the specified register as the operand without requiring any memory fetch, enabling high-speed operations within the processor. Registers R0 through R5 serve as general-purpose registers for data manipulation, R6 functions as the stack pointer (SP), and R7 acts as the program counter (PC).[7][6]
The immediate mode, denoted as #n, supplies a constant operand embedded directly in the instruction stream, allowing instructions to operate on fixed values without referencing external memory locations. In this mode, the operand is fetched from the word immediately following the opcode in memory; for word instructions, a 16-bit constant is used, while byte instructions employ an 8-bit constant that is sign-extended to 16 bits. The mode leverages the PC (R7) to compute the effective address as the current PC value, after which the PC is incremented by 2 to skip over the constant and proceed to the next instruction.[8][7]
Both register and immediate modes are encoded using a 3-bit mode field within the instruction word, typically bits 5:3 in the PDP-11's variable-length instruction format. For register mode, this field is set to 000 (octal 0), with the adjacent 3-bit register specifier selecting Rn from 0 to 7. Immediate mode uses 010 (octal 2) in the mode field combined with the register specifier set to 7 (PC), effectively fetching the inline constant while advancing the instruction pointer. This compact encoding supports efficient operand specification in most PDP-11 instructions.[8][6]
A specialized form of immediate addressing appears in branch instructions, where PC-relative immediate operands enable position-independent code execution. These instructions include an 8-bit signed displacement field in the opcode word itself, which is interpreted as a word offset (multiplied by 2 for byte addressing) and added to the PC value after the current instruction fetch (PC + 2). This allows branches within a range of ±128 words (±256 bytes), facilitating compact control flow without altering the general immediate mode mechanism.[8][7]
Autoincrement and Autodecrement Modes
The PDP-11 architecture includes autoincrement and autodecrement addressing modes to facilitate efficient sequential access to memory operands through general-purpose registers. In autoincrement mode, denoted as (Rn)+ where Rn is a general register, the contents of Rn serve as the effective address from which the operand is fetched; following the fetch, Rn is incremented by the operand size—1 for byte operations or 2 for word operations—to point to the next sequential location.[10] This post-adjustment timing enables straightforward traversal of data structures like arrays or lists in forward order, as exemplified by the assembler instruction MOVB (R4)+, R0, which loads a byte from the address in R4 into R0 and then increments R4 by 1.[11]
In contrast, autodecrement mode, denoted as -(Rn), first subtracts the operand size from Rn—1 for bytes or 2 for words—before using the updated value as the effective address to fetch the operand, allowing reverse sequential access.[10] This pre-adjustment is illustrated by MOV -(R4), R0, which decrements R4 by 2 (for a word) and then stores the contents of R0 at the new address pointed to by R4.[11] These modes apply uniformly to any general register, enhancing code efficiency for pointer-based operations without requiring explicit adjustment instructions.
A key application of autoincrement mode involves the program counter (PC, register R7), where (PC)+ is implicitly used during instruction fetching: the PC holds the address of the current instruction word, which is fetched, and then the PC is incremented by 2 to advance to the next word in the instruction stream, ensuring sequential program execution.[10] This automatic adjustment by 2 occurs regardless of operand size in the context of instruction prefetch, maintaining alignment on word boundaries.
For stack operations, these modes are particularly valuable when using the stack pointer (SP, register R6), which grows downward in memory. Autodecrement mode with SP pushes operands onto the stack by first decrementing SP by 2 and then storing the value at the new address, as in MOV R1, -(SP); conversely, autoincrement mode pops operands by fetching from the current SP address and then incrementing SP by 2, as in MOV (SP)+, R1.[10] In both cases for SP, adjustments are fixed at 2 bytes to preserve word alignment, even for byte operands, supporting reliable subroutine calls and interrupt handling.[11]
Indexed and Indirect Modes
The PDP-11 architecture supports indirect addressing through deferred modes, which allow operands to be located at addresses specified indirectly via registers or memory, enabling flexible data structure access without embedding full addresses in instructions.[4] These modes complement direct register and immediate addressing by deferring the final operand fetch, while indexed modes add offset capabilities for array-like operations.[4]
In register deferred mode, denoted as @Rn or (Rn), the contents of the specified general-purpose register Rn serve directly as the effective address of the operand, which may point to memory or an I/O device.[4] The effective address calculation is straightforward: EA = contents of Rn, allowing Rn to act as a pointer for sequential lists or indirect jumps.[4] This mode supports up to 22-bit effective addresses on systems equipped with memory management units, such as the PDP-11/70, extending beyond the standard 16-bit register width through relocation mechanisms.[4]
Autoincrement deferred mode, indicated by @(Rn)+, combines indirection with post-increment for traversing linked structures like argument lists.[4] Here, the effective address is obtained by first fetching the operand address from the memory location pointed to by Rn (EA = contents of memory at address in Rn), after which Rn is incremented by 2 bytes to prepare for the next access.[4] The increment occurs regardless of whether the operation is byte- or word-sized, ensuring consistent pointer advancement for word-aligned addresses in memory.[4] Like register deferred, it permits 22-bit effective addresses on enhanced systems.[4]
Indexed mode, represented as X(Rn), facilitates access to elements offset from a base by adding the contents of register Rn (used as an index) to an 8-bit signed displacement X embedded in the instruction.[4] The effective address is computed as EA = contents of Rn + X, where X ranges from -128 to +127, providing a compact way to address array entries or table offsets without altering the base register.[4] This mode's reliance on an 8-bit displacement limits the offset range but allows the full register value to contribute toward larger addresses, achieving up to 22-bit effective addresses with memory management support.[4]
Core Instruction Set
Double-Operand Arithmetic and Logic
The double-operand arithmetic and logic instructions in the PDP-11 architecture compute results between a source operand—addressable via any general addressing mode—and a destination operand, which is either a general register or a memory location, storing the outcome in the destination where applicable. These instructions follow a uniform 16-bit format consisting of a 4-bit opcode (bits 15-12), a 6-bit source effective address specifier (bits 11-6), and a 6-bit destination effective address specifier (bits 5-0). Most update the processor status word's condition codes (N for negative, Z for zero, V for overflow, C for carry/borrow) to reflect the operation's result, enabling conditional branching. Byte-mode variants operate on the low-order 8 bits of operands, indicated by setting bit 15 to 1 in the opcode field, while word mode uses 16-bit operands by default.[4]
Arithmetic operations include addition and subtraction in the core instruction set. The ADD instruction adds the source operand to the destination and replaces the destination with the sum, affecting all condition codes: N and Z based on the result's sign and zero status, V if signed overflow occurs, and C if an unsigned carry is generated. It uses octal opcode 06SSDD, where SS and DD denote the source and destination specifiers. The SUB instruction subtracts the source from the destination and stores the difference in the destination, similarly updating all condition codes (C indicates borrow). It employs octal opcode 16SSDD. Neither has a standard byte variant in the core set; byte arithmetic requires alternative sequences or extensions. For instance, the instruction to add memory location X to register R2 might encode as 06 2x 20 (octal, with x specifying the source mode).[4]
Multiplication and division form part of the Extended Instruction Set (EIS) and handle 32-bit intermediate results using even-odd pairs of general registers (R0-R1, R2-R3, etc.). The MUL instruction multiplies the 16-bit source by the 16-bit value in even register R, producing a 32-bit product stored with the high word in R and low word in R+1; it sets N if the result is negative, Z if zero, clears V, and sets C if bits beyond 31 are set. The opcode is 070RSS (octal), where R (0, 2, 4, or 6) identifies the pair and SS the source. The DIV instruction divides the 32-bit dividend in R:R+1 by the 16-bit source, placing the quotient (sign-extended) in R and remainder in R+1; it sets N and Z per the quotient, V on divide overflow or by-zero, and C on divide by zero. Its opcode is 071RSS (octal). Both are word-only, with the destination fixed as a register pair.[4]
Logic operations support bit testing and modification without a direct AND or OR that stores the intersection or union to the destination. The BIT instruction computes the bitwise AND of source and destination solely to set condition codes (N and Z per result, V cleared, C unaffected), leaving operands unchanged; it uses octal opcode 03SSDD for words and 13SSDD for bytes. The BIC instruction clears selected bits by ANDing the destination with the bitwise complement of the source, storing the result in the destination and updating N and Z accordingly (V cleared, C unaffected); opcodes are 04SSDD (word) and 14SSDD (byte). The BIS instruction sets selected bits by ORing the source into the destination, with similar condition code effects; opcodes are 05SSDD (word) and 15SSDD (byte). These enable efficient bit-field manipulations, such as masking or flag setting in registers or memory.[4]
| Instruction | Octal Opcode (Word/Byte) | Operation | Condition Codes (N Z V C) |
|---|
| ADD | 06SSDD / N/A | dst ← src + dst | * * * |
| SUB | 16SSDD / N/A | dst ← dst - src | * * * |
| MUL | 070RSS / N/A | R:R+1 ← R × src | * 0 * |
| DIV | 071RSS / N/A | R ← (R:R+1) / src; R+1 ← remainder | * * * |
| BIT | 03SSDD / 13SSDD | CC ← src & dst (no store) | * 0 - |
| BIC | 04SSDD / 14SSDD | dst ← dst & ~src | * 0 - |
| BIS | 05SSDD / 15SSDD | dst ← dst | src | * 0 - |
- = set according to result; 0 = cleared; - = unaffected. SS and DD are 6-bit address mode fields; RSS specifies register pair for MUL/DIV.[4]
Single-Operand Operations
The PDP-11 architecture features a class of single-operand instructions that modify or test a single destination operand in memory or a register, while updating the processor's condition codes (N, Z, V, C) based on the result. These instructions use a compact format where bits 15-6 of the opcode specify the operation (00OOO for basic arithmetic/logic and 006x for shifts), and bits 5-0 define the addressing mode and register for the destination. Operands are either 16-bit words or 8-bit bytes, depending on the instruction variant (word by default, byte with bit 15 set).[7]
The arithmetic and logic modification instructions perform in-place operations on the operand, altering its value and setting condition codes to reflect the outcome for subsequent conditional branches. CLR (opcode 0050) sets the destination to zero, clearing N, V, and C while setting Z, useful for initialization. INC (0052) adds 1 to the operand, setting N if the result is negative, Z if zero, and V if arithmetic overflow occurs from the maximum positive value (077777 for words), with C unaffected. DEC (0053) subtracts 1, mirroring INC's condition code behavior but setting V on underflow from the minimum negative value (100000). COM (0051) performs a bitwise one's complement, inverting all bits and setting C to 1, with N and Z based on the result and V cleared. NEG (0054) computes the two's complement negation (effectively subtracting the operand from zero), setting C to 1 unless the operand was zero, and V if the result is the minimum negative value. These operations enable efficient scalar adjustments without requiring a separate source operand.[7]
TST (0057) provides a non-destructive test of the operand, leaving it unchanged while setting N if negative, Z if zero, and clearing V and C, allowing condition codes to reflect the operand's sign and zero status for control flow decisions. Shift instructions operate on a single operand by performing one-bit shifts or rotations, also updating condition codes comprehensively. ASL (0063) arithmetically shifts left, multiplying by 2 for positive values (zero-fill on the right), setting N and Z by the result, V if the sign bit changes (indicating overflow), and C to the original most significant bit. ASR (0062) shifts right arithmetically, dividing by 2 while preserving the sign bit (arithmetic fill), clearing V, and setting C to the least significant bit shifted out. ROL (0061) rotates left through the carry flag, combining shift and carry recirculation, with V set if the new N differs from the old C. ROR (0060) rotates right similarly, again setting V on N-C mismatch. These shifts support multiplication/division by powers of 2 and bit manipulation tasks like alignment or rotation for circular buffers.[7]
| Instruction | Octal Opcode | Operation Summary | Condition Codes (N Z V C) |
|---|
| CLR | 0050 | Set to 0 | 0 1 0 0 |
| INC | 0052 | Add 1 | Result-based; V on overflow; C unaffected |
| DEC | 0053 | Subtract 1 | Result-based; V on underflow; C unaffected |
| COM | 0051 | One's complement | Result-based; V=0; C=1 |
| NEG | 0054 | Two's complement | Result-based; V on min neg; C=1 unless 0 |
| TST | 0057 | Test value | Operand-based; V=0; C=0 |
| ASL | 0063 | Shift left | Result-based; V on sign change; C=MSB out |
| ASR | 0062 | Shift right (sign extend) | Result-based; V=0; C=LSB out |
| ROL | 0061 | Rotate left thru C | Result-based; V=N XOR old C; C=MSB out |
| ROR | 0060 | Rotate right thru C | Result-based; V=N XOR old C; C=LSB out |
Bit manipulation in the single-operand context is limited, but instructions like these shifts enable extraction or setting of specific bits through repeated operations.[7]
Branch and Jump Instructions
The PDP-11 architecture provides a set of branch and jump instructions to manage program control flow by altering the program counter (PC), enabling both unconditional transfers and conditional branches based on the processor status word's condition codes (CC). These instructions support efficient short-range displacements for common control structures while allowing longer jumps through alternative addressing modes, promoting position-independent code via PC-relative addressing. Unconditional jumps include JMP, which loads the effective address (EA) directly into the PC, and JSR, which jumps to a subroutine by saving the return PC in a specified register (typically R5 for stack-based calls) before loading the EA into the PC.[6][8]
Conditional branch instructions, such as BRA (unconditional branch), BEQ (branch if equal, testing zero flag Z=1 from prior operations), and BNE (branch if not equal, Z=0), use an 8-bit signed displacement field in the instruction word. This displacement, interpreted as a word offset ranging from -128 to +127, is multiplied by 2 and added to the PC (which has been incremented past the current instruction) to compute the target address, yielding a byte range of -256 to +254.[6][8] The PC-relative nature of these branches ensures that code relocations do not require modification of the displacement values, as offsets remain constant relative to the instruction location.[6] These instructions do not alter the CC, relying instead on flags set by preceding arithmetic or logical operations.[8]
For branches exceeding the short displacement range, the architecture employs JMP or JSR with extended addressing modes, such as indexed (adding a register offset to a base address) or deferred indirect (loading from a memory-indirect location).[6] JMP supports these modes to reach any memory location without displacement limits, while JSR similarly uses them for subroutine entry points beyond ±128 words.[8] This combination allows flexible control flow, with short branches optimizing for speed in loops and decisions, and long jumps handling broader program navigation.[6]
Advanced Instructions
Subroutine and Stack Operations
The PDP-11 architecture supports subroutine calls through dedicated instructions that leverage the system stack for saving return addresses and managing linkage, enabling nested and reentrant code execution. The stack, pointed to by register 6 (R6, also known as the stack pointer or SP), operates on full 16-bit words and grows downward from the highest reserved memory address toward lower addresses, allowing linear expansion as items are pushed. This downward growth facilitates efficient parameter passing and local variable allocation without requiring complex memory management. Subroutines typically pass arguments via registers or the stack, with the calling convention preserving the linkage register's contents for return control flow.[12][13]
The Jump to Subroutine (JSR) instruction initiates a subroutine call by saving the current contents of a specified general register (the linkage register, often R5 or PC for direct addressing) onto the stack via autodecrement of SP, then loading the program counter (PC, containing the return address) into that register, and finally transferring control to the destination address. The format is JSR Rn, <destination>, where Rn is the linkage register (any general register, including R7 (PC) with special behavior) and uses any valid addressing mode except immediate or relative branch. For example, JSR PC, SUBR pushes the return address onto the stack and jumps to the subroutine label SUBR, with execution resuming after the JSR upon return. This mechanism supports up to seven levels of nesting using different registers but commonly employs a single linkage register for simplicity. The operation is equivalent to an explicit MOV Rn, -(SP) followed by MOV PC, Rn and a jump, ensuring the return address is preserved without altering the processor status word (PSW) during the call.[12][13]
Return from Subroutine (RTS) completes the call by loading the contents of the linkage register (now holding the return address) into the PC to resume execution, then popping the top stack word (the saved original register contents) back into that register via autoincrement of SP. The format is RTS Rn, matching the linkage register from the corresponding JSR, with an opcode of 00020R. For instance, following JSR R5, SUBR, an RTS R5 restores R5 and returns control, effectively reversing the JSR's stack modifications. This paired usage maintains stack integrity and register state across calls, with timing typically around 2.42 microseconds on early models like the PDP-11/40. Unlike interrupt returns, RTS does not restore the PSW, preserving the caller's execution environment.[12][13]
Stack manipulation for subroutine parameters and local frames often involves explicit pushes and pops using autodecrement and autoincrement modes on SP, such as MOV #value, -(SP) to push an argument or MOV (SP)+, R0 to pop a result. The MARK instruction, available on processors supporting the extended instruction set (e.g., PDP-11/45 and later), simplifies stack frame cleanup by adjusting SP to skip a specified number of words after an RTS, effectively deallocating parameters and temporaries. Its format is MARK #n, where n is the frame size in words (opcode 0064NN), performing SP ← SP + n + 2 after popping the return address, as in a sequence like JSR PC, SUBR; ... ; MARK #4; RTS PC to remove four argument words. In the MACRO-11 assembler, MARK facilitates structured stack allocation for subroutines, promoting clean entry and exit conventions without manual SP arithmetic. This instruction requires the stack page to be accessible in both instruction and data space on memory-managed systems.[12][13]
Interrupt and Trap Handling
The PDP-11 architecture supports software-initiated traps through dedicated instructions that facilitate system calls, emulation, and input/output operations. These traps are synchronous events triggered explicitly by program execution, distinct from asynchronous hardware interrupts. The trap instructions include EMT (Emulator Trap), TRAP, and IOT (Input/Output Trap), each directing control to predefined locations in a vector table residing in low memory.[4][7]
The EMT instruction, with opcodes 104000 to 104377 (octal), serves for user-defined emulation or debugging routines, transferring the low byte of the instruction as parameter data to the handler.[4] It vectors to location 030 (octal) in the trap table, where the first word holds the handler's program counter (PC) and the second the program status word (PSW).[7] Similarly, the TRAP instruction, opcodes 104400 to 104777 (octal), enables general system calls or error handling, also passing the low byte as data, but vectors to 034 (octal).[4] The IOT instruction, opcode 000004 (octal), is used for I/O device signaling and errors, vectoring to 020 (octal).[7] These instructions push the current PSW and PC onto the stack before branching, preserving the execution context.[4]
Exceptions in the PDP-11 arise from arithmetic or addressing errors during instruction execution, automatically invoking trap handlers via the vector table. A divide-by-zero condition in the DIV instruction triggers a trap to vector 4 (octal), allowing software to manage the invalid computation.[4] Odd-address exceptions occur when word operands or instructions reference unaligned (odd) memory locations, causing a boundary error trap to vector 004 (octal).[7] Other exceptions, such as reserved instruction execution, vector to 010 (octal). These mechanisms ensure reliable error detection without halting the processor.[4]
The trap handling sequence begins with the processor saving the current PSW and PC onto the stack by decrementing the stack pointer (SP) twice and storing the values.[7] It then fetches the new PC from the vector address and the new PSW from the next word (vector + 2), updating the processor state and jumping to the handler routine.[4] Return from the handler uses the RTI (Return from Interrupt) or RTT (Return from Trap) instruction, which pops the saved PSW and PC from the stack to restore the prior context.[7] This process mirrors interrupt vectoring but occurs synchronously within the current instruction cycle.
Traps differ from priority-based interrupts in their initiation and timing: traps are deterministic and instruction-synchronous, using fixed low-memory vectors without arbitration, whereas interrupts are asynchronous device events resolved by processor priority levels (0-7).[4] This separation allows traps to handle internal software exceptions efficiently, independent of external interrupt priorities.[7]
Miscellaneous and Condition Code Instructions
The PDP-11 architecture includes a dedicated set of condition code instructions that allow programmers to directly manipulate the four condition code bits (N for negative, Z for zero, V for overflow, and C for carry) stored in the low byte of the processor status word (PSW). These instructions are essential for controlling program flow in scenarios where arithmetic results need explicit adjustment without performing operations on data. The CCC (Clear Condition Codes) instruction, with opcode 000257 (octal), clears all four condition code bits to zero, while the SCC (Set Condition Codes) instruction, with opcode 000277 (octal), sets all four to one; both are zero-operand instructions that do not affect the program counter beyond normal advancement and require no specific addressing mode.[8] Complementing these, selective bit manipulation is possible through instructions like CLC (opcode 000241, clears C), SEC (opcode 000261, sets C), CLZ (opcode 000242, clears Z), and similar for N and V, enabling precise control over individual flags for conditional branching optimization.[14]
Miscellaneous utility instructions provide basic control over processor execution. The NOP (No Operation) instruction, opcode 000240 (octal), performs no action on registers, memory, or condition codes, serving primarily as a placeholder or to synchronize timing in assembly code; it advances the program counter by two bytes as with any instruction.[8] The HALT instruction, opcode 000000 (octal), terminates normal instruction execution, stopping the processor until an external reset or power cycle; in user or supervisor mode, it triggers a trap to location 000004 (octal), while in kernel mode it directly halts without trapping, making it useful for debugging or system shutdown but requiring privileged access in most operating environments.[15] Similarly, the WAIT instruction, opcode 000001 (octal), suspends processor activity until an interrupt occurs or the system is reset, preserving the current PSW and program counter; it has no effect on condition codes and is often used in interrupt-driven I/O routines to idle the CPU efficiently without privilege restrictions.[15]
The MTPS (Move to Processor Status Word) instruction, opcode 006250 (octal) in single-operand format, loads the low byte of the source operand into the PSW, potentially altering condition codes, trace enable (T), and priority level bits; however, it is privileged, allowing changes to the current priority level (PS<15:13>) only in kernel mode, while user and supervisor modes ignore those bits to prevent unauthorized escalation, thus enforcing protection in multitasking systems.[15]
Among the shift instructions, ASLB (Arithmetic Shift Left Byte), opcode 006300 + register (octal), shifts the source byte left by one bit, inserting zero into the least significant bit, with the most significant bit loaded into the carry flag (C) and used to set N and Z based on the result; V is set if the sign bit changes in a way indicating overflow for signed arithmetic. While functionally consistent, ASLB exhibits implementation-specific addressing quirks, particularly in autoincrement or autodecrement modes involving the stack pointer (R6 or R7): on some PDP-11 models (e.g., non-microcoded like the 11/04), byte operations adjust the SP by 1 for strict byte access, but on microcoded variants (e.g., 11/23, 11/24), it adjusts by 2 bytes to enforce word alignment, potentially leading to off-by-one errors in stack-based code if not accounted for across hardware generations.[16] This inconsistency highlights a broader design compromise in the PDP-11's handling of byte versus word operations on even-addressed stacks, as documented in DEC's architecture handbooks.[4]
Optional Extensions
Extended Instruction Set
The Extended Instruction Set (EIS) for the PDP-11, introduced with the PDP-11/45 processor in 1972, extends the core integer instruction capabilities by adding support for multi-precision arithmetic, logical operations, and utility functions that operate on 16-bit or 32-bit data.[17] This set enhances efficiency for tasks requiring wider data manipulation, such as in scientific computing or systems programming, without relying on software emulation of multi-word operations.[13] The EIS instructions are implemented in hardware, reducing execution time compared to sequences of basic instructions, and are available as standard on higher-end models like the 11/45 and optional via modules like the KE11-E on earlier systems such as the 11/40.[18]
Key EIS instructions include arithmetic shifts for scaling values, exclusive OR for bit manipulation, and sign extension for converting values while preserving sign. For example, the ASH instruction performs an arithmetic shift on a 16-bit register, shifting bits left or right by a specified count (ranging from -32 to +31) while preserving the sign bit for right shifts.[18] Similarly, XOR computes the bitwise exclusive OR between a register and a destination operand, useful for toggling specific bits or implementing parity checks.[17] The SXT instruction sets the destination to 0 if the N flag in the PSW is clear or to -1 if set.[17] These operations update the condition codes (N, Z, V, C) based on the result, enabling conditional branching in programs.[18]
The EIS also supports 32-bit operations through register pairs, where even-odd register pairs (e.g., R0-R1) treat the even register as the high word and the odd as the low word. This is particularly evident in the ASHC (arithmetic shift combined) instruction, which shifts a 32-bit value across the pair, maintaining arithmetic integrity for multi-precision numbers.[18] For arithmetic, the MUL instruction multiplies two 16-bit signed integers, producing a 32-bit result stored in a register pair, while DIV divides a 32-bit signed dividend by a 16-bit divisor, yielding a 16-bit quotient and remainder in the pair; both handle overflow by setting the V flag and aborting if necessary.[18] These extensions apply to both signed and unsigned modes, depending on the operand interpretation, allowing flexible use in mixed-precision algorithms.[17]
The following table summarizes representative EIS instructions with their formats and functions:
| Instruction | Octal Opcode Format | Description |
|---|
| MUL | 070 RSS | Multiplies 16-bit source by register R (even), stores 32-bit signed result in R and R+1.[18] |
| DIV | 071 RSS | Divides 32-bit value in R and R+1 by 16-bit source, stores signed quotient in R and remainder in R+1.[18] |
| ASH | 072 RSS | Arithmetic shifts 16-bit register R by count in source (positive for left, negative for right).[18] |
| ASHC | 073 RSS | Arithmetic shifts 32-bit register pair R and R+1 by count in source.[18] |
| XOR | 104 DDD | Performs exclusive OR of register R with destination, stores in destination.[17] |
| SXT | 006700 | Sets destination to 0 if N=0 or -1 if N=1 based on PSW (no source operand).[17] |
These instructions collectively enable more compact and faster code for integer-heavy applications, forming a bridge between the basic 16-bit operations and more advanced extensions.[13]
Floating-Point Instructions
The PDP-11 architecture provides floating-point support through two main implementations: the software-emulated Floating Instruction Set (FIS) and the hardware-based Floating Point Processor (FPP). The FIS, available as an option on processors such as the LSI-11/2 and PDP-11/03 via the KEV11 module, implements basic floating-point arithmetic using four instructions: FADD (octal 07500R), FSUB (octal 07501R), FMUL (octal 07502R), and FDIV (octal 07503R), where R is a general register serving as a pointer to operands on the stack. These instructions operate on single-precision (32-bit) values stored as pairs of 16-bit words, with the high-order word containing the sign and exponent, and the low-order word the mantissa; execution involves trapping to software routines if FIS is not present (vector 244).[4]
In contrast, the FPP, such as the FP11-A option for the PDP-11/35, is a dedicated hardware coprocessor that executes a broader set of up to 46 floating-point instructions in the octal range 170700-177777, including FADD (e.g., 172(AC)FSRC for double-operand addition) and FMUL (e.g., 175(AC)FSRC for multiplication), using opcode 17 (octal). The FPP employs six 64-bit accumulators (AC0-AC5) for operations, allowing overlapped execution with the CPU, and supports addressing modes from 0 (accumulator) to 7 (autodecrement). Instructions are categorized as double-operand (source and destination fields) or single-operand (destination only), with results affecting dedicated floating-point condition codes (FN, FZ, FV, FC) in the FPS status register.[4]
Floating-point data formats in both FIS and FPP adhere to a normalized representation with a hidden bit for efficiency. Single-precision format is 32 bits: 1 sign bit, 8-bit excess-128 biased exponent (range -127 to +127, octal 000-377), and 23 explicit mantissa bits implying a 24-bit fraction (0.5 ≤ fraction < 1.0), stored across two 16-bit words. Double precision extends to 64 bits across four 16-bit registers (or two words), using 1 sign bit, the same 8-bit exponent, and 55 explicit mantissa bits implying a 56-bit fraction, enabling higher precision via register pairs without native 128-bit support. The hidden bit (implied leading 1) is inserted for non-zero exponents during arithmetic, ensuring normalization by left-shifting the fraction until the most significant bit is 1, with the exponent adjusted accordingly; denormalized values (exponent 0) lack the hidden bit.[4][19]
Normalization rules enforce the fraction's leading bit as hidden, with the "hide ratio" referring to the maximum leading zeros shifted out (up to 55 for double precision) before the implied 1 aligns, preventing overflow in the exponent (biased value exceeding 377 octal triggers infinity or exception). Underflow occurs if shifting requires an exponent below 000 octal, setting the V flag; the FPS register's mode bits control rounding (chop for single, round for double) and precision selection. These mechanisms ensure consistent arithmetic across FIS software emulation and FPP hardware acceleration.[4]
Commercial and Memory Access Extensions
The Commercial Instruction Set (CIS) extends the PDP-11 architecture to support efficient decimal arithmetic and string manipulation, primarily for business applications such as COBOL programming. Introduced in 1977, CIS provides hardware acceleration for operations on packed binary-coded decimal (BCD) numbers, where each byte holds two digits (nibbles) plus a sign, enabling up to 32-digit precision without conversion to binary. These instructions use two-word descriptors for operands—specifying length, data type (packed or zoned), and address—loaded into register pairs like R0-R1 for the source, R2-R3 for the second operand, and R4-R5 for the destination, allowing flexible register or memory addressing.[20]
Key decimal operations in CIS include addition and subtraction of packed decimals. The ADDP instruction adds two packed decimal strings, performing nibble-by-nibble BCD addition via a dedicated ALU, handling carries and translating signs (preferred positive: 1100 binary; negative: 1101 binary) while adjusting for odd-length strings by treating the low-order digit as unsigned. Similarly, SUBP subtracts packed decimals using BCD subtraction, managing borrows and signs to produce a result with the preferred sign format. Both instructions set condition codes—N for negative result, Z for zero, V for overflow (e.g., exceeding 31 digits or invalid signs), and clear C—and are suspendable for interrupts, preserving state in the processor status word (PSW) bit 8. These enhance COBOL performance by avoiding software emulation of decimal math.[20]
Division in CIS is handled by EDIV, which performs extended division on packed decimal operands, yielding a quotient in the destination while setting condition codes for negative (N), zero (Z), overflow or divide-by-zero (V), and divide-by-zero (C). It operates via microcode implementing successive shifts and subtractions, potentially requiring post-adjustment for overflow, and supports the same descriptor format as ADDP and SUBP. For COBOL's polynomial computations—such as multi-precision multiplication and division in financial calculations like interest compounding—CIS includes MULP and DIVP instructions, which multiply or divide packed decimals using similar BCD hardware, producing results up to the operand length limits and setting appropriate condition codes. These polynomial-capable operations integrate seamlessly with COBOL's decimal data types, reducing execution time for business logic.[20]
Access to the PSW, which contains condition codes, priority level, and mode bits, is facilitated by the MFPS and MTPS instructions in extended PDP-11 configurations. MFPS moves the low byte of the PSW (non-privileged bits: condition codes N, Z, V, C, and current priority) to a destination register or memory, while MTPS moves from a source to the PSW's low byte, allowing user-mode programs to read or set interrupt priorities and condition codes without full privileged access. These instructions are part of the limited extended instruction set on models like the PDP-11/44 and J-11-based systems, enabling efficient status manipulation in application code. However, modifications to privileged PSW bits (e.g., mode or trace) require kernel or supervisor mode.[21]
Memory access extensions in the PDP-11 include the dedicated I/O page for device registers, mapped at addresses 17776000 to 17777777 (octal) within the 64 KB virtual space, allowing memory-like instructions to control peripherals without special I/O opcodes. Access to this page, as well as kernel memory spaces, requires privileged modes (kernel or supervisor) to prevent user-mode interference with system resources; attempts from user mode trap to the operating system. In high-performance models like the PDP-11/83, fast memory access is achieved via the Private Memory Interconnect (PMI), a dedicated bus supporting up to 4 MB of high-speed RAM with 22-bit physical addressing and on-chip caching in the J-11A CPU, reducing latency for instruction and data fetches compared to standard UNIBUS memory. This extension, standard on the 11/83, operates under the same privilege controls, with MMU enforcement for protected spaces.[22][23]
System Integration
Interrupt System
The PDP-11 interrupt system supports eight hardware priority levels, ranging from 0 (lowest) to 7 (highest), which determine the processor's responsiveness to external events. These levels are encoded in bits 5 through 7 of the 16-bit Processor Status Word (PSW). An interrupt request is only serviced if its priority exceeds the current PSW level; for instance, setting the PSW priority to 7 disables all external interrupts while allowing internal operations to continue.[6][7]
Hardware interrupts originate from peripheral devices connected to the Unibus, which assert requests on dedicated Bus Request lines: BR4 through BR7, mapping directly to priorities 4 through 7, with BR7 offering the highest precedence. The Non-Processor Request (NPR) line provides an even higher priority for DMA transfers, arbitrated between instruction cycles. Upon granting an interrupt, the processor automatically pushes the current program counter (PC) and PSW onto the supervisor stack (register R6) and performs vectored interrupt handling by loading new PC and PSW values from a dedicated table in low memory. This table occupies addresses 0 through 777 octal (512 bytes total), accommodating up to 128 entries, with device vectors commonly assigned from 100 octal (e.g., 060 octal for console input) onward to avoid overlap with fixed trap vectors. Each vector entry consists of two 16-bit words: the first specifies the starting address of the interrupt service routine, and the second provides the replacement PSW, often with an elevated priority to mask lower-level interrupts during servicing.[6][7][24]
Autovectoring enables efficient device-initiated interrupts, where the requesting device supplies a 9-bit vector address directly on the Unibus during the interrupt acknowledge cycle, allowing the processor to fetch the corresponding entry without polling or software decoding. For systems with multiple devices at the same priority level, daisy-chaining resolves conflicts: each Bus Grant (BG) line forms a serial chain through the devices, passing the grant signal sequentially until accepted by the interrupting device, with physical proximity to the processor determining intra-level priority—the closest device responds first.[6][24][7]
Software vectors complement hardware autovectoring by permitting programmers to dynamically update table entries for custom handlers, such as reassigning device service routines or integrating with operating system dispatchers; the PSW within the vector typically sets a higher priority to protect the handler from nested lower-priority interrupts. This design ensures low-latency response to critical events while maintaining system stability through prioritized masking.[6][7]
The PDP-11 employs a memory-mapped input/output (I/O) architecture, where peripheral devices are addressed as if they were part of the main memory space, allowing the CPU to use standard data movement instructions for I/O operations. This design integrates I/O seamlessly with memory access, eliminating the need for dedicated I/O instructions and enabling efficient interaction between the processor and devices. The architecture supports two primary data transfer methods: programmed I/O, handled directly by the CPU, and direct memory access (DMA), which allows devices to transfer data independently to reduce CPU overhead. Synchronization between the CPU and devices relies on handshaking protocols and status polling to ensure reliable operation.[25]
In the PDP-11's memory-mapped scheme, I/O devices occupy the upper portion of the 18-bit physical address space, specifically the I/O page ranging from 160000₈ to 177777₈ (corresponding to the high 4K words). This range, often referenced in software as 177xxx₈, reserves locations for device control and status registers, such as 177560₈ for console terminal status. The CPU accesses these addresses using ordinary instructions, treating device registers identically to memory locations; for instance, a MOV instruction can read from or write to a device's input/output buffer to initiate or complete a transfer. This approach simplifies programming by leveraging the existing instruction set for both memory and I/O operations, though it requires careful address management to avoid conflicts with physical memory.[7][25]
Programmed I/O in the PDP-11 involves the CPU actively managing data transfers by executing loops that poll device status and move data via instructions like MOV to or from device registers. The process typically begins with the CPU writing commands to a device's control register, followed by repeated checks of status bits (e.g., ready or done flags) in the I/O page to determine when data is available or an operation completes. For example, to input data from a device, the CPU might load a status register address into a register, test for a ready bit using TST or BIT instructions, and then execute a MOV to transfer the data once ready. This method suits low-speed devices but can burden the CPU for higher-throughput tasks, as the processor remains involved in every byte transferred. I/O devices may also trigger interrupts upon completion, signaling the CPU to handle the next step without constant polling.[7][25]
For high-performance needs, the PDP-11 architecture incorporates DMA channels, particularly in models like the 11/40, where devices can seize control of memory cycles to transfer blocks of data directly to or from memory without CPU intervention. DMA operates through non-processor requests (NPRs), granting devices priority access to memory for operations such as disk reads, with transfers occurring at full memory speed and minimal latency (e.g., up to 3.5 µs per cycle in the 11/40). The CPU programs the DMA controller by writing setup parameters to its registers in the I/O page, after which the device handles the transfer autonomously, notifying the CPU via interrupt when finished. This offloads the CPU, enabling concurrent processing during I/O.[7][25]
Handshaking and status polling form the core of synchronization in PDP-11 I/O, ensuring data integrity across varying device speeds. In programmed I/O, the CPU polls status registers—such as those containing bits for busy, done, or error conditions—using instructions like BIT to test flags before proceeding with data movement, preventing overruns or lost data. Handshaking involves mutual signaling between the CPU (or DMA controller) and device, where the initiator asserts a request and waits for an acknowledgment via status updates in the mapped registers, allowing asynchronous devices to pace the transfer. For DMA, handshaking extends to memory access protocols that confirm each cycle's completion before the next, maintaining order without CPU oversight. These mechanisms collectively enable robust, interrupt-completable I/O flows tailored to the PDP-11's unbuffered, cycle-stealing design.[7][25]
Bus Interface
The PDP-11 architecture employed two primary bus standards for interconnecting the central processor, memory, and peripherals: the Unibus and the Q-Bus. The Unibus served as the foundational interconnect in early PDP-11 systems, providing a versatile, asynchronous interface that allowed modular expansion. It featured a 16-bit multiplexed design, where address and data shared the same 16 data lines (BUS D00 L to BUS D15 L), along with 18 dedicated address lines (BUS A00 L to BUS A17 L). This configuration supported physical addressing up to 256 KB, with 248 KB allocated for memory and 8 KB reserved for I/O page access, enabling direct referencing of up to 128K words on even-byte boundaries. The Unibus operated asynchronously without fixed clock timing, achieving a maximum bandwidth of approximately 3 MB/s, limited by typical cycle times of 400-1000 ns depending on the operation and system configuration.[7][26]
In contrast, the Q-Bus, introduced with the LSI-11 series, represented an evolution optimized for cost reduction and higher performance in compact, later-generation PDP-11 systems such as the MicroPDP-11 models (e.g., PDP-11/23, /24, /44, /53, /73, /83). This synchronous bus used fixed-duration cycles coordinated by clock signals and handshaking (e.g., MASTER SYNC and SLAVE SYNC), with a multiplexed structure comprising 42 bidirectional lines and 2 unidirectional lines in its Q22 variant. It supported 22-bit physical addressing for up to 4 MB of memory space and 16-bit virtual addressing limited to 64 KB, surpassing the Unibus in capacity while maintaining compatibility with PDP-11 instruction sets. The Q-Bus delivered bandwidths ranging from 1 to 4 MB/s, benefiting from its synchronous design and overlapped operations, which made it suitable for faster CPU-memory and I/O interactions in LSI-based implementations. Unlike the asynchronous Unibus, the Q-Bus emphasized distributed control to minimize conflicts in shared environments.[9]
Bus arbitration in both standards ensured orderly access among multiple masters, such as the CPU and direct memory access (DMA) controllers, through a priority-based mechanism. On the Unibus, arbitration was partially distributed with daisy-chaining at each level; devices requested mastership via non-processor request (NPR L) for low priority or bus request lines (BR7 L to BR4 L) for higher priorities, with grants issued sequentially (BG7 H to BG4 H) by a central arbitrator, typically the CPU or Unibus adapter. DMA devices held the highest priority to sustain data throughput, while the selected master acknowledged via slave acknowledge (SACK L) to halt further requests. The Q-Bus employed a similar master-slave hierarchy with four interrupt priority levels and signals like BDMGI L (bus DMA grant in) and BDMGO L (bus DMA grant out) for chaining; it allowed overlapped arbitration during data transfers, enhancing efficiency, and assigned DMA the top priority to support block-mode operations. This structure prevented bus contention and supported up to seven masters on the Unibus or multiple in Q-Bus configurations.[7][26][9]
Data transfers on both buses followed standardized cycle types to handle read, write, and modify operations between masters and slaves. Common cycles included DATI (data in, for reading a 16-bit word or byte from slave to master) and DATO (data out, for writing a 16-bit word from master to slave), with typical timings of 450-680 ns for DATI and 580-680 ns for DATO on the Unibus, varying by model. Byte-oriented variants like DATOB (data out byte, preserving the unselected byte during writes) and DATIP (data in with preemption, initiating read-modify-write sequences by pausing further arbitration) extended functionality for partial-word access. The Q-Bus mirrored these with DATI, DATO(B), and added DATIO(B) for read-modify-write, plus block modes like DATBI (block read) and DATBO (block write) for DMA efficiency, often completing in under 1 µs due to synchronization. These protocols used control signals (e.g., BUS MSYN L for master synchronization and BUS SSYN L for slave synchronization) to interlock transfers, ensuring data integrity across the 50-foot maximum bus length on the Unibus or shorter Q-Bus segments.[7][26][9]
Programming Aspects
MACRO-11 Assembly Syntax
MACRO-11 is the assembly language developed by Digital Equipment Corporation (DEC) for programming the PDP-11 minicomputer family, providing a mnemonic-based syntax for specifying instructions, data, and assembler directives.[27] The language emphasizes readability through symbolic representations while adhering to the PDP-11's 16-bit architecture constraints, with source statements typically structured in a fixed format starting with an optional label field followed by an opcode or pseudo-op and operands.[27]
Instruction mnemonics in MACRO-11 follow a general form of OPCODE DESTINATION, SOURCE, where the opcode is a three- to six-letter abbreviation for the operation, and operands specify registers or memory locations using PDP-11 addressing modes such as register direct (Rn), immediate (#value), or autoincrement deferred ((Rn)+).[27] For example, ADD R0, (R1)+ assembles the ADD instruction to add the contents of the memory location pointed to by register R1 to register R0, then increment R1 by 2 to point to the next word.[27] Mnemonics are predefined permanent symbols, with full lists including operations like MOV (move) and SUB (subtract) detailed in the assembler's reference appendices.[27] Addressing modes are denoted concisely in the operand fields, as covered in the PDP-11's addressing mechanisms.[27]
Labels in MACRO-11 serve as symbolic references to memory locations and are defined by placing the label name followed by a single colon (:) for local scope or double colon (::) for global scope at the beginning of a statement.[27] Label names can be up to 31 characters long, starting with a letter or underscore, and must be unique within their scope to avoid assembler errors.[27] Local labels, used for temporary or block-specific references, take the form n$ where n is a number from 1 to 65535, automatically scoped within sections delimited by global labels, pseudo-section directives like .PSECT, or macro expansions, and prefixed with ? when generated internally by the assembler.[27]
Pseudo-ops, or assembler directives beginning with a period, control data allocation and assembly behavior without generating machine code.[27] Common data pseudo-ops include .BYTE for reserving successive 8-bit bytes, .WORD for 16-bit words, and .EVEN to align the location counter to an even address by inserting a NOP if necessary.[27] For instance, .BYTE 060, 65 allocates two bytes with octal values 60 and 65, while .WORD 177535, 100 reserves two words.[27] These directives support expressions for dynamic values and are essential for initializing data sections.[27]
Macros in MACRO-11 enable code reuse through user-defined blocks, declared with .MACRO followed by the macro name and formal parameters, and terminated by .ENDM.[27] Parameters can be positional or keyword-based, with dummy arguments for optional ones, and macros support nesting up to 25 levels while generating unique local labels during expansion.[27] Invocation occurs by stating the macro name with actual arguments, such as calling a simple increment macro defined as .MACRO INCX A \n ADD A, #1 \n .ENDM with INCX R0.[27]
Conditional assembly directives allow selective inclusion of source code based on expression evaluations, using constructs like .IF expression, .IFF (false if), .IFT (true if defined), and .IIF for immediate conditional macro calls, all nested and terminated by .ENDC or .MEXIT for early exit.[27] Conditions test relations such as equality (EQ), greater than (GT), or symbol definition (DF), enabling features like debugging toggles or platform-specific code.[27] For example, .IF EQ <symbol>, <value> ... .ENDC includes the block only if the symbol equals the value.[27]
Numeric constants in MACRO-11 default to octal radix (base 8), reflecting the PDP-11's historical use of octal notation for addresses and data.[27] The radix can be changed globally with the .RADIX pseudo-op specifying base 2, 8, 10, or 16, or temporarily using prefixes like D for decimal (e.g., D123) or H for hexadecimal.[27] This flexibility aids in mixing notations while maintaining octal as the standard for core PDP-11 programming.[27]
Sample Programs
To illustrate key features of the PDP-11 architecture, such as addressing modes, control flow, subroutine linkage, interrupt processing, and extended instructions, the following annotated assembly code examples are provided using MACRO-11 syntax, where registers are denoted as R0–R7 (with R6 as the stack pointer SP and R7 as the program counter PC), and addressing modes include autoincrement (e.g., (Rn)+).
Simple Loop Using Autoincrement and Branch
A basic loop for clearing a block of memory demonstrates the autoincrement addressing mode, which automatically advances the register pointer after accessing the operand, combined with the Subtract One and Branch (SOB) instruction for efficient iteration control. The SOB decrements a register and branches if the result is nonzero, enabling compact loop structures without explicit comparison. The following example initializes R0 to point to a 10-word buffer and R1 to the count, then clears each word while incrementing the pointer.
BUFFER: .BLKW 10 ; Reserve 10 words of buffer
COUNT: .WORD 10 ; Loop count
START: MOV #BUFFER, R0 ; R0 points to buffer start
MOV COUNT, R1 ; R1 = loop counter (10)
LOOP: CLR (R0)+ ; Clear word at R0, autoincrement R0 by 2
SOB R1, LOOP ; Decrement R1; branch to LOOP if R1 ≠ 0
BUFFER: .BLKW 10 ; Reserve 10 words of buffer
COUNT: .WORD 10 ; Loop count
START: MOV #BUFFER, R0 ; R0 points to buffer start
MOV COUNT, R1 ; R1 = loop counter (10)
LOOP: CLR (R0)+ ; Clear word at R0, autoincrement R0 by 2
SOB R1, LOOP ; Decrement R1; branch to LOOP if R1 ≠ 0
This loop executes in a tight cycle, leveraging the PDP-11's register-oriented design for minimal memory access. The autoincrement mode (mode 2) adds 2 to R0 post-operation, aligning with the 16-bit word addressing.
Subroutine Call with Stack Frame
Subroutines on the PDP-11 use the Jump to Subroutine (JSR) instruction to push the return address onto the stack (via SP = R6) and transfer control, with Return from Subroutine (RTS) to restore it. For stack frames, R5 serves as the frame pointer (FP), allowing local variables and parameters to be accessed relative to a stable base. The MARK instruction can automate frame cleanup by adjusting the stack after return. The example below shows a call to a subroutine that adds two values passed via registers, establishing a basic frame.
MOV R5, -(SP) ; Push old FP onto [stack](/page/Stack)
MOV SP, R5 ; Set new FP to current SP
SUB #2, SP ; Allocate 2 words for locals/[parameters](/page/Parameter) (if needed)
MOV #5, R0 ; [Parameter](/page/Parameter) 1 in R0
MOV #3, R1 ; [Parameter](/page/Parameter) 2 in R1
JSR PC, ADD_SUB ; Call subroutine (pushes return PC)
; After return (cleanup in subroutine), adjust if needed in caller
; For this simple case, no additional caller cleanup required
ADD_SUB: ADD R0, R1 ; Add parameters: result in R1
MOV R5, SP ; Deallocate frame by restoring SP to FP
MOV (SP)+, R5 ; Pop old FP
RTS PC ; Return, pop PC from stack
MOV R5, -(SP) ; Push old FP onto [stack](/page/Stack)
MOV SP, R5 ; Set new FP to current SP
SUB #2, SP ; Allocate 2 words for locals/[parameters](/page/Parameter) (if needed)
MOV #5, R0 ; [Parameter](/page/Parameter) 1 in R0
MOV #3, R1 ; [Parameter](/page/Parameter) 2 in R1
JSR PC, ADD_SUB ; Call subroutine (pushes return PC)
; After return (cleanup in subroutine), adjust if needed in caller
; For this simple case, no additional caller cleanup required
ADD_SUB: ADD R0, R1 ; Add parameters: result in R1
MOV R5, SP ; Deallocate frame by restoring SP to FP
MOV (SP)+, R5 ; Pop old FP
RTS PC ; Return, pop PC from stack
This convention supports nested calls and parameter passing, with JSR PC pushing the return address and loading the subroutine address into PC. The frame at R5 enables offsets like 2(R5) for locals. For simpler cases without explicit frames, JSR R5, SUBR pushes the old R5 value, which RTS R5 restores.
Interrupt Handler Skeleton
The PDP-11 interrupt system automatically stacks the processor status word (PS) and PC onto the system stack (R6) upon vectoring to the handler address, then executes RTI (Return from Interrupt) to restore them. Handlers must preserve non-volatile registers to maintain caller state. The skeleton below assumes a vector at location 300 (octal) pointing to the handler, which saves/restores R0–R4 and performs minimal processing (e.g., acknowledging a device).
. = 300 ; Interrupt vector location
JMP INTHND ; Jump to handler
INTHND: MOV R0, -(SP) ; Save R0–R4 on [stack](/page/Stack)
[MOV](/page/MOV) R1, -(SP)
[MOV](/page/MOV) R2, -(SP)
[MOV](/page/MOV) R3, -(SP)
[MOV](/page/MOV) R4, -(SP)
; Interrupt service code, e.g.:
; [BIT](/page/BIT) #1, @#1777750 ; Clear [device](/page/Device) [interrupt](/page/Interrupt) (example I/O address)
; Process data
[MOV](/page/MOV) (SP)+, R4 ; Restore R4–R0
[MOV](/page/MOV) (SP)+, R3
[MOV](/page/MOV) (SP)+, R2
[MOV](/page/MOV) (SP)+, R1
[MOV](/page/MOV) (SP)+, R0
RTI ; Restore [PS](/page/PS) and PC from [stack](/page/Stack)
. = 300 ; Interrupt vector location
JMP INTHND ; Jump to handler
INTHND: MOV R0, -(SP) ; Save R0–R4 on [stack](/page/Stack)
[MOV](/page/MOV) R1, -(SP)
[MOV](/page/MOV) R2, -(SP)
[MOV](/page/MOV) R3, -(SP)
[MOV](/page/MOV) R4, -(SP)
; Interrupt service code, e.g.:
; [BIT](/page/BIT) #1, @#1777750 ; Clear [device](/page/Device) [interrupt](/page/Interrupt) (example I/O address)
; Process data
[MOV](/page/MOV) (SP)+, R4 ; Restore R4–R0
[MOV](/page/MOV) (SP)+, R3
[MOV](/page/MOV) (SP)+, R2
[MOV](/page/MOV) (SP)+, R1
[MOV](/page/MOV) (SP)+, R0
RTI ; Restore [PS](/page/PS) and PC from [stack](/page/Stack)
Vectors are fixed locations (e.g., 300–777 octal for levels 0–7), loaded by the operating system or loader. This auto-stacking minimizes handler overhead, supporting vectored interrupts without software polling.
EIS Example: 32-Bit Multiply
The Extended Instruction Set (EIS) provides hardware support for 32-bit operations, including multiplication via the MUL instruction, which produces a double-word result from two 16-bit operands. When the destination register is even (e.g., R0), MUL treats it as the start of an even-odd pair, multiplying the source by the even register's contents and storing the 32-bit signed product across the pair (low word in even register, high word in next odd register). The example multiplies values in R2 (source) and R0 (multiplicand), yielding the product in R0–R1.
MOV #1234, R0 ; Multiplicand (low word; assume R1=0 for full 32-bit if needed)
[MOV](/page/MOV) #5678, R2 ; Multiplier ([source](/page/Source))
MUL R2, R0 ; 32-bit multiply: R0–R1 = R2 * R0 (signed)
; Result: low 16 bits in R0, high 16 bits in R1
MOV #1234, R0 ; Multiplicand (low word; assume R1=0 for full 32-bit if needed)
[MOV](/page/MOV) #5678, R2 ; Multiplier ([source](/page/Source))
MUL R2, R0 ; 32-bit multiply: R0–R1 = R2 * R0 (signed)
; Result: low 16 bits in R0, high 16 bits in R1
This avoids software emulation for fixed-point arithmetic, with the operation handling sign extension internally for efficiency in scientific or decimal computations. For 16-bit results, use an odd destination register.
The PDP-11 architecture exhibited significant performance variations across its model implementations, primarily driven by differences in clock frequencies and processing capabilities. Early models like the PDP-11/10 operated with an average instruction execution time of approximately 4.095 µs, reflecting the slower timing of its discrete logic design and core memory cycles around 900 ns to 1.2 µs.[28] In contrast, later models such as the PDP-11/83, based on the J-11 microprocessor chipset, achieved clock speeds of 15 MHz, enabling much faster execution with effective cycle times reduced to around 220 ns for non-stretched operations.[29] These advancements allowed the 11/83 to handle complex workloads more efficiently, though overall performance still depended on memory subsystem interactions and optional enhancements.
Memory access introduced notable overhead in the PDP-11 design, typically adding 1-2 cycles to instruction execution due to the Unibus arbitration and address translation processes. In systems without memory management units, this overhead was minimized, but enabling features like the KT11 added approximately 0.15 µs per cycle for relocation and protection checks.[30] For the PDP-11/70, a cache miss could extend read cycle times from 0.30 µs (hit) to 1.32 µs, emphasizing the impact on throughput for memory-intensive tasks.[6]
The base PDP-11 architecture lacked an onboard cache, relying instead on main memory speeds that limited overall performance to 0.33-0.5 mega-instructions per second under typical loads.[28] Later models introduced microcode optimizations and optional caches—such as the 2,048-byte bipolar cache in the PDP-11/70, achieving 80-95% hit rates—to mitigate bottlenecks, effectively doubling performance in cache-sensitive scenarios.[6] These enhancements, including overlapped instruction fetch and execution, addressed architectural limitations without altering the core instruction set.
Benchmark evaluations, such as the Whetstone test for floating-point performance, highlight these disparities; the PDP-11/10 achieved only 0.0129 MWIPS, while the PDP-11/70 with FP hardware reached 0.532 MWIPS, underscoring the benefits of faster clocks and specialized units for scientific computing.[31] Such metrics established the PDP-11's scalability but also revealed bottlenecks in uncached, low-clock configurations for demanding applications.