Interleaved memory
Interleaved memory is a computer architecture technique that divides the main memory into multiple independent banks or modules, enabling concurrent access to different banks to enhance memory bandwidth and mitigate access delays, particularly for sequential or pipelined data operations.[1] This organization allows the processor to initiate a new memory request to another bank while a previous one is being serviced, effectively overlapping access times and increasing the overall throughput of the memory system.[2] In an interleaved memory system, the physical address is partitioned into bits for selecting the specific word within a bank and bits for choosing the bank itself, typically using a shared bus to connect all banks.[1] For instance, with K banks (often a power of 2, such as 4 or 8), the system can achieve a peak data transfer rate up to K times higher than a single bank, as each bank operates independently with its own access cycle.[1] This is especially beneficial for burst transfers, such as filling cache lines, where consecutive addresses are distributed across banks to hide the inherent latency of dynamic random-access memory (DRAM).[1] However, performance depends on avoiding bank conflicts, where multiple requests target the same bank simultaneously, which can be managed through address mapping and scheduling.[2] There are two primary types of interleaving: low-order interleaving, where the least significant address bits determine the bank (placing consecutive words in different banks for fine-grained parallelism), and high-order interleaving, where higher bits select the bank (grouping larger blocks of data per bank, often used for coarser access patterns).[1] Low-order interleaving is more common in modern systems for its efficiency with sequential accesses, while high-order suits scenarios with localized data usage.[1] Both approaches leverage the single-port nature of memory banks to simulate multiport behavior without the added hardware complexity and cost.[2] The concept of interleaved memory emerged in the 1960s amid the need to bridge the growing speed gap between processors and memory in early supercomputers.[3] Pioneering implementations appeared in systems like IBM's 7030 Stretch and Control Data Corporation's CDC 6600, designed by Seymour Cray, which employed 32-way interleaving with magnetic core memory to support high-bandwidth pipelined operations.[4] By the 1970s, it became integral to machines like the CDC 7600 and IBM System/360 models (e.g., 85 and 91), using mathematical models to optimize for instruction and data address patterns rather than random simulations.[3] Today, interleaving persists in multi-channel DRAM configurations, GPU memory hierarchies, and cache systems, continuing to address bandwidth demands in high-performance computing.[1]Fundamentals
Definition and Principles
Interleaved memory is a technique in computer architecture that divides the physical memory into multiple independent banks or modules, permitting simultaneous access to different addresses that are mapped to separate banks. This organization enhances memory bandwidth by allowing parallel operations across banks, which is particularly useful for sequential or burst-mode data accesses.[5][6] The core principles of interleaved memory revolve around distributing sequential addresses evenly across the banks to enable pipelined or concurrent fetches, thereby concealing the access latency of individual modules. Typically, low-order bits of the address determine the bank selection, ensuring that consecutive addresses reside in different banks and can be accessed without conflict. This setup supports efficient parallelization in systems where memory requests arrive in a predictable pattern, such as in vector processors or cache fill operations.[7][8][5] For example, in a 4-bank interleaved system, address 0 maps to bank 0, address 1 to bank 1, address 2 to bank 2, address 3 to bank 3, address 4 back to bank 0, and so forth. This cyclic assignment illustrates the even distribution that optimizes burst accesses, as multiple consecutive words can be retrieved simultaneously from distinct banks.[6][7] The mathematical mapping for bank selection in this low-order scheme is expressed as: \text{bank number} = A \mod n where A is the memory address and n is the number of banks. To derive this formula, assume a total memory of $2^r words addressed by r-bit values, with n = 2^k banks (a common power-of-two configuration for binary alignment), each containing $2^{r-k} words. The address A can then be decomposed into higher-order bits representing the offset within a bank and the lower k bits selecting the bank itself: A = (\text{offset} \times n) + \text{bank}. Extracting the bank thus yields \text{bank} = A \mod n, which cycles addresses through banks sequentially and ensures no two consecutive addresses conflict on the same bank. This derivation underpins the even load balancing essential to interleaving's effectiveness.[6][7][9]Types of Interleaving
Interleaved memory systems employ various strategies to map addresses across multiple banks or modules, with the primary distinction lying in how address bits are used for bank selection. Low-order interleaving assigns consecutive memory addresses to adjacent banks by utilizing the least significant bits of the address for bank selection, such as taking the address modulo the number of banks to determine placement.[7] This approach is particularly effective for sequential access patterns, as it allows pipelined reads or writes to overlap across banks without conflicts, exploiting the temporal proximity of accesses in programs with stride-1 patterns like cache block replacements.[7] In contrast, high-order interleaving uses the most significant address bits to select banks, assigning larger contiguous blocks of addresses to each bank.[10] This method suits random or scattered access patterns, as it distributes non-consecutive addresses more evenly across banks, reducing the likelihood of simultaneous conflicts in workloads with low spatial locality.[3] By grouping addresses at a coarser granularity, high-order interleaving enhances capacity efficiency in systems where access streams are unpredictable, such as in multiprocessor environments with independent memory banks.[10] Hybrid approaches combine elements of low- and high-order interleaving to achieve balanced performance in mixed workloads that exhibit both sequential and random access characteristics. For instance, group-based or vertical interleaving subdivides banks into sub-banks using intermediate address bits, allowing finer control over distribution while preserving some block-level locality.[11] These methods selectively interleave at multiple levels, such as grouping two or more lines per bank before applying low-order selection, to mitigate imbalances in access patterns and optimize for power and latency in diverse scenarios like instruction caches.[11]| Aspect | Low-Order Interleaving | High-Order Interleaving |
|---|---|---|
| Address Mapping | Least significant bits select bank; consecutive addresses in different banks.[3] | Most significant bits select bank; larger blocks in same bank.[3] |
| Pros | Excellent for sequential/pipelined accesses; reduces latency via overlap; power-efficient in fine-grained distribution.[7][11] | Better for random accesses; minimizes conflicts in scattered patterns; higher capacity utilization.[3][10] |
| Cons | Prone to conflicts in random accesses; potential bank thrashing.[3] | Inefficient for sequential patterns; concentrates accesses, increasing power density.[11] |
| Suitability | Sequential workloads, cache systems with stride-1 accesses.[7] | Random or multiprocessor workloads with low locality.[10] |