Fact-checked by Grok 2 weeks ago

Scratchpad memory

Scratchpad memory (SPM), also known as scratchpad RAM or local store, is a high-speed on-chip () that is explicitly managed by software, allowing programmers or compilers to directly allocate and access without intervention. Unlike caches, which rely on automatic mechanisms for and , SPM operates in a distinct with fixed access latencies, enabling predictable performance in time-critical applications. Gaining prominence as a viable alternative to caches in embedded systems during the early , SPM addresses the limitations of overheads by eliminating complex tag comparison and miss detection circuitry. It provides significant efficiency gains, including average energy reduction of 40% and area savings of 34% compared to equivalent configurations, making it ideal for power-constrained devices such as mobile phones, processors, and communication systems. In modern architectures, SPM remains prominent in multicore processors and specialized accelerators, particularly for deep neural networks, where explicit data management facilitates reuse buffers and minimizes off-chip memory accesses, achieving improvements of up to three orders of magnitude over traditional CPU-based processing. Its software-controlled nature supports optimizations and dynamic allocation techniques, enhancing applicability in and domain-specific computing environments, including recent advancements in neural processing units and GPU architectures as of 2025.

Fundamentals

Definition and Characteristics

Scratchpad memory is a high-speed, software-managed on-chip () that serves as temporary storage for and instructions directly accessible by the core. Unlike caches, it lacks automatic hardware mechanisms for data placement and eviction, requiring explicit programmer or control to load and unload content. This design positions scratchpad memory within the between processor registers and main memory, facilitating low-latency access to critical program elements in and resource-constrained systems. Key characteristics of scratchpad memory include its fixed capacity, typically ranging from 1 to 512 , which supports direct addressing without the need for arrays or associativity found in . Access times are highly predictable, as there are no miss penalties or coherence overheads; every valid address within the scratchpad yields a deterministic hit , often comparable to or better than L1 cache access due to simplified circuitry. This predictability stems from the absence of hardware-managed replacement policies, making it particularly suitable for applications where timing guarantees are essential. In distinction from general-purpose memory structures, scratchpad memory focuses on minimizing for frequently accessed in power- and area-limited environments, such as embedded processors, by integrating seamlessly as a software-controlled . Its basic operational principle involves explicit data movement via software instructions or (), ensuring that only selected program segments reside on-chip at any time and enabling fully deterministic execution without the variability introduced by misses.

Historical Development

The concept of scratchpad memory originated in the late and early as a form of fast, modifiable on-chip storage to support functions in early systems. Honeywell pioneered its use with the H-800 system, announced in 1958 and first installed in 1960, which incorporated a 256-word core-based scratchpad for multiprogram , enabling efficient task switching without relying solely on slower main . By 1965, Honeywell's Series 200 minicomputers integrated scratchpad memories of varying sizes (up to 64 locations) as storage, offering access speeds 2 to 6 times faster than main memory to enhance throughput in business applications. A significant came in 1966 with the Honeywell Model 4200 minicomputer, which utilized the TMC3162, a 16-bit scratchpad memory developed by Transitron and second-sourced by multiple manufacturers including Fairchild, Sylvania, and ; this marked one of the first commercial implementations of scratchpad for high-speed needs. The 1980s saw widespread proliferation of scratchpad memory in digital signal processors (DSPs) for real-time applications, driven by the need for deterministic performance in embedded systems. ' series, launched in 1983, incorporated on-chip scratchpad RAM as auxiliary storage for temporary data, complementing program and data memories to enable high-speed filtering and processing without external memory delays. This design choice in the and subsequent models facilitated efficient algorithmic implementations in and audio processing, establishing scratchpad as a staple in DSP architectures. During the and , scratchpad memory expanded into and multicore systems, particularly with the rise of power-constrained devices. A key example is the Cell Broadband Engine, designed starting in 2001 through the STI alliance (, , ), which featured 256 KB of local store per Synergistic Processing Unit (SPU) as explicitly managed scratchpad memory to support parallel workloads in and scientific computing. This architecture, first shipped in Sony's in 2006, demonstrated scratchpad's efficacy in reducing for operations across multiple cores. Post-2010 developments have integrated scratchpad into graphics processing units (GPUs) and explored hybrid designs for improved . NVIDIA's GPU architectures, such as those in the Kepler series from 2012 onward, treat as a configurable scratchpad, allowing programmers to allocate on-chip explicitly for thread-block data sharing, enhancing performance in parallel compute tasks. Concurrent research has focused on hybrid cache-scratchpad systems, where portions of cache are dynamically repurposed as software-managed scratchpad to minimize ; for instance, adaptive schemes remap high-demand blocks to scratchpad, achieving up to 25% savings in processors while maintaining hit rates.

Design and Operation

Software Management Techniques

Software management techniques for scratchpad memory (SPM) primarily involve explicit, compiler-directed, and dynamic strategies to allocate data and code, ensuring efficient use of this software-controlled on-chip storage. Explicit allocation requires programmers or compilers to specify placements using language directives, such as (e.g., #pragma scratchpad), or runtime application programming interfaces (APIs) that map variables or functions to SPM regions. This approach allows precise control over data placement based on access patterns, often formulated as an solved via integer linear programming (ILP) to minimize access times by assigning global and stack variables to SPM while respecting capacity constraints. For instance, the ILP model uses variables to decide allocations, incorporating profile-guided access frequencies, and achieves up to 44% runtime reduction through distributed management in systems. Compiler-based techniques leverage static analysis to automate SPM allocation, analyzing variable lifetimes, access frequencies, and interferences to map frequently accessed ("hot") data to SPM for performance gains. These methods profile program execution to identify liveness intervals and prioritize placements that reduce , such as assigning basic blocks or functions to SPM banks, yielding up to 22% energy savings in applications. extends this by modeling allocation as an interference graph where nodes represent data objects and edges denote overlapping ; colors correspond to SPM "registers" of fixed sizes, resolved via standard coloring algorithms adapted from to handle conflicts and ensure non-overlapping assignments. This technique partitions SPM into alignment-based units, splits live ranges at loop boundaries for better fit, and improves by optimizing for smaller SPM sizes, as demonstrated in benchmarks like "untoast" where it enhances utilization without manual intervention. Dynamic allocation methods enable adaptation, particularly in multitasking environments, using compiler-inserted or operating () to load and evict based on heuristics like access costs and future usage predictions. These approaches construct a -program relationship to objects and greedily select transfers from off-chip to SPM at program points, avoiding overheads like caching tags while maintaining predictability. In pointer-based applications, SPM management can reduce execution time by 11-38% (average 31%) and DRAM accesses by 61% compared to static methods, with optimizations for dead exclusion further lowering energy by up to 31%. OS-level may involve adaptive loading via calls, ensuring portability across varying workloads. Tools and frameworks facilitate these techniques through integrated compiler passes and simulators. Compiler frameworks like LLVM incorporate SPM allocation passes that perform static analysis and graph-based optimizations during code generation, enabling seamless integration with build systems for hybrid memory management. For energy profiling, simulation tools such as CACTI model SPM access energies and leakage, providing estimates for design space exploration; it computes capacitances and power based on technology parameters, supporting evaluations that confirm SPM's 20-30% lower energy than caches for equivalent sizes. Additionally, methods handling compile-time-unknown SPM sizes use binary search or OS queries within compiler flows to generate portable binaries, maintaining near-optimal allocations across hardware variants. ===== END CLEANED SECTION =====

Performance Aspects

Advantages

Scratchpad memory provides deterministic access times, as data allocation is managed explicitly by software at or , eliminating the variability introduced by cache misses and hit/miss resolution hardware. This fixed latency is particularly beneficial for systems, where (WCET) guarantees are essential; techniques for WCET-centric allocation can reduce execution times by 5-80% compared to -based approaches by ensuring predictable memory behavior. In terms of , scratchpad memory consumes significantly less power than traditional s due to the absence of tag lookups, comparators, and mechanisms, with studies reporting average energy savings of 40% per access in embedded systems—for instance, 1.53 nJ for a 2 KB scratchpad versus 4.57 nJ for an equivalent . These savings arise from the simpler access path and reduced overhead, making scratchpad memory ideal for power-constrained environments like battery-operated devices. The hardware design of scratchpad memory is simplified, omitting complex caching logic such as tag arrays and replacement policies, which reduces die area by approximately 34% (e.g., 102,852 transistors for a 2 KB scratchpad versus 142,224 for a ) and allows more to be allocated to compute units. This streamlined also contributes to overall improvements of up to 18% in CPU cycles for embedded benchmarks. For bandwidth optimization, scratchpad memory enables high-throughput access to local data in parallel architectures, as direct addressing and support facilitate efficient data movement without contention from global memory hierarchies; bandwidth-aware techniques can achieve up to 4x performance gains by balancing space utilization and transfer rates in multi-core systems.

Disadvantages

Scratchpad memory imposes significant overhead due to its requirement for explicit software management of data placement and movement, unlike hardware-managed caches that operate transparently. This manual or compiler-assisted allocation process increases development complexity and time, as developers must analyze access patterns and insert code for loading and evicting data, which can be error-prone and non-portable across different memory configurations. The limited capacity of scratchpad memory, often constrained to small sizes such as a few kilobytes in embedded systems, necessitates frequent data swapping between the scratchpad and slower off-chip memory for larger workloads, introducing performance overhead and reducing overall efficiency. This size restriction relegates less frequently accessed data to , exacerbating latency in applications with extensive datasets. Lack of transparency in scratchpad memory arises from the absence of automatic mechanisms like prefetching or eviction policies found in caches, placing the full burden of optimization on software and risking suboptimal utilization if tuning is inadequate. Without hardware support for or detection, programmers must explicitly handle all data transfers, which can lead to inefficiencies in unpredictable access patterns. Scalability issues in multicore environments stem from scratchpad memory's challenges in maintaining data coherency across cores, as it lacks built-in hardware protocols and requires additional software layers for , complicating management as core counts increase. This results in potential incoherence between local scratchpad copies and shared global memory, hindering efficient in parallel workloads.

Comparisons

With Cache Memory

Scratchpad memory and memory represent two distinct approaches to on-chip memory management in processor architectures. While are hardware-managed with automatic data placement and eviction policies such as least recently used (LRU), scratchpad memory requires explicit software control for data allocation and deallocation, often handled by the or . This software-centric paradigm in scratchpad memory allows for precise optimization of memory usage tailored to application needs, whereas rely on hardware heuristics that may not align perfectly with specific workloads. In terms of access predictability, scratchpad memory provides guaranteed hit times since all allocated data resides directly in the memory without the need for comparisons or associative lookups, eliminating the risk of misses and related effects where irrelevant data evicts useful content. Caches, by contrast, introduce variability in access latency due to potential misses, compulsory loads, and conflicts, which can lead to unpredictable execution times, particularly in systems. Locked caches, a variant where specific lines are pinned to avoid eviction, improve predictability over standard caches but still incur overhead from and potential mapping conflicts. Regarding power consumption and area , scratchpad memory exhibits lower overhead because it lacks the arrays, comparators, and logic required for caches, resulting in reduced energy per access—for instance, approximately 36% less energy for certain benchmarks like compared to equivalent caches. Caches demand significantly more area, with direct-mapped or set-associative designs requiring up to five times the transistors of a scratchpad for the same capacity (e.g., 75,000 vs. 15,000 transistors for 128 bytes), and with caches consuming significantly more power, up to 67% more on average due to these additional circuits based on 40% energy savings for equivalent scratchpad configurations. This makes scratchpad memory particularly advantageous in resource-constrained environments where minimizing static power and die space is critical. Scratchpad memory is ideally suited for applications with predictable, computationally intensive tasks such as multimedia processing or , where software can statically map frequently accessed to ensure consistent performance. In contrast, caches excel in general-purpose scenarios characterized by irregular access patterns, such as or workloads, where handles dynamic locality without extensive programming intervention. These differences highlight scratchpad's role in optimizing for and efficiency in specialized domains over the flexibility of caches.

With Other On-Chip Memories

Scratchpad memory differs from files primarily in capacity and characteristics. files typically provide limited , often on the order of 128 to 512 bytes per (equivalent to 32-128 32-bit registers), serving as the fastest on-chip for immediate . In contrast, scratchpad memory offers much larger capacities, ranging from 4 to 64 or more, enabling of larger data structures or temporary arrays that exceed file limits. However, files achieve near-zero integrated directly into the execution , while scratchpad es incur 1-2 cycles due to their memory-like addressing and load/store operations. Compared to the local store in the Cell processor, scratchpad memory shares the trait of being fully software-managed, requiring explicit data placement and transfers to avoid off-chip accesses. Both structures provide predictable, low-latency on-chip storage without hardware caching overheads. However, the Cell's local store is tightly integrated with its Synergistic Processing Elements (), limited to 256 KB per SPE and relying exclusively on for data movement between main memory and the local store, emphasizing streaming workloads. General-purpose scratchpad memory, by contrast, supports broader applicability across processor architectures, often allowing direct load/store instructions without mandatory , though it lacks the Cell's specialized vector processing optimizations. Scratchpad memory also contrasts with shared L2 caches in terms of access scope and overheads. As a private per-core structure, scratchpad provides dedicated, low-latency access (typically 1-2 cycles) without contention from other cores, making it suitable for localized data reuse. Shared caches, however, serve multiple cores with higher average latencies (often 10-20 cycles) due to bank conflicts and directory-based coherency protocols, which introduce additional for maintaining data consistency across cores. This eliminates coherency overheads in scratchpad designs but requires or intervention for . Emerging hybrid approaches integrate scratchpad memory with caching mechanisms to balance predictability and automation. For instance, designs like Stash enable software-managed scratchpad regions that are globally addressable like , supporting implicit data movement and lazy writebacks to reduce programming effort while preserving low-latency benefits. These hybrids have demonstrated up to 12% performance gains and 32% energy savings over pure cache or scratchpad systems in GPU workloads. More recent designs, such as COMPAD (2023) and M3D-MDA (2025), further integrate scratchpad and cache elements for improved in heterogeneous systems.

Applications

In Digital Signal Processors

Scratchpad memory found early adoption in processors (DSPs) during the , particularly in the series, where on-chip RAM served as a fast scratchpad for storing coefficients and buffers in audio and applications. For instance, the 10 and 20 utilized their limited on-chip RAM—144 words for the 10 and up to 544 words for the 20—to hold coefficients for () filters (e.g., length-80 bandpass filters at 10 kHz sampling) and buffers for intermediate results in tasks like echo cancellation and . These implementations enabled efficient processing of audio signals, such as 128-tap digital voice echo cancellers compliant with CCITT G.165 standards and (LPC) vocoders at 8 kHz sampling, by keeping critical on-chip to minimize external memory accesses. In DSP architectures, is often integrated with dual-access ports to support simultaneous read and write operations, which is essential for tasks requiring high throughput. The TMS320C54x family, for example, features Dual-Access RAM (DARAM) blocks that allow two independent accesses per , facilitating instruction fetch and without conflicts in applications like filtering and buffering. This design extends to later iterations, such as the TMS320C4x, where on-chip dual-access RAM enables efficient handling of operands in algorithms, including matrix-vector multiplications and filters, by organizing into independent blocks for concurrent operations. The use of scratchpad memory in DSPs significantly enhances performance by enabling low-power, high-throughput operations, such as (FFT) computations, while avoiding stalls from slower (DRAM). In the TMS320VC5505 DSP, for instance, on-chip scratchpad allocation for FFT data and twiddle factors supports 1024-point FFTs with active power consumption below 0.15 mW/MHz, allowing real-time processing in power-constrained environments like portable audio devices without external DRAM dependencies. This approach reduces energy overhead and latency, as demonstrated in early TMS32020 implementations where a 256-point FFT completed in 4.375 ms at 5 MHz entirely using on-chip , prioritizing deterministic access over unpredictability. A notable example of scratchpad integration in modern DSPs is found in processors, such as the ADSP-BF54x series, which include configurable 4K-byte scratchpad blocks within the Level 1 (L1) for optimized data storage in . These blocks operate at full clock speed and can be allocated for , local variables, or temporary buffers in tasks, with configuration options via the L1 Data Memory Controller to ensure non-cacheable, low-latency access excluded from () channels. In architectures, the scratchpad supports efficient execution of DSP operations like multiply-accumulate instructions and circular buffering through data address generators, enhancing throughput for applications in audio and video signal handling. Recent advances (as of 2025) have extended scratchpad memory optimizations to modern applications, including AI-enhanced on embedded processors. For example, heterogeneous SRAM-based scratchpad designs have been proposed to balance reliability and in low-voltage tasks, achieving up to 2x improvements in for applications like video decoding.

In Embedded and Multicore Systems

Scratchpad memory (SPM) serves as a compelling on-chip storage solution in systems, particularly for computationally intensive applications where and area are paramount. Unlike caches, which rely on hardware-managed automatic replacement policies, SPM requires explicit software control for data and code placement, enabling designers to optimize for specific workloads. This approach has been shown to reduce by an average of 40% compared to cache-based systems, primarily due to the absence of complex tag comparisons and associative lookups. Additionally, SPM offers a 46% reduction in area-time product, making it suitable for resource-constrained devices such as microcontrollers and processors. In embedded systems, SPM's deterministic access times enhance timing predictability, which is crucial for meeting hard deadlines without the variability introduced by misses or evictions. This predictability stems from SPM's fixed for all valid addresses, avoiding the non-deterministic behavior of caches in contended scenarios. Power savings further support its adoption, as SPM eliminates the energy overhead of protocols, allowing for simpler hardware implementations that consume less dynamic power during accesses. For instance, dynamic SPM units have been proposed to adaptively manage allocation at , balancing predictability with flexibility in evolving tasks. Transitioning to multicore embedded systems, SPM extends its benefits to parallel architectures by facilitating efficient and locality management across cores, often in hybrid hierarchies combining SPM with caches or main memory. Runtime-guided management techniques leverage task dependencies to allocate data to SPM, overlapping transfers with computation and using locality-aware scheduling to minimize inter-core data movement. This results in performance improvements of up to 16% in 32-core configurations, alongside reductions in on-chip network traffic by 31% and power consumption by 22%, making SPM ideal for power-sensitive multicore SoCs in automotive and applications. Shared SPM designs with ownership mechanisms enable time-predictable inter-core communication in multicore systems, where cores temporarily own portions of the SPM via to avoid contention. Such architectures ensure bounded worst-case execution times, critical for safety-critical embedded multicore platforms. Complementing this, scratchpad-centric operating systems (OS) for multicore environments arbitrate shared resources at the OS level, separating application logic from I/O operations temporally to achieve contention-free execution. These OS designs deliver up to 2.1× performance gains over traditional cache-based approaches while maintaining predictability for hard tasks on commercial-off-the-shelf multicore hardware. In multicore contexts, SPM also supports advanced features like data duplication and replication for , mitigating multi-bit upsets in radiation-prone environments without significant overhead. Optimal data allocation algorithms further enhance efficiency by solving placement problems in time for exclusive data copies across cores, reducing memory conflicts in concurrent software. Overall, these applications underscore SPM's role in enabling scalable, low-power multicore systems where predictability and outweigh the management complexity. As of 2025, recent advances in embedded multicore systems include interactive dynamic SPM management strategies that improve allocation for multi-threaded applications, achieving up to 30% energy savings through compiler-directed transfers in heterogeneous many-core architectures. Additionally, integration of non-volatile memory (NVM) with SPM has enhanced energy efficiency and persistence in IoT and automotive multicore SoCs.

References

  1. [1]
    [PDF] Scratchpad Memory : A Design Alternative for Cache On-chip ...
    In this paper we address the problem of on-chip mem- ory selection for computationally intensive applications, by proposing scratch pad memory as an ...
  2. [2]
    [PDF] A Survey of Scratch-Pad Memory Management Techniques for low ...
    Scratch-Pad Memories (SPMs) are considered to be effective in helping reduce memory energy consumption. However, the variety of. SPM management techniques ...
  3. [3]
    [PDF] arXiv:2108.08672v1 [cs.DC] 19 Aug 2021
    Aug 19, 2021 · A scratchpad memory is a local memory space managed by the host CPU. The host can transfer data to and from this memory, and the accelerator ...
  4. [4]
    design alternative for cache on-chip memory in embedded systems
    Scratchpad memory: design alternative for cache on-chip memory in embedded systems. Authors: Rajeshwari Banakar. Rajeshwari Banakar. Indian Institute of ...
  5. [5]
    Software Techniques for Scratchpad Memory Management
    Typical benefits offered by scratchpad memory in- clude decreased energy usage, on-chip area savings, and pos- sibly guaranteed latency (in the case of hard ...
  6. [6]
    A Scratchpad Memory for General-purpose Microprocessors - SPX64
    A traditional scratchpad memory [5, 18, 34] is a software-managed on-chip data storage without any tag array, which provides an address space that is disjoint ...
  7. [7]
    Scratchpad memories at Honeywell | Proceedings of the November 30
    ... Scratchpad Memory in Real-Time Embedded Systems. Nonvolatile memory (NVM) has many benefits compared to the traditional static RAM, such as improved ...
  8. [8]
    1966: Semiconductor RAMs Serve High-speed Storage Needs
    A team led by Tom Longo at Transitron built the TMC3162 16-bit TTL scratchpad memory for the Honeywell Model 4200 minicomputer in 1966 that became the first ...<|control11|><|separator|>
  9. [9]
    The SCRATCHPAD system | Proceedings of the 1975 annual ...
    SCRATCHPAD is an experimental interactive for symbolic computation in use at the IBM Thomas son Research Center. SCRATCHPAD has facilities manipulation of ...
  10. [10]
    [PDF] Digital Signal Processing Applications with the TMS320 Family
    Jan 24, 2023 · The Texas Instruments TMS320 product line contains a family of digital signal processors, ... any scratch-pad (temporary) memory locations.
  11. [11]
    [PDF] Cell History - Computer Systems Lab at UCF
    IBM, Sony Computer Entertainment Incorporated SCEI/Sony, Toshiba Alliance (STI) formed in 2000. The objectives for the new processor were the following: – ...
  12. [12]
    [PDF] Unifying Primary Cache, Scratch, and Register File Memories in a ...
    In this work, we evaluate unified local memory with flexible parti- tioning of capacity across the register file, scratchpad (shared memory in NVIDIA ...
  13. [13]
    An energy-efficient adaptive hybrid cache - ACM Digital Library
    By reconfiguring part of the cache as software-managed scratchpad memory (SPM), hybrid caches manage to handle both unknown and predictable memory access ...
  14. [14]
    [PDF] An Optimal Memory Allocation Scheme for Scratch-Pad-Based ...
    This article presents a technique for the efficient compiler management of software-exposed het- erogeneous memory. In many lower-end embedded chips, ...
  15. [15]
    [PDF] A Compiler Approach for Scratchpad Memory Management
    Sep 28, 2005 · Memory Coloring: A Compiler Approach for Scratchpad Memory Management. Lian Li. ∗. , Lin Gao. † and Jingling Xue. ‡. Programming Languages and ...
  16. [16]
    [PDF] Compiler-Decided Dynamic Memory Allocation for Scratch-Pad ...
    ABSTRACT. This paper presents a highly predictable, low overhead and yet dynamic, memory allocation strategy for embedded systems with scratch-pad memory.
  17. [17]
    SMM Toolchain Download – MPS-Lab - ASU Engineering
    The scratchpad memory management techniques of SMM architectures are implemented as compiler passes in LLVM. The benchmarks are compiled with the LLVM and ...
  18. [18]
    [PDF] Memory Allocation for Embedded Systems with a Compile-Time ...
    ABSTRACT This paper presents the first memory allocation scheme for embedded systems having scratch-pad memory whose size is unknown at compile time.
  19. [19]
    [PDF] Scratchpad Memory : A Design Alternative for Cache On-chip ...
    In this paper we have presented an approach for selec- tion of on-chip memory configurations. The paper presents a comprehensive methodology for computing area, ...Missing: survey | Show results with:survey
  20. [20]
    [PDF] WCET Centric Data Allocation to Scratchpad Memory
    Scratch- pad memories are small on-chip memories that are mapped into the address space of the processor. Whenever the ad- dress of a memory access falls within ...
  21. [21]
  22. [22]
    [PDF] Dynamic Overlay of Scratchpad Memory for Energy Minimization
    Sep 10, 2004 · In this paper, we present an allocation technique which analyzes the application and inserts instructions to dynam- ically copy both code ...
  23. [23]
    [PDF] Scratchpad memories vs locked caches in hard real-time systems
    Safe bounds for task execution times may be computed using static WCET analysis methods that obtain. WCETs through a static analysis of task source and/or ...
  24. [24]
    [PDF] Coherence Protocol for Transparent Management of Scratchpad ...
    Jun 17, 2015 · The increasing number of cores in manycore architectures causes important power and scalability problems in the mem- ory subsystem. One solution ...
  25. [25]
    (PDF) Comparison of Cache- and Scratch-Pad based Memory ...
    A comparative study [69, 10] shows that the use of scratchpad memory instead of a cache gives an improvement of 18% in performance for bubble sort, a 34% ...
  26. [26]
    [PDF] Scratchpad Memories vs Locked Caches in Hard Real-Time Systems
    WCET of programs is obviously influenced by the hard- ware in use. The increasing performance gap between the processor and the off-chip memory has made it ...
  27. [27]
    Scratchpad Memory - an overview | ScienceDirect Topics
    1 3 Scratchpad SRAM guarantees a single-cycle access time. Architectural designs may include distributed memory architectures, where scratchpad memories are ...
  28. [28]
    [PDF] Register Pointer Architecture for Efficient Embedded Processors
    A scratchpad memory addresses the capacity problem of register file, but still re- quires load/store instructions and lacks multiple ports. Un- rolling resolves ...<|separator|>
  29. [29]
    [PDF] The Potential of the Cell Processor for Scientific Computing
    The Cell processor depends on explicit DMA operations to move data from main mem- ory to the local store of the SPE.
  30. [30]
    [PDF] DMA-based Prefetching for I/O-Intensive Workloads on the Cell ...
    May 7, 2008 · The local store is similar to scratch-pad memory. ... An SPE accesses both main memory and the local storage of other SPE's exclusively with DMA ...
  31. [31]
    [PDF] Stash: Have Your Scratchpad and Cache It Too - Sarita Adve's Group
    sumes 74% less energy and executes in 35% less time than. Scratch. The primary benefit of scratchpad is its energy efficient access. Scratchpad has less ...
  32. [32]
    [PDF] TMS320C54x DSP CPU Reference Guide - Texas Instruments
    This book serves as a reference for the C54x DSP and provides information for developing hardware and software applications using the C54x DSP. This user's ...<|separator|>
  33. [33]
  34. [34]
    [PDF] FFT Implementation on the TMS320VC5505, TMS320C5505, and ...
    With an active mode power consumption of less than 0.15 mW/MHz and a standby mode power consumption of less than 0.15 mW, these DSPs are optimized for ...
  35. [35]
    [PDF] ADSP-BF54x Blackfin Embedded Processors Data Sheet, Rev. E
    memory block is accessed at full processor speed. The third memory block is a 4K byte scratchpad SRAM, which runs at the same speed as the L1 memories. It ...
  36. [36]
    [PDF] Blackfin Processor Programming Reference, Revision 2.2, February ...
    Information furnished by Analog Devices is believed to be accurate and reliable. However, no responsibility is assumed by Analog. Devices for its use; nor for ...
  37. [37]
    A Dynamic Scratchpad Memory Unit for Predictable Real-Time ...
    Scratchpad memory is an attractive alternative to caches in real-time embedded systems due to its advantages in terms of timing predictability and power ...
  38. [38]
    Runtime-Guided Management of Scratchpad Memories in Multicore Architectures
    ### Summary of Runtime-Guided Management of Scratchpad Memories in Multicore Processors
  39. [39]
    Scratchpad Memories with Ownership
    **Summary of Shared Scratchpad Memory with Ownership for Multicore Systems:**
  40. [40]
    A Real-Time Scratchpad-Centric OS for Multi-Core Embedded ...
    In this paper, we approach the problem of shared resource arbitration at an OS-level and propose a novel scratchpad-centric OS design for multi-core platforms.<|control11|><|separator|>
  41. [41]
    Protecting Scratchpad Memories in Multicore Embedded Systems ...
    This article proposes a low-cost and efficient data replication mechanism, called In-Scratchpad Memory Replication (ISMR), to correct MBUs in SPMs of multicore ...
  42. [42]