Fact-checked by Grok 2 weeks ago

In-memory processing

In-memory processing is a technique that stores and processes data directly in a system's main memory, such as (), rather than relying on slower disk or secondary storage, thereby eliminating bottlenecks and enabling ultra-low latency operations. This supports high-throughput and real-time data handling by leveraging the speed of memory access, which is orders of magnitude faster than disk retrieval, often achieving over 100 times improvement in response time and overall system performance. The origins of in-memory processing trace back to the , when early research explored memory-resident databases to overcome the limitations of disk-based relational systems, though adoption was limited by the high cost and limited capacity of RAM at the time. A resurgence occurred in the and , driven by dramatic declines in memory prices—dropping approximately tenfold every five years—and exponential increases in capacity, making it feasible to hold terabytes of data in memory for applications. This evolution has shifted system designs to optimize for memory hierarchies, including (NUMA) architectures that became mainstream around 2008. Key benefits of in-memory processing include enhanced support for interactive queries, fault-tolerant , and scalable parallelism, which are critical for modern workloads like real-time fraud detection and large-scale . Pioneering systems exemplify its application: integrates in-memory columnar storage for enterprise analytics, while employs in-memory for high-velocity , and utilizes resilient distributed datasets to enable in-memory batch and across clusters. Challenges persist, such as ensuring data durability through techniques like logging and replication, but ongoing advancements in hardware like hardware transactional memory (HTM) continue to bolster its efficiency.

Fundamentals

Definition and Core Principles

In-memory processing refers to a computational that stores and manipulates data entirely within (RAM) or non-volatile alternatives such as , thereby eliminating the need for slower disk operations. This approach leverages the high-speed access characteristics of primary to enable analysis and processing, distinguishing it from traditional systems that rely on secondary storage for persistence and retrieval. At its core, in-memory processing is guided by principles that address fundamental limitations in conventional architectures. Data locality is a key tenet, embedding computation logic directly within or near arrays to minimize data movement between processors and storage, thus alleviating the bottleneck caused by the separation of processing and units. Additionally, it exploits parallel access patterns inherent in modern systems to handle concurrent operations efficiently, while navigating trade-offs between the high speed of and the persistence requirements often met by non-volatile options. Central to this paradigm are concepts such as the random-access nature of , which supports constant-time O(1) retrieval for data elements regardless of their position in , and the distinction between like —prioritized for its low latency—and technologies such as NVRAM, which provide durability without frequent disk reliance. In a typical , datasets are loaded into main at the outset, allowing subsequent computations to occur without to secondary storage, thereby streamlining operations from ingestion to output.

Comparison to Traditional Disk-Based Processing

Traditional disk-based processing relies on hard disk drives (HDDs) or solid-state drives (SSDs) for persistent , where operations such as reads and writes involve mechanical or electronic I/O mechanisms that introduce significant . For HDDs, seek times typically range from 3 to 10 milliseconds due to the physical movement of read/write heads, while SSDs offer lower latencies around 0.1 milliseconds for reads but still lag behind memory access speeds. In contrast, in-memory processing accesses data directly in (RAM), achieving latencies in the range of 50 to 100 nanoseconds, which is orders of magnitude faster than disk operations. Bandwidth differences are equally pronounced: RAM, such as DDR4, provides typical throughputs of 19 to 25 GB/s, compared to 100-200 MB/s for HDDs and 0.5-3 GB/s for SSDs in sequential operations. Additionally, RAM is volatile and requires explicit persistence mechanisms for , whereas disks inherently provide non-volatile , motivating hybrid approaches in many systems. In disk-based systems, data flow involves paging and mechanisms managed by the operating system to handle shortages, where inactive pages are moved to disk and retrieved on demand, often buffered through layers like the OS to mitigate frequent I/O. These processes create bottlenecks from data movement across buses, such as PCIe for SSDs, amplifying and reducing overall throughput compared to the seamless, bus-minimal access in in-memory environments.

Hardware Aspects

Processing-In-Memory (PIM) Architectures

Processing-in-Memory (PIM) architectures integrate computational logic directly into or near arrays to alleviate the data movement bottleneck inherent in traditional systems, where data must shuttle between separate and units. This approach embeds accelerators or processing units within the structure, enabling computations to occur at the data's location and thereby minimizing and costs associated with off-chip transfers. Pioneering work in this , such as the foundational outlined in early PIM proposals, has evolved to address bandwidth-intensive workloads by leveraging the inherent parallelism of banks. PIM architectures are broadly categorized into near-memory processing, in-DRAM processing, and 3D-stacked variants. Near-memory processing places logic adjacent to DRAM chips, often on a separate die or , allowing computations close to the data without altering memory array; this includes accelerators connected via high-bandwidth interfaces like those in hybrid memory cubes. In-DRAM processing, in contrast, embeds simple operations directly within the DRAM periphery, utilizing existing circuitry such as sense amplifiers to perform bitwise or tasks on activated rows or columns. For instance, row-wise operations activate multiple rows simultaneously to enable additions or multiplications by exploiting charge and voltage sensing, while column-wise methods leverage bitline parallelism for matrix- computations. 3D- PIM extends this by integrating logic layers beneath DRAM dies, as seen in high-bandwidth memory (HBM) configurations, which facilitate denser and higher throughput for parallel tasks. Prominent implementations include UPMEM's architecture, which augments standard DDR4 DIMMs with embedded DRAM Processing Units (DPUs)—general-purpose cores integrated at the bank level for in-situ execution of scalar and vector operations—and Samsung's Aquabolt-XL, an HBM2-PIM module that stacks processing logic with DRAM dies to support floating-point SIMD instructions directly in the memory stack. More recent advancements include Samsung's 2024 PIM enhancements, achieving a 30% performance improvement in AI inference tasks. These technologies enable row- and column-wise vector mathematics, such as bulk bitwise operations or accumulate steps, by activating subarrays for parallel data manipulation without full data evacuation to the host processor. At the hardware level, PIM architectures yield substantial energy reductions for data-intensive tasks, achieving 10-100x lower energy per operation compared to CPU-based offloading by eliminating repeated data fetches across the memory bus. This efficiency stems from localized computation, which cuts down on the power-hungry transfers that dominate in conventional systems, particularly for bandwidth-bound applications like graph analytics or inference. Such PIM designs play a supportive role in accelerating workloads by enabling efficient tensor operations near data storage.

Supporting Memory Technologies

Dynamic random-access memory (DRAM) variants form the backbone of in-memory processing systems, with DDR4 and DDR5 standards providing higher data rates and capacities essential for server-based workloads. DDR5, for instance, introduces features like on-die error correction and improved power efficiency over DDR4, supporting larger memory footprints in data-intensive applications. Low-power DDR (LPDDR) variants, such as LPDDR5, are optimized for energy-constrained environments like , offering bandwidths up to 6.4 GT/s while maintaining compatibility with mobile and embedded systems. High-bandwidth memory (HBM) technologies, including HBM2 and HBM3, enable ultra-high throughput for GPU-accelerated in-memory processing, with HBM3 achieving over 1.2 TB/s bandwidth through 3D-stacked dies and wider interfaces. These are critical for parallel data access in compute-intensive scenarios, reducing bottlenecks in memory-bound operations. HBM3E further extends this with data rates up to 9.6 Gb/s, enhancing scalability for AI-driven workloads. Non-volatile memory options like Optane, based on technology, provided persistence capabilities alongside DRAM-like latencies, allowing data to survive power loss without the overhead of traditional storage, although production was discontinued in 2022, with final shipments in late 2025. Optane modules, with capacities up to 512 GB per , integrated with DDR4 interfaces to extend effective memory size for in-memory databases and . Advancements in (PMEM) standards, such as (), enable byte-able access to non-volatile storage with latencies in the tens to hundreds of , closely mimicking behavior. -N configurations, for example, combine caching with backing in standard form factors, supporting up to 128 or higher of persistent capacity per module. Emerging non-volatile technologies further push boundaries: resistive (ReRAM) offers sub-microsecond read/write times through filament-based switching; magnetic (MRAM), particularly spin-transfer torque variants, achieves similar access speeds with magnetic state retention; and -change (PCRAM) uses material transitions for endurance exceeding 10^9 cycles at latencies. These technologies address limitations while scaling densities beyond 1 /cm². Scalability in environments is bolstered by memory pooling, which disaggregates memory resources for on-demand allocation across multiple servers, improving utilization from typical 50% to over 80% in pooled configurations. integration in modules detects and corrects single-bit errors in , with advanced schemes like chipkill extending to multi-bit detection, ensuring in terabyte-scale in-memory deployments where error rates can reach 1-10% annually without . As of 2025, (CXL) emerges as a key trend for coherent memory expansion, enabling low-latency interconnects over PCIe 5.0/6.0 fabrics to pool and share memory across nodes with cache coherency support up to 64 GT/s. CXL 3.x specifications introduce enhanced security and management for disaggregated systems, facilitating memory tiering in hyperscale data centers.

Software Aspects

In-Memory Data Management Systems

In-memory data management systems encompass software platforms that store, process, and query data primarily in to enable high-speed operations without relying on disk I/O for core functions. These systems include single-node in-memory databases (IMDBs) such as , which is a supporting relational and analytical workloads, and , an open-source key-value store optimized for caching and real-time applications. Distributed variants, like Apache Ignite and , pool across clustered nodes to form scalable data grids, allowing for horizontal scaling and shared memory resources in large-scale environments. Key features of these systems include support for ACID-compliant transactions maintained entirely in memory, often using mechanisms like snapshot isolation to ensure consistency without blocking concurrent reads. For instance, implements snapshot isolation through multi-version (MVCC), enabling transactions to view a consistent snapshot from the transaction's start while avoiding locks on reads. Real-time querying is facilitated via SQL interfaces in systems like and Apache Ignite, which support ANSI SQL standards including joins and aggregations, or NoSQL APIs in for flexible access. Data ingestion pipelines are integrated to handle streaming inputs, such as 's Smart (SDI) for real-time loading via JDBC/ODBC connectors, or 's (CDC) for syncing external sources. Implementation details focus on efficient memory utilization and reliability, with strategies like hash tables for key-value operations in , where data is indexed via hashed keys for O(1) average-time lookups. Fault tolerance is achieved through in-memory replication across nodes, as in Apache Ignite's partitioned-replicated caching modes or Hazelcast's replication, which mirror data without disk persistence to maintain availability during node failures. These systems often leverage hardware like (PMEM) for extended , though primary operations remain RAM-centric. In IMDBs optimized for analytics, such as , column-store architectures are preferred over row-stores, as they store data vertically to enable compression and faster aggregation scans on large datasets, while row-stores suit transactional inserts by grouping related fields contiguously. This distinction allows hybrid setups where column-stores handle read-heavy analytical queries and row-stores manage write-intensive operations, all within the in-memory footprint.

Optimized Data Structures and Algorithms

In-memory processing relies on data structures optimized for () to minimize and maximize throughput. Hash maps, adapted for RAM environments, employ techniques such as and careful bucket sizing to reduce probe sequences and improve locality. The load factor α, defined as α = n/k where n is the number of entries and k is the number of buckets, directly influences collision rates; a higher α increases the probability of collisions, leading to longer search times, while values around 0.7 balance space efficiency and performance in RAM-constrained systems. B+-trees, traditionally disk-oriented, are modified with cache-oblivious layouts that recursively decompose the to achieve near-optimal I/O complexity without knowledge of parameters, enabling efficient range queries and updates in hierarchical models. Vector databases for embeddings, such as those using approximate , store high-dimensional vectors in flat arrays or hierarchical navigable small world graphs to facilitate fast similarity computations entirely in memory. Algorithms for in-memory processing emphasize in-place operations to conserve space and leverage cache hierarchies. In-place sorting variants of quicksort partition arrays without auxiliary storage, relying on pivot selection strategies like median-of-three to minimize worst-case recursion depth and cache misses during swaps. Parallel processing integrates single instruction, multiple data (SIMD) instructions to vectorize operations on contiguous memory blocks, accelerating tasks like data filtering or aggregation by processing multiple elements simultaneously within a single cycle. Garbage collection mechanisms, particularly in virtual machines (JVMs), are tuned for large in-memory heaps using collectors like G1, which employ region-based allocation and concurrent marking to reduce pause times and adapt to high-allocation rates typical of in-memory workloads. Key optimizations address concurrency and cache efficiency in shared-memory settings. Cache line alignment ensures that frequently accessed data elements are padded to span distinct cache lines (typically 64 bytes), preventing where unrelated variables on the same line cause unnecessary cache invalidations across cores. Lock-free data structures, such as compare-and-swap-based queues or stacks, enable concurrent access without by using operations to resolve races, ensuring progress for at least one thread under contention and improving in multi-threaded in-memory environments.

Applications and Use Cases

Enterprise and Business Applications

In enterprise environments, in-memory processing facilitates analytics in the financial sector, particularly for detection, where systems analyze streams at high speeds to identify anomalies before they result in losses. For instance, in-memory databases like TimesTen enable detection applications that process hundreds of thousands of (approximately 185,000 ), as demonstrated by the U.S. Postal Service's implementation of a 1.7 terabyte in-memory database for detection in electronic payments. Similarly, platforms such as Fraud Management leverage in-memory processing to score 100% of in , achieving low-latency responses critical for preventing unauthorized activities. In banking, in-memory computing supports risk modeling by accelerating complex simulations, such as (VaR) calculations and , which require processing vast datasets instantaneously. Tools like GridGain's in-memory platform have been adopted by financial firms, including Misys (now ), to deliver rapid data access for regulatory-compliant risk assessments, reducing computation times from hours to minutes. This capability stems from the parallel processing of data entirely in RAM, enabling scalable simulations without disk I/O bottlenecks. Supply chain optimization benefits from in-memory by enabling dynamic forecasting and inventory adjustments based on live data feeds. HANA's in-memory computing, integrated into S/4HANA systems, supports real-time planning, allowing enterprises to optimize and respond to disruptions with agility, as seen in its application for predictive demand modeling across global networks. In-memory platforms like further enhance this by providing sub-millisecond query responses for , bridging operational and analytical workflows. A key business impact of in-memory processing is the convergence of (OLTP) and (OLAP), traditionally siloed activities, allowing seamless real-time transaction handling alongside analytics. exemplifies this by storing transactional data in memory for both immediate updates and instant aggregations, enabling unified views that support data-driven decisions without separate systems. This convergence facilitates 360-degree views through rapid from multiple sources, providing comprehensive profiles for personalized services; for example, Hazelcast's in-memory aggregates streaming interactions to deliver real-time insights for targeted marketing. In retail, serves as an in-memory caching layer for inventory management, ensuring synchronized stock levels across channels with sub-millisecond latency to prevent . Retailers like those using Redis Enterprise achieve real-time availability updates, synchronizing data from stores, warehouses, and online platforms to optimize fulfillment. For integration with tools, in-memory systems connect directly to platforms like Tableau, which uses its Hyper in-memory engine to visualize live data from sources such as , enabling interactive dashboards for operational monitoring without performance degradation.

Emerging Uses in AI and Edge Computing

In-memory processing has become integral to applications, particularly in handling large-scale tensor operations during model training and inference. Frameworks like leverage high-bandwidth memory (HBM) on GPUs to keep tensors in memory, minimizing data movement and enabling efficient parallel computations for workloads. For instance, HBM's high throughput supports the rapid access required for gradient computations in transformer models, reducing latency in training large language models. This approach addresses the memory wall in by performing operations directly on in-memory data structures, as demonstrated in memory-efficient techniques like FlashAttention, which tile attention computations to fit within GPU memory limits. Neuromorphic computing further advances in-memory processing for low-power by emulating brain-like architectures with processing-in-memory (PIM) elements. These systems integrate synaptic weights and computations within arrays, such as memristor-based networks, to achieve energy efficiencies orders of magnitude better than traditional setups. In neuromorphic designs, PIM enables event-driven processing where computations occur only on relevant data spikes, ideal for resource-constrained tasks like . This supports ultra-low power , with applications in always-on systems that mimic neural . In , in-memory processing facilitates IoT analytics by embedding PIM directly into nodes, allowing on-device fusion of streaming data without offloading to the cloud. For example, processing-in-sensor accelerators handle binary-weight neural networks for tasks like in environmental sensors, achieving high throughput with minimal energy. In autonomous vehicles, PIM architectures enable rapid from LiDAR, cameras, and by performing computations in local , reducing decision for obstacle avoidance. Crossbar arrays based on emerging materials like MoS2 further enhance this by supporting parallel matrix operations for localization. As of 2025, in-memory processing integrates with to enable privacy-preserving across distributed edge devices, where local computations minimize communication overhead during model aggregation. Techniques like federated frameworks significantly reduce footprints through in-situ updates, supporting large models on resource-limited . Edge chips, such as Intel's Loihi 2, incorporate in-memory synapses to simulate millions of neurons on-chip, facilitating continual learning for adaptive and . These advancements allow Loihi 2 to process with 120 million synapses in embedded , enabling low-latency inference at the edge. Bandwidth constraints in environments, exacerbated by intermittent and high volumes from sensors, are mitigated by PIM's , which keeps computations within to avoid costly transfers. This colocation significantly reduces effective bandwidth demands for inference tasks, preserving performance in bandwidth-starved scenarios like remote deployments. By embedding logic in hierarchies like , PIM ensures scalable AI without relying on external networks.

Benefits and Challenges

Performance Advantages

In-memory processing fundamentally improves by minimizing delays inherent in traditional disk-based systems. The total time can be modeled as T_{total} = T_{compute} + T_{io}, where T_{compute} represents computation time and T_{io} denotes overhead. In disk-based approaches, T_{io} dominates due to mechanical seek times averaging 4-10 milliseconds per operation, whereas in-memory systems reduce T_{io} to near zero, as to occurs in approximately 100 nanoseconds. This latency reduction—often by orders of magnitude, from milliseconds to nanoseconds—enables sub-second response times critical for real-time scenarios. Throughput gains are equally pronounced, particularly for analytical workloads involving complex queries. In-memory systems achieve 10-100x higher query speeds compared to disk-based alternatives by eliminating persistent I/O bottlenecks, allowing sustained processing of large datasets without paging delays. For instance, input/output operations per second () in can reach millions, far surpassing the tens of thousands to hundreds of thousands typical of solid-state drives or the hundreds to thousands of hard disk drives. These metrics highlight in environments, where distributed in-memory clusters handle petabyte-scale volumes across nodes while maintaining high throughput. Energy efficiency further amplifies these advantages, as in-memory operations avoid the power-intensive disk I/O, consuming fewer joules per . Disk accesses require significant for components and , often in the range of several watts per operation, while memory accesses leverage low-power circuits. This results in overall lower per operation, supporting efficient scaling without proportional power increases. Optimized structures, such as hash tables, enhance these gains by further streamlining access patterns.

Limitations and Mitigation Strategies

In-memory processing, while offering significant performance benefits, faces several inherent limitations that constrain its scalability and adoption. One primary drawback is the high cost of dynamic random-access memory (DRAM), which remains substantially more expensive than traditional disk storage. As of 2025, DRAM prices have surged by approximately 172% year-over-year due to surging demand from AI applications, with retail costs for DDR5 modules reaching around $3–5 per GB in consumer markets and higher for enterprise-grade server configurations, compared to hard disk drives (HDDs) at roughly $0.015–0.02 per GB. This cost disparity limits the feasibility of storing large datasets entirely in memory, particularly for applications requiring petabyte-scale data. Another critical limitation is data volatility, as DRAM loses all stored information upon power failure or system crash, necessitating frequent backups and recovery mechanisms that can introduce overhead and risk data loss in real-time scenarios. Additionally, capacity constraints persist, with current server nodes typically supporting only 1–8 terabytes of RAM per machine, insufficient for the exabyte-scale datasets common in big data environments. Large-scale in-memory systems also grapple with elevated power consumption and architectural complexities. Memory modules in database systems can account for a significant portion of total power draw in AI workloads, driven by the energy-intensive nature of DRAM refresh cycles and data movement, which exacerbates operational costs in data centers where electricity usage for such setups has risen to 4–5% of national totals in major economies. Hybrid disk-memory architectures, which attempt to blend in-memory speed with disk persistence, introduce further challenges, including increased software complexity for data tiering, synchronization overhead, and potential bottlenecks in I/O management that degrade overall efficiency. To mitigate these limitations, several strategies have been developed to balance performance with practicality. Tiered storage approaches address capacity and cost issues by keeping frequently accessed "hot" data in memory while offloading less-used "cold" data to cheaper disk or flash storage, enabling systems to handle larger workloads without fully provisioning expensive RAM. Data compression techniques, such as dictionary encoding commonly used in columnar in-memory databases, can reduce memory footprints by 5–10 times for structured data, allowing more information to fit within available capacity while minimizing power usage through reduced data transfers. Cloud bursting provides elastic scaling by dynamically provisioning additional cloud resources during peak loads, seamlessly extending on-premises in-memory clusters to handle surges without permanent over-provisioning. In 2025, emerging non-volatile memory (NVM) technologies like magnetoresistive RAM (MRAM) and resistive RAM (ReRAM) offer promising alternatives for persistence, combining DRAM-like speeds with non-volatility at lower costs than legacy options, thus reducing backup needs and enabling hybrid persistent in-memory designs.

History and Future Outlook

Evolution of the Technology

The evolution of in-memory processing originated in the with the adoption of in mainframe computers, which provided non-volatile, random-access storage directly accessible by the CPU, eliminating the slower of earlier technologies like magnetic drums. This shift enabled early forms of processing where data resided entirely in primary memory during computations, as seen in systems like the series introduced in 1964. Pioneering implementations of processing-in-memory (PIM) concepts emerged in the , where simple logic functions were embedded within core memory arrays to perform calculations on-board, reducing interconnect complexity, weight, and power draw for space applications. By the 1980s, the rise of management systems (RDBMS) incorporated in-memory caching to mitigate disk I/O bottlenecks, allowing frequently accessed pages to be buffered in for rapid retrieval and updates. Oracle's early database versions, starting from its 1979 release and evolving through the 1980s, featured buffer caches that dynamically managed data blocks in memory, significantly enhancing speeds on hardware with increasing capacities. Similar mechanisms appeared in other RDBMS like , launched in 1983, where buffer pools optimized hit rates for query operations, laying the groundwork for hybrid storage models that prioritized in-memory performance. The 2000s marked a pivotal expansion with the emergence of databases tailored for distributed, high-velocity workloads, culminating in Redis's 2009 release as an open-source, in-memory key-value store that supported advanced data structures for caching and pub/sub messaging. This period transitioned into the 2010s with the commercial launch of in 2010, an in-memory, column-oriented platform that unified OLTP and OLAP processing by storing and querying data entirely in RAM, enabling real-time analytics on terabyte-scale datasets. The post-2010 surge in , driven by exponential growth in from web-scale sources and sensors, prompted a broader shift toward in-memory paradigms to achieve sub-second latencies unattainable with disk-centric systems. Apache Spark's stable 2014 release exemplified this integration, leveraging resilient distributed datasets (RDDs) for in-memory computation atop Hadoop, accelerating iterative algorithms by up to 100 times compared to . Concurrently, PIM research advanced through U.S. government initiatives, including DARPA's exploration of memory-centric architectures to overcome the bottleneck. Seminal academic work, such as proposals for low-overhead PIM instructions at conferences like ISCA, demonstrated practical embeddings of computation in arrays, achieving 2-10x energy savings for data-intensive tasks without altering CPU designs. As of 2025, the in-memory computing market is experiencing robust growth, valued at approximately USD 21 billion in 2024 and projected to reach USD 97 billion by 2034, reflecting a (CAGR) of around 16%. This expansion is driven by increasing demands for processing in dynamic environments. Key trends include deeper integration with hybrid cloud architectures, enabling seamless data mobility between on-premises and cloud resources to optimize and cost efficiency. Additionally, there is a notable rise in -in-memory (PIM) technologies at the edge, particularly for applications, where PIM chips reduce and by embedding computation directly within arrays, as demonstrated by advancements from achieving up to 30% gains in inference tasks. Adoption of in-memory processing has become widespread among enterprises, with over 70% of new applications leveraging open-source platforms such as for caching and real-time data handling, and for efficient in-memory columnar data interchange. For instance, 70% of companies utilize Fabric, a unified platform incorporating in-memory processing for high-speed data operations. This open-source dominance underscores the shift toward scalable, cost-effective solutions that support diverse workloads without . Looking ahead, in-memory processing is predicted to dominate AI workloads by 2030, as -driven data center capacity is expected to account for 70% of global demand, necessitating ultra-fast memory access for and . Standardization of (CXL) is anticipated to enable disaggregated memory pools, allowing dynamic allocation across servers and improving utilization in and infrastructures with up to 19% performance uplift in searches. However, emerging challenges from threaten current standards protecting in-memory data, with cryptographically relevant quantum computers projected to emerge around 2035, prompting a urgent transition to . These trends and predictions are heavily influenced by the post-2020 data explosion fueled by and deployments, which have generated over 175 zettabytes of global by 2025, with more than 20% requiring edge processing that in-memory systems efficiently handle.

References

  1. [1]
  2. [2]
  3. [3]
  4. [4]
    Principles of Memory-Centric Programming for High Performance ...
    In this paper, we provide an overview of memory-centric programming concepts and principles for high performance computing. Formats available. You can view the ...
  5. [5]
  6. [6]
    Lecture 2 - Texas Computer Science
    A typical DRAM (dynamic random access memory) chip will have an access time of about 10 to 100ns to retrieve memory from one address. ROM (read-only memory) ...
  7. [7]
    Flash Storage Memory - Communications of the ACM
    Jul 1, 2008 · Adding in the seek time bumps these latencies up an additional 3–10 ms depending on the quality of the mechanical components.
  8. [8]
    SIndex: An SSD-based Large-scale Indexing with Deterministic ...
    Actually, the average read and write I/O latency of NVMe-based SSDs are prevalently lower than 100 and 50 microseconds, respectively.
  9. [9]
    A Radical Proposal: Replace Hard Disks With DRAM - IEEE Spectrum
    The application could then access its data at DRAM speeds (typically 50 to 100 nanoseconds), which allowed it to manipulate its data intensively. The ...
  10. [10]
    [PDF] A Case Study of Processing-in-Memory in off-the-Shelf Systems
    Jul 16, 2021 · We assume DDR4-2400 (19.2 GB/s) with a Xeon 4110 ($500) for all configurations except the last, which uses DDR-3200 (25.6 GB/s – 33% more ...
  11. [11]
    Will SSD replace HDD? - IEEE 802
    Sep 17, 2015 · Throughput Comparison. Large File Test – Read Write Average Test. • HDD up to 262 MB/s (2 Gb/s) • SSD up to 625MB/s (5 Gb/s). 11. Source:http ...Missing: bandwidth typical
  12. [12]
    Page replacement in Linux 2.4 memory management - USENIX
    The page cache: this cache is used to cache file data for both mmap() and read() and is indexed by (inode, index) pairs. No dirty data exists in this cache; ...
  13. [13]
    [PDF] Improving Application Performance through Swap Compression
    As pages compress so well, most swapped pages fit in the cache and nearly no disk access are needed. These two exceptions will not be very frequent and we.
  14. [14]
    [PDF] A Modern Primer on Processing in Memory - Ethz
    Dec 5, 2020 · PIM places computation mechanisms in or near where the data is stored (i.e., inside the memory chips, in the logic layer of 3D-stacked memory, ...
  15. [15]
    A survey on processing-in-memory techniques: Advances and ...
    In this survey, we analyze recent studies that explored PIM techniques, summarize the advances made, compare recent PIM architectures, and identify target ...
  16. [16]
    [PDF] PIM-DRAM: Accelerating Machine Learning Workloads using ... - arXiv
    The addition operation comprises four main steps: (i) copy the first vector bit (A) to the compute rows. (ii) copy the second vector bit (B) to the compute rows ...
  17. [17]
    [PDF] Understanding a Modern Processing-in-Memory Architecture: - Ethz
    GENERAL PROGRAMMING RECOMMENDATIONS. 1. Execute on the DRAM Processing Units (DPUs) portions of parallel code that are as long as possible.
  18. [18]
    [PDF] Aquabolt-XL: Samsung HBM2-PIM with in-memory processing for ...
    PIM has been proposed to improve performance of bandwidth-intensive workloads and improve energy efficiency by reducing computing-memory data movement.
  19. [19]
    [PDF] Computing Utilization Enhancement for Chiplet-based ...
    Litera- ture shows that PIM chips have already achieved 10-100x energy efficiency than other DLP implementations [3]. The low-power feature is due to the analog ...
  20. [20]
  21. [21]
    Survey of Disaggregated Memory: Cross-layer Technique Insights ...
    Mar 26, 2025 · LPDDR DRAM is used in edge devices such as mobile phones and DDR DRAM serves as the main memory for servers. CXL memory, persistent memory, and ...Missing: variants | Show results with:variants
  22. [22]
    PIM-AI: A Novel Architecture for High-Efficiency LLM Inference - arXiv
    Nov 26, 2024 · This paper introduces PIM-AI, a novel DDR5/LPDDR5 PIM architecture designed for LLM inference without modifying the memory controller or DDR/LPDDR memory PHY.Missing: variants | Show results with:variants<|separator|>
  23. [23]
  24. [24]
    High Bandwidth Memory (HBM): Everything You Need to Know
    Oct 30, 2025 · HBM2, HBM3, and now HBM3E have all scaled bandwidth primarily by increasing the data rate. For example, HBM3E runs at 9.6 Gb/s, enabling a 1229 ...What is High Bandwidth... · How is HBM4 Different from...
  25. [25]
    AI's Rapid Growth: The Crucial Role Of High Bandwidth Memory
    Feb 27, 2025 · The HBM standard released in 2013 specified 1 Gbps (Giga bit per second) bandwidth. HBM2 was 2.4 Gbps and HBM3 is at 6.4 Gbps.<|separator|>
  26. [26]
    What Is 3D XPoint? | Definition from TechTarget
    Jun 10, 2024 · 3D XPoint is memory storage technology that was jointly developed by Intel and Micron Technology. The two vendors intended for the technology to fill a gap in ...
  27. [27]
    Intel Optane Persistent Memory 200 Series - Lenovo Press
    Intel Optane Persistent Memory 200 Series offers large, non-volatile memory with lower latency, high capacity, and affordable cost, using a DDR4 DIMM form ...Missing: XPoint | Show results with:XPoint
  28. [28]
    Persistent Memory Documentation - NDCTL User Guide
    Nov 4, 2024 · The Intel Optane PMem DSM Interface , Version 3.0, describes the NVDIMM Device Specific Methods (_DSM) that pertain to Optane PMem modules.
  29. [29]
    Dell EMC NVDIMM-N Persistent Memory User Guide
    Each NVDIMM-N provides 16GB of nonvolatile memory and has the same form factor as a standard 288-Pin DDR4 DIMM. The NVDIMM-N resides in a standard CPU memory ...
  30. [30]
    Progress of emerging non-volatile memory technologies in industry
    Nov 7, 2024 · This review focusses on the four most advanced eNVM technologies; ferroelectric (FRAM or FeRAM), phase-change (PCRAM), resistive (RRAM), and ...
  31. [31]
    Overview of emerging nonvolatile memory technologies - PMC - NIH
    There are mainly five types of nonvolatile memory technology: Flash memory, ferroelectric random-access memory (FeRAM), magnetic random-access memory (MRAM), ...Missing: microsecond | Show results with:microsecond
  32. [32]
    Advances in Emerging Memory Technologies: From Data Storage to ...
    MRAM (magnetic RAM) is a memory that uses the magnetism of electron spin to provide non-volatility (Figure 4). MRAM stores information in magnetic material ...2.1. 1. Nonvolatile Memory... · 3.1. 1. Filamentary Memory · 4. New Systems With New...<|separator|>
  33. [33]
    Explaining CXL Memory Pooling and Sharing - Compute Express Link
    Aug 2, 2023 · CXL allows for both shared and pooled memory to serve different purposes. In this post, I will explain the importance of memory pooling and sharing.
  34. [34]
    Error Correction Code (ECC) - Semiconductor Engineering
    Error correction codes, or ECC, are a way to detect and correct errors introduced by noise when data is read or transmitted.
  35. [35]
    [PDF] Revisiting Memory Errors in Large-Scale Production Data Centers
    To reduce the effects of memory errors, error correcting codes (ECC) have been developed to help detect and correct errors when they occur. In order to develop ...
  36. [36]
    [PDF] Introducing the CXL 3.X Specification
    Feb 18, 2025 · Enables unified OS based management of CXL and PCIe devices, everybody wins! 18. Compute Express Link ® and CXL ® are registered trademarks of ...
  37. [37]
    Press Room - Compute Express Link
    Optimizing Data Center TCO With CXL And Compression. Feb 13, 2025 ; CXL Update Emphasizes Security. Jan 3, 2025 ; CXL is Finally Coming in 2025. Dec 19, 2024.
  38. [38]
    [PDF] SAP HANA Security Guide for SAP HANA Platform - SAP Help Portal
    Jan 28, 2022 · ... Isolation ... ACID-compliant database with advanced data processing, application services, and flexible data integration services. The SAP ...
  39. [39]
    Columnar and Row-Based Data Storage - SAP Help Portal
    SAP HANA uses column-wise (column tables) and row-wise (row tables) storage. Column storage is optimized for read operations, while row storage is better for ...
  40. [40]
    Row Store vs Column Store in SAP HANA - dbi services
    May 21, 2015 · The SAP HANA database allows you to create your tables in Row or Column Store mode. In this blog, I will demonstrate that each method has its advantages and ...
  41. [41]
    FairHash: A Fair and Memory/Time-efficient Hashmap
    May 30, 2024 · In this paper, we introduce FairHash, a data-dependant hashmap that guarantees uniform distribution at the group-level across hash buckets.
  42. [42]
    6.5. Hashing — Problem Solving with Algorithms and Data Structures
    ... hash table is the load factor, λ . Conceptually, if λ is small, then there is a lower chance of collisions, meaning that items are more likely to be in the ...
  43. [43]
    Cache-Oblivious B-Trees | SIAM Journal on Computing
    This paper presents two dynamic search trees attaining near-optimal performance on any hierarchical memory. The data structures are independent of the ...
  44. [44]
    Engineering In-place (Shared-memory) Sorting Algorithms
    Jan 31, 2022 · Quicksort works by selecting a pivot element and partitioning the array such that all elements smaller than the pivot are in the left part and ...
  45. [45]
  46. [46]
    Introduction to Garbage Collection Tuning - Java - Oracle Help Center
    The garbage collector (GC) automatically manages the application's dynamic memory allocation requests. A garbage collector performs automatic dynamic memory ...
  47. [47]
    False Sharing - The Linux Kernel documentation
    False sharing occurs when a cache line is shared across multiple CPUs, causing CPUs to reload the whole line even if only one member is modified.<|separator|>
  48. [48]
    [PDF] Concurrent Data Structures - People | MIT CSAIL
    An obstruction-free operation is guaranteed to complete within a finite number of its own steps after it stops encountering interference from other operations.
  49. [49]
    [PDF] Oracle TimesTen In-Memory Database for the Financial Industry
    The US Postal Service uses Oracle TimesTen to run a real-time fraud detection application using a 1.7 terabyte in-memory database. With transactions executing ...
  50. [50]
    SAS Fraud Management & Fraud Detection Software
    Score 100% of all transactions in real time with in-memory processing that delivers the industry's highest throughput and lowest latency response times.
  51. [51]
    How Misys Used In-Memory Tech for Investment Risk Management
    Feb 7, 2017 · Not only is the in-memory solution speeding data delivery to customers, but it is also helping the company support clients' regulatory mandates ...
  52. [52]
    In-Memory Computing For Financial Modeling And Risk Simulations
    Sep 2, 2025 · Accelerate financial modeling with in-memory computing for real-time Monte Carlo simulations, VaR calculations, and stress testing.
  53. [53]
    Leveraging SAP HANA's In-memory Computing Capabilities for ...
    Nov 1, 2024 · In conclusion, SAP HANA's in-memory computing provides a robust foundation for real-time supply chain optimization, fostering agility, ...
  54. [54]
    A perspective on applications of in-memory analytics in supply chain ...
    In this paper, we provide a comprehensive perspective on applications of in-memory analytics in the field of supply chain management (SCM) that use the ...
  55. [55]
    SAP HANA In-Memory Database
    SAP HANA uses multi-core CPUs, fast communication, and terabytes of main memory, keeping all data in memory to avoid disk I/O penalties. Disk is still needed ...
  56. [56]
    360° Customer View - Hazelcast
    In-memory computing platforms provide the framework for a comprehensive, real-time view of your customers. Customers stream into your business millions of times ...
  57. [57]
    Real-time inventory - Redis
    Creates a resilient, scalable inventory database that delivers real-time results. Synchronize and scale inventory positions in real time with submillisecond ...
  58. [58]
    Tableau Server Data Engine
    Hyper is Tableau's in-memory Data Engine technology optimized for fast data ingests and analytical query processing on large or complex data sets.Memory And Cpu Usage · Server Configuration... · Memory Usage
  59. [59]
    12 Best Tableau Integrations: How They Work + Use Cases
    Jul 1, 2023 · SAP HANA, an in-memory data platform, can be seamlessly integrated with Tableau to enable real-time data analysis and visualization. By ...
  60. [60]
    [PDF] Understanding Data Storage and Ingestion for Large-Scale Deep ...
    On each trainer, a PyTorch runtime manages the local training workflow, transferring preprocessed tensors between the. DPP Client and GPU device memory.
  61. [61]
    A survey on memory-efficient transformer-based model training in AI ...
    Jul 29, 2025 · ... GPU memory (HBM) and on-chip SRAM. Through tiling and recomputation, FlashAttention avoids materializing large intermediate attention ...3.1 Algorithm · 3.1. 4 Gradient Accumulation · 5 Future Trends And...
  62. [62]
    Hardware implementation of memristor-based artificial neural ...
    Mar 4, 2024 · On the other hand, processor in memory (PIM) accelerators integrate processing elements with memory technology. ... low-power neuromorphic in- ...
  63. [63]
    Neuromorphic Computing in the Era of Large Models - IEEE Xplore
    Neuromorphic computing is a biologically inspired approach using event-driven processing and in-memory computation for energy efficiency, addressing challenges ...
  64. [64]
    Embodied Neuromorphic Artificial Intelligence for Robotics - arXiv
    Recent advances in neuromorphic computing showed great success in achieving high accuracy, low latency, low memory footprint, and ultra-low power/energy ...
  65. [65]
    [PDF] PISA: A Binary-Weight Processing-In-Sensor Accelerator for Edge ...
    Feb 18, 2022 · Abstract—This work proposes a Processing-In-Sensor Accel- erator, namely PISA, as a flexible, energy-efficient, and high-.
  66. [66]
    Large-scale crossbar arrays based on three-terminal MoS2 ... - Nature
    Oct 28, 2025 · A survey of ReRAM-based architectures for processing-in-memory and neural networks. ... sensor fusion in autonomous vehicle target localization.Results & Discussion · Crossbar Overview · Crossbar Operation
  67. [67]
    FedHybrid: Breaking the Memory Wall of Federated Learning via ...
    Federated Learning (FL) emerges as a new learning paradigm that enables multiple devices to collaboratively train a shared model while ...
  68. [68]
    Real-time Continual Learning on Intel Loihi 2 - arXiv
    Nov 3, 2025 · Intel Loihi 2 is a digital neuromorphic chip with 128 neurocores, each simulating up to one million neurons and 123 million synapses implemented ...
  69. [69]
    Neuromorphic Principles for Efficient Large Language Models on ...
    Mar 25, 2025 · In this paper, we propose a hardware-aware approach that integrates an efficient LLM architecture with Intel's neuromorphic processor, Loihi 2.
  70. [70]
    LP-Spec: Leveraging LPDDR PIM for Efficient LLM Mobile ... - arXiv
    Aug 30, 2025 · To address this memory bandwidth bottleneck, recent advances in DRAM-based processing-in-memory (PIM) technology have been introduced, ...
  71. [71]
    Optimizing Edge AI: A Comprehensive Survey on Data, Model, and ...
    Jan 4, 2025 · This paper presents an optimization triad for efficient and reliable edge AI deployment, including data, model, and system optimization.
  72. [72]
    [PDF] Lecture 18: Memory Systems - UMBC
    ... access time): time to move disk arm to desired cylinder (seek time) plus time for desired sector to rotate under disk head. (rotational latency). • Measured ...
  73. [73]
    Oracle Autonomous Database | Features & Pricing | ESF
    Feb 26, 2019 · Performance: Autonomous Services run on Exadata Database infrastructure can run millions of IOPS with latency response in the nanoseconds.
  74. [74]
    [PDF] Top-5 Innovations of Oracle's Database In-Memory - CMU 15-445/645
    Oct 23, 2019 · • Latency Critical OLTP Applications. • Microsecond response time ... • 5-10x faster smart scan in storage. • 15x increase in total ...
  75. [75]
    What Is IOPS: Input/Output Operations per Second Defined - Sematext
    The IOPS values of SSDs typically range from tens of thousands to hundreds of thousands, whereas HDDs range from a hundred to several thousand.Iops (input/output... · Iops Vs. Throughput Vs... · Iops In Ssd Vs. Hdd Storage...<|control11|><|separator|>
  76. [76]
  77. [77]
    Energy-Efficient Database Systems: A Systematic Survey | ACM Computing Surveys
    ### Summary of Performance Advantages of In-Memory Processing vs. Disk-Based Systems
  78. [78]
    Memory & Storage | Timeline of Computer History
    In 1953, MIT's Whirlwind becomes the first computer to use magnetic core memory. Core memory is made up of tiny “donuts” made of magnetic material strung on ...
  79. [79]
    Processing-In-Memory, 1960s-Style
    Jan 26, 2024 · In the 1960s one very notable computer used PIM to trim the weight and power consumption of a spacecraft by reducing the complexity of the CPU.
  80. [80]
    Buffer management in relational database systems
    Principles of database buffer management. This paper discusses the implementation of a database buffer manager as a component of a DBMS. · Second-Level Buffer ...
  81. [81]
    A Brief History of Data Architecture: Shifting Paradigms - Dataversity
    Feb 3, 2022 · Buffers were originally a temporary memory storage system designed to remove data from a primitive computer's memories quickly, so the computer ...
  82. [82]
    Thank You, Salvatore Sanfilippo - Redis
    Jun 30, 2020 · After maintaining the open source Redis project for 11 years, Salvatore Sanfilippo (aka antirez) has decided to take a step back.<|separator|>
  83. [83]
    The SAP HANA Revolution
    Mar 31, 2020 · When SAP first introduced the idea of the in-memory database SAP HANA in 2010, skeptics dismissed the idea as a “complete fantasy.”.
  84. [84]
    [PDF] Harnessing the Power of Big Data in Real Time through In-Memory ...
    THE EMERGENCE OF BIG DATA The massive explosion in data is creating manageability issues for companies around the world, particularly in the context of mergers ...<|separator|>
  85. [85]
    [PDF] Apache Spark: A Unified Engine for Big Data Processing
    Nov 2, 2016 · Apache Spark is a unified engine for distributed data processing, unifying streaming, batch, and interactive workloads, and using RDDs.
  86. [86]
    [PDF] DARPA Software Defined Hardware - DTIC
    In this research, we have achieved that by designing a Processor in Memory (PIM) random-access memory (RAM) and PIM Set that finds unserviced nodes and ...
  87. [87]
    [PDF] PIM-Enabled Instructions: A Low-Overhead, Locality-Aware ...
    Jun 17, 2015 · In this paper, we propose a new PIM architecture that. (1) does not change the existing sequential programming mod- els and (2) automatically ...
  88. [88]
    In-Memory Computing Market Size to Hit USD 97.06 Bn by 2034
    Jul 24, 2025 · The global in-memory computing market was valued at USD 21.02 billion in 2024. It is projected to reach USD 97.06 billion by 2034. The market is ...
  89. [89]
    The Future of Hybrid Cloud Adoption: Expert Insights for 2025
    Jan 14, 2025 · Discover the benefits and key strategies of hybrid cloud adoption, overcome challenges, and ensure fast and secure implementation.Data Integration And... · Clear Deployment Strategies · Cloud Management Platforms
  90. [90]
    Processing in-memory (PIM) Chips Market's Growth Catalysts
    Rating 4.8 (1,980) Jul 24, 2025 · January 2024: Samsung announces a significant breakthrough in PIM chip technology, leading to a 30% performance improvement in AI inference.
  91. [91]
    Open Source Database Adoption in 2025: Costs, Risks & Myths
    Jul 18, 2025 · Open Source Database 2025 explores costs, risks, and myths to help enterprises make informed choices for future-proof database adoption.
  92. [92]
    Why Microsoft Fabric has already been adopted by 70 ... - VentureBeat
    and what's next. Sean Michael Kerner. May 19, 2025. Credit: Image ...
  93. [93]
    The cost of compute: A $7 trillion race to scale data centers - McKinsey
    Apr 28, 2025 · Our research shows that global demand for data center capacity could almost triple by 2030, with about 70 percent of that demand coming from AI ...
  94. [94]
    [PDF] How CXL Transforms Server Memory Infrastructure
    Oct 8, 2025 · Up to 19% higher performance with CXL-connected DRAM (CMM-D) in VectorDB search compared to Local-DRAM-only case in Milvus RAG cluster.
  95. [95]
    How Quantum Computing Will Upend Cybersecurity | BCG
    Oct 15, 2025 · Sometime around 2035, quantum computers are expected to become sufficiently powerful to compromise current widely used cryptographic standards, ...Key Takeaways · Quantum Trouble · Pay Now, Or Pay More Later
  96. [96]