Fact-checked by Grok 2 weeks ago

In-memory processing

In-memory processing is a computing technique that stores and processes data directly in a system's main memory, such as random-access memory (RAM), rather than relying on slower disk or secondary storage, thereby eliminating input/output bottlenecks and enabling ultra-low latency operations. This paradigm supports high-throughput analytics and real-time data handling by leveraging the speed of memory access, which is orders of magnitude faster than disk retrieval, often achieving over 100 times improvement in response time and overall system performance.^[1] The origins of in-memory processing trace back to the 1980s, when early research explored memory-resident databases to overcome the limitations of disk-based relational systems, though adoption was limited by the high cost and limited capacity of RAM at the time. A resurgence occurred in the 2000s and 2010s, driven by dramatic declines in memory prices—dropping approximately tenfold every five years—and exponential increases in capacity, making it feasible to hold terabytes of data in memory for big data applications. This evolution has shifted system designs to optimize for memory hierarchies, including non-uniform memory access (NUMA) architectures that became mainstream around 2008.^[1] Key benefits of in-memory processing include enhanced support for interactive queries, fault-tolerant concurrency control, and scalable parallelism, which are critical for modern workloads like real-time fraud detection and large-scale machine learning. Pioneering systems exemplify its application: SAP HANA integrates in-memory columnar storage for enterprise analytics, while VoltDB employs in-memory NewSQL for high-velocity transaction processing, and Apache Spark utilizes resilient distributed datasets to enable in-memory batch and stream processing across clusters. Challenges persist, such as ensuring data durability through techniques like logging and replication, but ongoing advancements in hardware like hardware transactional memory (HTM) continue to bolster its efficiency.^[1]

Fundamentals

Definition and Core Principles

In-memory processing refers to a computational paradigm that stores and manipulates data entirely within random-access memory (RAM) or non-volatile alternatives such as flash memory, thereby eliminating the need for slower disk input/output operations. This approach leverages the high-speed access characteristics of primary memory to enable real-time data analysis and processing, distinguishing it from traditional systems that rely on secondary storage for persistence and retrieval.^[2] At its core, in-memory processing is guided by principles that address fundamental limitations in conventional computing architectures. Data locality is a key tenet, embedding computation logic directly within or near memory arrays to minimize data movement between processors and storage, thus alleviating the von Neumann bottleneck caused by the separation of processing and memory units. Additionally, it exploits parallel access patterns inherent in modern memory systems to handle concurrent operations efficiently, while navigating trade-offs between the high speed of volatile memory and the persistence requirements often met by non-volatile options.^[3]^[4] Central to this paradigm are concepts such as the random-access nature of RAM, which supports constant-time O(1) retrieval for data elements regardless of their position in memory, and the distinction between volatile memory like DRAM—prioritized for its low latency—and non-volatile memory technologies such as NVRAM, which provide durability without frequent disk reliance. In a typical workflow, datasets are loaded into main memory at the outset, allowing subsequent computations to occur in situ without swapping to secondary storage, thereby streamlining operations from ingestion to output.^[5]^[4]

Comparison to Traditional Disk-Based Processing

Traditional disk-based processing relies on hard disk drives (HDDs) or solid-state drives (SSDs) for persistent data storage, where operations such as reads and writes involve mechanical or electronic I/O mechanisms that introduce significant latency. For HDDs, seek times typically range from 3 to 10 milliseconds due to the physical movement of read/write heads, while SSDs offer lower latencies around 0.1 milliseconds for reads but still lag behind memory access speeds.^[6]^[7] In contrast, in-memory processing accesses data directly in random access memory (RAM), achieving latencies in the range of 50 to 100 nanoseconds, which is orders of magnitude faster than disk operations. Bandwidth differences are equally pronounced: RAM, such as DDR4, provides typical throughputs of 19 to 25 GB/s, compared to 100-200 MB/s for HDDs and 0.5-3 GB/s for SSDs in sequential operations. Additionally, RAM is volatile and requires explicit persistence mechanisms for durability, whereas disks inherently provide non-volatile storage, motivating hybrid approaches in many systems.^[8]^[9] In disk-based systems, data flow involves paging and swapping mechanisms managed by the operating system to handle memory shortages, where inactive pages are moved to disk and retrieved on demand, often buffered through layers like the OS page cache to mitigate frequent I/O. These processes create bottlenecks from data movement across buses, such as PCIe for SSDs, amplifying latency and reducing overall throughput compared to the seamless, bus-minimal access in in-memory environments.^[10]^[11]

Hardware Aspects

Processing-In-Memory (PIM) Architectures

Processing-in-Memory (PIM) architectures integrate computational logic directly into or near memory arrays to alleviate the data movement bottleneck inherent in traditional von Neumann systems, where data must shuttle between separate processor and memory units. This approach embeds accelerators or processing units within the memory structure, enabling computations to occur at the data's location and thereby minimizing latency and energy costs associated with off-chip transfers. Pioneering work in this domain, such as the foundational concepts outlined in early PIM proposals, has evolved to address bandwidth-intensive workloads by leveraging the inherent parallelism of memory banks.^[12] PIM architectures are broadly categorized into near-memory processing, in-DRAM processing, and 3D-stacked variants. Near-memory processing places logic adjacent to DRAM chips, often on a separate die or interposer, allowing computations close to the data without altering the core memory array; this includes accelerators connected via high-bandwidth interfaces like those in hybrid memory cubes. In-DRAM processing, in contrast, embeds simple operations directly within the DRAM periphery, utilizing existing circuitry such as sense amplifiers to perform bitwise or arithmetic tasks on activated rows or columns. For instance, row-wise operations activate multiple rows simultaneously to enable vector additions or multiplications by exploiting charge sharing and voltage sensing, while column-wise methods leverage bitline parallelism for matrix-vector computations. 3D-stacked PIM extends this by integrating logic layers beneath stacked DRAM dies, as seen in high-bandwidth memory (HBM) configurations, which facilitate denser integration and higher throughput for parallel tasks.^[12]^[13]^[14] Prominent implementations include UPMEM's architecture, which augments standard DDR4 DIMMs with embedded DRAM Processing Units (DPUs)—general-purpose cores integrated at the bank level for in-situ execution of scalar and vector operations—and Samsung's Aquabolt-XL, an HBM2-PIM module that stacks processing logic with DRAM dies to support floating-point SIMD instructions directly in the memory stack. More recent advancements include Samsung's 2024 PIM enhancements, achieving a 30% performance improvement in AI inference tasks.^[15] These technologies enable row- and column-wise vector mathematics, such as bulk bitwise operations or accumulate steps, by activating subarrays for parallel data manipulation without full data evacuation to the host processor. ^[16]^[17]^[14] At the hardware level, PIM architectures yield substantial energy reductions for data-intensive tasks, achieving 10-100x lower energy per operation compared to CPU-based offloading by eliminating repeated data fetches across the memory bus. This efficiency stems from localized computation, which cuts down on the power-hungry transfers that dominate in conventional systems, particularly for bandwidth-bound applications like graph analytics or AI inference. Such PIM designs play a supportive role in accelerating AI workloads by enabling efficient tensor operations near data storage.^[18]^[12]

Supporting Memory Technologies

Dynamic random-access memory (DRAM) variants form the backbone of in-memory processing systems, with DDR4 and DDR5 standards providing higher data rates and capacities essential for server-based workloads. DDR5, for instance, introduces features like on-die error correction and improved power efficiency over DDR4, supporting larger memory footprints in data-intensive applications. Low-power DDR (LPDDR) variants, such as LPDDR5, are optimized for energy-constrained environments like edge computing, offering bandwidths up to 6.4 GT/s while maintaining compatibility with mobile and embedded systems.^[19]^[20]^[21] High-bandwidth memory (HBM) technologies, including HBM2 and HBM3, enable ultra-high throughput for GPU-accelerated in-memory processing, with HBM3 achieving over 1.2 TB/s bandwidth through 3D-stacked DRAM dies and wider interfaces. These are critical for parallel data access in compute-intensive scenarios, reducing bottlenecks in memory-bound operations. HBM3E further extends this with data rates up to 9.6 Gb/s, enhancing scalability for AI-driven workloads.^[22]^[23]^[24] Non-volatile memory options like Intel Optane, based on 3D XPoint technology, provided persistence capabilities alongside DRAM-like latencies, allowing data to survive power loss without the overhead of traditional storage, although production was discontinued in 2022, with final shipments in late 2025. Optane modules, with capacities up to 512 GB per DIMM, integrated with DDR4 interfaces to extend effective memory size for in-memory databases and analytics.^[25]^[26]^[27] Advancements in persistent memory (PMEM) standards, such as non-volatile dual in-line memory modules (NVDIMM), enable byte-addressable access to non-volatile storage with latencies in the tens to hundreds of nanoseconds, closely mimicking DRAM behavior. NVDIMM-N configurations, for example, combine DRAM caching with flash backing in standard DIMM form factors, supporting up to 128 GB or higher of persistent capacity per module. Emerging non-volatile RAM technologies further push boundaries: resistive RAM (ReRAM) offers sub-microsecond read/write times through filament-based switching; magnetic RAM (MRAM), particularly spin-transfer torque variants, achieves similar access speeds with magnetic state retention; and phase-change RAM (PCRAM) uses material phase transitions for endurance exceeding 10^9 cycles at nanosecond latencies. These technologies address volatility limitations while scaling densities beyond 1 Gb/cm².^[28]^[29]^[30]^[31]^[32]^[33] Scalability in data center environments is bolstered by memory pooling, which disaggregates memory resources for on-demand allocation across multiple servers, improving utilization from typical 50% to over 80% in pooled configurations. Error-correcting code (ECC) integration in DRAM modules detects and corrects single-bit errors in real-time, with advanced schemes like chipkill extending to multi-bit detection, ensuring data integrity in terabyte-scale in-memory deployments where error rates can reach 1-10% annually without mitigation.^[34]^[35]^[36] As of 2025, Compute Express Link (CXL) emerges as a key trend for coherent memory expansion, enabling low-latency interconnects over PCIe 5.0/6.0 fabrics to pool and share memory across nodes with cache coherency support up to 64 GT/s. CXL 3.x specifications introduce enhanced security and management for disaggregated systems, facilitating memory tiering in hyperscale data centers.^[37]^[38]

Software Aspects

In-Memory Data Management Systems

In-memory data management systems encompass software platforms that store, process, and query data primarily in RAM to enable high-speed operations without relying on disk I/O for core functions. These systems include single-node in-memory databases (IMDBs) such as SAP HANA, which is a multi-model database supporting relational and analytical workloads, and Redis, an open-source key-value store optimized for caching and real-time applications. Distributed variants, like Apache Ignite and Hazelcast, pool RAM across clustered nodes to form scalable data grids, allowing for horizontal scaling and shared memory resources in large-scale environments. Key features of these systems include support for ACID-compliant transactions maintained entirely in memory, often using mechanisms like snapshot isolation to ensure consistency without blocking concurrent reads. For instance, SAP HANA implements snapshot isolation through multi-version concurrency control (MVCC), enabling transactions to view a consistent data snapshot from the transaction's start while avoiding locks on reads.^[39] Real-time querying is facilitated via SQL interfaces in systems like SAP HANA and Apache Ignite, which support ANSI SQL standards including joins and aggregations, or NoSQL APIs in Redis for flexible data access. Data ingestion pipelines are integrated to handle streaming inputs, such as SAP HANA's Smart Data Integration (SDI) for real-time loading via JDBC/ODBC connectors, or Redis's Change Data Capture (CDC) for syncing external sources. Implementation details focus on efficient memory utilization and reliability, with strategies like hash tables for key-value operations in Redis, where data is indexed via hashed keys for O(1) average-time lookups. Fault tolerance is achieved through in-memory replication across nodes, as in Apache Ignite's partitioned-replicated caching modes or Hazelcast's WAN replication, which mirror data without disk persistence to maintain availability during node failures. These systems often leverage hardware like persistent memory (PMEM) for extended durability, though primary operations remain RAM-centric. In IMDBs optimized for analytics, such as SAP HANA, column-store architectures are preferred over row-stores, as they store data vertically to enable compression and faster aggregation scans on large datasets, while row-stores suit transactional inserts by grouping related fields contiguously.^[40] This distinction allows hybrid setups where column-stores handle read-heavy analytical queries and row-stores manage write-intensive operations, all within the in-memory footprint.^[40]

Optimized Data Structures and Algorithms

In-memory processing relies on data structures optimized for random access memory (RAM) to minimize latency and maximize throughput. Hash maps, adapted for RAM environments, employ techniques such as open addressing and careful bucket sizing to reduce probe sequences and improve cache locality.^[41] The load factor α, defined as α = n/k where n is the number of entries and k is the number of buckets, directly influences collision rates; a higher α increases the probability of collisions, leading to longer search times, while values around 0.7 balance space efficiency and performance in RAM-constrained systems.^[42] B+-trees, traditionally disk-oriented, are modified with cache-oblivious layouts that recursively decompose the tree to achieve near-optimal I/O complexity without knowledge of cache parameters, enabling efficient range queries and updates in hierarchical memory models.^[43] Vector databases for embeddings, such as those using approximate nearest neighbor search, store high-dimensional vectors in flat arrays or hierarchical navigable small world graphs to facilitate fast similarity computations entirely in memory. Algorithms for in-memory processing emphasize in-place operations to conserve space and leverage cache hierarchies. In-place sorting variants of quicksort partition arrays without auxiliary storage, relying on pivot selection strategies like median-of-three to minimize worst-case recursion depth and cache misses during swaps.^[44] Parallel processing integrates single instruction, multiple data (SIMD) instructions to vectorize operations on contiguous memory blocks, accelerating tasks like data filtering or aggregation by processing multiple elements simultaneously within a single cycle.^[45] Garbage collection mechanisms, particularly in Java virtual machines (JVMs), are tuned for large in-memory heaps using collectors like G1, which employ region-based allocation and concurrent marking to reduce pause times and adapt to high-allocation rates typical of in-memory workloads.^[46] Key optimizations address concurrency and cache efficiency in shared-memory settings. Cache line alignment ensures that frequently accessed data elements are padded to span distinct cache lines (typically 64 bytes), preventing false sharing where unrelated variables on the same line cause unnecessary cache invalidations across cores.^[47] Lock-free data structures, such as compare-and-swap-based queues or stacks, enable concurrent access without mutual exclusion by using atomic operations to resolve races, ensuring progress for at least one thread under contention and improving scalability in multi-threaded in-memory environments.^[48]

Applications and Use Cases

Enterprise and Business Applications

In enterprise environments, in-memory processing facilitates real-time analytics in the financial sector, particularly for fraud detection, where systems analyze transaction streams at high speeds to identify anomalies before they result in losses. For instance, in-memory databases like Oracle TimesTen enable real-time fraud detection applications that process hundreds of thousands of transactions per second (approximately 185,000 TPS), as demonstrated by the U.S. Postal Service's implementation of a 1.7 terabyte in-memory database for fraud detection in electronic payments.^[49] Similarly, platforms such as SAS Fraud Management leverage in-memory processing to score 100% of transactions in real time, achieving low-latency responses critical for preventing unauthorized activities.^[50] In banking, in-memory computing supports risk modeling by accelerating complex simulations, such as Value at Risk (VaR) calculations and stress testing, which require processing vast datasets instantaneously. Tools like GridGain's in-memory platform have been adopted by financial firms, including Misys (now Finastra), to deliver rapid data access for regulatory-compliant risk assessments, reducing computation times from hours to minutes.^[51] This capability stems from the parallel processing of data entirely in RAM, enabling scalable Monte Carlo simulations without disk I/O bottlenecks.^[52] Supply chain optimization benefits from in-memory analytics by enabling dynamic forecasting and inventory adjustments based on live data feeds. SAP HANA's in-memory computing, integrated into S/4HANA ERP systems, supports real-time supply chain planning, allowing enterprises to optimize logistics and respond to disruptions with agility, as seen in its application for predictive demand modeling across global networks.^[53] In-memory platforms like Hazelcast further enhance this by providing sub-millisecond query responses for scenario planning, bridging operational and analytical workflows.^[54] A key business impact of in-memory processing is the convergence of Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP), traditionally siloed activities, allowing seamless real-time transaction handling alongside analytics. SAP HANA exemplifies this by storing transactional data in memory for both immediate updates and instant aggregations, enabling unified views that support data-driven decisions without separate systems.^[55] This convergence facilitates customer 360-degree views through rapid data aggregation from multiple sources, providing comprehensive profiles for personalized services; for example, Hazelcast's in-memory platform aggregates streaming customer interactions to deliver real-time insights for targeted marketing.^[56] In retail, Redis serves as an in-memory caching layer for inventory management, ensuring synchronized stock levels across channels with sub-millisecond latency to prevent overselling. Retailers like those using Redis Enterprise achieve real-time availability updates, synchronizing data from stores, warehouses, and online platforms to optimize fulfillment.^[57] For integration with business intelligence tools, in-memory systems connect directly to platforms like Tableau, which uses its Hyper in-memory engine to visualize live data from sources such as SAP HANA, enabling interactive dashboards for operational monitoring without performance degradation.^[58]^[59]

Emerging Uses in AI and Edge Computing

In-memory processing has become integral to artificial intelligence applications, particularly in handling large-scale tensor operations during model training and inference. Frameworks like TensorFlow leverage high-bandwidth memory (HBM) on GPUs to keep tensors in memory, minimizing data movement and enabling efficient parallel computations for deep learning workloads.^[60] For instance, HBM's high throughput supports the rapid access required for gradient computations in transformer models, reducing latency in training large language models.^[61] This approach addresses the memory wall in AI by performing operations directly on in-memory data structures, as demonstrated in memory-efficient techniques like FlashAttention, which tile attention computations to fit within GPU memory limits.^[61] Neuromorphic computing further advances in-memory processing for low-power AI by emulating brain-like architectures with processing-in-memory (PIM) elements. These systems integrate synaptic weights and computations within memory arrays, such as memristor-based networks, to achieve energy efficiencies orders of magnitude better than traditional von Neumann setups.^[62] In neuromorphic designs, PIM enables event-driven processing where computations occur only on relevant data spikes, ideal for resource-constrained AI tasks like pattern recognition.^[63] This paradigm supports ultra-low power inference, with applications in always-on AI systems that mimic neural plasticity.^[64] In edge computing, in-memory processing facilitates real-time IoT analytics by embedding PIM directly into sensor nodes, allowing on-device fusion of streaming data without offloading to the cloud. For example, processing-in-sensor accelerators handle binary-weight neural networks for tasks like anomaly detection in environmental sensors, achieving high throughput with minimal energy.^[65] In autonomous vehicles, PIM architectures enable rapid sensor fusion from LiDAR, cameras, and radar by performing computations in local memory, reducing decision latency for obstacle avoidance.^[66] Crossbar arrays based on emerging materials like MoS2 further enhance this by supporting parallel matrix operations for real-time localization.^[66] As of 2025, in-memory processing integrates with federated learning to enable privacy-preserving AI across distributed edge devices, where local memory computations minimize communication overhead during model aggregation. Techniques like hybrid federated frameworks significantly reduce memory footprints through in-situ gradient updates, supporting large models on resource-limited hardware.^[67] Edge AI chips, such as Intel's Loihi 2, incorporate in-memory synapses to simulate millions of neurons on-chip, facilitating continual learning for adaptive robotics and IoT.^[68] These advancements allow Loihi 2 to process spiking neural networks with 120 million synapses in embedded memory, enabling low-latency inference at the edge.^[69] Bandwidth constraints in edge environments, exacerbated by intermittent connectivity and high data volumes from sensors, are mitigated by PIM's local processing, which keeps computations within memory to avoid costly data transfers.^[70] This colocation significantly reduces effective bandwidth demands for inference tasks, preserving performance in bandwidth-starved scenarios like remote IoT deployments.^[71] By embedding logic in memory hierarchies like LPDDR, PIM ensures scalable edge AI without relying on external networks.^[70]

Benefits and Challenges

Performance Advantages

In-memory processing fundamentally improves performance by minimizing data access delays inherent in traditional disk-based systems. The total processing time can be modeled as T_{total} = T_{compute} + T_{io}, where T_{compute} represents computation time and T_{io} denotes input/output overhead. In disk-based approaches, T_{io} dominates due to mechanical seek times averaging 4-10 milliseconds per operation, whereas in-memory systems reduce T_{io} to near zero, as random access to DRAM occurs in approximately 100 nanoseconds.^[72] This latency reduction—often by orders of magnitude, from milliseconds to nanoseconds—enables sub-second response times critical for real-time processing scenarios.^[73] Throughput gains are equally pronounced, particularly for analytical workloads involving complex queries. In-memory systems achieve 10-100x higher query speeds compared to disk-based alternatives by eliminating persistent I/O bottlenecks, allowing sustained processing of large datasets without paging delays.^[74] For instance, input/output operations per second (IOPS) in RAM can reach millions, far surpassing the tens of thousands to hundreds of thousands typical of solid-state drives or the hundreds to thousands of hard disk drives.^[75]^[73] These metrics highlight scalability in big data environments, where distributed in-memory clusters handle petabyte-scale volumes across nodes while maintaining high throughput.^[76] Energy efficiency further amplifies these advantages, as in-memory operations avoid the power-intensive disk I/O, consuming fewer joules per transaction. Disk accesses require significant energy for mechanical components and data transfer, often in the range of several watts per operation, while memory accesses leverage low-power DRAM circuits.^[77] This results in overall lower energy per operation, supporting efficient scaling without proportional power increases. Optimized data structures, such as hash tables, enhance these gains by further streamlining access patterns.^[77]

Limitations and Mitigation Strategies

In-memory processing, while offering significant performance benefits, faces several inherent limitations that constrain its scalability and adoption. One primary drawback is the high cost of dynamic random-access memory (DRAM), which remains substantially more expensive than traditional disk storage. As of 2025, DRAM prices have surged by approximately 172% year-over-year due to surging demand from AI applications, with retail costs for DDR5 modules reaching around $3–5 per GB in consumer markets and higher for enterprise-grade server configurations, compared to hard disk drives (HDDs) at roughly $0.015–0.02 per GB. This cost disparity limits the feasibility of storing large datasets entirely in memory, particularly for applications requiring petabyte-scale data. Another critical limitation is data volatility, as DRAM loses all stored information upon power failure or system crash, necessitating frequent backups and recovery mechanisms that can introduce overhead and risk data loss in real-time scenarios. Additionally, capacity constraints persist, with current server nodes typically supporting only 1–8 terabytes of RAM per machine, insufficient for the exabyte-scale datasets common in big data environments. Large-scale in-memory systems also grapple with elevated power consumption and architectural complexities. Memory modules in database systems can account for a significant portion of total power draw in AI workloads, driven by the energy-intensive nature of DRAM refresh cycles and data movement, which exacerbates operational costs in data centers where electricity usage for such setups has risen to 4–5% of national totals in major economies. Hybrid disk-memory architectures, which attempt to blend in-memory speed with disk persistence, introduce further challenges, including increased software complexity for data tiering, synchronization overhead, and potential bottlenecks in I/O management that degrade overall efficiency. To mitigate these limitations, several strategies have been developed to balance performance with practicality. Tiered storage approaches address capacity and cost issues by keeping frequently accessed "hot" data in memory while offloading less-used "cold" data to cheaper disk or flash storage, enabling systems to handle larger workloads without fully provisioning expensive RAM. Data compression techniques, such as dictionary encoding commonly used in columnar in-memory databases, can reduce memory footprints by 5–10 times for structured data, allowing more information to fit within available capacity while minimizing power usage through reduced data transfers. Cloud bursting provides elastic scaling by dynamically provisioning additional cloud resources during peak loads, seamlessly extending on-premises in-memory clusters to handle surges without permanent over-provisioning. In 2025, emerging non-volatile memory (NVM) technologies like magnetoresistive RAM (MRAM) and resistive RAM (ReRAM) offer promising alternatives for persistence, combining DRAM-like speeds with non-volatility at lower costs than legacy options, thus reducing backup needs and enabling hybrid persistent in-memory designs.

History and Future Outlook

Evolution of the Technology

The evolution of in-memory processing originated in the 1960s with the adoption of magnetic core memory in mainframe computers, which provided non-volatile, random-access storage directly accessible by the CPU, eliminating the slower sequential access of earlier technologies like magnetic drums. This shift enabled early forms of processing where data resided entirely in primary memory during computations, as seen in systems like the IBM System/360 series introduced in 1964.^[78] Pioneering implementations of processing-in-memory (PIM) concepts emerged in the Apollo Guidance Computer, where simple logic functions were embedded within core memory arrays to perform calculations on-board, reducing interconnect complexity, weight, and power draw for space applications.^[79] By the 1980s, the rise of relational database management systems (RDBMS) incorporated in-memory caching to mitigate disk I/O bottlenecks, allowing frequently accessed pages to be buffered in RAM for rapid retrieval and updates. Oracle's early database versions, starting from its 1979 release and evolving through the 1980s, featured buffer caches that dynamically managed data blocks in memory, significantly enhancing transaction processing speeds on hardware with increasing RAM capacities.^[80] Similar mechanisms appeared in other RDBMS like IBM DB2, launched in 1983, where buffer pools optimized hit rates for query operations, laying the groundwork for hybrid storage models that prioritized in-memory performance.^[81] The 2000s marked a pivotal expansion with the emergence of NoSQL databases tailored for distributed, high-velocity workloads, culminating in Redis's 2009 release as an open-source, in-memory key-value store that supported advanced data structures for caching and pub/sub messaging.^[82] This period transitioned into the 2010s with the commercial launch of SAP HANA in 2010, an in-memory, column-oriented platform that unified OLTP and OLAP processing by storing and querying data entirely in RAM, enabling real-time analytics on terabyte-scale datasets.^[83] The post-2010 surge in big data, driven by exponential growth in unstructured data from web-scale sources and sensors, prompted a broader industry shift toward in-memory paradigms to achieve sub-second latencies unattainable with disk-centric systems.^[84] Apache Spark's stable 2014 release exemplified this integration, leveraging resilient distributed datasets (RDDs) for in-memory computation atop Hadoop, accelerating iterative algorithms by up to 100 times compared to MapReduce.^[85] Concurrently, PIM research advanced through U.S. government initiatives, including DARPA's exploration of memory-centric architectures to overcome the von Neumann bottleneck.^[86] Seminal academic work, such as proposals for low-overhead PIM instructions at conferences like ISCA, demonstrated practical embeddings of computation in DRAM arrays, achieving 2-10x energy savings for data-intensive tasks without altering CPU designs.^[87]

Current Trends and Predictions

As of 2025, the in-memory computing market is experiencing robust growth, valued at approximately USD 21 billion in 2024 and projected to reach USD 97 billion by 2034, reflecting a compound annual growth rate (CAGR) of around 16%.^[88] This expansion is driven by increasing demands for real-time data processing in dynamic environments. Key trends include deeper integration with hybrid cloud architectures, enabling seamless data mobility between on-premises and cloud resources to optimize performance and cost efficiency.^[89] Additionally, there is a notable rise in processing-in-memory (PIM) technologies at the edge, particularly for AI applications, where PIM chips reduce latency and energy consumption by embedding computation directly within memory arrays, as demonstrated by advancements from Samsung achieving up to 30% performance gains in AI inference tasks.^[15] Adoption of in-memory processing has become widespread among enterprises, with over 70% of new applications leveraging open-source platforms such as Redis for caching and real-time data handling, and Apache Arrow for efficient in-memory columnar data interchange.^[90] For instance, 70% of Fortune 500 companies utilize Microsoft Fabric, a unified analytics platform incorporating in-memory processing for high-speed data operations.^[91] This open-source dominance underscores the shift toward scalable, cost-effective solutions that support diverse workloads without vendor lock-in. Looking ahead, in-memory processing is predicted to dominate AI workloads by 2030, as AI-driven data center capacity is expected to account for 70% of global demand, necessitating ultra-fast memory access for training and inference.^[92] Standardization of Compute Express Link (CXL) is anticipated to enable disaggregated memory pools, allowing dynamic allocation across servers and improving utilization in cloud and AI infrastructures with up to 19% performance uplift in vector database searches.^[93] However, emerging challenges from quantum computing threaten current encryption standards protecting in-memory data, with cryptographically relevant quantum computers projected to emerge around 2035, prompting a urgent transition to post-quantum cryptography.^[94] These trends and predictions are heavily influenced by the post-2020 data explosion fueled by IoT and 5G deployments, which have generated over 175 zettabytes of global data by 2025, with more than 20% requiring edge processing that in-memory systems efficiently handle.^[95]

References

[1]
In-Memory Big Data Management and Processing: A Survey
### Extracted Content
[2]
https://www.gigaspaces.com/data-terms/in-memory-processing
[3]
https://www.sciencedirect.com/science/article/pii/S0167926022000402
[4]
Principles of Memory-Centric Programming for High Performance ...
In this paper, we provide an overview of memory-centric programming concepts and principles for high performance computing. Formats available. You can view the ...
[5]
https://zoo.cs.yale.edu/classes/cs201/Spring_2021/lectures/Architecture.html
[6]
Lecture 2 - Texas Computer Science
A typical DRAM (dynamic random access memory) chip will have an access time of about 10 to 100ns to retrieve memory from one address. ROM (read-only memory) ...
[7]
Flash Storage Memory - Communications of the ACM
Jul 1, 2008 · Adding in the seek time bumps these latencies up an additional 3–10 ms depending on the quality of the mechanical components.
[8]
SIndex: An SSD-based Large-scale Indexing with Deterministic ...
Actually, the average read and write I/O latency of NVMe-based SSDs are prevalently lower than 100 and 50 microseconds, respectively.
[9]
A Radical Proposal: Replace Hard Disks With DRAM - IEEE Spectrum
The application could then access its data at DRAM speeds (typically 50 to 100 nanoseconds), which allowed it to manipulate its data intensively. The ...
[10]
[PDF] A Case Study of Processing-in-Memory in off-the-Shelf Systems
Jul 16, 2021 · We assume DDR4-2400 (19.2 GB/s) with a Xeon 4110 ($500) for all configurations except the last, which uses DDR-3200 (25.6 GB/s – 33% more ...
[11]
Will SSD replace HDD? - IEEE 802
Sep 17, 2015 · Throughput Comparison. Large File Test – Read Write Average Test. • HDD up to 262 MB/s (2 Gb/s) • SSD up to 625MB/s (5 Gb/s). 11. Source:http ...Missing: bandwidth typical
[12]
Page replacement in Linux 2.4 memory management - USENIX
The page cache: this cache is used to cache file data for both mmap() and read() and is indexed by (inode, index) pairs. No dirty data exists in this cache; ...
[13]
[PDF] Improving Application Performance through Swap Compression
As pages compress so well, most swapped pages fit in the cache and nearly no disk access are needed. These two exceptions will not be very frequent and we.
[14]
[PDF] A Modern Primer on Processing in Memory - Ethz
Dec 5, 2020 · PIM places computation mechanisms in or near where the data is stored (i.e., inside the memory chips, in the logic layer of 3D-stacked memory, ...
[15]
A survey on processing-in-memory techniques: Advances and ...
In this survey, we analyze recent studies that explored PIM techniques, summarize the advances made, compare recent PIM architectures, and identify target ...
[16]
[PDF] PIM-DRAM: Accelerating Machine Learning Workloads using ... - arXiv
The addition operation comprises four main steps: (i) copy the first vector bit (A) to the compute rows. (ii) copy the second vector bit (B) to the compute rows ...
[17]
[PDF] Understanding a Modern Processing-in-Memory Architecture: - Ethz
GENERAL PROGRAMMING RECOMMENDATIONS. 1. Execute on the DRAM Processing Units (DPUs) portions of parallel code that are as long as possible.
[18]
[PDF] Aquabolt-XL: Samsung HBM2-PIM with in-memory processing for ...
PIM has been proposed to improve performance of bandwidth-intensive workloads and improve energy efficiency by reducing computing-memory data movement.
[19]
[PDF] Computing Utilization Enhancement for Chiplet-based ...
Litera- ture shows that PIM chips have already achieved 10-100x energy efficiency than other DLP implementations [3]. The low-power feature is due to the analog ...
[20]
https://arxiv.org/html/2503.20275v1
[21]
Survey of Disaggregated Memory: Cross-layer Technique Insights ...
Mar 26, 2025 · LPDDR DRAM is used in edge devices such as mobile phones and DDR DRAM serves as the main memory for servers. CXL memory, persistent memory, and ...Missing: variants | Show results with:variants
[22]
PIM-AI: A Novel Architecture for High-Efficiency LLM Inference - arXiv
Nov 26, 2024 · This paper introduces PIM-AI, a novel DDR5/LPDDR5 PIM architecture designed for LLM inference without modifying the memory controller or DDR/LPDDR memory PHY.Missing: variants | Show results with:variants<|separator|>
[23]
https://www.rambus.com/blogs/hbm3-everything-you-need-to-know/
[24]
High Bandwidth Memory (HBM): Everything You Need to Know
Oct 30, 2025 · HBM2, HBM3, and now HBM3E have all scaled bandwidth primarily by increasing the data rate. For example, HBM3E runs at 9.6 Gb/s, enabling a 1229 ...What is High Bandwidth... · How is HBM4 Different from...
[25]
AI's Rapid Growth: The Crucial Role Of High Bandwidth Memory
Feb 27, 2025 · The HBM standard released in 2013 specified 1 Gbps (Giga bit per second) bandwidth. HBM2 was 2.4 Gbps and HBM3 is at 6.4 Gbps.<|separator|>
[26]
What Is 3D XPoint? | Definition from TechTarget
Jun 10, 2024 · 3D XPoint is memory storage technology that was jointly developed by Intel and Micron Technology. The two vendors intended for the technology to fill a gap in ...
[27]
Intel Optane Persistent Memory 200 Series - Lenovo Press
Intel Optane Persistent Memory 200 Series offers large, non-volatile memory with lower latency, high capacity, and affordable cost, using a DDR4 DIMM form ...Missing: XPoint | Show results with:XPoint
[28]
Persistent Memory Documentation - NDCTL User Guide
Nov 4, 2024 · The Intel Optane PMem DSM Interface , Version 3.0, describes the NVDIMM Device Specific Methods (_DSM) that pertain to Optane PMem modules.
[29]
Dell EMC NVDIMM-N Persistent Memory User Guide
Each NVDIMM-N provides 16GB of nonvolatile memory and has the same form factor as a standard 288-Pin DDR4 DIMM. The NVDIMM-N resides in a standard CPU memory ...
[30]
Progress of emerging non-volatile memory technologies in industry
Nov 7, 2024 · This review focusses on the four most advanced eNVM technologies; ferroelectric (FRAM or FeRAM), phase-change (PCRAM), resistive (RRAM), and ...
[31]
Overview of emerging nonvolatile memory technologies - PMC - NIH
There are mainly five types of nonvolatile memory technology: Flash memory, ferroelectric random-access memory (FeRAM), magnetic random-access memory (MRAM), ...Missing: microsecond | Show results with:microsecond
[32]
Advances in Emerging Memory Technologies: From Data Storage to ...
MRAM (magnetic RAM) is a memory that uses the magnetism of electron spin to provide non-volatility (Figure 4). MRAM stores information in magnetic material ...2.1. 1. Nonvolatile Memory... · 3.1. 1. Filamentary Memory · 4. New Systems With New...<|separator|>
[33]
Explaining CXL Memory Pooling and Sharing - Compute Express Link
Aug 2, 2023 · CXL allows for both shared and pooled memory to serve different purposes. In this post, I will explain the importance of memory pooling and sharing.
[34]
Error Correction Code (ECC) - Semiconductor Engineering
Error correction codes, or ECC, are a way to detect and correct errors introduced by noise when data is read or transmitted.
[35]
[PDF] Revisiting Memory Errors in Large-Scale Production Data Centers
To reduce the effects of memory errors, error correcting codes (ECC) have been developed to help detect and correct errors when they occur. In order to develop ...
[36]
[PDF] Introducing the CXL 3.X Specification
Feb 18, 2025 · Enables unified OS based management of CXL and PCIe devices, everybody wins! 18. Compute Express Link ® and CXL ® are registered trademarks of ...
[37]
Press Room - Compute Express Link
Optimizing Data Center TCO With CXL And Compression. Feb 13, 2025 ; CXL Update Emphasizes Security. Jan 3, 2025 ; CXL is Finally Coming in 2025. Dec 19, 2024.
[38]
[PDF] SAP HANA Security Guide for SAP HANA Platform - SAP Help Portal
Jan 28, 2022 · ... Isolation ... ACID-compliant database with advanced data processing, application services, and flexible data integration services. The SAP ...
[39]
Columnar and Row-Based Data Storage - SAP Help Portal
SAP HANA uses column-wise (column tables) and row-wise (row tables) storage. Column storage is optimized for read operations, while row storage is better for ...
[40]
Row Store vs Column Store in SAP HANA - dbi services
May 21, 2015 · The SAP HANA database allows you to create your tables in Row or Column Store mode. In this blog, I will demonstrate that each method has its advantages and ...
[41]
FairHash: A Fair and Memory/Time-efficient Hashmap
May 30, 2024 · In this paper, we introduce FairHash, a data-dependant hashmap that guarantees uniform distribution at the group-level across hash buckets.
[42]
6.5. Hashing — Problem Solving with Algorithms and Data Structures
... hash table is the load factor, λ . Conceptually, if λ is small, then there is a lower chance of collisions, meaning that items are more likely to be in the ...
[43]
Cache-Oblivious B-Trees | SIAM Journal on Computing
This paper presents two dynamic search trees attaining near-optimal performance on any hierarchical memory. The data structures are independent of the ...
[44]
Engineering In-place (Shared-memory) Sorting Algorithms
Jan 31, 2022 · Quicksort works by selecting a pivot element and partitioning the array such that all elements smaller than the pivot are in the left part and ...
[45]
https://db.in.tum.de/teaching/ws1819/dataprocessing/chapter2.pdf
[46]
Introduction to Garbage Collection Tuning - Java - Oracle Help Center
The garbage collector (GC) automatically manages the application's dynamic memory allocation requests. A garbage collector performs automatic dynamic memory ...
[47]
False Sharing - The Linux Kernel documentation
False sharing occurs when a cache line is shared across multiple CPUs, causing CPUs to reload the whole line even if only one member is modified.<|separator|>
[48]
[PDF] Concurrent Data Structures - People | MIT CSAIL
An obstruction-free operation is guaranteed to complete within a finite number of its own steps after it stops encountering interference from other operations.
[49]
[PDF] Oracle TimesTen In-Memory Database for the Financial Industry
The US Postal Service uses Oracle TimesTen to run a real-time fraud detection application using a 1.7 terabyte in-memory database. With transactions executing ...
[50]
SAS Fraud Management & Fraud Detection Software
Score 100% of all transactions in real time with in-memory processing that delivers the industry's highest throughput and lowest latency response times.
[51]
How Misys Used In-Memory Tech for Investment Risk Management
Feb 7, 2017 · Not only is the in-memory solution speeding data delivery to customers, but it is also helping the company support clients' regulatory mandates ...
[52]
In-Memory Computing For Financial Modeling And Risk Simulations
Sep 2, 2025 · Accelerate financial modeling with in-memory computing for real-time Monte Carlo simulations, VaR calculations, and stress testing.
[53]
Leveraging SAP HANA's In-memory Computing Capabilities for ...
Nov 1, 2024 · In conclusion, SAP HANA's in-memory computing provides a robust foundation for real-time supply chain optimization, fostering agility, ...
[54]
A perspective on applications of in-memory analytics in supply chain ...
In this paper, we provide a comprehensive perspective on applications of in-memory analytics in the field of supply chain management (SCM) that use the ...
[55]
SAP HANA In-Memory Database
SAP HANA uses multi-core CPUs, fast communication, and terabytes of main memory, keeping all data in memory to avoid disk I/O penalties. Disk is still needed ...
[56]
360° Customer View - Hazelcast
In-memory computing platforms provide the framework for a comprehensive, real-time view of your customers. Customers stream into your business millions of times ...
[57]
Real-time inventory - Redis
Creates a resilient, scalable inventory database that delivers real-time results. Synchronize and scale inventory positions in real time with submillisecond ...
[58]
Tableau Server Data Engine
Hyper is Tableau's in-memory Data Engine technology optimized for fast data ingests and analytical query processing on large or complex data sets.Memory And Cpu Usage · Server Configuration... · Memory Usage
[59]
12 Best Tableau Integrations: How They Work + Use Cases
Jul 1, 2023 · SAP HANA, an in-memory data platform, can be seamlessly integrated with Tableau to enable real-time data analysis and visualization. By ...
[60]
[PDF] Understanding Data Storage and Ingestion for Large-Scale Deep ...
On each trainer, a PyTorch runtime manages the local training workflow, transferring preprocessed tensors between the. DPP Client and GPU device memory.
[61]
A survey on memory-efficient transformer-based model training in AI ...
Jul 29, 2025 · ... GPU memory (HBM) and on-chip SRAM. Through tiling and recomputation, FlashAttention avoids materializing large intermediate attention ...3.1 Algorithm · 3.1. 4 Gradient Accumulation · 5 Future Trends And...
[62]
Hardware implementation of memristor-based artificial neural ...
Mar 4, 2024 · On the other hand, processor in memory (PIM) accelerators integrate processing elements with memory technology. ... low-power neuromorphic in- ...
[63]
Neuromorphic Computing in the Era of Large Models - IEEE Xplore
Neuromorphic computing is a biologically inspired approach using event-driven processing and in-memory computation for energy efficiency, addressing challenges ...
[64]
Embodied Neuromorphic Artificial Intelligence for Robotics - arXiv
Recent advances in neuromorphic computing showed great success in achieving high accuracy, low latency, low memory footprint, and ultra-low power/energy ...
[65]
[PDF] PISA: A Binary-Weight Processing-In-Sensor Accelerator for Edge ...
Feb 18, 2022 · Abstract—This work proposes a Processing-In-Sensor Accel- erator, namely PISA, as a flexible, energy-efficient, and high-.
[66]
Large-scale crossbar arrays based on three-terminal MoS2 ... - Nature
Oct 28, 2025 · A survey of ReRAM-based architectures for processing-in-memory and neural networks. ... sensor fusion in autonomous vehicle target localization.Results & Discussion · Crossbar Overview · Crossbar Operation
[67]
FedHybrid: Breaking the Memory Wall of Federated Learning via ...
Federated Learning (FL) emerges as a new learning paradigm that enables multiple devices to collaboratively train a shared model while ...
[68]
Real-time Continual Learning on Intel Loihi 2 - arXiv
Nov 3, 2025 · Intel Loihi 2 is a digital neuromorphic chip with 128 neurocores, each simulating up to one million neurons and 123 million synapses implemented ...
[69]
Neuromorphic Principles for Efficient Large Language Models on ...
Mar 25, 2025 · In this paper, we propose a hardware-aware approach that integrates an efficient LLM architecture with Intel's neuromorphic processor, Loihi 2.
[70]
LP-Spec: Leveraging LPDDR PIM for Efficient LLM Mobile ... - arXiv
Aug 30, 2025 · To address this memory bandwidth bottleneck, recent advances in DRAM-based processing-in-memory (PIM) technology have been introduced, ...
[71]
Optimizing Edge AI: A Comprehensive Survey on Data, Model, and ...
Jan 4, 2025 · This paper presents an optimization triad for efficient and reliable edge AI deployment, including data, model, and system optimization.
[72]
[PDF] Lecture 18: Memory Systems - UMBC
... access time): time to move disk arm to desired cylinder (seek time) plus time for desired sector to rotate under disk head. (rotational latency). • Measured ...
[73]
Oracle Autonomous Database | Features & Pricing | ESF
Feb 26, 2019 · Performance: Autonomous Services run on Exadata Database infrastructure can run millions of IOPS with latency response in the nanoseconds.
[74]
[PDF] Top-5 Innovations of Oracle's Database In-Memory - CMU 15-445/645
Oct 23, 2019 · • Latency Critical OLTP Applications. • Microsecond response time ... • 5-10x faster smart scan in storage. • 15x increase in total ...
[75]
What Is IOPS: Input/Output Operations per Second Defined - Sematext
The IOPS values of SSDs typically range from tens of thousands to hundreds of thousands, whereas HDDs range from a hundred to several thousand.Iops (input/output... · Iops Vs. Throughput Vs... · Iops In Ssd Vs. Hdd Storage...<|control11|><|separator|>
[76]
https://discovery-center.cloud.sap/serviceCatalog/sap-hana-cloud
[77]
Energy-Efficient Database Systems: A Systematic Survey | ACM Computing Surveys
### Summary of Performance Advantages of In-Memory Processing vs. Disk-Based Systems
[78]
Memory & Storage | Timeline of Computer History
In 1953, MIT's Whirlwind becomes the first computer to use magnetic core memory. Core memory is made up of tiny “donuts” made of magnetic material strung on ...
[79]
Processing-In-Memory, 1960s-Style
Jan 26, 2024 · In the 1960s one very notable computer used PIM to trim the weight and power consumption of a spacecraft by reducing the complexity of the CPU.
[80]
Buffer management in relational database systems
Principles of database buffer management. This paper discusses the implementation of a database buffer manager as a component of a DBMS. · Second-Level Buffer ...
[81]
A Brief History of Data Architecture: Shifting Paradigms - Dataversity
Feb 3, 2022 · Buffers were originally a temporary memory storage system designed to remove data from a primitive computer's memories quickly, so the computer ...
[82]
Thank You, Salvatore Sanfilippo - Redis
Jun 30, 2020 · After maintaining the open source Redis project for 11 years, Salvatore Sanfilippo (aka antirez) has decided to take a step back.<|separator|>
[83]
The SAP HANA Revolution
Mar 31, 2020 · When SAP first introduced the idea of the in-memory database SAP HANA in 2010, skeptics dismissed the idea as a “complete fantasy.”.
[84]
[PDF] Harnessing the Power of Big Data in Real Time through In-Memory ...
THE EMERGENCE OF BIG DATA The massive explosion in data is creating manageability issues for companies around the world, particularly in the context of mergers ...<|separator|>
[85]
[PDF] Apache Spark: A Unified Engine for Big Data Processing
Nov 2, 2016 · Apache Spark is a unified engine for distributed data processing, unifying streaming, batch, and interactive workloads, and using RDDs.
[86]
[PDF] DARPA Software Defined Hardware - DTIC
In this research, we have achieved that by designing a Processor in Memory (PIM) random-access memory (RAM) and PIM Set that finds unserviced nodes and ...
[87]
[PDF] PIM-Enabled Instructions: A Low-Overhead, Locality-Aware ...
Jun 17, 2015 · In this paper, we propose a new PIM architecture that. (1) does not change the existing sequential programming mod- els and (2) automatically ...
[88]
In-Memory Computing Market Size to Hit USD 97.06 Bn by 2034
Jul 24, 2025 · The global in-memory computing market was valued at USD 21.02 billion in 2024. It is projected to reach USD 97.06 billion by 2034. The market is ...
[89]
The Future of Hybrid Cloud Adoption: Expert Insights for 2025
Jan 14, 2025 · Discover the benefits and key strategies of hybrid cloud adoption, overcome challenges, and ensure fast and secure implementation.Data Integration And... · Clear Deployment Strategies · Cloud Management Platforms
[90]
Processing in-memory (PIM) Chips Market's Growth Catalysts
Rating 4.8 (1,980) Jul 24, 2025 · January 2024: Samsung announces a significant breakthrough in PIM chip technology, leading to a 30% performance improvement in AI inference.
[91]
Open Source Database Adoption in 2025: Costs, Risks & Myths
Jul 18, 2025 · Open Source Database 2025 explores costs, risks, and myths to help enterprises make informed choices for future-proof database adoption.
[92]
Why Microsoft Fabric has already been adopted by 70 ... - VentureBeat
and what's next. Sean Michael Kerner. May 19, 2025. Credit: Image ...
[93]
The cost of compute: A $7 trillion race to scale data centers - McKinsey
Apr 28, 2025 · Our research shows that global demand for data center capacity could almost triple by 2030, with about 70 percent of that demand coming from AI ...
[94]
[PDF] How CXL Transforms Server Memory Infrastructure
Oct 8, 2025 · Up to 19% higher performance with CXL-connected DRAM (CMM-D) in VectorDB search compared to Local-DRAM-only case in Milvus RAG cluster.
[95]
How Quantum Computing Will Upend Cybersecurity | BCG
Oct 15, 2025 · Sometime around 2035, quantum computers are expected to become sufficiently powerful to compromise current widely used cryptographic standards, ...Key Takeaways · Quantum Trouble · Pay Now, Or Pay More Later
[96]
https://www.prnewswire.com/news-releases/ai-in-edge-computing-market-to-surpass-usd-83-86-billion-by-2032--driven-by-industrial-iot-5g-and-intelligent-infrastructure-expansion--datam-intelligence-302603906.html