Fact-checked by Grok 2 weeks ago

eDRAM

Embedded Dynamic Random-Access Memory (eDRAM) is a semiconductor memory technology that integrates dynamic RAM cells directly onto the same integrated circuit die as logic or processor elements, enabling high-density on-chip caching with reduced data movement latency and energy compared to external DRAM modules. Unlike traditional static RAM (SRAM), which uses multiple transistors per cell for stability without refresh, eDRAM employs a single-transistor DRAM cell that requires periodic refreshing to retain data, but achieves significantly higher density—such as 0.029 µm² per bit versus 0.108 µm² for SRAM in 22 nm Intel implementations—at the cost of more complex integration. This embedding facilitates wider internal buses and faster access speeds, making eDRAM particularly suitable for performance-critical applications like last-level caches in high-end processors. Developed primarily through advancements in the and early , eDRAM technology gained prominence via IBM's innovations, with the company integrating it into its logic-based processes starting from the third generation at 130 nm node, featuring specialized macros for general-purpose, area-optimized, and high-speed uses. IBM's (45 nm, 2010) marked a notable milestone with 32 MB of on-chip eDRAM L3 per , scaling to 96 MB in the (22 nm, 2013), delivering up to 3 TB/s across 12 cores at 4 GHz while occupying only a third of the area of equivalent . adopted eDRAM for on-package L3 caching in its Haswell-EP (22 nm, 2014), providing 128 MB shared with 102.4 GB/s throughput at 1 W, though it later shifted toward SRAM-dominant designs in newer architectures. Beyond servers, eDRAM has seen use in gaming consoles, such as the and Nintendo Wii U, where and Renesas implemented it for efficient video memory buffering. Key advantages of eDRAM include its superior density and cost-effectiveness for large caches—enabling terabyte-scale in compact dies—alongside lower dynamic for access due to proximity to , though standby can be higher without optimized retention modes. Challenges involve reconciling DRAM's refresh overhead with processes, necessitating custom architectures like destructive-read sensing for 3.3 ns cycle times, and managing variability in retention across process nodes. Despite these hurdles, eDRAM remains a vital option for systems-on-chip (SoCs) demanding balanced performance, , and area, as seen in IBM's z16 mainframe (7 nm, 2022) with 960 MB eDRAM L4 cache for applications where cache hierarchy efficiency directly impacts overall throughput.

Overview

Definition and principles

Embedded dynamic random-access memory (eDRAM) is a form of (DRAM) integrated directly onto the same die or within a as the associated logic circuitry, such as in an (ASIC). This on-chip integration distinguishes eDRAM from discrete DRAM chips, which function as standalone components interfaced externally via buses or packages. By embedding the memory alongside processing elements, eDRAM facilitates tighter coupling between computation and storage, optimizing for applications requiring high-bandwidth, low-latency access. At its core, eDRAM operates on the same principles as conventional , relying on capacitor-based storage s to hold data as electrical charge. Each bit is represented in a memory where a charged signifies a logic '1' and a discharged one a logic '0'; however, inherent charge leakage through the 's and surrounding materials causes the stored value to degrade over time, typically within microseconds to milliseconds. To counteract this, eDRAM requires periodic refresh cycles, during which the data is read, amplified, and rewritten to the , embodying the "dynamic" nature of the technology. This refresh mechanism ensures but introduces overhead in power and performance. The fundamental building block of eDRAM is the 1T1C (one , one ) cell structure, comprising a single access connected to a storage . During a read operation, the gates the to a bitline, producing a small voltage that is detected and amplified by a to determine the stored value; the then restores the full charge level back to the . Write operations similarly charge or discharge the via the under control of applied voltages on the bitline and wordline. This enables high-density , with typical cell areas ranging from 20 F² to 30 F² in processes, where F denotes the minimum , balancing with manufacturability. A key advantage of eDRAM's integration is the substantial reduction in latency compared to off-chip DRAM, as data transfer avoids the delays and bottlenecks of external interfaces, potentially achieving latencies in the range for on-die accesses.

Basic operation

The basic operation of eDRAM relies on the 1T1C cell structure, where is stored as charge on a accessed via a single , requiring periodic refresh to maintain integrity due to charge leakage. Operations are managed through cycles involving word lines for row selection and bit lines for transfer, supported by amplifiers to detect and amplify small voltage signals. These processes ensure reliable access in embedded environments, with typical on-chip access latencies ranging from 1 to 5 ns. In the read operation, the bit line is first precharged to an intermediate reference voltage, often VDD/2, to enable differential sensing. The corresponding word line is then activated, connecting the storage capacitor to the bit line and causing charge sharing that produces a small voltage differential proportional to the stored charge. A sense amplifier latches onto this differential, amplifying it to full rail levels (e.g., 0 V for logic '0' or VDD for logic '1') to output the data. Since this process is destructive—partially discharging the capacitor regardless of the stored value—the sense amplifier immediately restores the original charge by driving the bit line back to the appropriate voltage while the word line remains active, completing the restoration step. The write operation similarly begins with precharging and equalization of the bit line and associated circuitry to minimize . The word line is activated to open the access , allowing the bit line—driven to the desired voltage level ( for '0' or VDD for '1')—to overwrite the 's charge directly through charge transfer. Once the reaches the target voltage, the word line is deactivated to isolate the , ensuring the new is stored stably. This overwrites any prior content without requiring a prior read, though equalization steps help balance any residual differentials across paired bit lines in differential architectures. Refresh operations are essential to counteract subthreshold leakage in the access and dielectric leakage in the , which gradually erode stored charge. Typically every 10-100 µs at , depending on process and temperature, all cells in a row or entire bank undergo a non-data-accessing read cycle: the word line activates to share charge onto the precharged bit line, the detects and amplifies the signal, and the data is immediately rewritten to restore full charge levels. This distributed refresh—often row-by-row across multiple banks—prevents , with the overhead consuming significant in eDRAM designs (often 5-15% or more), though optimized high-performance implementations can reduce it below 5%. eDRAM arrays are structured into independent banks, each containing a grid of cells addressed via row decoders (selecting word lines) and column decoders ( bit lines to sense amplifiers and I/O paths), enabling parallel access to multiple banks for improved throughput. In dense arrays, potential half-select issues—where non-targeted cells in an activated row experience unintended charge disturbance from shared word line voltage or bit line —are mitigated through techniques such as hierarchical bit line segmentation and local sense amplifiers, which limit signal propagation and reduce capacitive loading on global lines.

History

Early development

The development of embedded dynamic random-access memory (eDRAM) emerged as an extension of standalone technology, which was first commercialized in the late and early . The , introduced in 1970, marked a pivotal advancement in discrete DRAM chips, providing denser memory for minicomputers and calculators by replacing earlier core memory systems. By the early 1980s, researchers recognized the limitations of off-chip memory access in integrated circuits, where the growing disparity between processor speeds and memory latency—later termed the "memory wall" in 1995—created significant performance bottlenecks in systems. This conceptual challenge drove initial efforts to integrate DRAM cells directly into logic processes, aiming to enable on-chip caching and reduce inter-chip communication delays. In the 1980s, led pioneering research on embedding into complementary metal-oxide-semiconductor () logic fabrication, focusing on compatible process technologies. A key innovation was the integration of trench capacitors, which allowed storage nodes to be built alongside logic transistors without requiring separate high-temperature steps that could degrade device performance. However, early attempts faced substantial challenges in process compatibility, particularly the differing thermal budgets: logic transistors demanded lower-temperature annealing to preserve profiles, while capacitors needed higher temperatures for reliable formation. 's work in this era laid the groundwork for monolithic integration, demonstrating small-scale embedded memory arrays in experimental chips by the mid-1980s. The 1990s saw the maturation of these efforts through prototypes that showcased feasible eDRAM implementations. In the early 1990s, developed prototypes incorporating embedded arrays using a 0.5 μm process, validating the approach for applications. By the mid-1990s, advancements enabled the first demonstrations of 1 Mb eDRAM macros in 0.25 μm processes, achieving densities suitable for on-chip caches while maintaining compatibility with standard logic flows. These prototypes highlighted eDRAM's potential to address the memory wall by providing faster, lower-latency access compared to external , though yield and area efficiency remained hurdles for broader adoption.

Commercial adoption

The commercial adoption of eDRAM began in the early 2000s within , particularly consoles, where it provided high-bandwidth on-chip for graphics rendering. Renesas and supplied eDRAM for the Wii's GPU, integrating several megabits to support real-time , while ATI's GPU in Microsoft's (2005) featured 10 MB of eDRAM for to enhance performance in high-definition . These early implementations demonstrated eDRAM's viability for embedded applications requiring dense, fast integrated with logic processes. IBM pioneered widespread adoption in processors starting with the POWER7 in 2008, fabricated at 45 nm SOI, which incorporated 32 MB of eDRAM as shared L3 cache per eight-core chip to deliver balanced multi-threaded performance in servers. This was followed by the in 2014 at 22 nm, expanding to 96 MB eDRAM L3 cache on-chip plus 128 MB off-chip L4 eDRAM, enabling terabyte-scale bandwidth for workloads. IBM extended eDRAM use to mainframe processors, with the z13 (2015) featuring 64 MB L3 eDRAM shared by all cores per chip and the z14 (2017) scaling to 128 MB shared L3 eDRAM per processor unit SCM, supporting mission-critical . Intel integrated eDRAM into its processors for consumer and workstation markets in the mid-2010s. The Haswell architecture (2013, 22 nm) included 128 MB eDRAM in a separate die for Iris Pro graphics in select unlocked models, acting as a victim to boost integrated GPU performance by up to 2x in bandwidth-limited scenarios. Broadwell (2015, 14 nm) refined this in the i7-5775C, retaining 128 MB eDRAM L4 connected via a high-speed OPIO interface, which improved hit rates and iGPU frame rates in gaming by 20-30% over non-eDRAM variants. Adoption trends peaked in the late and early driven by eDRAM's density advantages over at nodes above 28 nm, but declined through the decade as FinFET scaling improved SRAM cell sizes and reduced the integration complexity gap. discontinued eDRAM after Broadwell due to yield challenges at 14 nm, higher costs, and sufficient DDR4 mitigating the need for on-package cache. shifted Power processors to SRAM-only L3 in (2020, 7 nm) for simpler fabrication, though mainframes retained eDRAM longer; the z15 (2019, 14 nm) used up to 960 MB shared L4 eDRAM per drawer and 256 MB L3 eDRAM per processor unit SCM, aggregating to several GB in multi-chip configurations for AI-accelerated workloads. Following the z15, eDRAM use in mainframes declined, with the z16 (2022) discontinuing eDRAM in the Telum processor in favor of SRAM-based caches. As of 2025, eDRAM adoption remains limited in specialized embedded designs, with foundries like offering IP blocks at advanced nodes such as 7 nm, though broader resurgence in accelerators and HPC has not materialized amid shifts to alternative memory technologies.

Technology

Cell architecture

The primary building block of eDRAM is the 1T1C cell, comprising a single access connected to a storage that holds the charge representing . This , typically an n-type , controls access to the capacitor via the word line, while the bit line facilitates read and write operations. The design is critical for and , with two main variants: deep trench capacitors (DTCs) etched vertically into the substrate and stacked capacitors (STCs) built above the layer. DTCs are favored in eDRAM for their superior integration with logic processes, as they are fabricated early in the flow—prior to formation—allowing placement below the active device layer without disrupting subsequent high-k or back-end-of-line steps. In contrast, STCs offer flexibility in advanced nodes but require careful management of thermal budgets and aspect ratios to avoid impacting logic characteristics. eDRAM arrays are organized in a folded bit-line to enhance by reducing differential noise between complementary bit lines, which are physically adjacent and precharged to mid-rail voltage. This architecture pairs bit lines such that a word line connects a to one line while the other serves as a reference, minimizing compared to open bit-line layouts. Sense amplifiers, essential for detecting small voltage differentials from the (typically 100-200 ), are shared across multiple arrays—often staggered at the edges of subarrays—to optimize area, with signals enabling selective connection during operations. Word-line drivers, boosted to higher voltages for full turn-on, incorporate transistors between subarrays to mitigate disturb effects, such as charge leakage in adjacent s from repeated activations or voltage coupling. These elements collectively support dense layouts while maintaining compatibility with FinFET or SOI logic transistors. To support scaling below 10 nm, eDRAM capacitors rely on high-k dielectrics like ZrO₂ (with k ≈ 40) in multilayer stacks, often combined with Al₂O₃ barriers to reduce leakage while preserving capacitance density; for instance, ZrO₂-based metal-insulator-metal structures in STCs or high-k/metal nodes in DTCs provide improved capacitance over traditional SiO₂/Si₃N₄ equivalents. In research for advanced nodes such as 7 nm, cell capacitance targets remain around 20-30 fF to ensure adequate signal margins against leakage and noise, balancing retention with area constraints. Large-scale arrays, with macros typically up to 16 Mb or composed into larger caches, integrate error-correcting codes (ECC) at the subarray or macro level to enhance reliability by detecting and correcting single- or multi-bit errors induced by process variations or alpha particles. Recent research (as of 2024) highlights scaling challenges for traditional 1T1C eDRAM below 10 nm, leading to exploration of capacitorless gain-cell eDRAM (GC-eDRAM) variants, which use parasitic capacitances for storage and offer better compatibility with advanced logic processes like FinFET at 7 nm and below, as demonstrated in heterogeneous integrations such as Si-MoS₂ eDRAM for ultra-low power applications.

Fabrication and integration

The fabrication of eDRAM begins with a standard baseline process, augmented by specialized steps to form deep trench capacitors within the same die as circuitry. Trench formation typically involves (RIE) to create high-aspect-ratio structures in the substrate, enabling dense arrays. Following , high-k such as HfO₂ are deposited conformally using (ALD) to line the trenches, ensuring uniform insulation and . These steps require 5 to 10 additional mask layers beyond the core flow, primarily for definition, , deposition, and formation, which integrate seamlessly after the front-end-of-line (FEOL) transistor fabrication but before back-end-of-line (BEOL) metallization. A primary integration challenge arises from mismatched thermal requirements between eDRAM and components. Capacitor formation in eDRAM often necessitates high-temperature annealing around 800°C to crystallize dielectrics and optimize electrical properties, which exceeds the thermal budget of advanced processes limited to below 600°C to preserve strain engineering in channel materials like SiGe for mobility enhancement. To address this, processes sequence eDRAM steps early in the flow, prior to sensitive features, or employ low-temperature alternatives like plasma-enhanced ALD for dielectrics. In advanced nodes, hybrid bonding facilitates multi-die eDRAM configurations, allowing separate fabrication of memory and logic dies before direct Cu-to-Cu and oxide-to-oxide bonding, mitigating monolithic thermal conflicts while enabling heterogeneous . Deep trench defects, such as sidewall roughness or incomplete fills, pose yield risks due to higher defect densities in capacitor arrays compared to logic regions, potentially reducing overall chip yields by 10-20% without mitigation. schemes, including spare rows and columns in the eDRAM , repair these faults post-testing, improving effective yields to over 90% in production. Scaling eDRAM to sub-14nm nodes, including integration with FinFET or GAAFET logic, maintains viability through refined ALD processes, though it introduces approximately 15-25% area overhead relative to pure logic dies due to added volumes. Foundries like and provide eDRAM as modular () within process design kits (PDKs), enabling designers to integrate macros via standardized libraries for , , and in EDA tools.

Characteristics

Performance metrics

eDRAM exhibits significantly higher density than at equivalent process nodes, typically 4-6 times greater, due to its simpler 1T1C cell structure compared to the 6T configuration of . For instance, 's eDRAM implementation achieves a cell size of 0.026 μm² in 22nm SOI technology, contrasting with 0.144 μm² for a performance-density-balanced cell at the same node, enabling the integration of large on-chip s exceeding 100 MB—such as the hundreds of megabytes in processors for applications. This density scaling has supported sizes up to 128 MB in Intel's Broadwell processors, where eDRAM serves as an L4 to handle high-capacity without excessive area overhead. Read in eDRAM ranges from 1-40 ns, reflecting its position as a high-speed memory suitable for last-level . A 14 nm eDRAM macro demonstrates 1 ns access time, while Intel's Broadwell eDRAM L4 reports a load-to-use of approximately 36.6 ns at 3.8 GHz. Bandwidth capabilities scale with integration, reaching up to 102.4 per watt via optimized interfaces like Intel's OPIO, and aggregate figures as high as 3 TB/s for L3 bandwidth across 12 cores in 's 22 nm processor. Row activation time approximates 10-15 ns, governed by the time required to charge the sense amplifiers from the , akin to conventional operations but optimized for on-chip proximity. eDRAM density has followed scaling trends similar to standalone , roughly doubling every 2-3 years with process shrinks, as seen in the progression from 0.026 μm² cells at 22 nm to 0.0174 μm² at 14 nm in technologies, facilitating ever-larger embedded s. Access times, however, are approaching a 1-2 ns practical floor, constrained by periodic refresh cycles needed to counteract charge leakage in the , which becomes more pronounced at advanced s and limits further reductions despite transistor scaling. In cache hierarchies, eDRAM achieves hit rates exceeding 80% for L3 and L4 levels, with Broadwell demonstrating an estimated 84% hit rate in SPEC CPU2017 workloads like omnetpp, contributing to overall system efficiency. Benchmarks from 's Broadwell platform show performance improvements in cache-sensitive applications due to the expanded eDRAM L4 , highlighting its impact on throughput.

Power and reliability

eDRAM's power profile is characterized by a combination of static power dominated by periodic refresh operations and dynamic power from read/write accesses. The static power arises primarily from refresh mechanisms needed to counteract charge leakage in storage capacitors, typically consuming around 16 pW per bit at elevated temperatures such as 85°C in logic-compatible designs, with up to 24% static power savings compared to power-gated in certain modes. Dynamic access energy is generally in the range of 0.4 to 1.2 pJ per bit for read and write operations, benefiting from the single-transistor-plus-capacitor structure that minimizes switching overhead compared to multi-transistor alternatives. For large arrays, eDRAM's refresh overhead can offset the high subthreshold leakage prevalent in cells during standby. Recent advancements as of 2024 include gain-cell eDRAM designs achieving retention times over 1 s in low-power modes and heterogeneous variants improving retention by up to five orders of magnitude. Reliability in eDRAM is challenged by data retention limitations and susceptibility to soft errors, necessitating robust mitigation strategies. Retention times vary from 10 ms to over 300 ms, strongly influenced by operating temperature and supply voltage; for instance, one 180 nm gain-cell design maintains 306 ms at 25°C but drops to 9.5 ms at 85°C due to accelerated junction and subthreshold leakage. Soft error rates are typically low at under 4 FIT per Mb from cosmic ray-induced events, but these can escalate under voltage scaling, where reduced node capacitance lowers the critical charge threshold for upsets. Mitigation relies on error-correcting codes (ECC) for multi-bit detection and correction, combined with periodic scrubbing to refresh affected rows and prevent error accumulation. Capacitor leakage currents pose a key reliability concern, driven by , , and subthreshold paths that erode stored charge and contribute to 1-2% annual yield degradation in production arrays due to process-induced defects. These leaks are exacerbated in scaled nodes, leading to variable retention across cells and requiring built-in or adaptive refresh to maintain . Advances such as dual-port gain cells address power contention during simultaneous operations, achieving up to 28% reduction in by enabling dual-row activation that triples effective retention without increasing refresh frequency. Voltage scaling from 0.6 V to 1.0 V enables low-power modes in eDRAM, trading off retention stability for energy savings; however, operating below 0.7 V significantly shortens retention times and heightens vulnerability by diminishing the charge margin against thermal noise and radiation strikes. This necessitates careful calibration of refresh intervals and strength to balance power efficiency with , particularly in temperature-variable environments.

Applications

In processors and SoCs

eDRAM serves as a key component in modern processors, particularly for implementing large last-level caches (L3 or L4) in multi-core CPUs, where its higher allows for significantly expanded on-chip capacity compared to . For example, IBM's POWER7 processor incorporates a 32 MB shared L3 cache using eDRAM, enabling efficient data sharing among its eight cores in environments. Similarly, IBM's processor features up to 96 MB of eDRAM-based L3 per , supporting demanding workloads in applications by providing low-latency access to larger datasets. In Intel's Haswell architecture, select variants like the Core i7-4950HQ integrate 128 MB of eDRAM as an L4 , augmenting the L3 to handle complex computations with reduced stalls. In graphics processing units (GPUs), eDRAM has been utilized to enhance caching and other memory-intensive operations in high-end integrated designs. Intel's Haswell-based Iris Pro Graphics 5200 employs its 128 MB eDRAM to act as a victim cache for textures and shaders, improving rendering by keeping more data on-chip and minimizing accesses to slower system memory. This integration allows GPUs to achieve higher effective for graphics pipelines, particularly in scenarios involving large texture datasets. Within system-on-chip (SoC) designs, eDRAM facilitates unified memory architectures in both mobile and server contexts, enabling seamless data access across CPU, GPU, and accelerators. In server SoCs, IBM's z15 mainframe (preceding the z16) relies on eDRAM for its L3 , optimizing for and workloads through dense, on-chip storage that supports coherent sharing. For mobile SoCs, NEC Electronics' µPD809400 integrates 8 Mb of eDRAM to support VGA acceleration, providing unified buffering for video and image processing in portable devices. These implementations leverage eDRAM's 3-4x higher density over to create 2-4x larger on-chip memory pools, improving bandwidth in data-intensive tasks like inference by localizing data access. In coherent multi-core systems, eDRAM's advantages extend to bandwidth optimization, where its on-chip placement reduces off-chip traffic by capturing more temporal locality and minimizing DRAM controller contention in bandwidth-bound applications. As of 2025, trends in emphasize hybrid SRAM-eDRAM hierarchies, combining SRAM's low latency for critical paths with eDRAM's capacity for bulk storage, as explored in emerging architectures for serving and , such as co-designed KV caching systems achieving up to 3.9× speedup. This approach balances performance and efficiency in SoCs targeting workloads, with ongoing research demonstrating energy savings through selective partitioning.

In consumer electronics

eDRAM has found notable application in gaming consoles, where its integration supports high-bandwidth requirements for graphics processing. In the , released in 2005, the graphics processor incorporates 10 MB of eDRAM as a dedicated frame buffer, enabling efficient and resolution-independent rendering without taxing the main system memory. This embedded approach allowed for a compact design while delivering up to 500 MHz performance in an 80 Mb macro. In later embedded GPUs for consumer devices, eDRAM variants have been employed to handle intermediate rendering tasks, such as framebuffer captures and scratch memory for CPU-intensive operations, enhancing overall system responsiveness in compact form factors. Small eDRAM buffers have been integrated in earlier and SoCs to accelerate rendering and data , particularly in power-limited environments. These buffers support efficient on-chip storage for tasks like image buffering in pipelines, reducing the need for external accesses. Automotive electronic control units (ECUs) leverage eDRAM for caching of , enabling quick retrieval in safety-critical systems without relying on off-chip . By embedding directly into the , eDRAM can reduce overall package size compared to discrete solutions, streamlining integration in space-constrained consumer products. In power-constrained like wearables and mobile devices, eDRAM's lower energy consumption for access contributes to extended battery life by minimizing and overheads associated with external . This makes it particularly suitable for always-on applications, where efficient on-chip storage optimizes overall system power efficiency.

Comparisons

To SRAM

Embedded dynamic random-access memory (eDRAM) offers significantly higher density than (SRAM), typically 4-5 times that of a conventional 6T SRAM cell due to its simpler 1T1C structure, which enables larger on-chip caches at reduced area overhead. This density advantage translates to lower cost per bit for eDRAM in large arrays exceeding 32 MB, making it preferable for substantial last-level caches where SRAM's larger cell size becomes prohibitively expensive. In contrast, SRAM remains superior for small, high-speed structures like registers and L1/L2 caches, where its compact layout for minimal capacities avoids eDRAM's integration complexities. Regarding performance, provides sub-1 ns access times without refresh requirements, ensuring static stability and predictable latency ideal for critical paths in hierarchies. eDRAM, however, exhibits access latencies typically in the range of 10-40 ns and incurs overhead from periodic refresh cycles to maintain charge in its dynamic cells, potentially degrading throughput in latency-sensitive scenarios, though multi-bank designs can minimize this impact. On power efficiency, eDRAM can demonstrate advantages at scale due to lower leakage in larger arrays despite refresh costs, while 's static nature consumes more per bit in extensive deployments. The trade-offs manifest in eDRAM becoming advantageous beyond a few megabytes, yielding net area and cost savings over and influencing designs that combine both for optimized hierarchies—such as Intel's Broadwell processors, which pair SRAM-based L3 caches with eDRAM L4 extensions to speed and capacity. Fundamentally, SRAM's static retention contrasts with eDRAM's dynamic refresh demands, allowing SRAM dominance in upper cache levels for but positioning eDRAM effectively in lower levels where outweighs marginal penalties. As of 2025, ongoing research continues to refine eDRAM for gain-cell architectures in and accelerators, further enhancing and .

To standalone DRAM

Embedded dynamic random-access memory (eDRAM) differs significantly from standalone in terms of , as it is fabricated directly on the same die as the logic circuitry, such as processors or system-on-chips (SoCs), whereas standalone requires separate off-chip modules connected via high-speed interfaces like DDR5. This on-die placement eliminates the need for extensive I/O pads, packaging interconnects, and dedicated memory controllers, reducing system complexity and board space but introducing additional fabrication costs to the logic process, estimated at 10-20% higher die area overhead for eDRAM . In contrast, standalone is more cost-effective for capacities exceeding 1 GB due to optimized commodity processes, though it incurs higher overall system costs from packaging and interface overheads. One of the primary advantages of eDRAM over standalone DRAM is its substantially lower latency, typically 1-5 ns for on-chip access, compared to 20-50 ns for off-chip DDR5 modules, representing a 5-10x reduction due to the absence of signal propagation delays across package boundaries. This proximity also yields higher effective bandwidth, as eDRAM avoids losses from interconnect parasitics and bus contention, achieving up to 50-100x improvement in some architectures through wide internal buses without turnaround penalties. For example, Intel's Broadwell eDRAM implementation delivered over 50 GB/s bandwidth with latencies around 37 ns, outperforming contemporary DDR3 off-chip memory in system benchmarks. eDRAM's retention time is tuned for cache-like roles, often limited to 40-100 μs due to logic process leakage, much shorter than the 64 ms standard for standalone , necessitating more frequent refreshes but enabling hidden operations without impacting performance. At the level, eDRAM mitigates the "memory wall" by providing die-level proximity that reduces access stalls, with examples showing up to 2x higher throughput for on-chip versus off-chip in processor workloads, while standalone excels in scalability for large main memory pools but at the expense of 20-30% higher power from interface signaling.

References

  1. [1]
    The Power Of eDRAM - Semiconductor Engineering
    Jun 16, 2014 · The power of eDRAM: It's not just the clock speed that increases performance of a processor. Memories play a big role.
  2. [2]
    Embedded dynamic random access memory
    ### Summary of Embedded Dynamic Random Access Memory Paper
  3. [3]
  4. [4]
    Under the Hood: DRAM architectures: 8F2 vs. 6F2 - EDN Network
    Feb 22, 2008 · The comparison of cell size between the designs clearly demonstrates the trade-offs of 6F2 DRAM cells: a 24 percent reduction in cell size only.
  5. [5]
    [PDF] A Write-Back-Free 2T1D Embedded DRAM with Local Voltage ...
    Gain cell eDRAM is considered as a promising embedded memory option with the potential of overcoming the scaling challenges encountered by SRAM and 1T1C eDRAM.Missing: 6F² | Show results with:6F²
  6. [6]
    (PDF) High performance Si-MoS2 heterogeneous embedded DRAM
    With 6000 s data retention, 35 μA/μm sense margin, 5 ns access speeds, 3D integration and CMOS logic compatibility, this Si-MoS2 eDRAM marks a significant ...
  7. [7]
    Introduction to eDRAM - AnySilicon
    Compared to its DRAM and SRAM equivalents, eDRAM technology has the benefit of a smaller footprint in addition to its high speed and low latency ...
  8. [8]
    Retention-Aware DRAM Auto-Refresh Scheme for Energy and ... - NIH
    Sep 8, 2019 · In the JEDEC standard [2], DRAM cells are refreshed every 64 ms at normal temperature (<85 °C) and 32 ms at high temperature (>85 °C). However, ...Missing: eDRAM room
  9. [9]
    [PDF] RAIDR: Retention-Aware Intelligent DRAM Refresh
    DRAM cells must be periodically refreshed to prevent loss of data. These refresh operations waste energy and degrade system performance by interfering with ...
  10. [10]
    DRAM makes move to copper - EE Times
    Nov 12, 2006 · In a device for the Xbox360 gaming unit, NEC integrated embedded DRAM using stacked MIM capacitors in a seven-level copper logic process.Missing: timeline | Show results with:timeline
  11. [11]
    [PDF] IBM z15 Technical Introduction
    Jul 17, 2017 · 򐂰 Each PU SCM contains a 256 MB L3 cache that is shared by all PU cores in the SCM. The shared L3 cache uses eDRAM. Page 62. 48. IBM z15 ...
  12. [12]
    Broadwell's eDRAM: VCache before VCache was Cool
    Nov 1, 2024 · Unlike DRAM modules used for main memory, Intel designed the eDRAM chip for high performance at low power. It therefore has some nifty features, ...
  13. [13]
    [PDF] IBM z15 Model T01 Hardware Overview
    14nm SOI technology. • 960 MB shared eDRAM L4 Cache. • System Interconnect ... © 2020 IBM Corporation. IBM Z (z15) Hardware Overview_18. IBM Virtual Flash Memory.
  14. [14]
    IBM z15: A 12-Core 5.2GHz Microprocessor for ISSCC 2020
    Feb 1, 2020 · Each core on the CP chip has 8MB of L2 eDRAM cache as well as 256KB of L1 SRAM cache, both caches split evenly between data and instruction.
  15. [15]
    ATMOS-TSMC Team Effort Delivers Embedded DRAM for SoC ...
    The eDRAM IP offering by ATMOS includes standard and customizable cores (soft macros) and an eDRAM compiler. Development of both was achieved by incorporating ...
  16. [16]
  17. [17]
    [PDF] Logic Based Embedded DRAM Technologies
    For trade-offs between process cost and DRAM density, the stacked capacitor lies between 2 other architecture choices: planar cells and deep trench cells.Missing: DTC | Show results with:DTC
  18. [18]
    A 14 nm 1.1 Mb Embedded DRAM Macro With 1 ns Access
    Insufficient relevant content. The provided content snippet does not contain specific details about eDRAM cell architecture at 14nm, capacitor type, capacitance, or array organization. It only includes a title and partial metadata:
  19. [19]
    Unleashing the Power of Embedded DRAM - Design And Reuse
    Mar 1, 2005 · This paper describes in detail the advantages of embedded DRAM technology over external memory and embedded SRAM, and presents three 90nm embedded DRAM ...
  20. [20]
    [PDF] An Embedded DRAM for CMOS ASICs - UNC Computer Science
    For example, our DRAM uses a folded bit line arrangement, in contrast to the open bit line layout in the MOSAIC DRAM. Folded bit-line layouts avoid many of ...
  21. [21]
    Low-Power Single Bitline Load Sense Amplifier for DRAM - MDPI
    Sep 25, 2023 · ... 1T1C Embedded DRAM Cell with Micro Sense Amplifier for Enhancing Throughput. ... Refresh Frequency in 1T1C DRAM at Nanometer Regime. In ...
  22. [22]
    A 0.039um2 high performance eDRAM cell based on 32nm High-K ...
    The cell is aggressively scaled at 58% (vs. 45nm) and features the key innovation of High-K Metal (HK/M) stack in the Deep Trench (DT) capacitor. This has ...
  23. [23]
    [PDF] Memory technologies: Status and Perspectives - OSTI.GOV
    Each DRAM cell in an array consists of a cell capacitor (Storage Node) in series with a FET (Selector). The minimum DRAM cell capacitance should be Ccell~25 fF, ...
  24. [24]
    IBM Doubles Its 14nm eDRAM Density, Adds Hundreds of ...
    Mar 8, 2020 · The SC chip glues the entire system together while providing a huge level 4 cache. Just how much cache are we talking about? Try 960 MiB of it ...
  25. [25]
    IBM to produce logic and memory on single die - EE Times
    Significantly, the scheme requires only five additional mask layers, which add about 25 percent to the cost of the total process, said IBM researcher Scott ...Missing: eDRAM fabrication
  26. [26]
    Forming eDRAM unit cell with VFET and via capacitance
    A method is presented for forming an embedded dynamic random access memory (eDRAM) device. The method includes forming a FinFET (fin field effect ...
  27. [27]
    IBM has adopted 14 nm FD-SOI FinFET with an ALD deep trench ...
    May 18, 2020 · DTC eDRAM cell capacitance (estimated) ~8.1 fF/cell with ULK HfO/SiON high-k dielectrics and DTC depth 3.5 µm; DTC process for both cell ...
  28. [28]
    [PDF] Overview and Future Challenges eDRAM Technologies - Confit
    Since 150nm-node, thermal budget limitation of leading-edge CMOS Trs. had decreased to below 600deg. C (Fig.2). However, MIS capacitor formation requires.
  29. [29]
    Commentary: Memory redundancy requires analysis - EE Times
    ... defect densities are higher than those of logic on the same process, it is clear that the memory yield will have the largest impact on overall yield. Therefore,Missing: prone trenches
  30. [30]
    A True Process-Heterogeneous Stacked Embedded DRAM ... - MDPI
    We present a true process-heterogeneous stacked embedded DRAM (SeDRAM) using hybrid bonding 3D integration process, achieving high bandwidth of 34 GBps/Gbit.
  31. [31]
    IBM z14™: 14nm microprocessor for the next-generation mainframe
    Mar 12, 2018 · Each core on the CP chip has 4MB of eDRAM L2 Data cache and 2MB of eDRAM L2 Instruction cache, with 128KB SRAM Instruction and 128KB SRAM Data ...Missing: PDK | Show results with:PDK
  32. [32]
    PDKs: Powerful Enablers of First-Pass Silicon Success
    Jun 23, 2020 · The PDK is a collection of files which describe the details of a semiconductor process to the EDA tools used to design a chip.Missing: eDRAM | Show results with:eDRAM
  33. [33]
    0.026μm2 high performance Embedded DRAM in 22nm technology ...
    This paper presents the industry's smallest Embedded Dynamic Random Access Memory (eDRAM) implemented in IBM's 22nm SOI technology.Missing: size | Show results with:size
  34. [34]
    DRAM Scaling Trend and Beyond | TechInsights
    DRAM cell capacitance has been decreased on and on as device scales, and D1z and D1a cell capacitances are now lower than 10 fF/cell. The high-k dielectric ...Missing: access floor refresh
  35. [35]
    High performance 14nm SOI FinFET CMOS technology with 0.0174 ...
    Feb 20, 2015 · We present a fully integrated 14nm CMOS technology featuring finFET architecture on an SOI substrate for a diverse set of SoC applications ...Missing: fF | Show results with:fF
  36. [36]
    [PDF] A 5.42nW/kB Retention Power Logic-Compatible Embedded DRAM ...
    This paper proposes a logic-compatible embedded DRAM. (eDRAM) with a 2T dual-Vt gain cell, which has 12× smaller cell area than a previously proposed ultra-low ...
  37. [37]
    [PDF] A Write-Back-Free 2T1D Embedded DRAM With Local Voltage ...
    For read operations, 1T1C eDRAM will have considerably more power ... ns read cycle time is achieved in embedded DRAM for the. Page 9. This article has ...
  38. [38]
    A high-performance low-power highly manufacturable embedded ...
    The excellent cosmic ray (neutron) soft error rate (SER) performance of less than 4 FITs/Mb is also achieved. The integration technologies can be applicable ...
  39. [39]
    [PDF] Impact of Technology and Voltage Scaling on the Soft Error ...
    Decreasing the supply voltage impacts soft error susceptibility as the charge needed to upset a node is a function of the voltage level.Missing: 0.6-1.0V | Show results with:0.6-1.0V
  40. [40]
    Identifying DRAM Failures Caused By Leakage Current And ...
    Feb 3, 2020 · Problems with leakage current in DRAM design can lead to reliability issues, even when there are no obvious structural abnormalities in the underlying device.
  41. [41]
    High-performance, Energy-efficient Hybrid Gain Cell-based Cache ...
    It has found adoption in multiple recent products, including IBM's Power series [74], Intel's Haswell [24], and Microsoft's Xbox 360 [9], again as a technology ...
  42. [42]
    NEC Electronics Announces SoC with 90-nm eDRAM for Mobile ...
    May 29, 2008 · NEC announced shipment of µPD809400 for mobile phones, music players, and other portable devices that integrates 8 megabits of embedded DRAM ...
  43. [43]
    [PDF] On-Chip Memory Technology Design Space Explorations for Mobile ...
    Jun 2, 2019 · We focus on practical/mature on-chip memory technologies, including. SRAM, eDRAM, MRAM, and 3D vertical RRAM (VRRAM). The DSE employs state-of- ...
  44. [44]
    Kelle: Co-design KV Caching and eDRAM for Efficient LLM Serving ...
    Oct 17, 2025 · To minimize eDRAM energy consumption, three effective strategies are: reducing data refresh frequency, decreasing stored data size, and ...<|separator|>
  45. [45]
    Microsoft Embraces TSMC 90nm Embedded DRAM Process for ...
    Aug 15, 2007 · This makes TSMC 90nm eDRAM ideal for system-on-chip (SoC) platforms used in high-bandwidth applications such as digital TV or game consoles, as ...
  46. [46]
    The Cell Broadband Engine (plus processor history & outlook)
    Jan 18, 2011 · ▫ 32MB on chip eDRAM shared L3 ... Takahashi e.a.. Page 23. Page 23. Cell Broadband Engine-based CE Products. Sony Playstation 3 and PS3.Missing: XDRAM details<|separator|>
  47. [47]
    Memory system tradeoffs: embedded DRAM in SoCs, Chip-on ...
    A tutorial on memory module use in embedded designs compared to other processor main memory alternatives such as embedded DRAM in System-on-Chip (SoC), ...
  48. [48]
    [PDF] MCAIMem: a Mixed SRAM and eDRAM Cell for Area and Energy ...
    Dec 6, 2023 · We find that 1T1C eDRAM (1 transistor and 1 capacitor) offers 4.5× higher bit-cell density and 5.0× lower static power dissipation than 6T SRAMs ...
  49. [49]
    [PDF] A 3T Gain Cell Embedded DRAM Utilizing Preferential Boosting for ...
    Abstract—Circuit techniques for enabling a sub-0.9 V logic-com- patible embedded DRAM (eDRAM) are presented. A boosted 3T gain cell utilizes Read Word-line ...
  50. [50]
    [PDF] Embedded DRAM Architectural Trade-Offs
    Merging a microprocessor with DRAM can reduce the latency by a factor of 5-10, increase the bandwidth by a factor of 50 to. 100 and improve the energy ...
  51. [51]
    eDRAM – Knowledge and References - Taylor & Francis
    It provides excellent bandwidth with reduced power consumption when compared to conventional DRAM, but has limited control on the internal behavior and limited ...Missing: definition | Show results with:definition
  52. [52]
    MOSFET design simplifies DRAM - EE Times
    May 13, 2002 · For standalone DRAM applications, retention times of a few hundred milliseconds to a few seconds are targeted at 85 degrees C. For eDRAM ...