Fact-checked by Grok 2 weeks ago

Compute Express Link

Compute Express Link (CXL) is an open-standard, cache-coherent interconnect technology that enables high-speed, low-latency connections between processors, accelerators, and memory devices, primarily in data centers and high-performance computing environments.^[1] Built on the physical layer of PCI Express (PCIe), CXL maintains memory coherency across CPU and attached devices, facilitating resource pooling, sharing, and disaggregation to support demanding workloads such as artificial intelligence, machine learning, and big data analytics.^[1] By reducing software complexity, minimizing redundant memory management, and lowering system costs, CXL enhances overall performance and scalability in heterogeneous computing systems.^[1] The CXL Consortium, an industry organization dedicated to advancing the technology, was formed in March 2019 with founding members including Alibaba, Cisco, Dell EMC, Facebook, Google, Hewlett Packard Enterprise, Huawei, Intel, and Microsoft; Intel contributed the foundational technology, which was initially developed to address limitations in traditional interconnects like PCIe for coherent accelerator integration.^[2] The consortium officially incorporated in September 2019 and has since grown to include major players such as AMD, ARM, NVIDIA, Samsung, and SK Hynix.^[3] The initial CXL 1.0 specification was released in March 2019, introducing core protocols for I/O, caching, and memory access at up to 32 GT/s link rates, followed by CXL 1.1 in September 2019 with refinements for device types including accelerators and memory buffers.^[4] Subsequent releases have expanded CXL's capabilities: CXL 2.0, launched in November 2020, added support for memory pooling via multi-logical devices, single-level switching, and peer-to-peer direct memory access while maintaining 32 GT/s speeds.^[4] CXL 3.0, released in August 2022, doubled bandwidth to 64 GT/s using PCIe 6.0, introduced multi-level switching, enhanced coherency with larger flit sizes, and enabled fabric-wide memory sharing and peer-to-peer access for greater system composability.^[4] CXL 3.1, issued in November 2023, further improved fabric management with APIs for packet-based routing switches, host-to-host communication via global integrated memory, and security through a trusted execution environment protocol, alongside memory expander enhancements for reliability and metadata support.^[5] The latest version, CXL 3.2, was released on December 3, 2024, optimizing memory device monitoring and management, extending OS and application functionality, and bolstering security compliance with trusted security protocol tests, all while ensuring backward compatibility.^[6] CXL operates through three primary protocols multiplexed over PCIe links: CXL.io for device discovery, configuration, and standard I/O operations; CXL.cache for low-latency caching between host and devices; and CXL.mem for direct, coherent memory load/store access, allowing devices to appear as part of the host's memory address space.^[7] These protocols support three device types—Type 1 for accelerators with caching, Type 2 for devices with both cache and local memory, and Type 3 for memory expanders—enabling flexible integration without proprietary interfaces.^[4] As adoption grows, CXL is positioned to transform data center architectures by enabling dynamic resource allocation, reducing over-provisioning, and accelerating innovation in composable infrastructure.^[1]

Overview

Definition and Purpose

Compute Express Link (CXL) is an open industry-standard cache-coherent interconnect designed to connect central processing units (CPUs) with accelerators, memory expansion devices, and other components in computing systems.^[1]^[8] It enables low-latency, high-bandwidth data transfer while maintaining memory coherency across connected devices, addressing the limitations of traditional input/output (I/O) interconnects in modern data centers.^[8] Built on the physical layer of Peripheral Component Interconnect Express (PCIe), CXL extends PCIe capabilities to support coherent memory access without requiring separate fabrics.^[8] The primary purposes of CXL include facilitating disaggregated computing, where resources like memory and compute can be pooled and allocated dynamically across systems; enabling memory expansion and pooling to overcome capacity constraints in individual nodes; supporting accelerator offloading for tasks such as artificial intelligence and high-performance computing; and promoting heterogeneous computing architectures that integrate diverse processors and devices seamlessly.^[1]^[8] These objectives aim to create unified memory spaces that reduce the complexity of software stacks managing distributed resources, ultimately lowering system costs and enhancing performance in scalable environments.^[1] Key benefits of CXL encompass reduced data movement overhead through direct coherent access, which minimizes copying between non-coherent memory spaces; improved resource utilization by allowing shared access to idle or underutilized components; and enhanced scalability for data centers, surpassing the constraints of PCIe-only deployments by enabling efficient resource composability.^[8] At its core, CXL's cache coherency model relies on hardware mechanisms, including host-initiated snooping protocols where the host sends requests to change coherence states in device caches to ensure data consistency, and in advanced implementations like CXL 3.0, bias tables that track cache states across multiple devices for optimized traffic management.^[8] This model supports a simple MESI (Modified, Exclusive, Shared, Invalid) state machine on devices while the host orchestrates overall coherence, providing low-latency sharing without software intervention.^[8]

Relationship to PCIe

Compute Express Link (CXL) leverages the physical layer (PHY) of PCI Express (PCIe) 5.0 and later generations for its electrical signaling and transmission characteristics, enabling seamless integration with existing infrastructure. This includes support for lane configurations ranging from x1 to x16, which align directly with PCIe standards to facilitate high-bandwidth interconnects without requiring new cabling or slot designs. By adopting the PCIe PHY, CXL ensures backward compatibility with prior PCIe generations, such as PCIe 4.0 and earlier, through degraded modes that maintain operational viability in mixed environments.^[9] A primary distinction between CXL and PCIe lies in the protocol stack, where CXL extends the PCIe transaction layer by introducing additional cache-coherent protocols—CXL.cache for device-initiated coherency and CXL.mem for host-managed memory access—while preserving the underlying physical medium unaltered. This layering allows CXL devices to reuse standard PCIe cables, connectors, and slots, promoting cost-effective deployment in data centers and servers. Unlike pure PCIe, which focuses on non-coherent I/O transactions, CXL's protocols enable shared memory semantics across accelerators and hosts, enhancing system-level resource pooling without disrupting PCIe compatibility.^[10]^[9] CXL incorporates compatibility modes that permit devices to revert to pure PCIe operation for non-coherent workloads, achieved through FlexBus negotiation during link training where the protocol auto-detects and falls back if CXL-specific features are unsupported. This fallback ensures interoperability with legacy PCIe endpoints, as CXL devices enumerate as PCIe devices initially and switch modes post-negotiation. Such provisions minimize deployment risks in heterogeneous systems.^[9] The evolution of CXL aligns closely with PCIe advancements, with CXL 3.0 specifically tying to the PCIe 6.0 PHY to support higher signaling rates while incorporating forward compatibility mechanisms for future iterations. These include provisions for mixed-speed fabrics and extensible flit structures that accommodate evolving PCIe electrical standards, ensuring long-term scalability without obsoleting prior deployments.^[11]

History and Standardization

Formation of the CXL Consortium

In March 2019, Intel announced the development of Compute Express Link (CXL) technology, an open standard interconnect designed to enable high-speed, coherent communication between processors, accelerators, and memory devices.^[12] Shortly thereafter, the CXL Consortium was formally established as an open industry association to drive this initiative forward.^[12] The founding members included Alibaba, Cisco, Dell EMC, Facebook, Google, Hewlett Packard Enterprise, Huawei, Intel, and Microsoft, representing a broad coalition of technology leaders committed to advancing data-centric computing architectures.^[12] The primary purpose of the CXL Consortium is to develop and maintain specifications for CXL, ensuring standardization that promotes interoperability among multi-vendor hardware components.^[12] This includes facilitating compliance testing programs and fostering ecosystem growth through education, demonstrations, and collaboration among members to address challenges in memory-intensive workloads such as AI and high-performance computing. By focusing on cache-coherent protocols over PCIe infrastructure, the consortium aims to break down traditional memory walls and enable disaggregated, scalable systems.^[13] To broaden its scope, the CXL Consortium integrated assets from complementary organizations. In November 2021, following a memorandum of understanding signed in 2020, the Gen-Z Consortium transferred its specifications and intellectual property to CXL, enhancing capabilities for fabric management and multi-host, scalable interconnect topologies. This merger, completed in early 2022, unified efforts around coherent fabrics, with approximately 80% of Gen-Z members joining CXL to support vendor-neutral, pooled resource environments.^[14] Subsequently, in August 2022, the OpenCAPI Consortium signed a letter of intent to transfer its specifications to CXL, incorporating support for Power architecture-based coherent accelerators and expanding compatibility across diverse processor ecosystems. These integrations positioned CXL as a comprehensive standard for multi-vendor fabrics, reducing fragmentation in the industry.^[15] The CXL Consortium continues to grow, reflecting widespread industry adoption and collaborative governance through its board of directors and working groups.^[3] This expansion underscores the consortium's role in promoting an open ecosystem for next-generation computing infrastructure.^[3]

Specification Releases

The Compute Express Link (CXL) specifications have evolved through successive releases managed by the CXL Consortium, introducing enhancements in protocols, security, scalability, and integration with underlying PCIe standards to address growing demands in data-centric computing environments. Each version builds on prior ones, maintaining backward compatibility while expanding capabilities for coherent interconnects between hosts, accelerators, and memory devices.^[16] The initial CXL 1.0 specification, released in March 2019, established the foundational protocols for low-latency, cache-coherent connections over PCIe 5.0 physical layers at up to 32 GT/s. It defined three core protocols—CXL.io for I/O semantics and device management, CXL.cache for peer-to-peer caching, and CXL.mem for host-managed device memory—enabling direct CPU access to accelerator-attached memory for expansion and offload scenarios without requiring custom interfaces. This release focused on single-host topologies, supporting basic memory pooling and coherency to reduce latency in heterogeneous compute systems.^[17]^[16] CXL 1.1, released in September 2019, refined the 1.0 foundation with errata corrections, compliance clarifications, and initial security enhancements. Key additions included Integrity and Data Encryption (IDE) support via the Secure PCIe (SPIe) protocol, providing end-to-end confidentiality and integrity for CXL.mem and CXL.cache transactions without performance overhead, alongside improved device discovery and power management primitives. These updates ensured robust protection against tampering in accelerator and memory expansion use cases while aligning with emerging PCIe ecosystem requirements.^[17]^[9] The CXL 2.0 specification, released on November 10, 2020, marked a significant expansion by introducing fabric-level capabilities while retaining 32 GT/s speeds. It added support for CXL switches to enable multi-device fan-out, multi-host memory sharing through dynamic resource pooling and migration, and persistent memory integration for resilient data storage. These features facilitated scalable topologies for disaggregated memory, allowing efficient allocation across hosts in rack-scale environments and enhancing utilization in cloud and edge deployments.^[18]^[19] CXL 3.0, released in August 2022, doubled bandwidth to 64 GT/s aligned with PCIe 6.0, without increasing latency, and advanced fabric architectures for larger-scale deployments. Major enhancements included multi-level switching for complex topologies up to thousands of nodes, end-to-end data integrity extensions, and improved fabric management protocols for dynamic discovery and routing. This version emphasized peer-to-peer communication efficiency and coherency in expansive pools, supporting AI and HPC workloads requiring massive shared memory.^[20]^[21] Building on 3.0, the CXL 3.1 specification, released on November 14, 2023, introduced refinements for reliability and efficiency in large fabrics. It enhanced error handling with advanced fault isolation and recovery mechanisms, improved power management for dynamic scaling in energy-constrained environments, and added Trusted Execution Environment (TEE) support for secure enclaves. Fabric extensions enabled better multi-host orchestration and reduced latency in error-prone scenarios, optimizing for sustained performance in data centers.^[5]^[4] The CXL 3.2 specification, released on December 3, 2024, further optimized integration with PCIe 6.0/6.1 while focusing on memory device enhancements. It reduced fabric overhead through streamlined monitoring units like the CXL Hot-Page Monitoring Unit (CHMU) for tiered memory, extended IDE for broader security compliance, and improved OS-level visibility into device health and telemetry. These updates minimized latency in pooled environments and bolstered resilience for AI-driven applications.^[22]^[6]

Version	Release Date	Key Bandwidth	Major Additions
CXL 1.0	March 2019	32 GT/s	Core protocols (CXL.io, CXL.cache, CXL.mem); accelerator/memory support
CXL 1.1	September 2019	32 GT/s	IDE/SPIe security; errata and compliance fixes
CXL 2.0	November 2020	32 GT/s	Switching, multi-host pooling, fabric basics
CXL 3.0	August 2022	64 GT/s	Multi-level fabrics, end-to-end integrity, large topologies
CXL 3.1	November 2023	64 GT/s	Error handling, power management, TEE security
CXL 3.2	December 2024	64 GT/s	PCIe 6.x optimizations, reduced fabric overhead, enhanced monitoring

Architecture and Protocols

Core Protocols

Compute Express Link (CXL) employs three core protocols—CXL.io, CXL.cache, and CXL.mem—that collectively enable non-coherent I/O operations alongside coherent memory sharing between hosts and devices, all multiplexed over a shared PCIe physical link. These protocols build on PCIe transaction and link layers while introducing specialized mechanisms for cache coherency and memory access, ensuring low-latency, high-bandwidth interactions in disaggregated computing environments.^[16]^[19] CXL.io provides a PCIe-compatible interface for device enumeration, configuration, and non-coherent I/O transactions, utilizing standard PCIe transaction layer packets (TLPs) such as memory reads/writes and completions, along with ordering rules and error reporting via the Advanced Error Reporting (AER) mechanism. It supports discovery through PCIe configuration space and handles power management via vendor-defined messages, making it mandatory for all CXL devices to ensure compatibility with existing PCIe ecosystems. Enhancements in later specifications, such as hot-plug support and secondary mailboxes for event logging, further streamline device management without altering its core PCIe semantics.^[16]^[19] CXL.cache implements a cache coherency protocol that allows CXL devices to cache host memory data, using a snoop-based model with support for directory-based alternatives to maintain consistency across the system. It operates via dedicated request, response, and data channels in both host-to-device and device-to-host directions, supporting operations such as reads, writes, invalidations, and snoops that adhere to Modified-Exclusive-Shared-Invalid (MESI) coherency states. A bias mechanism—host-biased for local access or device-biased for accelerator workloads—determines ownership, with snoop filters or directories tracking cacheline states at granularities from 64 bytes to 4 KB; this enables devices to evict or write back data efficiently, reducing latency for repeated accesses.^[16]^[21] CXL.mem facilitates direct load and store operations to device-attached memory, mapping it into the host's address space via Host-managed Device Memory (HDM) decoders that define base addresses, sizes, and interleaving across up to eight devices. It uses transactional message classes—including memory-to-system (M2S) requests and system-to-memory (S2M) responses—for coherent access, with back-invalidation snoops in directory-based modes to handle multi-host sharing; quality-of-service telemetry, such as load indicators, optimizes traffic under overload. This protocol supports speculative reads and persistent flush operations, ensuring data integrity in pooled memory scenarios.^[16]^[19]^[21] The three protocols are multiplexed over the shared PCIe link using an arbiter/multiplexer (ARB/MUX) with weighted round-robin scheduling, interleaving traffic at flit boundaries (e.g., 528 bits in earlier versions or 256 bytes in CXL 3.0) to maximize throughput while preserving per-protocol crediting and ordering. In CXL 3.x specifications, flit integrity is enhanced with CRC sized by mode—8 bytes for standard 256-byte flits and 12 bytes (two 6-byte CRCs) for latency-optimized mode, using mode-specific polynomials—retaining 16-bit CRC (polynomial 0x1F053) for 68-byte flits, alongside optional end-to-end CRC (ECRC) and link layer retry buffers to handle detected errors without data loss.^[19]^[21] Coherency domains are established across hosts and devices through integrated snoop filters, directory structures, and bias policies in CXL.cache and CXL.mem, creating shared visibility where devices participate as coherent agents in the host's memory hierarchy. For instance, host-biased domains prioritize CPU access with device snoops for invalidations, while device-biased domains allow accelerators to hold exclusive cachelines; multi-level switching in CXL 3.0 extends these domains to fabric-scale pooling, supporting up to 4,096 ports with port-based routing and logical device identifiers for isolation. This framework ensures atomicity and consistency without software intervention for core operations.^[16]^[21]

Physical and Link Layers

CXL adopts the physical layer (PHY) from PCI Express (PCIe) for serializer/deserializer (SerDes) signaling, enabling compatibility with existing infrastructure while supporting high-speed data transmission. This reuse includes PCIe 5.0 PHY for CXL 1.1 and 2.0, operating at 32 GT/s per lane, and PCIe 6.0 PHY for CXL 3.0 and later, reaching 64 GT/s per lane using PAM-4 modulation. The physical layer handles electrical signaling, clocking, and lane management across x1 to x16 configurations, ensuring reliable bit-level transfer without requiring new cabling or connectors.^[7]^[23] At the link layer, CXL employs FLIT-based framing to encapsulate protocol packets for transmission over the physical medium. In CXL 1.0 through 2.0, flits are 68 bytes (544 bits), consisting of a 16-bit protocol ID, payload slots (typically four 16-byte slots for CXL.cache or CXL.mem), and a 16-bit CRC for error detection. CXL 3.0 expands this to 256-byte flits in standard and latency-optimized modes to improve efficiency at higher speeds, incorporating additional headers and supporting backward compatibility with smaller flits. This framing allows multiplexing of CXL.io, CXL.cache, and CXL.mem protocols over the shared link, with flow control units (flits) ensuring ordered delivery.^[24]^[25] Link training and equalization in CXL leverage the PCIe Link Training and Status State Machine (LTSSM), which progresses through states like Detect, Polling, Configuration, and L0 for active operation. CXL extends this with specific sequences for protocol negotiation and coherency initialization, such as Alternate Protocol Negotiation (APN) during the Recovery state to switch from PCIe mode to CXL mode if both endpoints support it. Equalization adapts to channel losses using preset values and feedback, ensuring signal integrity up to the supported speeds without added latency.^[26]^[27] Error detection and correction mechanisms enhance reliability, particularly at higher data rates. CXL inherits PCIe replay buffers and ACK/NAK protocols from the link layer to retransmit corrupted packets. In CXL 3.x, Forward Error Correction (FEC) becomes mandatory, using low-latency Reed-Solomon codes to correct bit errors in PAM-4 signaling, with parity bytes integrated into flits to maintain throughput without frequent replays. These features collectively achieve low bit error rates, supporting mission-critical applications.^[23]^[28] CXL supports flexible topologies starting with point-to-point connections between hosts and devices in CXL 1.0 and 1.1. CXL 2.0 introduces switching for peer-to-peer communication and basic fabrics, while CXL 3.0+ enables multi-tiered switch fabrics with up to 4,096 devices, using routing headers in flits for fabric management and load balancing. This evolution allows scalable disaggregated systems while adhering to the PCIe physical constraints.^[24]^[7]

Device Classes

Type 1 Devices

Type 1 devices in Compute Express Link (CXL) are defined as accelerators that feature a coherent cache but lack local volatile memory, using the CXL.cache protocol to enable coherent access to the host's memory, allowing them to perform compute operations directly on host-resident data without the need for local storage. This design facilitates efficient offloading of specialized tasks from the host CPU, maintaining a unified memory coherency domain across the system.^[11]^[10] Key characteristics of Type 1 devices include their optimization for compute-intensive workloads that benefit from low-latency access to host memory, such as data analytics, encryption, and networking functions. By including a coherent cache without local memory, these devices reduce hardware complexity for memory management and power consumption while leveraging the host's DDR memory for all data operations, enabling coherent load-store semantics with latencies under 200 ns in typical configurations. They are particularly suited for scenarios where the accelerator processes data streams or performs atomic operations without requiring persistent local state.^[29]^[10] Representative examples of Type 1 devices include SmartNICs for network processing and field-programmable gate array (FPGA) accelerators, such as those implemented using Intel's Agilex series with integrated CXL IP cores configured for cache-only access. Custom application-specific integrated circuits (ASICs), like those used in storage controllers for compression or encryption offload, also exemplify this class, where the device accesses host data via CXL.cache without embedding its own memory. These implementations demonstrate the versatility of Type 1 devices in disaggregated computing environments.^[11]^[29] Integration of Type 1 devices occurs through enumeration as standard PCIe devices augmented with CXL extensions, ensuring compatibility with existing PCIe ecosystems while adding coherency features. This allows for seamless discovery and configuration during system boot, with support for hot-plug operations in CXL fabrics that enable dynamic scaling across up to 4096 nodes via port-based routing. The use of the CXL.cache protocol in these integrations provides coherent host memory access without delving into protocol details.^[11]^[10]

Type 2 Devices

Type 2 devices in the Compute Express Link (CXL) architecture integrate local cache and memory, enabling them to participate fully in the system coherency domain through the CXL.cache protocol, which supports snooping and maintains consistent cache states across the host and device using mechanisms like the MESI (Modified, Exclusive, Shared, Invalid) protocol. These devices combine CXL.io for basic I/O and discovery, CXL.cache for coherent caching and device-initiated requests to host memory, and CXL.mem for host access to device-attached memory, all layered over the PCIe physical layer via the FlexBus interface. They feature a device coherency engine (DCOH) that manages bias-based coherency modes—host-biased for high-throughput host access or device-biased for low-latency local operations—ensuring data consistency without explicit copying. Host-managed device memory (HDM) is mapped into the system's coherent address space, with capabilities for up to two HDM ranges configurable via decoder controls, allowing the host to treat device memory as an extension of its own.^[19] Key characteristics of Type 2 devices include support for local processing with shared memory semantics, where the device can cache host data and expose its memory to the host for unified access, reducing latency in compute-intensive workloads. This makes them particularly suitable for accelerators like GPUs and NICs that require both high-bandwidth local storage (e.g., DDR or HBM) and coherent interaction with system memory, enabling scenarios such as offloaded AI inference or network packet acceleration without coherence stalls. The devices report cache sizes in 64 KB to 1 MB granules and support features like snoop filters for efficient coherency tracking, dirty eviction handling, and mandatory cache writeback/invalidate operations to preserve data integrity during state transitions. Recommended latencies, such as 50 ns for snoop-miss responses and 80 ns for memory reads, guide implementations to balance performance and power.^[19]^[30] Representative examples include AMD's Versal Premium Series Gen 2 adaptive SoCs, which integrate a comprehensive CXL 3.1 subsystem for FPGA-based acceleration with local memory and full coherency support, enabling configurable offload in data center environments. Advanced smart NICs from Broadcom, such as those in the Stingray family with embedded DDR for packet processing, leverage CXL Type 2 features to provide acceleration with shared semantics, as demonstrated in interoperability tests.^[31] In CXL 3.0 and later specifications, Type 2 devices enable peer-to-peer communication within fabrics, allowing direct data transfers between devices without host intervention, supported by local HDM decoders for address translation and mapping. This extends coherency to multi-host topologies, with snoop filters optimizing traffic in scaled environments up to 256-byte flits for improved efficiency.^[11]

Type 3 Devices

Type 3 devices in Compute Express Link (CXL) are dedicated memory expanders that provide additional memory capacity to host processors via the CXL.mem protocol, enabling coherent access without incorporating compute engines or caching mechanisms. These devices function primarily as passive extensions to system memory, allowing hosts to treat the attached memory as a seamless part of the local address space through load/store operations. Unlike other CXL device types, Type 3 implementations focus exclusively on memory pooling and sharing, supporting disaggregated architectures in data centers where memory resources can be dynamically allocated across multiple hosts. Key characteristics of Type 3 devices include support for both volatile memory, such as DDR5 DRAM, and persistent memory types reminiscent of Intel Optane, which retain data across power cycles for applications requiring durability. In CXL 3.0, granular memory allocation is facilitated by the Device-managed Fabric Buffer Manager (DFBM), which enables fine-grained slicing and sharing of memory buffers within fabric-attached configurations, optimizing utilization in large-scale pools. These devices leverage the CXL.mem protocol for host-initiated memory access, ensuring cache-coherent transactions over PCIe-based links. Prominent examples of Type 3 devices include Micron's CZ120 and CZ122 CXL memory expansion modules, which offer capacities up to 256 GB per module using DDR5 and can be pooled to achieve 1 TB or more in multi-device setups, as demonstrated in 2023 deployments. Samsung's CXL DRAM Memory Expander and Memory Appliance solutions, such as the CMM-B series compliant with CXL 2.0, provide scalable volatile memory expansion for AI workloads, with orchestration for dynamic pooling across servers. Additionally, Liqid's composable memory systems utilize CXL 2.0 to create shared pools of up to 100 TB DRAM across 32 servers, enabling real-time resource disaggregation without chassis modifications. More recent examples include Montage Technology's CXL 3.1 Memory eXpander Controller (2025).^[32] Security in Type 3 devices is enhanced by CXL's Integrity and Data Encryption (IDE) features, which provide end-to-end encryption and integrity protection for data in transit across shared memory pools, mitigating risks such as unauthorized access or tampering in multi-tenant environments. This includes AES-based encryption and integrity checks at the protocol level, ensuring secure coherent sharing without compromising performance.

Implementations

Hardware Implementations

Intel's 4th Generation Xeon Scalable processors, codenamed Sapphire Rapids and launched in 2023, introduced hardware support for CXL 1.1 and 2.0, allowing coherent sharing of memory resources across PCIe-connected devices in data center servers.^[33] The subsequent 5th Generation Xeon processors, including Emerald Rapids released in late 2023, advanced CXL 2.0 support with enhanced fabric topologies, Type 3 memory devices, and improved switching for disaggregated systems.^[34] In 2025, the 6th Generation Xeon Scalable processors, codenamed Granite Rapids and launched earlier in the year, further improved CXL 2.0 capabilities with up to 136 PCIe 5.0 lanes, supporting larger cache sizes and multi-socket configurations for AI and HPC workloads.^[35] AMD complemented this with its EPYC 9004 series (Genoa) processors, available since November 2022, which integrate CXL 2.0 support to expand memory capacity beyond traditional DDR limits in high-performance computing environments.^[36] The EPYC 9005 series (Turin), released in October 2024, extended this with up to 192 Zen 5 cores, 12 DDR5-6400 channels, and continued CXL 2.0 support for denser memory expansion.^[37] Switch and fabric hardware has emerged to support CXL's multi-host and pooled resource features. Astera Labs' Aries PCIe/CXL Smart DSP Retimers, rolled out in 2023 for CXL 2.0 compliance, extend signal reach up to three times in AI and cloud infrastructures while maintaining low latency for PCIe Gen5 and CXL links.^[38] Broadcom's PEX series retimers, designed for high-speed interconnects, incorporate CXL 2.0 compatibility to facilitate robust fabric extensions in server racks, addressing signal degradation over longer traces.^[39] Marvell's Structera CXL portfolio, announced in 2024 and demonstrating interoperability in 2025, enables scalable memory expansion with up to 8 TB additional capacity per device using DDR4/5 modules.^[40] Memory expanders and accelerators represent key Type 3 and Type 1 device implementations. SK Hynix introduced a 512 GB CXL-based computational memory solution prototype in 2022 using DDR5 for server memory pooling.^[41] By 2025, CXL memory appliances using Type 3 devices have demonstrated up to 8 TB capacity in rack-scale configurations for AI workloads.^[42] GPU integrations, such as AMD's Instinct MI300 series accelerators, leverage CXL 2.0 over PCIe 5.0 for efficient data sharing in accelerated computing platforms.^[43] Ecosystem progress includes interoperability demonstrations at the 2024 Open Compute Project (OCP) Global Summit, where vendors like Astera Labs, AMD, and Samsung showcased CXL 2.0 memory expansion using EPYC processors and DDR5 modules for deep learning applications.^[44] Further demos at OCP 2025 highlighted rack-scale innovations with Astera Labs' Leo controllers.^[45] Volume shipments of CXL hardware for data centers commenced in 2024 and continued to ramp up in 2025, driven by hyperscale demands for pooled memory in AI infrastructure.

Software and OS Support

The Linux kernel provides robust support for Compute Express Link (CXL) devices through its dedicated CXL subsystem, which was initially introduced in version 5.18 in 2022 to enable basic device enumeration and memory management.^[46] Initial support for CXL 2.0 features, including fabric topology and host-managed device memory (HDM) decoders, was added progressively starting in kernel version 5.19, with fuller capabilities in 6.1 later that year, allowing for dynamic resource allocation and coherency across multi-host environments.^[47] CXL 3.0 capabilities, such as advanced fabric management and switching for larger-scale deployments, have been progressively integrated starting with kernel 6.10 in 2024 and continuing in subsequent releases like 6.12 in 2025, enhancing scalability for AI and high-performance computing workloads.^[48] Fabric management in Linux is facilitated by tools like cxl-cli, part of the NDCTL project, which offers command-line utilities for device provisioning, health monitoring, and label management on CXL memory expanders.^[49] Support in other operating systems remains more limited compared to Linux. Microsoft has integrated CXL into Azure cloud platforms since 2023, leveraging research prototypes like the Pond memory pooling system to enable disaggregated memory sharing across virtual machines, though native Windows driver support relies on PCIe extensions rather than dedicated NDIS-based drivers for fabric operations.^[50] As of November 2025, macOS lacks official support for CXL devices, with compatibility limited to older x86-based systems via PCIe, but no kernel integration for coherency or hot-plug on Apple Silicon platforms, restricting use to development environments. Key libraries and APIs underpin CXL software ecosystems, with ACPI-based enumeration introduced in CXL 2.0 enabling operating systems to discover and configure devices through standard tables like ACPI0016 for host bridges, ensuring seamless integration without proprietary firmware dependencies.^[51] User-space access is supported via libraries such as libcxl for management interfaces and emerging extensions in libfabric (OFI), which are planned to incorporate CXL fabric protocols for high-performance data movement in distributed applications.^[52] Security is addressed through the CXL Trusted Execution Environment Security Protocol (TSP), defined in the CXL 3.1 specification, which provides attestation, encryption, and access controls to protect data in transit and at rest across shared memory pools.^[53] Despite these advancements, CXL software faces challenges in firmware management and operational reliability. Firmware updates are essential for maintaining cache coherency in multi-device topologies, as mismatches can lead to inconsistent memory states requiring system reboots for resolution.^[51] In virtualized environments, hot-plug handling poses difficulties, with protocols supporting managed hot-add and removal but necessitating complex orchestration to preserve VM isolation and avoid latency spikes during resource migration.^[54]

Performance Characteristics

Bandwidth and Throughput

Compute Express Link (CXL) leverages the physical layer of PCI Express (PCIe) to achieve high-bandwidth data transfers, with theoretical limits determined by the underlying PCIe generation and lane configuration. For CXL 2.0, which utilizes PCIe 5.0, a x16 link provides up to 64 GB/s of bandwidth in each direction, yielding 128 GB/s bidirectional.^[55]^[24] CXL 3.2, based on PCIe 6.0, doubles this capacity to 128 GB/s per direction or 256 GB/s bidirectional for x16 configurations, enabling greater data movement in disaggregated systems.^[55]^[22] Protocol overhead in CXL arises from flit-based framing, where data packets include headers and cyclic redundancy checks (CRC), reducing link efficiency to approximately 92-94% depending on synchronous header usage.^[24] In CXL 3.0 and later, larger 256-byte flits—compared to 68-byte flits in earlier versions—improve payload efficiency by minimizing header overhead relative to data, approaching 90% overall protocol utilization in fabric environments.^[24]^[21] Fabric scaling introduces additional impacts, such as switch-induced latency that can limit per-link throughput in multi-hop topologies, though optimizations like peer-to-peer routing mitigate this.^[24] CXL coherent protocols incur 10-20% higher power than non-coherent PCIe for similar bandwidth levels.^[56] Real-world throughput for CXL memory pooling, as measured in 2024 benchmarks, typically achieves 50-64 GB/s effective bandwidth per x16 link, representing 80-100% of theoretical limits under balanced read-write workloads.^[56] For instance, evaluations on commercial CXL 2.0 devices show peak reads reaching 64 GB/s in single-host configurations, with writes at 74-93% of that rate due to coherency overhead, while interleaved pooling across multiple devices sustains 55-61 GB/s in AI inference scenarios.^[56] In pooled systems, CXL scales bandwidth through multi-lane links and fabric topologies, such as tiered switches supporting up to 32 hosts per pool, enabling aggregate throughput exceeding 1 TB/s across rack-scale deployments.^[24]^[21] For example, configurations with multiple x16 CXL 3.x devices interconnected via switches can deliver several TB/s collectively, facilitating efficient memory sharing in data centers without per-link bottlenecks dominating overall capacity.

Latency

Compute Express Link (CXL) introduces timing overheads for coherent operations relative to native PCIe, primarily due to the additional protocol layers for cache coherence and memory semantics. For CXL.cache snoops, specification targets indicate a round-trip latency of approximately 50 ns, enabling low-latency cache-coherent access compared to non-coherent I/O transactions.^[24] Subsequent versions of the CXL specification have focused on mitigating these overheads through architectural optimizations. In CXL 3.0, per-switch-hop fabric latency is reduced to approximately 50-70 ns via optimized routing and flit-based encoding that minimizes serialization delays in multi-hop topologies. For Type 3 memory expander devices, memory access latency typically ranges from 50-100 ns additional over local DRAM, positioning CXL-attached memory as a viable extension similar to a remote NUMA node.^[57] Several factors contribute to these latency characteristics in CXL systems. FLIT encoding, introduced to support efficient packetization on the PCIe physical layer, imposes a minor overhead of 2-5 ns per transaction due to alignment and efficiency trade-offs in latency-optimized modes. Coherency protocol handshakes, involving snoop requests and acknowledgments across the link, account for a significant portion of the delay, as they ensure data consistency without software intervention. Additionally, error correction mechanisms, such as forward error correction (FEC) in PCIe 6.0-based CXL 3.0 links, introduce delays of under 2 ns to detect and correct transmission errors, enhancing reliability at the expense of minimal added latency.^[8]^[58]^[59] Recent benchmarks from 2024 evaluations highlight these latencies in practical scenarios, with overall CXL memory access adding approximately 200 ns from controllers for CXL.mem operations. In tests using real CXL hardware, CXL.mem loads exhibited an average latency of approximately 140 ns, compared to 70-80 ns for local DDR5 memory, demonstrating a 2x overhead primarily from the extended protocol stack. When traversing CXL fabrics with multiple hops, each additional switch contributes about 50 ns, underscoring the importance of topology design for latency-sensitive workloads.^[56]^[60]

Applications

Data Center and Cloud Computing

In data centers and cloud environments, Compute Express Link (CXL) facilitates resource disaggregation by enabling coherent, low-latency sharing of memory and accelerators across servers, allowing operators to allocate resources dynamically and reduce underutilization. This approach addresses key challenges in hyperscale infrastructures, where traditional server-bound memory often leads to stranding—unused capacity that inflates costs without delivering value. By pooling resources at rack scale, CXL supports scalable architectures that optimize total cost of ownership (TCO) while maintaining performance for diverse workloads. Memory pooling with CXL, leveraging Type 3 devices for expanded capacity, enables dynamic allocation of DRAM across multiple servers, improving utilization by 20-30% in cloud scenarios through disaggregation and tiering.^[61] In hyperscale deployments, this reduces waste from idle memory, with studies showing potential savings of 12% in overall DRAM demand for pools spanning 32 sockets when allocating 50% of capacity to shared tiers.^[62] Google and Meta advanced this in 2024 by introducing a Hyperscale CXL Tiered Memory Expander specification at the Open Compute Project (OCP), incorporating inline compression such as a 2:1 ratio to halve media costs for cold data tiers and enable incremental expansion without full DDR5 upgrades.^[63] Such pilots demonstrate how CXL pooling minimizes carbon footprint by displacing higher-power memory types, targeting the roughly 50% of data that remains unused in the prior minute. For storage acceleration, CXL-attached NVMe drives support disaggregated architectures by providing near-DRAM performance in pooled setups, allowing servers to access vast, shared storage pools with latencies in the low hundreds of ns for memory accesses and higher for storage I/O due to device characteristics.^[64] This integration offloads storage processing from host CPUs, enhancing efficiency in cloud storage systems where traditional PCIe limits scalability. CXL's virtualization features enable seamless virtual machine (VM) migration through coherent memory sharing, permitting live transfers of memory pages across hosts without halting operations, which is critical for high-availability cloud services. In containerized environments, this extends to orchestration platforms like Kubernetes via emerging extensions that manage tiered memory allocation, ensuring consistent performance during workload shifts in multi-tenant setups. Major cloud providers are adopting CXL for rack-scale systems to achieve cost savings, with Microsoft Azure identifying up to 25% memory stranding in production clusters and deploying pooled designs that reclaim approximately 25% of stranded DRAM while meeting latency targets under 200ns.^[65] These implementations, evaluated across 8-16 socket pools, balance hardware overhead with ROI, prioritizing multi-headed devices over switches to avoid negative returns in public datacenters.

AI and High-Performance Computing

Compute Express Link (CXL) enables accelerator offload through Type 1 and Type 2 devices, facilitating shared access to GPUs and TPUs while minimizing data copy overhead in large-scale AI models such as those in the GPT family.^[66] By leveraging CXL's cache-coherent protocol over PCIe, tensors can be offloaded directly from accelerators to expanded memory pools, bypassing traditional CPU-mediated transfers that introduce latency.^[66] For instance, in NVIDIA GPU environments, CXL integration allows dynamic memory allocation to accelerators, enhancing utilization for memory-intensive AI workloads and more than doubling the speed of batch inference compared to non-coherent alternatives.^[67] This approach is particularly beneficial for models exceeding on-device high-bandwidth memory (HBM) capacity, where CXL-attached DRAM serves as a low-latency extension.^[68] In high-performance computing (HPC), CXL supports scalable fabrics that interconnect multiple nodes for exascale systems, enabling coherent memory sharing across topologies that scale to thousands of devices.^[69] These fabrics, introduced in CXL 3.x specifications, allow for dynamic resource pooling in distributed environments, addressing bandwidth limitations in traditional HPC interconnects like InfiniBand or Ethernet.^[70] For exascale computing, CXL fabrics provide the foundation for extending systems like those targeting multi-petaflop performance, with support for up to 4,096 nodes in fabric-managed configurations.^[71] Key use cases in AI and HPC involve distributed training with CXL-pooled memory, where accelerators access a shared global address space to optimize tensor operations.^[72] In large language model (LLM) training, techniques like tensor offloading to CXL memory enable optimizations such as ZeRO-style, improving throughput in distributed setups by reducing inter-node data movement.^[66] Benchmarks from 2024 demonstrate this in multi-GPU clusters, where pooled CXL memory cuts synchronization overhead in operations like all-reduce, improving efficiency for models with billions of parameters.^[73] Type 2 devices, such as CXL-enabled GPUs, further integrate this pooling seamlessly into the fabric.^[73] Looking ahead, CXL 3.x extensions, including fabric management in 3.1, support enhanced coherency for AI-specific workloads in edge HPC environments, with 2025 demonstrations at events like Supercomputing 2025 highlighting pooling for AI/HPC.^[74]^[75] These advancements target hybrid AI-HPC setups where edge nodes require low-power, coherent acceleration without full data center infrastructure.^[76]

References

[1]
About CXL® - Compute Express Link
Compute Express Link (CXL) is an industry-supported Cache-Coherent Interconnect for Processors, Memory Expansion and Accelerators.
[2]
Compute Express Link Consortium (CXL) Officially Incorporates
Today, Alibaba, Cisco, Dell EMC, Facebook, Google, Hewlett Packard Enterprise, Huawei, Intel Corporation and Microsoft announce the incorporation of the ...
[3]
Our Members - Compute Express Link
Board of Directors, Alibaba, AMD, ARM, Astera Labs, Cisco, Dell, Google, HPE, Huawei, Intel, Meta, Microsoft, NVIDIA, Samsung, SK Hynix
[4]
[PDF] Introducing the CXL 3.1 Specification
Compute Express Link® and CXL® are trademarks of the Compute Express Link Consortium. March 2019. CXL 1.0. Specification. Released. September 2019. CXL ...
[5]
CXL Consortium Announces Compute Express Link 3.1 ...
Nov 14, 2023 · The Compute Express Link™ (CXL™) 3.1 specification introduces CXL fabric improvements and extensions, Trusted-Execution-Environment Security ...
[6]
CXL Consortium Announces Compute Express Link 3.2 ...
Dec 3, 2024 · “We are excited to announce the release of the CXL 3.2 Specification to advance the CXL ecosystem by providing enhancements to security ...
[7]
Compute Express Link (CXL): All you need to know - Rambus
Jan 23, 2024 · CXL is an open standard industry-supported cache-coherent interconnect for processors, memory expansion, and accelerators.An Introduction to CXL: What Is... · Compute Express Link and...<|control11|><|separator|>
[8]
An Introduction to the Compute Express Link (CXL) Interconnect
Jul 8, 2024 · CXL coherence is decoupled from host-specific coherence protocol details. The host processor is also responsible for orchestrating cache ...
[9]
[PDF] CXL-1.1-Specification.pdf
Please refer to the “Physical Layer Logical Block” chapter of the PCI Express Base. Specification for details on PCIe mode of operation. The Flex Bus.CXL ...
[10]
[PDF] An Open Industry Standard for ... - Compute Express LinkTM (CXL™)
New breakthrough high-speed fabric. • Enables a high-speed, efficient interconnect between CPU, memory and accelerators. • Builds upon PCI Express® (PCIe®) ...
[11]
[PDF] CXL_3.0_white-paper_FINAL.pdf - Compute Express Link
These features enable many devices in a platform to migrate to CXL, while maintaining compatibility with PCIe 5.0 and the low-latency characteristics of CXL.<|control11|><|separator|>
[12]
[PDF] Key Industry Players Converge to Advance CXL, a New High-Speed ...
Alibaba, Cisco, Dell EMC, Facebook, Google,. Hewlett Packard Enterprise, Huawei, Intel Corporation and Microsoft are forming an open industry ...
[13]
[PDF] A Coherent Interface for Ultra ... - Compute Express Link™ (CXL™)
• First generation CXL aligns to. 32 Gbps PCIe 5.0. • CXL usages expected to be key driver for an aggressive timeline to. PCIe 6.0. What is CXL? 4/16/2021. 7.
[14]
Finally, A Coherent Interconnect Strategy: CXL Absorbs Gen-Z
Nov 23, 2021 · CXL has emerged as the protocol of choice for linking CPU memory to the memory of accelerators and, interestingly enough, as the protocol that will be used to ...
[15]
CXL Will Absorb Gen-Z - EE Times
Dec 9, 2021 · The memory interconnect groups agree to merge under the Compute Express Link banner. The merger should be completed by mid-2022.
[16]
CXL Deep Dive – Future of Composable Server Architecture and ...
Aug 17, 2022 · There are more than 200 members of the CXL Consortium, but these are the companies we believe have the most impactful products and IP. In the ...
[17]
[PDF] CXL-1.0-Specification.pdf - Compute Express Link
CXL.mem. Memory access protocol that supports device-attached memory. CXL.cache. Agent coherency protocol that supports device caching of Host memory. CXL Flex ...
[18]
CXL® Consortium Celebrates its First Anniversary
Sep 25, 2020 · Initially, we started the CXL consortium with nine members and we have received broad industry support enabling us to grow to over 125 members ...
[19]
[PDF] CXL™ Consortium Releases Compute Express Link ™ 2.0 ...
November 10, 2020 – The CXL™ Consortium, an industry standards body dedicated to advancing Compute Express Link™ (CXL) technology, today announced the release ...
[20]
[PDF] CXL-2.0-Specification.pdf
In the event this CXL Specification makes any references (including without limitation any incorporation by reference) to another standard's setting.
[21]
[PDF] CXL Consortium releases Compute Express Link 3.0 specification to ...
CXL 3.0 specification doubles the data rate to 64GTs with no added latency over CXL 2.0. • The new specification is available to the public. • CXL Consortium ...
[22]
[PDF] CXL-3.0-Specification.pdf - Compute Express Link
Aug 1, 2022 · CXL SPECIFICATION: NOTICE TO USERS: THE UNIQUE VALUE THAT IS PROVIDED IN THIS SPECIFICATION FOR USE IN VENDOR. DEFINED MESSAGE FIELDS, ...
[23]
[PDF] Compute Express Link® (CXL®) 3.2
Dec 3, 2024 · New, fully backward compatible specification is now available to the public. December 3, 2024 – Beaverton, Ore. – The CXL Consortium, an ...
[24]
CXL Brings Datacenter-sized Computing with 3.0 Standard, Thinks ...
Aug 2, 2022 · Under that case, one of the tenets of CXL 4.0 will be to double the bandwidth by going to PCIe 7.0, but “beyond that, everything else will be ...
[25]
Industry's First CXL 3.0 Verification Solution | Synopsys Blog
Aug 1, 2022 · In addition to 68B Flit mode as in CXL 2.0, CXL 3.0 introduces Standard and Latency optimized 256B formats. With separate format definitions ...
[26]
An Introduction to the Compute Express Link (CXL) Interconnect
CXL is implemented using three protocols, CXL.io, CXL.cache, and CXL.memory (a.k.a. CXL.mem), which are dynamically multiplexed on PCIe physical layer. Figure 2 ...
[27]
CXL 3.0 Scales the Future Data Center - Verification - Cadence Blogs
Oct 17, 2022 · ... FLIT granular transfers to mitigate store-and-forward overheads in the physical layer. A 6-byte CRC independently protects each half, but ...
[28]
Boost your CXL Verification From IP to System-Level - Cadence Blogs
Feb 24, 2022 · CXL uses a PCI link training state machine LTSSM (Link Training and Status State Machine), so the same link training process is followed ...<|control11|><|separator|>
[29]
De-mystifying CXL: An Overview | Synopsys Blog
Sep 12, 2019 · Link layer is an intermediate layer between Transaction layer and Physical layer. It helps maintain reliability of transactions across the link.
[30]
CXL 3.0 and the Future of AI Data Centers | Keysight Blogs
Jul 10, 2024 · CXL benefits from existing physical layer infrastructure, building on decades of PCI-SIG® innovation and industry familiarity, but reduces ...
[31]
[PDF] OPPORTUNITIES AND CHALLENGES FOR COMPUTE EXPRESS ...
In fact, CXL. 3.1, the specification released to the public in November 2023, uses PCIe 6.1 to support up to. 128 Gigabits per Second (Gbps) bi-directional data ...<|separator|>
[32]
Understanding the Compute Express Link Standard | Synopsys IP
Jul 22, 2019 · Compute Express Link (CXL), a new open interconnect standard, targets intensive workloads for CPUs and purpose-built accelerators.<|separator|>
[33]
[PDF] AMD VERSAL™ PREMIUM SERIES GEN 2 CXL® UNLOCKS ...
AMD Versal™ Premium Series Gen 2 adaptive SoCs are a versatile and configurable platform, offering a comprehensive CXL® 3.1 subsystem.
[34]
[PDF] Demystifying CXL Memory with Genuine CXL-Ready Systems and ...
CXL requires hardware support from both the CPU and devices. Both the latest 4th-generation Intel Xeon Scalable Processor (Sap- phire Rapids) and the latest 4th ...
[35]
Marvell Announces Successful Interoperability of Structera CXL ...
Apr 23, 2025 · Marvell collaborated with AMD and Intel to extensively test Structera CXL products with AMD EPYC and 5 th Gen Intel Xeon Scalable platforms.Missing: Sapphire Rapids 9004
[36]
[PDF] 4th Gen AMD EPYC Processor Architecture
With the potential to expand memory through CXL provided in EPYC ... AVX-512 Performance Comparison: AMD Genoa vs Intel Sapphire Rapids & Ice Lake” - January ...
[37]
Aries PCIe®/CXL® Smart DSP Retimers - ASTERA LABS, INC. %
Solves high-speed PCIe®/CXL® signal integrity challenges in AI and general purpose servers with reliable 3x reach extension · Purpose-built for AI and cloud ...
[38]
This is the Astera Labs Aries 6 PCIe Gen6 and CXL Retimer
Apr 3, 2024 · We saw the Astera Labs Aries 6 PCIe Gen6 and CXL retimer running at NVIDIA GTC 2024 and looking great compared to Broadcom's planned chip.
[39]
SK Hynix Unveils CXL Memory Module with Compute Capabilities
Oct 19, 2022 · SK Hynix shows off 512GB computational memory solution (CMS) with PCIe 4.0/CXL interface.Missing: 3 8TB Instinct MI300
[40]
AMD Instinct™ MI300 Series Accelerators
AMD Instinct MI300 Series accelerators are uniquely well-suited to power even the most demanding AI and HPC workloads, offering exceptional compute performance.AMD Instinct™ MI300X · AMD Instinct™ MI325X Platform · Instinct™ MI325XMissing: CXL | Show results with:CXL
[41]
CXL Demo for DLRM at OCP Summit 2024 - ASTERA LABS, INC.
The demo featured a 5th Gen AMD EPYC processor, DDR5 memory from Samsung, and four Leo CXL Smart Memory Controllers. The demo showed that by doubling memory ...Missing: volume shipments
[42]
Compute Express Link (CXL) — QEMU documentation
CXL 2.0 Device Types Example might be a crypto accelerators. May also have device private memory accessible via means such as PCI memory reads and writes to ...Cxl System Components · Cxl Memory Interleave · Example Command Lines<|control11|><|separator|>
[43]
Linux 6.10 Wires Up More Compute Express Link "CXL" Functionality
May 17, 2024 · With Linux 6.10, there are three new CXL features in place. First, new CXL mailbox pass-through commands are added to support populating and ...Missing: cli | Show results with:cli
[44]
Linux Kernel 6.10 is Released: This is What's New for Compute ...
Jul 14, 2024 · The Linux Kernel 6.10 release brings several improvements and additions related to Compute Express Link (CXL) technology.
[45]
cxl-cli - Fedora Packages
The cxl utility provides enumeration and provisioning commands for the Linux kernel CXL devices. Releases Overview. Release, Stable, Testing. Fedora Rawhide, 82 ...Missing: subsystem support 5.18 6.1 6.10
[46]
Pond: CXL-Based Memory Pooling Systems for Cloud Platforms
Mar 25, 2023 · This paper proposes Pond, the first memory pooling system that both meets cloud performance goals and significantly reduces DRAM cost.<|separator|>
[47]
CXL is Finally Coming in 2025 - ServeTheHome
Dec 19, 2024 · In 2025, expect to see more CXL server designs for those who need more memory and memory bandwidth in general purpose compute.
[48]
[PDF] CXL* Type 3 Memory Device Software Guide - Intel
A CXL enlightened version of a standard PCIe bus driver that consumes all. ACPI0016 CXL host bridge ACPI device instances and initiates CXL bus enumeration, ...Missing: TSE | Show results with:TSE
[49]
computexpresslink/libcxlmi: CXL Management Interface library
CXL Management Interface utility library provides type definitions for CXL specification structures, enumerations and helper functionsMissing: TSE | Show results with:TSE
[50]
[PDF] Introducing Compute Express Link™ (CXL™) 3.1
The newly defined CXL TEE (Trusted Execution Environment) Security. Protocol (CXL-TSP), defines mechanisms to include direct attached CXL Memory Devices within ...
[51]
[PDF] Managing Memory Tiers with CXL in Virtualized Environments
Cloud providers seek to deploy CXL-based memory to in- crease aggregate memory capacity, reduce costs, and lower carbon emissions. However, CXL accesses ...Missing: NDIS | Show results with:NDIS
[52]
Specifications - PCI-SIG
PCI-SIG specifications define standards for peripheral component interconnects, including PCI Express 8.0, 7.0, and 6.0, and are accessible online.
[53]
None
### Summary of Throughput and Bandwidth for CXL Memory in Benchmarks
[54]
Just How Bad Is CXL Memory Latency? - The Next Platform
Dec 5, 2022 · Most CXL memory controllers add about 200 nanoseconds of latency, give or take a few tens of nanoseconds for additional retimers depending on ...
[55]
[PDF] Memory Disaggregation: Advances and Open Challenges
CXL adds around 50-100 nanoseconds of extra la- tency over normal DRAM access. CXL Roadmap. Today, CXL-enabled CPUs and memory devices support CXL 1.0/1.1 ( ...
[56]
Compute Express Link (CXL) 3.0 Debuts, Wins CPU Interconnect Wars
Aug 2, 2022 · The CXL spec breaks devices into different classes: Type 1 devices are accelerators that lack local memory, Type 2 devices are accelerators ...
[57]
The PCIe® 6.0 Specification Webinar Q&A: A Deeper Dive into FLIT ...
Aug 19, 2021 · The improved bandwidth that results from low overhead amortization allows for high bandwidth efficiency, low latency and reduced area. The PCIe ...
[58]
[PDF] CXL Memory Performance for In-Memory Data Processing
Type 2 devices (supporting CXL.cache and CXL.mem) have on- device memory exposed to a CPU and full coherent access both to their own memory and CPU memory ...
[59]
[PDF] Efficient Tensor Offloading for Large Deep-Learning Model Training ...
The paper uses a cache coherence interconnect based on Compute Express Link (CXL) to reduce tensor transfer overhead, reducing training time by 33.7% to 55.4%.
[60]
MemVerge uses CXL to drive Nvidia GPU utilization higher
Mar 18, 2024 · MemVerge's DRAM virtualizing Memory Machine software can use CXL to give GPUs more memory and rescue them from being stranded waiting for memory loading.
[61]
Amplifying Effective CXL Memory Bandwidth for LLM Inference via ...
Sep 3, 2025 · DRAMSim3-based simulations show that it lowers DRAM access energy by up to 40.3% and reduces model-load latency by up to 42.1%. In an end-to-end ...Missing: benchmarks | Show results with:benchmarks
[62]
Increasing AI and HPC Application Performance with CXL Fabrics
May 21, 2024 · This panel presentation will introduce the new features and explore how CXL attached memory to meet the increased memory capacity and bandwidth for HPC, AI, ...
[63]
[PDF] How CXL Transforms Server Memory Infrastructure
Oct 8, 2025 · Up to 19% higher performance with CXL-connected DRAM (CMM-D) in VectorDB search compared to Local-DRAM-only case in Milvus RAG cluster.Missing: 4.0 | Show results with:4.0
[64]
Expanding your memory footprint with CXL at FMS 2025
Sep 2, 2025 · CXL Consortium members shared video demos at the kiosk to showcase how they are deploying CXL to boost performance of emerging applications.Missing: Linux Foundation 2023
[65]
[PDF] Performance Characterization of CXL Memory and Its Use Cases
In this work, we study the performance of three genuine CXL memory-expansion cards from different vendors. We characterize the basic performance of the CXL ...Missing: world | Show results with:world
[66]
Accelerating Performance of GPU-based Workloads Using CXL
Compute express link (CXL) is an emerging technology that transparently extends the available system memory capacity at low latency and high throughput in a ...Missing: pooled speedup tensor benchmarks
[67]
Memory Challenges in Shared Computing Environments; CXL offers ...
Aug 25, 2025 · Nvidia Shipped 3.76 Million Data-center GPUs in 2023, According to Study. June 10, 2024. Nvidia had an explosive 2023 in data-center GPU ...
[68]
Increasing AI and HPC Application Performance with CXL Fabrics
Jun 3, 2024 · The CXL 3.1 specification introduces CXL fabric manager and extensions, Trusted-Execution-Environment Security Protocol (TSP), ...Missing: support | Show results with:support