Fact-checked by Grok 2 weeks ago

Compute Express Link

Compute Express Link (CXL) is an open-standard, cache-coherent interconnect technology that enables high-speed, low-latency connections between processors, accelerators, and memory devices, primarily in data centers and environments. Built on the physical layer of (PCIe), CXL maintains memory coherency across CPU and attached devices, facilitating resource pooling, sharing, and disaggregation to support demanding workloads such as , , and analytics. By reducing software complexity, minimizing redundant memory management, and lowering system costs, CXL enhances overall performance and scalability in systems. The CXL Consortium, an industry organization dedicated to advancing the technology, was formed in March 2019 with founding members including Alibaba, Cisco, Dell EMC, Facebook, Google, Hewlett Packard Enterprise, Huawei, Intel, and Microsoft; Intel contributed the foundational technology, which was initially developed to address limitations in traditional interconnects like PCIe for coherent accelerator integration. The consortium officially incorporated in September 2019 and has since grown to include major players such as AMD, ARM, NVIDIA, Samsung, and SK Hynix. The initial CXL 1.0 specification was released in March 2019, introducing core protocols for I/O, caching, and memory access at up to 32 GT/s link rates, followed by CXL 1.1 in September 2019 with refinements for device types including accelerators and memory buffers. Subsequent releases have expanded CXL's capabilities: CXL 2.0, launched in November 2020, added support for memory pooling via multi-logical devices, single-level switching, and while maintaining 32 GT/s speeds. CXL 3.0, released in August 2022, doubled to 64 GT/s using PCIe 6.0, introduced multi-level switching, enhanced coherency with larger sizes, and enabled fabric-wide memory sharing and access for greater system . CXL 3.1, issued in November 2023, further improved fabric management with for packet-based routing switches, host-to-host communication via global integrated memory, and security through a protocol, alongside memory expander enhancements for reliability and metadata support. The latest version, CXL 3.2, was released on December 3, 2024, optimizing memory device monitoring and management, extending OS and application functionality, and bolstering security compliance with trusted security protocol tests, all while ensuring . CXL operates through three primary protocols multiplexed over PCIe links: CXL.io for device discovery, configuration, and standard I/O operations; CXL.cache for low-latency caching between host and devices; and CXL.mem for direct, coherent memory load/store access, allowing devices to appear as part of the host's memory address space. These protocols support three device types—Type 1 for accelerators with caching, Type 2 for devices with both cache and local memory, and Type 3 for memory expanders—enabling flexible integration without proprietary interfaces. As adoption grows, CXL is positioned to transform data center architectures by enabling dynamic resource allocation, reducing over-provisioning, and accelerating innovation in composable infrastructure.

Overview

Definition and Purpose

Compute Express Link (CXL) is an open industry-standard cache-coherent interconnect designed to connect central processing units (CPUs) with accelerators, memory expansion devices, and other components in computing systems. It enables low-latency, high-bandwidth data transfer while maintaining memory coherency across connected devices, addressing the limitations of traditional (I/O) interconnects in modern data centers. Built on the physical layer of Express (PCIe), CXL extends PCIe capabilities to support coherent memory access without requiring separate fabrics. The primary purposes of CXL include facilitating disaggregated computing, where resources like and compute can be pooled and allocated dynamically across systems; enabling expansion and pooling to overcome constraints in individual nodes; supporting accelerator offloading for tasks such as and ; and promoting architectures that integrate diverse processors and devices seamlessly. These objectives aim to create unified spaces that reduce the complexity of software stacks managing distributed resources, ultimately lowering system costs and enhancing performance in scalable environments. Key benefits of CXL encompass reduced data movement overhead through direct coherent access, which minimizes copying between non-coherent memory spaces; improved resource utilization by allowing shared access to idle or underutilized components; and enhanced scalability for data centers, surpassing the constraints of PCIe-only deployments by enabling efficient resource composability. At its core, CXL's coherency model relies on hardware mechanisms, including host-initiated snooping protocols where the host sends requests to change coherence states in device to ensure data consistency, and in advanced implementations like CXL 3.0, bias tables that track states across multiple devices for optimized traffic management. This model supports a simple MESI (Modified, Exclusive, Shared, Invalid) state machine on devices while the host orchestrates overall , providing low-latency sharing without software intervention.

Relationship to PCIe

Compute Express Link (CXL) leverages the (PHY) of (PCIe) 5.0 and later generations for its electrical signaling and transmission characteristics, enabling seamless integration with existing infrastructure. This includes support for lane configurations ranging from x1 to x16, which align directly with PCIe standards to facilitate high-bandwidth interconnects without requiring new cabling or slot designs. By adopting the PCIe PHY, CXL ensures with prior PCIe generations, such as PCIe 4.0 and earlier, through degraded modes that maintain operational viability in mixed environments. A primary distinction between CXL and PCIe lies in the , where CXL extends the PCIe transaction layer by introducing additional cache-coherent protocols—CXL.cache for device-initiated coherency and CXL.mem for host-managed —while preserving the underlying physical medium unaltered. This layering allows CXL devices to reuse standard PCIe cables, connectors, and slots, promoting cost-effective deployment in data centers and servers. Unlike pure PCIe, which focuses on non-coherent I/O , CXL's protocols enable semantics across accelerators and hosts, enhancing system-level resource pooling without disrupting PCIe compatibility. CXL incorporates compatibility modes that permit devices to revert to pure PCIe operation for non-coherent workloads, achieved through FlexBus during where the auto-detects and falls back if CXL-specific features are unsupported. This fallback ensures with legacy PCIe endpoints, as CXL devices enumerate as PCIe devices initially and switch modes post-. Such provisions minimize deployment risks in heterogeneous systems. The evolution of CXL aligns closely with PCIe advancements, with CXL 3.0 specifically tying to the PCIe 6.0 PHY to support higher signaling rates while incorporating mechanisms for future iterations. These include provisions for mixed-speed fabrics and extensible structures that accommodate evolving PCIe electrical standards, ensuring long-term without obsoleting prior deployments.

History and Standardization

Formation of the CXL Consortium

In March 2019, Intel announced the development of Compute Express Link (CXL) technology, an open standard interconnect designed to enable high-speed, coherent communication between processors, accelerators, and memory devices. Shortly thereafter, the CXL Consortium was formally established as an open industry association to drive this initiative forward. The founding members included Alibaba, Cisco, Dell EMC, Facebook, Google, Hewlett Packard Enterprise, Huawei, Intel, and Microsoft, representing a broad coalition of technology leaders committed to advancing data-centric computing architectures. The primary purpose of the CXL Consortium is to develop and maintain specifications for CXL, ensuring that promotes among multi-vendor components. This includes facilitating testing programs and fostering growth through education, demonstrations, and collaboration among members to address challenges in memory-intensive workloads such as and . By focusing on cache-coherent protocols over PCIe infrastructure, the consortium aims to break down traditional memory walls and enable disaggregated, scalable systems. To broaden its scope, the CXL Consortium integrated assets from complementary organizations. In November 2021, following a memorandum of understanding signed in 2020, the Gen-Z Consortium transferred its specifications and intellectual property to CXL, enhancing capabilities for fabric management and multi-host, scalable interconnect topologies. This merger, completed in early 2022, unified efforts around coherent fabrics, with approximately 80% of Gen-Z members joining CXL to support vendor-neutral, pooled resource environments. Subsequently, in August 2022, the OpenCAPI Consortium signed a letter of intent to transfer its specifications to CXL, incorporating support for Power architecture-based coherent accelerators and expanding compatibility across diverse processor ecosystems. These integrations positioned CXL as a comprehensive standard for multi-vendor fabrics, reducing fragmentation in the industry. The CXL Consortium continues to grow, reflecting widespread industry adoption and collaborative governance through its and working groups. This expansion underscores the consortium's role in promoting an open ecosystem for next-generation computing infrastructure.

Specification Releases

The Compute Express Link (CXL) specifications have evolved through successive releases managed by the CXL Consortium, introducing enhancements in protocols, , , and with underlying PCIe standards to address growing demands in data-centric computing environments. Each version builds on prior ones, maintaining while expanding capabilities for coherent interconnects between hosts, accelerators, and memory devices. The initial CXL 1.0 specification, released in March 2019, established the foundational protocols for low-, cache-coherent connections over PCIe 5.0 physical layers at up to 32 GT/s. It defined three core protocols—CXL.io for I/O semantics and device management, CXL.cache for caching, and CXL.mem for host-managed device —enabling direct CPU access to accelerator-attached for expansion and offload scenarios without requiring custom interfaces. This release focused on single-host topologies, supporting basic memory pooling and coherency to reduce in heterogeneous compute systems. CXL 1.1, released in September 2019, refined the 1.0 foundation with errata corrections, compliance clarifications, and initial enhancements. Key additions included and Data Encryption () support via the Secure PCIe () protocol, providing end-to-end confidentiality and integrity for CXL.mem and CXL.cache transactions without performance overhead, alongside improved device discovery and primitives. These updates ensured robust protection against tampering in and memory expansion use cases while aligning with emerging PCIe ecosystem requirements. The CXL 2.0 specification, released on November 10, 2020, marked a significant expansion by introducing fabric-level capabilities while retaining 32 GT/s speeds. It added support for CXL switches to enable multi-device , multi-host memory sharing through dynamic resource pooling and migration, and integration for resilient . These features facilitated scalable topologies for disaggregated memory, allowing efficient allocation across hosts in rack-scale environments and enhancing utilization in cloud and edge deployments. CXL 3.0, released in August 2022, doubled to 64 GT/s aligned with PCIe 6.0, without increasing , and advanced fabric architectures for larger-scale deployments. Major enhancements included multi-level switching for complex topologies up to thousands of nodes, end-to-end extensions, and improved fabric management protocols for dynamic discovery and routing. This version emphasized peer-to-peer communication efficiency and coherency in expansive pools, supporting and HPC workloads requiring massive . Building on 3.0, the CXL 3.1 specification, released on November 14, 2023, introduced refinements for reliability and efficiency in large fabrics. It enhanced error handling with advanced fault isolation and recovery mechanisms, improved for dynamic scaling in energy-constrained environments, and added (TEE) support for secure enclaves. Fabric extensions enabled better multi-host and reduced in error-prone scenarios, optimizing for sustained performance in data centers. The CXL 3.2 specification, released on December 3, 2024, further optimized integration with PCIe 6.0/6.1 while focusing on memory device enhancements. It reduced fabric overhead through streamlined units like the CXL Hot-Page Unit (CHMU) for tiered memory, extended for broader security compliance, and improved OS-level visibility into device health and . These updates minimized in pooled environments and bolstered resilience for AI-driven applications.
VersionRelease DateKey BandwidthMajor Additions
CXL 1.0March 201932 GT/sCore protocols (CXL.io, CXL.cache, CXL.mem); accelerator/memory support
CXL 1.1September 201932 GT/sIDE/SPIe security; errata and compliance fixes
CXL 2.0November 202032 GT/sSwitching, multi-host pooling, fabric basics
CXL 3.0August 202264 GT/sMulti-level fabrics, end-to-end integrity, large topologies
CXL 3.1November 202364 GT/sError handling, power management, TEE security
CXL 3.2December 202464 GT/sPCIe 6.x optimizations, reduced fabric overhead, enhanced monitoring

Architecture and Protocols

Core Protocols

Compute Express Link (CXL) employs three core protocols—CXL.io, CXL.cache, and CXL.mem—that collectively enable non-coherent I/O operations alongside coherent sharing between hosts and devices, all multiplexed over a shared PCIe physical link. These protocols build on PCIe and link layers while introducing specialized mechanisms for coherency and access, ensuring low-latency, high-bandwidth interactions in disaggregated computing environments. CXL.io provides a PCIe-compatible for device enumeration, , and non-coherent I/O transactions, utilizing standard PCIe transaction layer packets (TLPs) such as memory reads/writes and completions, along with ordering rules and error reporting via the Advanced Error Reporting (AER) mechanism. It supports discovery through PCIe space and handles via vendor-defined messages, making it mandatory for all CXL devices to ensure with existing PCIe ecosystems. Enhancements in later specifications, such as hot-plug and secondary mailboxes for event logging, further streamline device management without altering its core PCIe semantics. CXL.cache implements a cache coherency that allows CXL devices to cache host memory , using a snoop-based model with support for directory-based alternatives to maintain across the system. It operates via dedicated , and channels in both host-to-device and device-to-host directions, supporting operations such as reads, writes, invalidations, and snoops that adhere to Modified-Exclusive-Shared-Invalid (MESI) coherency states. A mechanism—host-biased for local access or device-biased for workloads—determines ownership, with snoop filters or directories tracking cacheline states at granularities from 64 bytes to 4 ; this enables devices to evict or write back efficiently, reducing latency for repeated accesses. CXL.mem facilitates direct load and store operations to device-attached , mapping it into the host's via Host-managed Device Memory (HDM) decoders that define base addresses, sizes, and interleaving across up to eight devices. It uses transactional message classes—including memory-to-system (M2S) requests and system-to-memory (S2M) responses—for coherent access, with back-invalidation snoops in directory-based modes to handle multi-host sharing; quality-of-service , such as load indicators, optimizes traffic under overload. This protocol supports speculative reads and persistent flush operations, ensuring in pooled memory scenarios. The three protocols are multiplexed over the shared PCIe link using an arbiter/multiplexer (ARB/MUX) with weighted , interleaving traffic at boundaries (e.g., 528 bits in earlier versions or 256 bytes in CXL 3.0) to maximize throughput while preserving per-protocol crediting and ordering. In CXL 3.x specifications, integrity is enhanced with sized by mode—8 bytes for standard 256-byte flits and 12 bytes (two 6-byte CRCs) for latency-optimized mode, using mode-specific —retaining 16-bit (polynomial 0x1F053) for 68-byte flits, alongside optional end-to-end (ECRC) and retry buffers to handle detected errors without data loss. Coherency domains are established across hosts and devices through integrated snoop filters, directory structures, and bias policies in CXL.cache and CXL., creating shared visibility where devices participate as coherent agents in the host's . For instance, host-biased domains prioritize CPU access with device snoops for invalidations, while device-biased domains allow accelerators to hold exclusive cachelines; multi-level switching in CXL 3.0 extends these domains to fabric-scale pooling, supporting up to 4,096 ports with port-based routing and logical device identifiers for isolation. This framework ensures atomicity and consistency without software intervention for core operations. CXL adopts the physical layer (PHY) from PCI Express (PCIe) for serializer/deserializer (SerDes) signaling, enabling compatibility with existing infrastructure while supporting high-speed data transmission. This reuse includes PCIe 5.0 PHY for CXL 1.1 and 2.0, operating at 32 GT/s per lane, and PCIe 6.0 PHY for CXL 3.0 and later, reaching 64 GT/s per lane using PAM-4 modulation. The physical layer handles electrical signaling, clocking, and lane management across x1 to x16 configurations, ensuring reliable bit-level transfer without requiring new cabling or connectors. At the link layer, CXL employs FLIT-based framing to encapsulate packets for transmission over the physical medium. In CXL 1.0 through 2.0, flits are 68 bytes (544 bits), consisting of a 16-bit ID, payload slots (typically four 16-byte slots for CXL. or CXL.), and a 16-bit for error detection. CXL 3.0 expands this to 256-byte flits in standard and latency-optimized modes to improve efficiency at higher speeds, incorporating additional headers and supporting with smaller flits. This framing allows of CXL.io, CXL., and CXL. protocols over the shared link, with flow control units (flits) ensuring ordered delivery. Link and equalization in CXL leverage the PCIe Link Training and Status State Machine (LTSSM), which progresses through states like Detect, Polling, , and for active operation. CXL extends this with specific sequences for protocol negotiation and coherency initialization, such as Alternate Protocol Negotiation (APN) during the state to switch from PCIe mode to CXL mode if both endpoints support it. Equalization adapts to channel losses using preset values and feedback, ensuring up to the supported speeds without added . Error detection and correction mechanisms enhance reliability, particularly at higher data rates. CXL inherits PCIe replay buffers and /NAK protocols from the to retransmit corrupted packets. In CXL 3.x, (FEC) becomes mandatory, using low-latency Reed-Solomon codes to correct bit errors in PAM-4 signaling, with bytes integrated into flits to maintain throughput without frequent replays. These features collectively achieve low bit error rates, supporting mission-critical applications. CXL supports flexible topologies starting with point-to-point connections between hosts and devices in CXL 1.0 and 1.1. CXL 2.0 introduces switching for communication and basic fabrics, while CXL 3.0+ enables multi-tiered switch fabrics with up to 4,096 devices, using headers in flits for fabric management and load balancing. This evolution allows scalable disaggregated systems while adhering to the PCIe physical constraints.

Device Classes

Type 1 Devices

Type 1 devices in Compute Express Link (CXL) are defined as accelerators that feature a coherent but lack local , using the CXL.cache to enable coherent access to the host's , allowing them to perform compute operations directly on host-resident data without the need for local storage. This design facilitates efficient offloading of specialized tasks from the host CPU, maintaining a unified memory coherency domain across the system. Key characteristics of Type 1 devices include their optimization for compute-intensive workloads that benefit from low-latency access to host , such as data analytics, , and networking functions. By including a coherent without local , these devices reduce hardware complexity for and power consumption while leveraging the host's for all data operations, enabling coherent load-store semantics with latencies under 200 in typical configurations. They are particularly suited for scenarios where the accelerator processes data streams or performs operations without requiring persistent local state. Representative examples of Type 1 devices include SmartNICs for network processing and accelerators, such as those implemented using Intel's Agilex series with integrated CXL IP cores configured for cache-only access. Custom application-specific integrated circuits (ASICs), like those used in storage controllers for or offload, also exemplify this class, where the device accesses host data via CXL.cache without embedding its own memory. These implementations demonstrate the versatility of Type 1 devices in disaggregated environments. Integration of Type 1 devices occurs through as standard PCIe devices augmented with CXL extensions, ensuring compatibility with existing PCIe ecosystems while adding coherency features. This allows for seamless discovery and configuration during system boot, with support for hot-plug operations in CXL fabrics that enable dynamic scaling across up to 4096 nodes via port-based routing. The use of the CXL.cache protocol in these integrations provides coherent host memory access without delving into protocol details.

Type 2 Devices

Type 2 devices in the Compute Express Link (CXL) integrate local and , enabling them to participate fully in the system coherency domain through the CXL.cache protocol, which supports snooping and maintains consistent states across the host and device using mechanisms like the MESI (Modified, Exclusive, Shared, Invalid) protocol. These devices combine CXL.io for basic I/O and discovery, CXL.cache for coherent caching and device-initiated requests to host , and CXL.mem for host access to device-attached , all layered over the PCIe via the FlexBus . They feature a device coherency engine (DCOH) that manages bias-based coherency modes—host-biased for high-throughput host access or device-biased for low-latency local operations—ensuring data consistency without explicit copying. Host-managed device (HDM) is mapped into the system's coherent , with capabilities for up to two HDM ranges configurable via decoder controls, allowing the host to treat device as an extension of its own. Key characteristics of Type 2 devices include support for local processing with semantics, where the device can host data and expose its to the host for unified access, reducing in compute-intensive workloads. This makes them particularly suitable for accelerators like GPUs and NICs that require both high-bandwidth local storage (e.g., or HBM) and coherent interaction with system , enabling scenarios such as offloaded inference or acceleration without stalls. The devices report sizes in 64 KB to 1 granules and support features like snoop filters for efficient coherency tracking, dirty handling, and mandatory writeback/invalidate operations to preserve during state transitions. Recommended latencies, such as 50 ns for snoop-miss responses and 80 ns for reads, guide implementations to balance performance and power. Representative examples include AMD's Versal Premium Series Gen 2 adaptive SoCs, which integrate a comprehensive CXL 3.1 subsystem for FPGA-based with local and full coherency support, enabling configurable offload in environments. Advanced smart NICs from , such as those in the Stingray family with embedded for packet processing, leverage CXL Type 2 features to provide with shared semantics, as demonstrated in tests. In CXL 3.0 and later specifications, Type 2 devices enable communication within fabrics, allowing direct transfers between devices without , supported by local HDM decoders for address translation and . This extends coherency to multi-host topologies, with snoop filters optimizing traffic in scaled environments up to 256-byte flits for improved efficiency.

Type 3 Devices

Type 3 devices in Compute Express Link (CXL) are dedicated expanders that provide additional capacity to host processors via the CXL.mem protocol, enabling coherent access without incorporating compute engines or caching mechanisms. These devices function primarily as passive extensions to system , allowing hosts to treat the attached as a seamless part of the local through load/store operations. Unlike other CXL device types, Type 3 implementations focus exclusively on pooling and sharing, supporting disaggregated architectures in data centers where resources can be dynamically allocated across multiple hosts. Key characteristics of Type 3 devices include support for both , such as DDR5 , and persistent memory types reminiscent of Optane, which retain data across power cycles for applications requiring durability. In CXL 3.0, granular memory allocation is facilitated by the Device-managed Fabric Buffer Manager (DFBM), which enables fine-grained slicing and sharing of buffers within fabric-attached configurations, optimizing utilization in large-scale pools. These devices leverage the CXL.mem protocol for host-initiated access, ensuring cache-coherent transactions over PCIe-based links. Prominent examples of Type 3 devices include Micron's CZ120 and CZ122 CXL modules, which offer capacities up to 256 GB per module using DDR5 and can be pooled to achieve 1 TB or more in multi-device setups, as demonstrated in 2023 deployments. Samsung's CXL and Memory Appliance solutions, such as the CMM-B series compliant with CXL , provide scalable for AI workloads, with orchestration for dynamic pooling across servers. Additionally, Liqid's composable systems utilize CXL to create shared pools of up to 100 TB across 32 servers, enabling resource disaggregation without modifications. More recent examples include Montage Technology's CXL 3.1 (2025). Security in Type 3 devices is enhanced by CXL's and Data (IDE) features, which provide and protection for data in transit across pools, mitigating risks such as unauthorized access or tampering in multi-tenant environments. This includes AES-based and checks at the level, ensuring secure coherent sharing without compromising performance.

Implementations

Hardware Implementations

Intel's 4th Generation Scalable processors, codenamed and launched in 2023, introduced hardware support for CXL 1.1 and 2.0, allowing coherent sharing of memory resources across PCIe-connected devices in servers. The subsequent 5th Generation processors, including released in late 2023, advanced CXL 2.0 support with enhanced fabric topologies, Type 3 memory devices, and improved switching for disaggregated systems. In 2025, the 6th Generation Scalable processors, codenamed Rapids and launched earlier in the year, further improved CXL 2.0 capabilities with up to 136 PCIe 5.0 lanes, supporting larger sizes and multi-socket configurations for and HPC workloads. complemented this with its 9004 series () processors, available since November 2022, which integrate CXL 2.0 support to expand memory capacity beyond traditional limits in environments. The 9005 series (), released in October 2024, extended this with up to 192 Zen 5 cores, 12 5-6400 channels, and continued CXL 2.0 support for denser memory expansion. Switch and fabric hardware has emerged to support CXL's multi-host and pooled resource features. Astera Labs' Aries PCIe/CXL Smart DSP Retimers, rolled out in 2023 for CXL 2.0 compliance, extend signal reach up to three times in AI and cloud infrastructures while maintaining low latency for PCIe Gen5 and CXL links. Broadcom's PEX series retimers, designed for high-speed interconnects, incorporate CXL 2.0 compatibility to facilitate robust fabric extensions in server racks, addressing signal degradation over longer traces. Marvell's Structera CXL portfolio, announced in 2024 and demonstrating interoperability in 2025, enables scalable memory expansion with up to 8 TB additional capacity per device using DDR4/5 modules. Memory expanders and accelerators represent key Type 3 and Type 1 implementations. introduced a 512 CXL-based computational solution prototype in 2022 using DDR5 for pooling. By 2025, CXL appliances using Type 3 devices have demonstrated up to 8 TB capacity in rack-scale configurations for workloads. GPU integrations, such as AMD's MI300 series accelerators, leverage CXL 2.0 over PCIe 5.0 for efficient data sharing in accelerated computing platforms. Ecosystem progress includes interoperability demonstrations at the 2024 Open Compute Project (OCP) Global Summit, where vendors like Astera Labs, , and showcased CXL 2.0 memory expansion using processors and DDR5 modules for applications. Further demos at OCP 2025 highlighted rack-scale innovations with Astera Labs' controllers. Volume shipments of CXL hardware for data centers commenced in 2024 and continued to ramp up in 2025, driven by hyperscale demands for pooled memory in infrastructure.

Software and OS Support

The provides robust support for Compute Express Link (CXL) devices through its dedicated CXL subsystem, which was initially introduced in version 5.18 in 2022 to enable basic device enumeration and . Initial support for CXL 2.0 features, including fabric topology and host-managed device (HDM) decoders, was added progressively starting in version 5.19, with fuller capabilities in 6.1 later that year, allowing for dynamic and coherency across multi-host environments. CXL 3.0 capabilities, such as advanced fabric management and switching for larger-scale deployments, have been progressively integrated starting with 6.10 in and continuing in subsequent releases like 6.12 in 2025, enhancing scalability for AI and workloads. Fabric management in is facilitated by tools like cxl-cli, part of the NDCTL project, which offers command-line utilities for device provisioning, health monitoring, and label management on CXL expanders. Support in other operating systems remains more limited compared to . has integrated CXL into cloud platforms since 2023, leveraging research prototypes like the memory pooling system to enable disaggregated memory sharing across virtual machines, though native Windows driver support relies on PCIe extensions rather than dedicated NDIS-based drivers for fabric operations. As of November 2025, macOS lacks official support for CXL devices, with compatibility limited to older x86-based systems via PCIe, but no integration for coherency or hot-plug on platforms, restricting use to development environments. Key libraries and underpin CXL software ecosystems, with ACPI-based introduced in CXL enabling operating systems to discover and configure devices through standard tables like ACPI0016 for host bridges, ensuring seamless integration without proprietary firmware dependencies. User-space access is supported via libraries such as libcxl for management interfaces and emerging extensions in libfabric (OFI), which are planned to incorporate CXL fabric protocols for high-performance data movement in distributed applications. is addressed through the CXL Protocol (TSP), defined in the CXL 3.1 specification, which provides attestation, encryption, and access controls to protect data in transit and at rest across pools. Despite these advancements, CXL software faces challenges in firmware management and operational reliability. updates are essential for maintaining cache coherency in multi-device topologies, as mismatches can lead to inconsistent states requiring reboots for . In virtualized environments, hot-plug handling poses difficulties, with protocols supporting managed hot-add and removal but necessitating complex orchestration to preserve VM isolation and avoid latency spikes during resource migration.

Performance Characteristics

Bandwidth and Throughput

Compute Express Link (CXL) leverages the of (PCIe) to achieve high-bandwidth data transfers, with theoretical limits determined by the underlying PCIe generation and lane configuration. For CXL 2.0, which utilizes PCIe 5.0, a x16 link provides up to 64 /s of bandwidth in each direction, yielding 128 /s bidirectional. CXL 3.2, based on PCIe 6.0, doubles this capacity to 128 /s per direction or 256 /s bidirectional for x16 configurations, enabling greater data movement in disaggregated systems. Protocol overhead in CXL arises from flit-based framing, where data packets include headers and , reducing link efficiency to approximately 92-94% depending on synchronous header usage. In CXL 3.0 and later, larger 256-byte flits—compared to 68-byte flits in earlier versions—improve efficiency by minimizing header overhead relative to data, approaching 90% overall utilization in environments. Fabric scaling introduces additional impacts, such as switch-induced latency that can limit per-link throughput in multi-hop topologies, though optimizations like mitigate this. CXL coherent protocols incur 10-20% higher power than non-coherent PCIe for similar levels. Real-world throughput for CXL memory pooling, as measured in 2024 benchmarks, typically achieves 50-64 /s effective per x16 link, representing 80-100% of theoretical limits under balanced read-write workloads. For instance, evaluations on CXL devices show peak reads reaching 64 /s in single-host configurations, with writes at 74-93% of that rate due to coherency overhead, while interleaved pooling across multiple devices sustains 55-61 /s in AI scenarios. In pooled systems, CXL scales through multi-lane links and fabric topologies, such as tiered switches supporting up to 32 hosts per , enabling aggregate throughput exceeding 1 TB/s across rack-scale deployments. For example, configurations with multiple x16 CXL 3.x devices interconnected via switches can deliver several TB/s collectively, facilitating efficient memory sharing in data centers without per-link bottlenecks dominating overall capacity.

Latency

Compute Express Link (CXL) introduces timing overheads for coherent operations relative to native PCIe, primarily due to the additional protocol layers for cache coherence and memory semantics. For CXL.cache snoops, specification targets indicate a round-trip latency of approximately 50 ns, enabling low-latency cache-coherent access compared to non-coherent I/O transactions. Subsequent versions of the CXL specification have focused on mitigating these overheads through architectural optimizations. In CXL 3.0, per-switch-hop fabric latency is reduced to approximately 50-70 ns via optimized routing and flit-based encoding that minimizes serialization delays in multi-hop topologies. For Type 3 memory expander devices, memory access latency typically ranges from 50-100 ns additional over local DRAM, positioning CXL-attached memory as a viable extension similar to a remote NUMA node. Several factors contribute to these latency characteristics in CXL systems. FLIT encoding, introduced to support efficient packetization on the PCIe , imposes a minor overhead of 2-5 ns per transaction due to alignment and efficiency trade-offs in latency-optimized modes. Coherency protocol handshakes, involving snoop requests and acknowledgments across the link, account for a significant portion of the delay, as they ensure data consistency without software intervention. Additionally, error correction mechanisms, such as (FEC) in PCIe 6.0-based CXL 3.0 links, introduce delays of under 2 ns to detect and correct transmission errors, enhancing reliability at the expense of minimal added . Recent benchmarks from evaluations highlight these in practical scenarios, with overall CXL memory access adding approximately 200 ns from controllers for CXL.mem operations. In tests using real CXL , CXL.mem loads exhibited an average of approximately 140 ns, compared to 70-80 ns for local DDR5 memory, demonstrating a 2x overhead primarily from the extended . When traversing CXL fabrics with multiple hops, each additional switch contributes about 50 ns, underscoring the importance of design for latency-sensitive workloads.

Applications

Data Center and Cloud Computing

In data centers and cloud environments, Compute Express Link (CXL) facilitates resource disaggregation by enabling coherent, low-latency sharing of memory and accelerators across servers, allowing operators to allocate resources dynamically and reduce underutilization. This approach addresses key challenges in hyperscale infrastructures, where traditional server-bound memory often leads to stranding—unused capacity that inflates costs without delivering value. By pooling resources at rack scale, CXL supports scalable architectures that optimize (TCO) while maintaining performance for diverse workloads. Memory pooling with CXL, leveraging Type 3 devices for expanded capacity, enables dynamic allocation of across multiple servers, improving utilization by 20-30% in scenarios through disaggregation and tiering. In hyperscale deployments, this reduces waste from idle , with studies showing potential savings of 12% in overall demand for pools spanning 32 sockets when allocating 50% of capacity to shared tiers. and advanced this in 2024 by introducing a Hyperscale CXL Tiered Expander specification at the (), incorporating inline compression such as a 2:1 ratio to halve media costs for cold data tiers and enable incremental expansion without full DDR5 upgrades. Such pilots demonstrate how CXL pooling minimizes by displacing higher-power memory types, targeting the roughly 50% of data that remains unused in the prior minute. For storage acceleration, CXL-attached NVMe drives support disaggregated architectures by providing near-DRAM in pooled setups, allowing servers to vast, shared pools with latencies in the low hundreds of for accesses and higher for storage I/O due to device characteristics. This integration offloads storage processing from host CPUs, enhancing efficiency in systems where traditional PCIe limits . CXL's virtualization features enable seamless virtual machine (VM) migration through coherent sharing, permitting live transfers of pages across hosts without halting operations, which is critical for high-availability cloud services. In containerized environments, this extends to orchestration platforms like via emerging extensions that manage tiered allocation, ensuring consistent performance during workload shifts in multi-tenant setups. Major cloud providers are adopting CXL for rack-scale systems to achieve cost savings, with identifying up to 25% stranding in production clusters and deploying pooled designs that reclaim approximately 25% of stranded while meeting latency targets under 200ns. These implementations, evaluated across 8-16 pools, balance hardware overhead with ROI, prioritizing multi-headed devices over switches to avoid negative returns in public datacenters.

AI and High-Performance Computing

Compute Express Link (CXL) enables accelerator offload through Type 1 and Type 2 devices, facilitating shared access to GPUs and TPUs while minimizing data copy overhead in large-scale models such as those in the family. By leveraging CXL's cache-coherent protocol over PCIe, tensors can be offloaded directly from accelerators to expanded memory pools, bypassing traditional CPU-mediated transfers that introduce latency. For instance, in GPU environments, CXL integration allows dynamic memory allocation to accelerators, enhancing utilization for memory-intensive workloads and more than doubling the speed of batch compared to non-coherent alternatives. This approach is particularly beneficial for models exceeding on-device high-bandwidth memory (HBM) capacity, where CXL-attached serves as a low-latency extension. In high-performance computing (HPC), CXL supports scalable fabrics that interconnect multiple nodes for exascale systems, enabling coherent memory sharing across topologies that scale to thousands of devices. These fabrics, introduced in CXL 3.x specifications, allow for dynamic resource pooling in distributed environments, addressing bandwidth limitations in traditional HPC interconnects like InfiniBand or Ethernet. For exascale computing, CXL fabrics provide the foundation for extending systems like those targeting multi-petaflop performance, with support for up to 4,096 nodes in fabric-managed configurations. Key use cases in and HPC involve distributed with CXL-pooled , where accelerators access a shared global to optimize tensor operations. In () , techniques like tensor offloading to CXL enable optimizations such as ZeRO-style, improving throughput in distributed setups by reducing inter-node data movement. Benchmarks from 2024 demonstrate this in multi-GPU clusters, where pooled CXL cuts overhead in operations like all-reduce, improving for models with billions of parameters. Type 2 devices, such as CXL-enabled GPUs, further integrate this pooling seamlessly into the fabric. Looking ahead, CXL 3.x extensions, including fabric management in 3.1, support enhanced coherency for AI-specific workloads in edge HPC environments, with demonstrations at events like Supercomputing highlighting pooling for AI/HPC. These advancements target hybrid AI-HPC setups where edge nodes require low-power, coherent acceleration without full infrastructure.

References

  1. [1]
    About CXL® - Compute Express Link
    Compute Express Link (CXL) is an industry-supported Cache-Coherent Interconnect for Processors, Memory Expansion and Accelerators.
  2. [2]
    Compute Express Link Consortium (CXL) Officially Incorporates
    Today, Alibaba, Cisco, Dell EMC, Facebook, Google, Hewlett Packard Enterprise, Huawei, Intel Corporation and Microsoft announce the incorporation of the ...
  3. [3]
    Our Members - Compute Express Link
    Board of Directors, Alibaba, AMD, ARM, Astera Labs, Cisco, Dell, Google, HPE, Huawei, Intel, Meta, Microsoft, NVIDIA, Samsung, SK Hynix
  4. [4]
    [PDF] Introducing the CXL 3.1 Specification
    Compute Express Link® and CXL® are trademarks of the Compute Express Link Consortium. March 2019. CXL 1.0. Specification. Released. September 2019. CXL ...
  5. [5]
    CXL Consortium Announces Compute Express Link 3.1 ...
    Nov 14, 2023 · The Compute Express Link™ (CXL™) 3.1 specification introduces CXL fabric improvements and extensions, Trusted-Execution-Environment Security ...
  6. [6]
    CXL Consortium Announces Compute Express Link 3.2 ...
    Dec 3, 2024 · “We are excited to announce the release of the CXL 3.2 Specification to advance the CXL ecosystem by providing enhancements to security ...
  7. [7]
    Compute Express Link (CXL): All you need to know - Rambus
    Jan 23, 2024 · CXL is an open standard industry-supported cache-coherent interconnect for processors, memory expansion, and accelerators.An Introduction to CXL: What Is... · Compute Express Link and...<|control11|><|separator|>
  8. [8]
    An Introduction to the Compute Express Link (CXL) Interconnect
    Jul 8, 2024 · CXL coherence is decoupled from host-specific coherence protocol details. The host processor is also responsible for orchestrating cache ...
  9. [9]
    [PDF] CXL-1.1-Specification.pdf
    Please refer to the “Physical Layer Logical Block” chapter of the PCI Express Base. Specification for details on PCIe mode of operation. The Flex Bus.CXL ...
  10. [10]
    [PDF] An Open Industry Standard for ... - Compute Express LinkTM (CXL™)
    New breakthrough high-speed fabric. • Enables a high-speed, efficient interconnect between CPU, memory and accelerators. • Builds upon PCI Express® (PCIe®) ...
  11. [11]
    [PDF] CXL_3.0_white-paper_FINAL.pdf - Compute Express Link
    These features enable many devices in a platform to migrate to CXL, while maintaining compatibility with PCIe 5.0 and the low-latency characteristics of CXL.<|control11|><|separator|>
  12. [12]
    [PDF] Key Industry Players Converge to Advance CXL, a New High-Speed ...
    Alibaba, Cisco, Dell EMC, Facebook, Google,. Hewlett Packard Enterprise, Huawei, Intel Corporation and Microsoft are forming an open industry ...
  13. [13]
    [PDF] A Coherent Interface for Ultra ... - Compute Express Link™ (CXL™)
    • First generation CXL aligns to. 32 Gbps PCIe 5.0. • CXL usages expected to be key driver for an aggressive timeline to. PCIe 6.0. What is CXL? 4/16/2021. 7.
  14. [14]
    Finally, A Coherent Interconnect Strategy: CXL Absorbs Gen-Z
    Nov 23, 2021 · CXL has emerged as the protocol of choice for linking CPU memory to the memory of accelerators and, interestingly enough, as the protocol that will be used to ...
  15. [15]
    CXL Will Absorb Gen-Z - EE Times
    Dec 9, 2021 · The memory interconnect groups agree to merge under the Compute Express Link banner. The merger should be completed by mid-2022.
  16. [16]
    CXL Deep Dive – Future of Composable Server Architecture and ...
    Aug 17, 2022 · There are more than 200 members of the CXL Consortium, but these are the companies we believe have the most impactful products and IP. In the ...
  17. [17]
    [PDF] CXL-1.0-Specification.pdf - Compute Express Link
    CXL.mem. Memory access protocol that supports device-attached memory. CXL.cache. Agent coherency protocol that supports device caching of Host memory. CXL Flex ...
  18. [18]
    CXL® Consortium Celebrates its First Anniversary
    Sep 25, 2020 · Initially, we started the CXL consortium with nine members and we have received broad industry support enabling us to grow to over 125 members ...
  19. [19]
    [PDF] CXL™ Consortium Releases Compute Express Link ™ 2.0 ...
    November 10, 2020 – The CXL™ Consortium, an industry standards body dedicated to advancing Compute Express Link™ (CXL) technology, today announced the release ...
  20. [20]
    [PDF] CXL-2.0-Specification.pdf
    In the event this CXL Specification makes any references (including without limitation any incorporation by reference) to another standard's setting.
  21. [21]
    [PDF] CXL Consortium releases Compute Express Link 3.0 specification to ...
    CXL 3.0 specification doubles the data rate to 64GTs with no added latency over CXL 2.0. • The new specification is available to the public. • CXL Consortium ...
  22. [22]
    [PDF] CXL-3.0-Specification.pdf - Compute Express Link
    Aug 1, 2022 · CXL SPECIFICATION: NOTICE TO USERS: THE UNIQUE VALUE THAT IS PROVIDED IN THIS SPECIFICATION FOR USE IN VENDOR. DEFINED MESSAGE FIELDS, ...
  23. [23]
    [PDF] Compute Express Link® (CXL®) 3.2
    Dec 3, 2024 · New, fully backward compatible specification is now available to the public. December 3, 2024 – Beaverton, Ore. – The CXL Consortium, an ...
  24. [24]
    CXL Brings Datacenter-sized Computing with 3.0 Standard, Thinks ...
    Aug 2, 2022 · Under that case, one of the tenets of CXL 4.0 will be to double the bandwidth by going to PCIe 7.0, but “beyond that, everything else will be ...
  25. [25]
    Industry's First CXL 3.0 Verification Solution | Synopsys Blog
    Aug 1, 2022 · In addition to 68B Flit mode as in CXL 2.0, CXL 3.0 introduces Standard and Latency optimized 256B formats. With separate format definitions ...
  26. [26]
    An Introduction to the Compute Express Link (CXL) Interconnect
    CXL is implemented using three protocols, CXL.io, CXL.cache, and CXL.memory (a.k.a. CXL.mem), which are dynamically multiplexed on PCIe physical layer. Figure 2 ...
  27. [27]
    CXL 3.0 Scales the Future Data Center - Verification - Cadence Blogs
    Oct 17, 2022 · ... FLIT granular transfers to mitigate store-and-forward overheads in the physical layer. A 6-byte CRC independently protects each half, but ...
  28. [28]
    Boost your CXL Verification From IP to System-Level - Cadence Blogs
    Feb 24, 2022 · CXL uses a PCI link training state machine LTSSM (Link Training and Status State Machine), so the same link training process is followed ...<|control11|><|separator|>
  29. [29]
    De-mystifying CXL: An Overview | Synopsys Blog
    Sep 12, 2019 · Link layer is an intermediate layer between Transaction layer and Physical layer. It helps maintain reliability of transactions across the link.
  30. [30]
    CXL 3.0 and the Future of AI Data Centers | Keysight Blogs
    Jul 10, 2024 · CXL benefits from existing physical layer infrastructure, building on decades of PCI-SIG® innovation and industry familiarity, but reduces ...
  31. [31]
    [PDF] OPPORTUNITIES AND CHALLENGES FOR COMPUTE EXPRESS ...
    In fact, CXL. 3.1, the specification released to the public in November 2023, uses PCIe 6.1 to support up to. 128 Gigabits per Second (Gbps) bi-directional data ...<|separator|>
  32. [32]
    Understanding the Compute Express Link Standard | Synopsys IP
    Jul 22, 2019 · Compute Express Link (CXL), a new open interconnect standard, targets intensive workloads for CPUs and purpose-built accelerators.<|separator|>
  33. [33]
    [PDF] AMD VERSAL™ PREMIUM SERIES GEN 2 CXL® UNLOCKS ...
    AMD Versal™ Premium Series Gen 2 adaptive SoCs are a versatile and configurable platform, offering a comprehensive CXL® 3.1 subsystem.
  34. [34]
    [PDF] Demystifying CXL Memory with Genuine CXL-Ready Systems and ...
    CXL requires hardware support from both the CPU and devices. Both the latest 4th-generation Intel Xeon Scalable Processor (Sap- phire Rapids) and the latest 4th ...
  35. [35]
    Marvell Announces Successful Interoperability of Structera CXL ...
    Apr 23, 2025 · Marvell collaborated with AMD and Intel to extensively test Structera CXL products with AMD EPYC and 5 th Gen Intel Xeon Scalable platforms.Missing: Sapphire Rapids 9004
  36. [36]
    [PDF] 4th Gen AMD EPYC Processor Architecture
    With the potential to expand memory through CXL provided in EPYC ... AVX-512 Performance Comparison: AMD Genoa vs Intel Sapphire Rapids & Ice Lake” - January ...
  37. [37]
    Aries PCIe®/CXL® Smart DSP Retimers - ASTERA LABS, INC. %
    Solves high-speed PCIe®/CXL® signal integrity challenges in AI and general purpose servers with reliable 3x reach extension · Purpose-built for AI and cloud ...
  38. [38]
    This is the Astera Labs Aries 6 PCIe Gen6 and CXL Retimer
    Apr 3, 2024 · We saw the Astera Labs Aries 6 PCIe Gen6 and CXL retimer running at NVIDIA GTC 2024 and looking great compared to Broadcom's planned chip.
  39. [39]
    SK Hynix Unveils CXL Memory Module with Compute Capabilities
    Oct 19, 2022 · SK Hynix shows off 512GB computational memory solution (CMS) with PCIe 4.0/CXL interface.Missing: 3 8TB Instinct MI300
  40. [40]
    AMD Instinct™ MI300 Series Accelerators
    AMD Instinct MI300 Series accelerators are uniquely well-suited to power even the most demanding AI and HPC workloads, offering exceptional compute performance.AMD Instinct™ MI300X · AMD Instinct™ MI325X Platform · Instinct™ MI325XMissing: CXL | Show results with:CXL
  41. [41]
    CXL Demo for DLRM at OCP Summit 2024 - ASTERA LABS, INC.
    The demo featured a 5th Gen AMD EPYC processor, DDR5 memory from Samsung, and four Leo CXL Smart Memory Controllers. The demo showed that by doubling memory ...Missing: volume shipments
  42. [42]
    Compute Express Link (CXL) — QEMU documentation
    CXL 2.0 Device Types​​ Example might be a crypto accelerators. May also have device private memory accessible via means such as PCI memory reads and writes to ...Cxl System Components · Cxl Memory Interleave · Example Command Lines<|control11|><|separator|>
  43. [43]
    Linux 6.10 Wires Up More Compute Express Link "CXL" Functionality
    May 17, 2024 · With Linux 6.10, there are three new CXL features in place. First, new CXL mailbox pass-through commands are added to support populating and ...Missing: cli | Show results with:cli
  44. [44]
    Linux Kernel 6.10 is Released: This is What's New for Compute ...
    Jul 14, 2024 · The Linux Kernel 6.10 release brings several improvements and additions related to Compute Express Link (CXL) technology.
  45. [45]
    cxl-cli - Fedora Packages
    The cxl utility provides enumeration and provisioning commands for the Linux kernel CXL devices. Releases Overview. Release, Stable, Testing. Fedora Rawhide, 82 ...Missing: subsystem support 5.18 6.1 6.10
  46. [46]
    Pond: CXL-Based Memory Pooling Systems for Cloud Platforms
    Mar 25, 2023 · This paper proposes Pond, the first memory pooling system that both meets cloud performance goals and significantly reduces DRAM cost.<|separator|>
  47. [47]
    CXL is Finally Coming in 2025 - ServeTheHome
    Dec 19, 2024 · In 2025, expect to see more CXL server designs for those who need more memory and memory bandwidth in general purpose compute.
  48. [48]
    [PDF] CXL* Type 3 Memory Device Software Guide - Intel
    A CXL enlightened version of a standard PCIe bus driver that consumes all. ACPI0016 CXL host bridge ACPI device instances and initiates CXL bus enumeration, ...Missing: TSE | Show results with:TSE
  49. [49]
    computexpresslink/libcxlmi: CXL Management Interface library
    CXL Management Interface utility library provides type definitions for CXL specification structures, enumerations and helper functionsMissing: TSE | Show results with:TSE
  50. [50]
    [PDF] Introducing Compute Express Link™ (CXL™) 3.1
    The newly defined CXL TEE (Trusted Execution Environment) Security. Protocol (CXL-TSP), defines mechanisms to include direct attached CXL Memory Devices within ...
  51. [51]
    [PDF] Managing Memory Tiers with CXL in Virtualized Environments
    Cloud providers seek to deploy CXL-based memory to in- crease aggregate memory capacity, reduce costs, and lower carbon emissions. However, CXL accesses ...Missing: NDIS | Show results with:NDIS
  52. [52]
    Specifications - PCI-SIG
    PCI-SIG specifications define standards for peripheral component interconnects, including PCI Express 8.0, 7.0, and 6.0, and are accessible online.
  53. [53]
    None
    ### Summary of Throughput and Bandwidth for CXL Memory in Benchmarks
  54. [54]
    Just How Bad Is CXL Memory Latency? - The Next Platform
    Dec 5, 2022 · Most CXL memory controllers add about 200 nanoseconds of latency, give or take a few tens of nanoseconds for additional retimers depending on ...
  55. [55]
    [PDF] Memory Disaggregation: Advances and Open Challenges
    CXL adds around 50-100 nanoseconds of extra la- tency over normal DRAM access. CXL Roadmap. Today, CXL-enabled CPUs and memory devices support CXL 1.0/1.1 ( ...
  56. [56]
    Compute Express Link (CXL) 3.0 Debuts, Wins CPU Interconnect Wars
    Aug 2, 2022 · The CXL spec breaks devices into different classes: Type 1 devices are accelerators that lack local memory, Type 2 devices are accelerators ...
  57. [57]
    The PCIe® 6.0 Specification Webinar Q&A: A Deeper Dive into FLIT ...
    Aug 19, 2021 · The improved bandwidth that results from low overhead amortization allows for high bandwidth efficiency, low latency and reduced area. The PCIe ...
  58. [58]
    [PDF] CXL Memory Performance for In-Memory Data Processing
    Type 2 devices (supporting CXL.cache and CXL.mem) have on- device memory exposed to a CPU and full coherent access both to their own memory and CPU memory ...
  59. [59]
    [PDF] Efficient Tensor Offloading for Large Deep-Learning Model Training ...
    The paper uses a cache coherence interconnect based on Compute Express Link (CXL) to reduce tensor transfer overhead, reducing training time by 33.7% to 55.4%.
  60. [60]
    MemVerge uses CXL to drive Nvidia GPU utilization higher
    Mar 18, 2024 · MemVerge's DRAM virtualizing Memory Machine software can use CXL to give GPUs more memory and rescue them from being stranded waiting for memory loading.
  61. [61]
    Amplifying Effective CXL Memory Bandwidth for LLM Inference via ...
    Sep 3, 2025 · DRAMSim3-based simulations show that it lowers DRAM access energy by up to 40.3% and reduces model-load latency by up to 42.1%. In an end-to-end ...Missing: benchmarks | Show results with:benchmarks
  62. [62]
    Increasing AI and HPC Application Performance with CXL Fabrics
    May 21, 2024 · This panel presentation will introduce the new features and explore how CXL attached memory to meet the increased memory capacity and bandwidth for HPC, AI, ...
  63. [63]
    [PDF] How CXL Transforms Server Memory Infrastructure
    Oct 8, 2025 · Up to 19% higher performance with CXL-connected DRAM (CMM-D) in VectorDB search compared to Local-DRAM-only case in Milvus RAG cluster.Missing: 4.0 | Show results with:4.0
  64. [64]
    Expanding your memory footprint with CXL at FMS 2025
    Sep 2, 2025 · CXL Consortium members shared video demos at the kiosk to showcase how they are deploying CXL to boost performance of emerging applications.Missing: Linux Foundation 2023
  65. [65]
    [PDF] Performance Characterization of CXL Memory and Its Use Cases
    In this work, we study the performance of three genuine CXL memory-expansion cards from different vendors. We characterize the basic performance of the CXL ...Missing: world | Show results with:world
  66. [66]
    Accelerating Performance of GPU-based Workloads Using CXL
    Compute express link (CXL) is an emerging technology that transparently extends the available system memory capacity at low latency and high throughput in a ...Missing: pooled speedup tensor benchmarks
  67. [67]
    Memory Challenges in Shared Computing Environments; CXL offers ...
    Aug 25, 2025 · Nvidia Shipped 3.76 Million Data-center GPUs in 2023, According to Study. June 10, 2024. Nvidia had an explosive 2023 in data-center GPU ...
  68. [68]
    Increasing AI and HPC Application Performance with CXL Fabrics
    Jun 3, 2024 · The CXL 3.1 specification introduces CXL fabric manager and extensions, Trusted-Execution-Environment Security Protocol (TSP), ...Missing: support | Show results with:support