Fact-checked by Grok 2 weeks ago

Remote direct memory access

Remote direct memory access (RDMA) is a that enables the direct transfer of data between the memory of two computers over a , bypassing the CPU, operating system , , and context switches on both endpoints to achieve low-latency and high-throughput communication. This is facilitated by specialized (NICs), known as RNICs, which handle data placement directly into application buffers using protocols like the Remote Direct Memory Access (RDMAP). At its core, RDMA operates through operations such as RDMA Write (one-sided transfer to remote memory), RDMA Read (remote fetch of data), and Send (two-sided ), all of which ensure reliable, ordered delivery over underlying transports while providing via steering tags (STags) to prevent unauthorized access. The technology minimizes data copies and CPU involvement by leveraging direct data placement (DDP), allowing applications to specify exact memory locations for transfers without intermediate buffering. RDMA is implemented across several standards: , a channel-based fabric designed for (HPC) that natively supports RDMA semantics for low-latency interconnects between servers, storage, and GPUs; (RoCE), which extends RDMA capabilities to standard Ethernet networks at Layers 2 and 3 for scalable deployments; and iWARP (Internet Wide Area RDMA Protocol), which maps RDMA over / for compatibility with existing Ethernet infrastructure. These implementations, governed by bodies like the InfiniBand Trade Association (IBTA) and the (IETF), ensure interoperability and evolving features such as enhanced telemetry and higher port densities in recent specifications. RDMA's key advantages include reduced latency (often sub-microsecond), high bandwidth (up to hundreds of Gbps), and near-zero CPU overhead, making it essential for demanding applications like training, , distributed (e.g., NVMe-oF), and cloud-scale clustering. By offloading network processing to hardware, it enhances scalability in modern data centers, where and RoCE together power 73% of the supercomputers as of November 2024.

Fundamentals

Definition and Core Concepts

Remote Direct Memory Access (RDMA) is a networking technology that enables direct data transfers between the main memory of networked computers without involving the (CPU), operating system (OS), , or traditional on either . This approach offloads data movement to the network interface hardware, allowing applications to access remote memory as if it were local, thereby achieving low-latency and high-throughput communication essential for and data centers. At its core, RDMA incorporates networking, where data is transferred directly from the of one to the of another without intermediate buffering or copying in the kernel or user space. It also features kernel bypass, permitting user-level applications to interact directly with the network hardware, eliminating OS overhead during transfers. RDMA operations are categorized into single-sided and two-sided types: single-sided operations, such as RDMA Write and RDMA Read, allow the initiator to specify both local and remote memory buffers while bypassing the remote CPU entirely for completion notification; in contrast, two-sided operations, like Send and Receive, require both endpoints to post buffers and involve explicit coordination, resembling traditional . RDMA extends the principles of traditional local (DMA), where peripheral devices access host independently of the CPU, to remote scenarios across a , enabling similar efficiency over distances. Unlike conventional / networking, which relies on multiple data copies through the and incurs significant CPU involvement for processing packets, RDMA minimizes these bottlenecks to deliver superior performance in bandwidth-intensive applications. The basic architecture relies on RDMA-enabled Network Interface Cards (RNICs), specialized hardware that independently manages registration, queue processing, and data transfers without host intervention.

Key Operational Principles

Remote Direct Memory Access (RDMA) enables efficient data transfer by placing incoming or outgoing data directly into the memory buffers of user-space applications on remote hosts, bypassing the operating system to eliminate the overhead of data copying through kernel space. This direct placement is achieved through support in RDMA-capable network interface controllers (RNICs), which manage transfers independently of the CPU. A key enabler of this process is the avoidance of involvement, which prevents costly context switches and interrupts that characterize traditional / networking; instead, the RNIC handles packet processing, error detection, and retransmissions at the level. To support secure and predictable direct access, applications must register specific memory regions with the RNIC prior to use in RDMA operations. This registration process pins the buffers in physical , mapping virtual addresses to physical ones and preventing paging or that could disrupt access, while also establishing domains to enforce access permissions. Pinning ensures that the RNIC can translate and validate addresses without software intervention, maintaining the nature of transfers. At the core of RDMA's asynchronous operation model are queue pairs (QPs), each comprising a send queue () for outgoing work and a receive queue (RQ) for incoming work. Applications post work requests (WRs) to the SQ to initiate sends or RDMA writes, or to the RQ to prepare for receives or RDMA reads, with the RNIC dequeuing and executing these requests in . This queue-based mechanism allows for efficient batching and pipelining of operations, decoupling application logic from low-level network handling. Work request completion is signaled through completion queues (CQs), which the application monitors via polling or event notification to retrieve completion queue elements (CQEs) containing status, , and byte count details. CQs enable scalable, low-overhead notification without relying on interrupts, supporting high-throughput scenarios by allowing multiple QPs to share a single CQ. RDMA defines transport semantics to balance reliability, ordering, and overhead. The Reliable Connected () service establishes a dedicated between QPs, guaranteeing in-order , exactly-once semantics, and flow control through hardware acknowledgments and retransmissions. In contrast, the Unreliable Datagram (UD) service provides connectionless, best-effort akin to , with no ordering or reliability guarantees but minimal setup overhead, ideal for messaging. This hardware-centric design yields significant latency reductions, approximated as: \text{RDMA latency} \approx \text{RNIC processing time} + \text{network transit time} typically under 5 μs end-to-end for small messages in local clusters, versus over 100 μs for equivalent TCP/IP transfers involving kernel traversal and buffering.

History and Development

Origins and Early Standards

Remote direct memory access (RDMA) emerged in the as a response to performance bottlenecks in (HPC) environments, particularly in cluster-based supercomputing systems where traditional network interfaces incurred high due to operating system involvement in data transfers. These bottlenecks limited in parallel applications, such as scientific simulations and large-scale , by introducing overheads from context switches and data copying between user and spaces. Early research in user-level networking, including projects like at UC Berkeley, highlighted the need for direct hardware access to memory without CPU intervention to achieve low-latency, high-bandwidth communication in distributed systems. A pivotal early standardization effort was the Virtual Interface Architecture (VIA), a software specification developed by Compaq, Intel, and Microsoft to enable protected, user-level networking over system area networks (SANs). Released in version 1.0 on December 16, 1997, VIA provided abstractions for zero-copy data transfers and remote memory operations, aiming to reduce communication latency for HPC clusters and enterprise applications like transaction processing. By allowing applications to directly manage network interfaces via virtual interfaces and completion queues, VIA addressed key limitations of kernel-mediated networking, influencing subsequent RDMA designs. Building on VIA's concepts, the Trade Association (IBTA) was formed in August 1999 by industry leaders including , , , , , and Mellanox to develop a unified architecture for high-speed interconnects in HPC and data centers. The Architecture (IBA) specification version 1.0 was released in October 2000, defining a protocol with native support for RDMA operations like send/receive and direct memory writes/reads to bypass CPU and OS involvement. Initial hardware implementations followed shortly, with Mellanox shipping the first devices, such as the InfiniBridge MT21108 host channel adapter, in January 2001, enabling practical deployment in supercomputing clusters. contributed significantly to early development through its involvement in the IBTA and silicon design efforts.

Evolution and Adoption Milestones

The standardization of iWARP by the (IETF) in 2007 marked an early milestone in extending RDMA capabilities over standard TCP/IP networks, with RFC 5040 defining the core Remote Direct Memory Access Protocol (RDMAP) and related specifications (RFC 5041–5044) enabling direct data placement and framing over reliable transports. This laid the groundwork for Ethernet-based RDMA implementations, broadening accessibility beyond proprietary fabrics. In 2010, the Trade Association (IBTA) released the initial RoCE specification (v1), integrating RDMA semantics directly into Ethernet frames to leverage existing infrastructure without requiring specialized hardware. This was followed by RoCE v2 in September 2014, which added routable /UDP encapsulation to support Layer 3 network traversal, enhancing scalability in multi-subnet environments. Operating system adoption accelerated RDMA's integration into mainstream computing. kernels began supporting RDMA features in the late , with initial NFS/RDMA client implementation in version 2.6.24 (December 2007) and server support in 2.6.25 (April 2008), enabling efficient file system operations over RDMA fabrics. introduced native RDMA support in via SMB Direct, allowing low-CPU file sharing over RDMA-capable adapters for storage and clustering workloads. platforms followed suit, with integrating paravirtual RDMA (PVRDMA) in vSphere 6.5 (October 2016), permitting virtual machines to access RDMA hardware for high-throughput networking. By the 2010s, RDMA had achieved widespread adoption in (HPC), powering a majority of supercomputers through and emerging Ethernet variants, driven by demands for low- interconnects in scientific simulations and processing. Market momentum surged in with the proliferation of 100 Gbit/s RDMA hardware from vendors like and , enabling cost-effective scaling for enterprise clusters and reducing bottlenecks in bandwidth-intensive applications. In April 2020, completed its $7 billion acquisition of Mellanox, enhancing RDMA and integration with GPU technologies for and HPC workloads. Post-2020, RDMA adoption boomed in data centers, fueled by workloads requiring ultra-low data movement, with the RDMA networking market expanding from approximately $1 billion prior to 2021 to over $6 billion in 2023. As of June 2024, RDMA-based networks powered over 90% of supercomputers, with the market projected to exceed $22 billion by 2028 fueled by demands. Intel's Architecture, announced in November 2014 and commercially released in 2015, emerged as a cost-competitive alternative to , offering 100 Gbit/s throughput with lower and power consumption for HPC fabrics, and remained in use through the despite eventual discontinuation of further generations.

Protocols and Implementations

InfiniBand and RoCE

InfiniBand serves as a foundational protocol for remote direct memory access (RDMA), defined as a channel-based interconnect architecture that employs a switched fabric topology to enable high-speed connectivity between servers and storage systems. This topology facilitates scalable, point-to-point communication with minimal latency, supporting data rates up to 800 Gbit/s via the eXtended Data Rate (XDR) standard as of 2025. InfiniBand ensures lossless data transmission through credit-based flow control, where receivers grant credits to senders to manage buffer usage and prevent packet drops. The architecture is governed by specifications from the InfiniBand Trade Association (IBTA), with ongoing revisions such as Volume 1 Release 2.0 in 2025, which enhance switch density, scalability, and memory placement for reduced latency. RoCE, or , adapts 's RDMA capabilities to standard Ethernet networks, allowing efficient, low-latency data transfers over . It exists in two versions: RoCE v1, which operates solely at Ethernet Layer 2 within a single , and RoCE v2, which is routable across Layer 3 networks using and encapsulation for broader scalability. Like , RoCE requires lossless Ethernet environments, achieved through mechanisms such as (PFC) to avoid and maintain performance. The primary differences between InfiniBand and RoCE lie in their underlying hardware and deployment: InfiniBand utilizes dedicated native hardware for its fabric, providing optimized, purpose-built performance, whereas RoCE overlays RDMA functionality onto existing Ethernet infrastructure, leveraging commodity switches and NICs for cost-effective integration. Despite these distinctions, both protocols share the IBTA Verbs API, a standardized interface for managing RDMA operations like queue pairs and work requests. RoCE standards emerged as IBTA supplements, with the initial RoCE specification released in 2010 and RoCE v2 formalized in the 2010s to address routing limitations of the earlier version.

iWARP, Omni-Path, and Emerging Standards

iWARP, or Internet Wide Area RDMA Protocol, is a standards-based implementation of RDMA that operates over /, enabling direct memory access across standard Ethernet networks without requiring specialized lossless fabrics. Defined by the (IETF) in 2007, iWARP consists of a layered including the Remote Direct Memory Access Protocol (RDMAP) for RDMA operations, the Direct Data Placement (DDP) protocol for efficient data transfer into application buffers, and the Marker PDU Aligned Framing () for TCP framing to ensure reliable delivery. This approach leverages existing Ethernet infrastructure, avoiding the need for priority flow control or other enhancements mandated by protocols like RoCE, but introduces higher protocol overhead due to 's reliability mechanisms, such as acknowledgments and retransmissions. iWARP's design prioritizes compatibility with conventional networks, making it suitable for enterprise and wide-area deployments where hardware modifications are impractical. Omni-Path Architecture (OPA), introduced by in 2014, is a proprietary high-performance interconnect designed specifically for scalability in (HPC) environments, offering RDMA capabilities with low and high bandwidth. OPA supports data rates up to 100 Gbps per port in its initial generation, with optimizations for message rates exceeding 10 million per second and end-to-end latencies under 1 , positioning it as a cost-effective alternative to for large-scale clusters. The architecture employs the Omni-Path Interface (OPI) specification, which defines a standardized electrical and interface for host fabric adapters and switches, facilitating among components while emphasizing power efficiency and fabric manageability. Although plans for a 200 Gbps second-generation OPA were announced, Intel discontinued development in 2019, shifting focus to other interconnect technologies. OPA's fabric supports up to thousands of nodes with features like adaptive routing and congestion control, enhancing reliability in HPC workloads. Emerging standards are extending RDMA's reach into Ethernet-centric and software-defined environments, particularly for and cloud-scale applications. The Ultra Ethernet Consortium (UEC), formed in by industry leaders including , , and , is developing an open Ethernet-based specification optimized for and HPC, featuring the Ultra Ethernet Transport (UET) protocol as a modern RDMA alternative to RoCE. UEC 1.0, released in June 2025, introduces RDMA enhancements with intelligent low-latency transport, congestion control tailored for high-throughput training, and IP-routable packet structures to support massive-scale clusters without proprietary fabrics. Complementing hardware advancements, Soft-RoCE provides a software-emulated RDMA implementation over standard Ethernet, allowing systems without dedicated RDMA hardware to perform direct memory transfers via kernel drivers like those in . This emulation layer maps RDMA verbs to / transports, enabling testing and deployment in virtualized or legacy environments with performance approaching hardware solutions for smaller-scale use cases. These developments reflect ongoing IETF and industry efforts to evolve RDMA standards for broader and efficiency in diverse networking ecosystems.

Technical Mechanisms

Memory Access and Data Transfer

Remote Direct Memory Access (RDMA) supports several core operations for efficient transfer between nodes, categorized into one-sided and two-sided semantics. One-sided operations, such as RDMA Read and RDMA Write, enable direct access to remote without involving the remote CPU, allowing the initiator to pull (RDMA Read) from or push (RDMA Write) to a specified remote region using a provided remote (rkey). Two-sided operations, including Send and Receive, function like , where the sender posts a Send work request to deliver to a pre-posted Receive on the remote side, requiring coordination and CPU involvement on both ends for completion signaling. Additionally, atomic operations, such as or , provide one-sided mechanisms for synchronized remote updates, where the RNIC performs the atomically and returns the result to the initiator without remote CPU intervention. Before performing RDMA operations, applications must register memory regions to enable safe direct access by the RDMA Network Interface Card (RNIC). This registration process pins the specified virtual memory pages in physical memory to prevent paging, maps virtual addresses to physical ones for the RNIC, and assigns permissions such as local read/write or remote access types (e.g., remote read, write, or atomic). Upon successful registration, the RNIC generates a local key (lkey) for the application to reference the region in local operations and a remote key (rkey) to share with remote peers, which serves as an access control token to validate and authorize incoming remote requests. This pinning ensures data integrity during transfers but consumes system resources, as registered regions remain fixed in physical memory until deregistered. Data transfer in RDMA begins when the initiator application posts a work request (WR) describing the —such as the remote , length, and rkey for one-sided verbs—to a send queue within a queue pair (), a paired set of queues for communication between endpoints. The RNIC then autonomously processes the WR by initiating (DMA) engines to transfer data directly between the local and remote memory regions, bypassing the host CPU for both data movement and protocol processing in one-sided operations. For two-sided operations, the remote side must have a corresponding Receive WR posted, after which the RNIC signals completion via completion queues for error handling and . This flow achieves low-latency transfers by offloading all network and memory to hardware. RDMA's efficiency stems from minimal protocol overhead, enabling theoretical maximum throughput calculated as link speed divided by the packet overhead factor, which accounts for headers and encapsulation. For instance, on 100 Gbit/s links, RDMA protocols commonly achieve approximately 95% efficiency under optimal conditions with large payloads and lossless networks, approaching line rate while reducing CPU utilization to near zero.

APIs and Queue Management

The Verbs API, standardized by the InfiniBand Trade Association (IBTA) in the Architecture Specification, provides a user-space programming interface for RDMA operations on and RoCE networks. It enables applications to directly manage hardware resources and initiate transfers without involvement, supporting functions for , work request posting, and event polling. This API is foundational for high-performance networking, allowing developers to implement efficient data movement semantics. Central to the Verbs API are functions for posting and completing work requests, such as ibv_post_send() to enqueue send operations on a queue pair's send and ibv_post_recv() for receive operations on the receive . Completion events are retrieved via ibv_poll_cq(), which dequeues work structures containing , , and byte counts from a completion . These mechanisms ensure asynchronous, non-blocking operation, with signaled work requests generating interrupts for notification. Queue management encompasses the creation and configuration of core RDMA resources: queue pairs (QPs), completion queues (CQs), and protection domains (PDs). Queue pairs, created with ibv_create_qp(), represent bidirectional communication endpoints comprising a send queue for outgoing work requests and a receive queue for incoming ones; they support transport types like or . Completion queues, allocated via ibv_create_cq(), hold entries for finished work requests from one or more QPs, with polling removing entries to track progress. Protection domains, obtained through ibv_alloc_pd(), enforce by grouping QPs, regions, and handles, restricting access to authorized and preventing unauthorized reads or writes. These resources are destroyed with corresponding deallocation functions like ibv_destroy_qp() and ibv_destroy_cq() upon . Work completions (WCs) are handled by applications polling CQs, where each WC includes fields for , vendor error codes, and completion flags to indicate or . Multiple QPs can share a CQ for , but overflow events trigger IBV_EVENT_CQ_ERR if the queue fills without polling. Protection domains integrate with registration to validate access rights during operations, ensuring faults like invalid keys result in controlled s rather than crashes. The primary library for Verbs implementation on is libibverbs, part of the OpenFabrics Enterprise Distribution (OFED), which abstracts hardware-specific drivers for , RoCE, and iWARP. It supports user-space direct access via the ib_uverbs kernel module, enabling low-latency operations. On Windows, the Network Direct Kernel Provider Interface (NDKPI), an NDIS extension, delivers a Verbs-compatible for RDMA, allowing independent hardware vendors to implement kernel-mode support for protocols like RoCE and iWARP. Verbs extensions for Ethernet, such as those in libibverbs-rocee, facilitate RDMA over converged networks by mapping semantics to Ethernet transports. Error handling in the Verbs API centers on completion status codes within , with IBV_WC_SUCCESS denoting successful completion and codes like IBV_WC_LOC_QP_OP_ERR or IBV_WC_REM_INV_REQ_ERR signaling local or remote errors such as queue underflow or invalid requests. In unreliable modes like UD, negative acknowledgments (NAKs) from the remote side—such as for sequence errors or receiver not ready (RNR)—are reflected in WC status codes like IBV_WC_REM_OP_ERR, prompting application retries since does not automatically recover. Events like IBV_EVENT_QP_FATAL indicate irrecoverable errors, requiring resource recreation.

Applications and Use Cases

High-Performance Computing and Storage

Remote Direct Memory Access (RDMA) plays a pivotal role in (HPC) clusters by enabling efficient, low-latency data transfers that bypass the operating system kernel, allowing direct access to remote memory. This capability is particularly valuable for (MPI) implementations, such as MVAPICH2, which leverage RDMA over networks like to support scalable in supercomputing environments. RDMA facilitates low-latency messaging essential for tightly coupled applications, reducing communication overhead in distributed simulations and scientific computations. Since the early 2000s, RDMA-enabled interconnects like have been integral to supercomputers, powering a significant portion of the world's fastest systems and contributing to their high performance rankings. In storage systems, RDMA underpins NVMe over Fabrics (NVMe-oF), extending the high-speed, low-latency NVMe protocol across networked block storage environments. NVMe-oF utilizes RDMA transports to enable direct memory-to-memory data movement between hosts and storage arrays, minimizing CPU involvement and supporting protocols such as NVMe/ and NVMe/RoCE. This architecture delivers scalable I/O performance for large-scale data-intensive workloads, with RDMA ensuring efficient handling of small-block random accesses common in HPC storage. Parallel file systems like Lustre integrate RDMA through its LNet routing layer to optimize I/O operations in HPC clusters, enabling transfers and full utilization for distributed data access. In conjunction with workload managers such as SLURM, Lustre's RDMA support facilitates high-throughput parallel I/O, aggregating performance across multiple servers () to handle massive datasets from scientific simulations. Deployments in large clusters achieve aggregate throughputs exceeding 100 GB/s, scaling linearly with additional storage targets to meet the demands of petabyte-scale environments. A notable application is 's (LHC) computing infrastructure, where RDMA enhances data movement in experiment readout systems like ATLAS and SND@HL-LHC. By implementing RDMA in front-end electronics and event buffers, achieves efficient, high-bandwidth transfers of collision data across distributed processing nodes, supporting real-time analysis of terabytes generated per second.

AI/ML and Cloud Environments

In and workflows, Remote Direct Memory Access (RDMA) plays a pivotal role in distributed training frameworks by enabling efficient communication primitives such as all-reduce operations within parameter server architectures. In , RDMA accelerates tasks by integrating with for low-latency data exchanges during gradient aggregation in parameter servers, reducing communication overhead compared to traditional /IP stacks. Similarly, leverages the NCCL backend for RDMA-supported all-reduce operations in distributed data parallel (DDP) training, enabling efficient gradient synchronization across nodes without CPU intervention, improving scalability for large-scale models. These mechanisms allow RDMA to handle the intensive bursty traffic patterns inherent in synchronous training paradigms. A key enabler in these setups is GPU-direct transfers via GPUDirect RDMA, which permits direct data movement between GPUs across networked nodes, bypassing host CPU and memory copies to minimize . This technology, integrated into the toolkit, supports RDMA protocols like RoCE, enabling up to 10x performance gains in data throughput for neural network training on large datasets. By facilitating memory-to-memory transfers at line-rate speeds, GPUDirect RDMA ensures that workloads maintain high efficiency during collective operations like all-reduce. In environments, RDMA integration with orchestration platforms like enhances / scalability through specialized drivers such as DraNet, a 2025 implementation from that uses the Dynamic () API to dynamically attach high-performance RDMA interfaces to pods for demanding workloads. This allows seamless provisioning of RDMA resources alongside accelerators in Google Engine (GKE), optimizing for distributed training without manual configuration. Complementing this, Alibaba's Stellar platform introduces Para-Virtualized (PVDMA) in 2025, enabling on-demand memory pinning and dynamic allocation in virtualized setups to support RDMA over multi-tenant environments with minimal overhead. Exemplifying practical deployments, clusters utilize RoCE-based RDMA for multi-node training, as seen in configurations with DGX A100 systems connected via 200 Gbps Ethernet fabrics to enable GPU-direct collectives in Kubernetes-orchestrated environments. These setups support efficient scaling of training jobs across dozens of nodes, with RDMA ensuring lossless, low-latency synchronization for frameworks like . The growing adoption of RDMA in is underscored by market projections estimating the RDMA networking segment for / to exceed $22 billion by 2028, driven by surging demand for high-throughput interconnects. RDMA's benefits in these domains are particularly evident in handling petabyte-scale datasets during distributed , where its sub-microsecond and high enable rapid iteration over massive inputs without stalling GPU compute resources. For instance, in environments processing exabyte-level AI corpora, RDMA facilitates efficient shuffling and aggregation, reducing time-to-accuracy by minimizing bottlenecks in multi-node setups.

Advantages and Limitations

Performance Benefits

Remote Direct Memory Access (RDMA) provides significant performance advantages over traditional / networking, primarily through its ability to bypass the operating system and CPU involvement in data transfers. This kernel bypass enables ultra-low , with round-trip times as low as approximately 2 μs in high-speed RDMA networks using modern network interface cards (NICs). In contrast, /-based communications in data centers typically incur latencies of 50-100 μs for small messages due to protocol processing and context switching overheads. For remote accesses across nodes, RDMA maintains latencies under 10 μs, making it ideal for latency-sensitive workloads. RDMA also delivers high throughput close to line-rate speeds, such as 200 Gbit/s or up to 800 Gbit/s in recent implementations, with minimal overhead from its semantics that eliminate intermediate data buffering. efficiency in RDMA transfers often reaches 90-95% of the physical , as demonstrated in micro-benchmarks using tools like IB Perftest. The mechanism further reduces CPU utilization to less than 1% during large data transfers, compared to 50-90% overhead in TCP/ operations, freeing resources for application processing. In terms of scalability, RDMA supports the creation of millions of queue pairs (QPs) per node in large clusters, enabling massive parallelism without proportional increases in latency or resource contention. This capability, combined with low CPU overhead, contributes to improved energy efficiency in hyperscale environments.

Challenges and Drawbacks

Remote Direct Memory Access (RDMA) relies on specialized hardware, particularly Remote Network Interface Cards (RNICs), such as the NVIDIA ConnectX series, which are essential for enabling direct memory transfers without CPU involvement. These RNICs incorporate dedicated microarchitecture resources like caches and processing units to handle RDMA operations, distinguishing them from standard Ethernet NICs that lack such capabilities. While RDMA-capable NICs were historically more expensive, recent models have prices comparable to standard Ethernet NICs due to advancements in chipsets. Implementing RDMA adds considerable complexity to network configuration, as it demands a lossless Ethernet to prevent packet drops that could degrade . This necessitates enabling mechanisms like Priority-based Flow Control () to pause traffic on specific priorities and (ECN) to signal impending congestion, ensuring end-to-end reliability without retransmissions. Misconfigurations in these features can lead to issues like PFC deadlocks or , complicating deployment in shared infrastructures. Furthermore, RDMA's software ecosystem exhibits limited compatibility, particularly with operating systems; native support is available in editions and Pro for Workstations via features like SMB Direct, though integration often requires custom libraries like libibverbs or WinOF. Many applications require custom libraries like libibverbs or WinOF for integration, restricting widespread adoption beyond specialized . Scalability poses notable hurdles in large-scale deployments, where the finite resources of RNICs—such as queue pairs (QPs) and completion queues (CQs)—can become exhausted under high connection counts, leading to cache misses and stalled processing. For instance, in clusters with thousands of nodes, wide access patterns across numerous QPs trigger frequent misses in the RNIC's internal context memory (ICM), exacerbating resource contention. Debugging these issues is further hampered by the scarcity of comprehensive tools; traditional network diagnostics often fall short, requiring specialized approaches like simulated annealing-based anomaly detection or custom telemetry to isolate microarchitecture bottlenecks. Additional drawbacks include , especially with implementations, where the ecosystem is dominated by a few providers like , limiting and increasing dependency on specific and updates. Migrating existing /IP-based applications to RDMA involves substantial refactoring to leverage verbs and handle differences in reliability semantics, often necessitating gateways or hybrid stacks that introduce overhead and risks. These factors collectively elevate the barrier to entry for RDMA adoption in diverse computing environments.

Security and Future Directions

Security Risks and Mitigations

Remote Direct Memory Access (RDMA) introduces significant security risks due to its design, which enables direct manipulation between endpoints while bypassing traditional operating system protections such as firewalls and privilege checks. This one-sided communication model allows remote initiators to read from or write to a target's without involving the target's CPU, potentially exposing sensitive if access controls fail. For instance, predictable remote keys (rkeys) in certain RDMA network interface cards (NICs), such as Mellanox ConnectX series, can be exploited to gain unauthorized access to protected regions, leading to theft or . A notable vulnerability is the exposure of rkeys, which serve as access permissions for memory regions but can be intercepted or guessed in insecure setups, enabling attackers to perform unauthorized reads or writes even across trusted connections. Additionally, RDMA's lack of native encryption in base protocols like RoCE (RDMA over Converged Ethernet) and iWARP exposes data in transit to eavesdropping, particularly in shared cloud environments where lateral movement by compromised nodes is a concern. Side-channel attacks further compound these issues; in multi-tenant setups with shared NICs, timing differences from page table entry misses or memory registration operations can leak information via covert channels. Availability threats are exemplified by denial-of-service () attacks, such as the LoRDMA attack identified in 2024, which exploits interactions between Priority Flow Control () and Data Center Quantized Congestion Notification (DCQCN) using low-rate burst traffic to degrade legitimate RDMA flows. This attack coordinates short bursts from multiple bots to trigger PFC pauses, misleading congestion control and causing up to 56% performance loss on victim flows across multiple hops, even with minimal direct contention. To mitigate these risks, RDMA implementations rely on built-in hardware and protocol features, including memory protection domains (PDs) that isolate resources and limit the scope of regions to specific queue pairs (QPs), preventing unauthorized access across connections. Strict QP policies enforce access controls by binding operations to specific PDs and using type-2 memory windows that pin permissions to queue pair numbers (QPNs), reducing the from key exposure. For and in transit, can be layered over RoCE and iWARP to provide and , dropping spoofed packets while integrating with standards, though it adds overhead and is not supported natively for . Advanced mitigations include programmable network defenses like those in , which enable source authentication and fine-grained directly in the data plane to counter unauthorized RDMA operations without centralized . Hardware enhancements, such as secure boot in RDMA-capable NICs like NVIDIA BlueField DPUs, verify integrity during initialization to prevent tampered components from introducing vulnerabilities. IETF discussions, including early drafts on RDMA considerations, have addressed concerns like handle predictability to bolster robustness. In recent years, innovations in RDMA have focused on enhancing , offloading, and to meet the demands of and distributed systems. Alibaba's Stellar network introduces Para-Virtualized Direct Memory Access (PVDMA), enabling on-demand memory pinning that reduces overhead in virtualized environments by dynamically allocating RDMA-accessible memory without persistent pinning, improving efficiency for workloads in multi-tenant . Similarly, the ROS2 system offloads RDMA-based operations to BlueField-3 SmartNICs, separating control and data paths to achieve low-latency I/O for training while minimizing host CPU involvement and supporting compatibility. Microsoft's SRC protocol decouples queue pairs (QPs) from network , introducing lightweight reliable streams that scale to thousands of per , addressing QP exhaustion in large-scale RDMA deployments and boosting throughput in disaggregated memory systems. Emerging trends highlight RDMA's expansion beyond traditional data centers. Patents since 2023 enable RDMA over cellular networks, allowing between and edge servers for ultra-low-latency applications like and industrial . The Ultra Ethernet Consortium, formed in 2023, develops an RDMA-compatible for Ethernet-based fabrics, replacing legacy RoCE with scalable protocols that support massive GPU clusters without dependencies. In , RDMA optimizations like status-byte-assisted transmission reduce congestion in (MEC) environments, enabling real-time data processing for -connected devices with minimal latency. Market growth underscores RDMA's integration into containerized ecosystems, with Google's DraNet in 2025 providing a Kubernetes-native driver for dynamic RDMA , simplifying high-performance networking for AI/ML workloads via declarative APIs and GPUDirect RDMA support. Projections indicate RDMA networks will reach 800 Gbit/s speeds as early as 2026, driven by AI-driven demand and optical interconnect advancements. Looking ahead, hybrid RDMA-TCP protocols like SMC-R and Jakiro facilitate broader adoption by combining RDMA's low latency with TCP's compatibility in virtual private clouds, enabling seamless migration for legacy applications without full infrastructure overhauls. Additionally, quantum-safe encryption is emerging as a priority for RDMA, with post-quantum cryptography integrations in high-speed networks to protect against future quantum threats in AI and edge deployments. As of November 2025, efforts continue to integrate post-quantum algorithms into RDMA protocols for enhanced long-term security.

References

  1. [1]
    RFC 5040 - A Remote Direct Memory Access Protocol Specification
    This document defines a Remote Direct Memory Access Protocol (RDMAP) that operates over the Direct Data Placement Protocol (DDP protocol).
  2. [2]
    InfiniBand - A low-latency, high-bandwidth interconnect
    Jun 27, 2023 · RDMA significantly reduces CPU overhead and latency, making it well-suited for AI and HPC workloads that involve frequent data exchanges between ...
  3. [3]
  4. [4]
    [PDF] What's new – Volume 1 Release 1.8 - InfiniBand Trade Association
    Jul 31, 2024 · Specification update overview. • Volume 1, Release 1.8, published July 31, 2024. • The specification defines InfiniBand and RoCE. • Available ...<|control11|><|separator|>
  5. [5]
    [PDF] Using One-Sided RDMA Reads to Build a Fast, CPU-Efficient Key ...
    Unlike Send/Recv Verbs, RDMA operations are one-sided, since an RDMA operation can com- plete without any knowledge of the remote process.
  6. [6]
    [PDF] Understanding RDMA Microarchitecture Resources for Performance ...
    Apr 17, 2023 · In this paper, we visit one particular hardware device, the. RDMA NIC (RNIC). RDMA offloads the network stack from. OS kernel to NIC hardware to ...
  7. [7]
    [PDF] RDMA programming concepts - OpenFabrics Alliance
    ❖“zero copy” – RDMA transfers data directly from user virtual memory on one node to user virtual memory on another node, TCP copies.Missing: core two-
  8. [8]
    [PDF] Network Support for Remote Direct Memory Access
    What is RDMA? • A (relatively) new method for high-speed inter- machine communication. – new standards. – new ...<|control11|><|separator|>
  9. [9]
    [PDF] Zero Overhead Monitoring for Cloud-native Infrastructure using RDMA
    Jul 13, 2022 · RDMA supports both one-sided and two-sided operations. The one-sided op- erations directly operate on the remote memory via read and write ...
  10. [10]
    [PDF] Comparison of 40G RDMA and Traditional Ethernet Technologies
    RDMA allows for communication between systems but can bypass the overhead associated with the operating system kernel, so applications have reduced latency and ...
  11. [11]
    [PDF] USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS
    • Zero-copy sends, but needs one-copy for receives. • Intel iWarp is aimed at something similar. • Tries to offer RDMA with zero-copy on both sides under the.Missing: concepts | Show results with:concepts
  12. [12]
    [PDF] FileMR: Rethinking RDMA Networking for Scalable Persistent Memory
    Feb 27, 2020 · RDMA gives a client direct access to memory on a remote machine and mediates this access through a memory region abstraction that handles the.
  13. [13]
    [PDF] Arrakis: The Operating System is the Control Plane - USENIX
    Oct 6, 2014 · Remote Direct Memory Access (RDMA) is another popular model for user-level networking [48]. RDMA gives applications the ability to read from ...<|control11|><|separator|>
  14. [14]
    [PDF] Flor: An Open High Performance RDMA Framework Over ... - USENIX
    Jul 12, 2023 · the data, the RNICs generate Completion Queue Elements. (CQEs) into Completion Queues (CQs) as the transmit completion signals for users.<|control11|><|separator|>
  15. [15]
    [PDF] TeRM: Extending RDMA-Attached Memory with SSD - USENIX
    Feb 29, 2024 · Abstract. RDMA-based in-memory storage systems offer high perfor- mance but are restricted by the capacity of physical memory.Missing: context | Show results with:context
  16. [16]
    On using connection-oriented vs. connection-less transport for ...
    InfiniBand provides two transport modes: (i)Connection-oriented Reliable connection (RC) supporting Memory and Channel semantics and (ii) Connection-less ...
  17. [17]
    [PDF] On using Connection-Oriented vs. Connection-Less Transport for ...
    InfiniBand allows for four conduits of message transport: Reliable Connection (RC), Unre- liable Connection, Reliable Datagram (RD) and Unreliable Data- gram ( ...
  18. [18]
    [PDF] When Cloud Storage Meets RDMA - USENIX
    The latency of traditional network stack (e.g., TCP/IP) is generally within hundreds of microseconds [13]. The maximum achievable TCP bandwidth per kernel ...
  19. [19]
    Compaq, Intel and Microsoft Announce Completion of the Virtual ...
    Dec 16, 1997 · The VI Architecture specification provides an industry-standard high-speed cluster communication interface that promises substantial benefits ...Missing: 1999 | Show results with:1999
  20. [20]
    Virtual Interface Architecture - USENIX
    VIA requires that memory used for every data transfer request be registered. ... The VIA specification defines two RDMA operations, RDMA Write and RDMA Read.
  21. [21]
    The IBTA Celebrates 20 Years of Growth and Industry Success
    Aug 27, 2019 · In August 1999, the InfiniBand Trade Association (IBTA) was formed by a group of industry leaders with a plan to develop the InfiniBand ...
  22. [22]
    Mellanox Introduces InfiniBand Server Blade Architecture - HPCwire
    Dec 14, 2001 · In January 2001, Mellanox Technologies delivered the InfiniBridge MT21108, the first 1X/4X InfiniBand device to market, and is now shipping ...
  23. [23]
    Intel, Mellanox drive Infiniband silicon to market - EE Times
    “The first pieces of Infiniband may also show up in late 2001, but because it is an external interface, it will be of little use until all of its ...
  24. [24]
    RFC 5040: A Remote Direct Memory Access Protocol Specification
    RFC 5040 RDMA Protocol Specification October 2007 ; 1. Introduction ; 1.1. Architectural Goals ...
  25. [25]
    Slidecast: IBTA Releases Updated Specification for RoCEv2
    Sep 16, 2014 · Slidecast: IBTA Releases Updated Specification for RoCEv2. September 16, 2014 ... Major cloud providers and Web 2.0 companies have converged on ...
  26. [26]
    nfs-rdma.txt - The Linux Kernel Archives
    The NFS/RDMA client was first included in Linux 2.6.24. The NFS/RDMA server was first included in the following release, Linux 2.6.25. In our testing, we ...
  27. [27]
    Improve performance of a file server with SMB Direct - Microsoft Learn
    Jan 16, 2025 · Windows Server includes a feature called Server Message Block (SMB) Direct, which supports the use of network adapters that have Remote Direct Memory Access ( ...Requirements · Disabling and enabling SMB...
  28. [28]
    [PDF] Configuring PVRDMA in VMware vSphere 6.5 - Lenovo Press
    Paravirtual. RDMA (PVRDMA) is a new device that can emulate RDMA in the ESXi hypervisor. In vSphere, a virtual machine can use a PVRDMA network adapter to ...
  29. [29]
    RDMA Networks Are a Key Enabler to AI/ML Deployments, RDMA ...
    Jun 25, 2024 · RDMA Switching, which has grown from a $1B market to one that will exceed $18B in 2028, shows the importance of technology as a critical enabler of AI/ML.
  30. [30]
    Broadcom Announces Production Availability of Industry's First 100G ...
    Aug 7, 2018 · Broadcom Announces Production Availability of Industry's First 100G Programmable Storage Adapter with NVMe-oF Support. Highly-optimized Stingray ...
  31. [31]
    Intel Reveals Details for Future High-Performance Computing ...
    Nov 17, 2014 · Intel Omni-Path Architecture page: www.intel.com/omnipath; Intel ... Released Nov 17, 2014 • 9:05 AM EST. Email Alerts · Tear Sheet ...
  32. [32]
    [PDF] Introduction to InfiniBand™ - Networking
    The fabric topology of InfiniBand allows communication to be simplified between storage and server. Removal of the Fibre Channel network allows servers to ...
  33. [33]
    New InfiniBand® and RoCE Specification Introduces Memory ...
    Aug 18, 2021 · Data persistence is guaranteed using RDMA and the new MPE operations via InfiniBand or RoCE interconnects. This represents a significant ...
  34. [34]
    [PDF] InfiniBand FAQ - Networking
    Dec 22, 2014 · 19. How Does Credit-based Flow Control Work? Flow control is used to manage the flow of data between links in order to guarantee a lossless.
  35. [35]
    New InfiniBand Specification Enables Data Centers with Enhanced ...
    Oct 2, 2023 · The new 1.6 specification enables the build out of 256-port top-of-rack (ToR) switches, up from 64-ports.
  36. [36]
    RDMA over Converged Ethernet (RoCE) - NVIDIA Docs
    Oct 23, 2023 · RDMA over Converged Ethernet (RoCE) is a mechanism to provide this efficient data transfer with very low latencies on lossless Ethernet networks.Missing: 2009 2014
  37. [37]
    IBTA Announces New RoCE Specification
    Mar 19, 2018 · Big news! The IBTA today announced the updated specification for Remote Direct Memory Access (RDMA) over Converged Ethernet (RoCE), RoCEv2.Missing: v1 | Show results with:v1
  38. [38]
    RoCE and InfiniBand: Which should I choose?
    Mar 19, 2018 · RoCE and InfiniBand both offer many of the features of RDMA, but there is a fundamental difference between an RDMA fabric built on Ethernet ...
  39. [39]
    Comparison of RDMA Technologies - NVIDIA Docs
    May 23, 2023 · The key difference is that RDMA pro- vides a messaging service which applications can use to directly access the virtual memory on remote ...Missing: local DMA
  40. [40]
    InfiniBand Trade Association Releases Updated Specification for ...
    Sep 16, 2014 · “IBM is in strong support of the IBTA RoCEv2 standard. RoCEv2 is a key network technology enabling high performance commercial cluster solutions ...
  41. [41]
    RFC 5041 - Direct Data Placement over Reliable Transports
    The Direct Data Placement protocol provides information to Place the incoming data directly into an upper layer protocol's receive buffer without intermediate ...
  42. [42]
    Ultra Ethernet Consortium
    Delivering an Ethernet based open, interoperable, high performance, full-communications stack architecture to meet the growing network demands of AI & HPC ...Ultra Ethernet Specification... · Blog · Membership · Working GroupsMissing: RDMA 2023
  43. [43]
    Ultra Ethernet Consortium Releases Specification - HPCwire
    Jun 11, 2025 · Modern RDMA for Ethernet and IP: Supporting intelligent, low-latency transport for high-throughput environments. Open Standards and ...
  44. [44]
    Software-based RoCE, A New Way to Experience RDMA
    Nov 11, 2015 · Soft-RoCE, a software implementation of the RDMA transport that offers unique advantages when deploying RDMA technology and meets the demand for efficient data ...Missing: emulation | Show results with:emulation
  45. [45]
    RDMA over Converged Ethernet (RoCE) - NVIDIA Docs
    Oct 23, 2023 · Soft RoCE driver implements the InfiniBand RDMA transport over the Linux network stack. It enables a system with standard Ethernet adapter to ...Missing: emulation | Show results with:emulation
  46. [46]
    [PDF] 23.07.12-UEC-1.0-Overview-FINAL-WITH-LOGO.pdf
    Ultra Ethernet Consortium's members believe it is time to start afresh and replace the legacy. RoCE protocol with Ultra Ethernet Transport, a modern transport ...
  47. [47]
    Key Concepts - NVIDIA Docs
    May 23, 2023 · A Completion Queue is a mechanism to notify the application about information of ended Work Requests (status, opcode, size, source).Missing: principles semantics
  48. [48]
    RDMA Verbs API - NVIDIA Docs
    This chapter describes the details of the RDMA verbs API. © Copyright 2023, NVIDIA. Last updated on May 23, 2023.
  49. [49]
    [PDF] Deconstructing RDMA-enabled Distributed Transactions: Hybrid is ...
    The client starts an. RDMA request by posting the requests (called Verbs) to the sender queue, which can either be one-sided or two- sided verbs. The client ...
  50. [50]
    [PDF] Correct, Fast Remote Persistence - arXiv
    Sep 4, 2019 · The “one-sided” RDMA operations – RDMA. READ and RDMA WRITE – do not require any participation from the responder's CPU; these operations read ...
  51. [51]
    [PDF] Efficient Wide Area Data Transfer Protocols for 100 Gbps Networks ...
    Nov 17, 2013 · RDMA-based protocols are a well-proven data center technology offering high performance and efficiency, that can also be utilized in wide-area ...
  52. [52]
    [PDF] Design Guidelines for High Performance RDMA Systems | USENIX
    Jun 22, 2016 · RDMA networks usually provide high band- width and low latency: NICs with 100 Gbps of per-port bandwidth and ∼ 2μs round-trip latency are ...<|control11|><|separator|>
  53. [53]
    Libibverbs library - IBM
    The Libibverbs library enables user-space processes to use Remote Direct Memory Access (RDMA) verbs.
  54. [54]
    Programming Examples Using IBV Verbs - NVIDIA Docs
    Apr 24, 2024 · This code demonstrates how to perform the following operations using the VPI Verbs API: Send, Receive, RDMA Read, RDMA Write.
  55. [55]
    Events - NVIDIA Docs
    May 23, 2023 · This event is generated when an error occurs on a Queue Pair (QP) which prevents the generation of completions while accessing or processing the Work Queue.
  56. [56]
    Userspace verbs access - The Linux Kernel documentation
    The ib_uverbs module, built by enabling CONFIG_INFINIBAND_USER_VERBS, enables direct userspace access to IB hardware via “verbs,” as described in chapter 11.
  57. [57]
  58. [58]
    Overview of Network Direct Kernel Provider Interface (NDKPI)
    Dec 14, 2021 · The Network Direct Kernel Provider Interface (NDKPI) is an extension to NDIS that allows IHVs to provide kernel-mode Remote Direct Memory Access (RDMA) support ...
  59. [59]
    [PDF] RDMA Aware Networks Programming User Manual | NVIDIA Docs
    Mar 3, 2011 · There are three types of QP: UD Unreliable Datagram,. Unreliable Connection, and Reliable Connection. RC (Reliable Connection). A QP Transport ...
  60. [60]
    InfiniBand Network Architecture - Nak Errors - O'Reilly
    Remote Operational Error Nak. Results in an error completion and may not be retried. Receiver Not Ready (RNR) Nak. May be retried ...
  61. [61]
    [PDF] High Performance Pipelined Process Migration with RDMA
    InfiniBand em- powers many of today's Top500 Super Computers [18]. MVAPICH2 [13] is an open source MPI-2 imple- mentation using InfiniBand, iWARP and other RDMA ...
  62. [62]
    [PDF] MVAPICH2-GDR: Pushing the Frontier of HPC and Deep Learning
    High Performance open-source MPI Library for InfiniBand, Omni-Path, Ethernet/iWARP, and RDMA over Converged Ethernet (RoCE).
  63. [63]
    InfiniBand Fends Off Supercomputing Challengers - TOP500
    Dec 12, 2017 · InfiniBand still rules the roost when it comes to true supercomputing performance. We run the numbers and show how InfiniBand still dominates the top ...
  64. [64]
    InfiniBand and RoCE Advances Further in the TOP500 November ...
    Dec 4, 2024 · Offering substantial bandwidth and exceptional efficiency, RDMA-based networking not only reduces total ownership costs but also accelerates ...<|separator|>
  65. [65]
    NVMe Over Fabrics – Part Two - NVM Express
    NVMeTM Over Fabrics replaces the PCIe transport with a fabric technology such as RDMA or Fibre Channel (FC) fabric as shown in Figure 3. Transports for RDMA ...
  66. [66]
    [PDF] NVM Express NVMe over RDMA Transport Specification, Revision 1.2
    Jul 30, 2025 · The diagram in Figure 2 illustrates the layering of the. RDMA transport and common RDMA providers (iWARP, InfiniBand™, and RoCE) within the host ...<|separator|>
  67. [67]
    [PDF] NVMe™/TCP vs. RDMA with RoCEV2 - Western Digital
    NVMe-oF RDMA uses RDMA protocols, such as. InfiniBand or RDMA over Converged Ethernet (RoCE), to enable high-performance, low-latency access to NVMe storage ...
  68. [68]
    Introduction to Lustre - Lustre Wiki
    May 2, 2024 · LNet allows for full RDMA throughput and zero copy communications when available. LNet supports routing, which provides maximum flexibility ...
  69. [69]
    [PDF] Lustre File System: High-Performance Storage Architecture and ...
    Hand in hand with aggregating file system capacity with many servers, I/O throughput is also aggregated and scales with addi-.
  70. [70]
    Lustre Unveiled: Evolution, Design, Advancements, and Current ...
    Jun 18, 2025 · ... throughput of hundreds of GB/sec. Many HPC facilities used Lustre as a global filesystem, serving multiple clusters at an impressive scale.
  71. [71]
    [PDF] FPGA implementation of RDMA for ATLAS Readout with FELIX at ...
    RDMA communication is implemented using software on both end of the links. Exploring opportunities to improve data throughput as part of the High Luminosity LHC ...
  72. [72]
    [PDF] TECHNICAL PROPOSAL SND@HL-LHC Scattering and Neutrino ...
    Mar 6, 2025 · RDMA transfers between the readout servers (BU and DS) and the event buffer, ensuring ef- ficient data movement. The event buffer is ...
  73. [73]
    [PDF] FaRM: Fast Remote Memory - USENIX
    Apr 2, 2014 · FaRM's RDMA-based messaging achieves a request rate between 11x and 9x higher than TCP/IP for request sizes between 16 and 512 bytes, which are ...
  74. [74]
    [PDF] RDMA - IP Core for RoCE v2 at 100/200Gbps - BittWare
    Jan 9, 2025 · The IP core also provides a low-latency FPGA implementation of RoCE v2 at 200Gbs or 100Gbps throughput.
  75. [75]
    [PDF] Bringing Zero-Copy RDMA to Database Systems - VLDB Endowment
    Our experiments show that the network throughput can increase from 18 Gbps per CPU core to up to 98 Gbps (on a 100 Gbps card) with virtually zero CPU usage ...
  76. [76]
  77. [77]
    RDMA Over Converged Ethernet (RoCE) - NVIDIA Docs
    Sep 8, 2023 · How To Configure RoCE Over a Lossless Fabric (PFC+ECN) End-to-End Using ConnectX-4 and Spectrum (Trust L2) · How To Run RoCE Over L2 Enabled ...Missing: complexity | Show results with:complexity
  78. [78]
    Choosing the Right NIC: Standard, TOE, RDMA or DPU for Storage
    Both RoCE and iWARP require specialized RDMA NICs (rNICs) with the hardware logic for RDMA. ... Cost: DPUs are more expensive than traditional or even RDMA NICs.Missing: dependency | Show results with:dependency<|separator|>
  79. [79]
    Using PFC and ECN queuing methods to create lossless fabrics for ...
    Jun 25, 2024 · This article will explain advanced queueing solutions used by all the major OEMs in the Network Operating Systems (NOS) that support ECN and PFC.
  80. [80]
    How to configure RDMA in win 10 enterprise - Intel Community
    May 14, 2025 · We have checked this further with our team and unfortunately RDMA is not supported on windows 10 for E810 series ethernet controller I'm afraid.
  81. [81]
    [PDF] Collie: Finding Performance Anomalies in RDMA Subsystems
    Apr 6, 2022 · We design and implement Collie, a tool for users to systematically uncover performance anomalies in RDMA subsystems without the need to access ...
  82. [82]
  83. [83]
    [PDF] RDMA over Commodity Ethernet at Scale - Microsoft
    RoCEv2 supports RDMA over Ethernet instead of Infiniband. Unlike TCP, RDMA needs a lossless network; i.e. there must be no packet loss due to buffer overflow ...Missing: migration | Show results with:migration
  84. [84]
    [PDF] MigrOS: Transparent Operating Systems Live Migration ... - arXiv
    Oct 23, 2020 · MigrOS is an OS-level architecture for transparent live migration of RDMA-applications, addressing difficulties with RDMA networks and OS ...
  85. [85]
    [PDF] Security Threats and Opportunities in One-Sided Network ...
    The most interesting high-level insight is that one-sided communication is a double-edged sword in security: it can cause security threats and offer op- ... RDMA/ ...
  86. [86]
    ReDMArk: Bypassing RDMA Security Mechanisms - USENIX
    ReDMArk shows that current security mechanisms of IB-based architectures are insufficient against both in-network attackers and attackers located on end hosts.Missing: risks | Show results with:risks
  87. [87]
    [PDF] Bypassing RDMA Security Mechanisms - ReDMArk - USENIX
    Aug 13, 2021 · RDMA can also enable one-sided operations, where the CPU at the target node is not notified of incoming RDMA requests. Even though several ...Missing: zero- | Show results with:zero-
  88. [88]
    [PDF] RAGNAR: Exploring Volatile-Channel Vulnerabilities on RDMA NIC
    Security Issues on RDMA. RDMA introduces variant security issues. We categorize these issues from two perspectives: SW/HW, RDMA-targeted/related. • RDMA ...Missing: risks | Show results with:risks
  89. [89]
    [PDF] LoRDMA: A New Low-Rate DoS Attack in RDMA Networks
    Feb 26, 2024 · In this paper, we investigate the security of traffic control mechanisms in RDMA networks with extensive experiments and theoretical analysis.<|separator|>
  90. [90]
    RFC 8166 - Remote Direct Memory Access Transport for Remote ...
    1. Protection Domains The use of Protection Domains to limit the exposure of memory regions to a single connection is critical. · 2. Handle Predictability ...
  91. [91]
    [PDF] Bedrock: Programmable Network Support for Secure RDMA Systems
    RoCEv2. A widely used implementation is “RDMA over. Converged Ethernet Version 2” (RoCEv2) [15], where RDMA packets are carried by Ethernet frames over ...
  92. [92]
    Secure Boot - NVIDIA Docs
    May 9, 2024 · Secure boot is a process which verifies each element in the boot process prior to execution, and halts or enters a special state if a ...Missing: NICs | Show results with:NICs
  93. [93]
    draft-ietf-rddp-rdma-concerns-01 - DDP and RDMA Concerns
    This draft describes technical concerns that should be considered in the design of standardized RDMA and DDP protocols/mechanisms for use with Internet ...
  94. [94]
    Alibaba Stellar: A New Generation RDMA Network for Cloud AI
    Aug 27, 2025 · Stellar introduces three key innovations: Para-Virtualized Direct Memory Access (PVDMA) for on-demand memory pinning, extended Memory ...
  95. [95]
    An RDMA-First Object Storage System with SmartNIC Offload - arXiv
    We present ROS2, an RDMA-first object storage system design that offloads the DAOS client to an NVIDIA BlueField-3 SmartNIC while leaving the ...
  96. [96]
    [PDF] A Scalable Reliable Connection for RDMA with Decoupled QPs and ...
    Jun 24, 2025 · RDMA supports both connection-oriented and datagram-based communications in reliable and unreliable mode, including Reliable Connection (RC), ...
  97. [97]
    Remote direct memory access (rdma) in next generation cellular ...
    Various embodiments herein are directed to enabling remote direct memory access (RDMA) between a user equipment (UE) and a cellular network.
  98. [98]
    [PDF] 23.08.10 UEC Overview Presentation - Ultra Ethernet Consortium
    ETHERNET IS THE WAY. Why? • Open / Multivendor: Switches, NICs, cables, optics, tools, software. • Scalable: Addressing and routing for rack-, building-, ...
  99. [99]
    Accurate and fast congestion feedback in MEC-enabled RDMA ...
    Mar 25, 2024 · The interconnection of edge servers forms small-scale data centers, enabling MEC to provide low-latency network services for mobile users.
  100. [100]
    Unlocking High-Performance AI/ML in Kubernetes with DRANet and ...
    Jul 15, 2025 · DraNet is a native integration with Kubernetes that uses the core Dynamic Resource Allocation (DRA) API to address these challenges by treating ...
  101. [101]
    Introduction to NVIDIA's AI/ML GPU networking solutions - WWT
    Aug 5, 2024 · The current Infiniband and Spectrum-X lines of switches, cables, and DPUs show roadmaps out to 800 Gbps and 1.6 Tbps by 2026 and 3.2 Tbps by ...
  102. [102]
    Part 2: SMC-R: A hybrid solution of TCP and RDMA - Alibaba Cloud
    Jun 15, 2022 · And SMC-R is further compatible with the socket interface while providing RDMA services, which can improve network performance for TCP ...Missing: adoption | Show results with:adoption
  103. [103]
    Quantum-safe networks explained - Ericsson
    Quantum-safe networks are designed to resist the most powerful future quantum threats. Get the latest insights and explore what it means for today's ...Missing: RDMA | Show results with:RDMA