Fact-checked by Grok 2 weeks ago

Zero-copy

Zero-copy is a performance optimization technique in computer operating systems and networking that eliminates or minimizes redundant data copying between user space and kernel space during input/output (I/O) operations, allowing data to be transferred directly from sources like disk storage to destinations such as network sockets.^[1] This approach reduces the number of context switches between user and kernel modes and conserves CPU cycles and memory bandwidth by avoiding intermediate buffer copies.^[1] In traditional I/O workflows, data undergoes multiple copies—for instance, direct memory access (DMA) transfers data from disk to a kernel buffer, then to a user-space buffer, back to a kernel socket buffer, and finally to the network protocol engine—resulting in up to four copies and four context switches per transfer.^[1] Zero-copy mechanisms address this inefficiency by mapping or exchanging buffers directly; for example, the Linux sendfile() system call enables file data in the kernel's page cache to be sent over a socket without user-space involvement, reducing operations to two context switches and potentially zero CPU-initiated copies.^[2] Similarly, the splice() system call moves data between file descriptors via kernel pipes without user-space copying, supporting applications like web servers for efficient static content delivery.^[2] Key benefits include significant throughput improvements and lower resource utilization; benchmarks show zero-copy transfers completing up to 65% faster than traditional methods for large files, such as reducing the time to send a 1 GB file from 18,399 ms to 8,537 ms.^[1] Early frameworks, like the zero-copy I/O extensions developed for UNIX systems such as Solaris in the mid-1990s, demonstrated over 40% gains in network throughput for file transfers and more than 20% reductions in CPU usage.^[3] Modern implementations extend this to languages like Java via the FileChannel.transferTo() method, which leverages underlying OS zero-copy primitives for non-blocking I/O.^[1] While primarily used in networking and file I/O, zero-copy principles also apply to specialized domains like GPU computing^[4] and persistent memory systems.^[5]

Fundamentals

Definition and Motivation

Zero-copy refers to a class of techniques in computer systems that enable the transfer of data between memory buffers or from a source to a destination without requiring intermediate copies mediated by the CPU, thereby minimizing unnecessary data movement and overhead.^[1] These methods leverage hardware and software mechanisms to pass references to data locations rather than duplicating the content, allowing direct access by the involved components.^[1] The concept emerged prominently in the 1990s as a response to performance bottlenecks in high-throughput applications, such as web servers serving static content, where traditional input/output (I/O) operations incurred significant inefficiencies due to multiple data copies across kernel and user space boundaries.^[1] In a classic file-to-network transfer scenario, data would typically undergo four copies: DMA from disk to kernel buffer, kernel buffer to user buffer, user buffer to kernel socket buffer, and DMA from kernel socket buffer to the network interface card or protocol engine.^[1] This multiplicity of copies, combined with frequent context switches between user and kernel modes, strained CPU resources and memory bandwidth in increasingly data-intensive environments.^[1] By eliminating these redundant operations, zero-copy techniques deliver key benefits, including reduced CPU cycles dedicated to data movement, lower memory bandwidth consumption, and decreased overall latency in I/O paths.^[1] For instance, in kernel-to-user transfers, the number of context switches can be reduced from four to two, allowing the system to handle higher data rates with less overhead.^[1] Early hardware implementations of such principles date back to the 1960s, exemplified by the IBM OS/360's channel subsystem, which enabled programs to instruct direct data transfers between files or devices and main storage without ongoing CPU intervention, offloading I/O processing to autonomous channels that fetched command words and managed transfers independently.^[6]

Core Principles

Zero-copy techniques operate within the context of distinct memory address spaces in modern operating systems, where user-space applications are isolated from the kernel-space to ensure security and stability. User-space memory is directly accessible to applications but not to kernel components without explicit mediation, while kernel-space handles privileged operations like I/O. Crossing this boundary traditionally requires data copying to prevent unauthorized access, but this incurs significant overhead from context switches—transitions between user and kernel modes that involve saving and restoring processor states, potentially taking thousands of CPU cycles each.^[7]^[1] The operational scope of zero-copy encompasses data exchanges between user-space processes, inter-process communications, and user-kernel interactions, contrasting sharply with traditional copy-based I/O where data is duplicated multiple times across buffers. In copy-based methods, data from an I/O device (e.g., disk) is first loaded into a kernel buffer via direct memory access (DMA), then copied to a user buffer upon a system call, and later copied back to a kernel socket buffer for transmission—resulting in up to four copies and multiple context switches per operation. Zero-copy addresses this by enabling direct access to data without redundant duplications, particularly beneficial in data-intensive scenarios like network transfers or file serving, where traditional approaches can consume 20-40% of CPU cycles on copying alone.^[3]^[1] At its core, zero-copy relies on pointer passing or descriptor-based transfers, allowing the kernel or hardware to reference user-provided buffers directly rather than copying their contents. For instance, techniques like memory mapping (mmap) share virtual memory pages between user and kernel spaces, enabling the kernel to access user buffers via page tables without data movement. Descriptor-based methods, such as buffer exchanges, pass ownership of data structures (e.g., fast buffers) between spaces, minimizing CPU involvement in transfers. This shifts the burden to hardware or optimized kernel paths, reducing copies to as few as two (device to kernel, kernel to device) while limiting context switches to two per I/O cycle.^[3]^[1] To illustrate the flows: Traditional Copy-Based Flow (e.g., file read to network send):

DMA transfers data from disk to kernel buffer (1 copy).
System call copies data from kernel buffer to user buffer (2nd copy, 1st context switch pair).
Application processes data in user space.
Another system call copies data from user buffer to kernel socket buffer (3rd copy, 2nd context switch pair).
DMA transfers from kernel socket buffer to network interface card (4th copy overall).

Zero-Copy Flow (e.g., using sendfile or mmap):

DMA transfers data from disk directly to kernel buffer or mapped user view (no initial copy to user).
Kernel references the buffer (via pointer or descriptor) and performs any necessary processing without user-space involvement.
DMA transfers from the same kernel-referenced buffer to network interface (eliminates 2-3 intermediate copies, only 1-2 context switches).

These mechanisms can improve throughput by over 40% and reduce CPU utilization significantly in high-volume I/O workloads.^[3]^[1]

Hardware Support

Direct Memory Access

Direct Memory Access (DMA) is a hardware mechanism that enables input/output (I/O) devices, such as network interface cards (NICs) or storage controllers, to transfer data directly to or from system memory without requiring continuous central processing unit (CPU) intervention.^[8] A dedicated DMA controller manages these transfers by arbitrating access to the memory bus, allowing the CPU to perform other tasks concurrently.^[9] This contrasts with programmed I/O (PIO), where the CPU actively polls the device status and handles each byte or block of data transfer, resulting in high CPU overhead and no zero-copy capability due to the necessary data movement under CPU control.^[10] In the context of zero-copy techniques, DMA plays a pivotal role by permitting the operating system kernel to configure the DMA controller to read from or write to user-space buffers directly, thereby eliminating intermediate CPU-mediated copies between kernel and user memory spaces.^[11] For instance, a NIC's DMA engine can pull data straight from an application's buffer in user memory for transmission over the network, passing only buffer descriptors rather than copying the data payload.^[12] This hardware-level bypass ensures that data remains in its original location, supporting efficient protocol stack processing where pointers to buffers are exchanged instead of duplicating contents.^[11] A significant advancement in DMA was provided by the IBM System/360 architecture introduced in 1964, where channel programs enabled I/O devices to execute sequences of control words for data transfers to main memory with minimal CPU involvement, laying the groundwork for zero-copy I/O operations.^[13] These early channels operated as specialized processors optimized for high-speed, concurrent data movement, achieving rates up to 5,000,000 characters per second without tying up the CPU.^[13] Over time, DMA evolved with bus standards like PCI Express (PCIe), where modern devices use DMA engines integrated into the PCIe fabric to perform peer-to-peer transfers between peripherals and host memory, further enhancing zero-copy efficiency in contemporary systems.^[14] A key advancement in DMA for zero-copy is scatter-gather functionality, which allows the controller to handle non-contiguous memory buffers by processing a list of descriptor entries specifying scattered source or destination locations.^[15] This mitigates issues from memory fragmentation, where user buffers may not be physically contiguous, enabling direct DMA operations on disparate pages without requiring the kernel to consolidate them via CPU copies.^[16] In practice, the kernel provides the DMA engine with a gather list for transmission or a scatter list for reception, ensuring seamless zero-copy transfers even for fragmented application data.^[16]

Memory Management Techniques

The Memory Management Unit (MMU) plays a central role in enabling zero-copy operations by facilitating virtual-to-physical address translation through page tables, which allow multiple processes or components to access the same physical memory pages without duplicating data. Page tables maintain mappings that associate virtual addresses in a process's address space with physical memory locations, ensuring that shared regions can be referenced directly rather than copied. This mechanism supports efficient memory sharing across address spaces, as the MMU hardware resolves translations on-the-fly, avoiding the need for explicit data movement.^[17] Key techniques leveraging the MMU for zero-copy include memory-mapped files and copy-on-write mechanisms. Memory-mapped files map file contents directly into a process's virtual address space via page table entries, enabling file I/O operations to treat disk data as in-memory objects without intermediate buffering or copying. This approach integrates seamlessly with the MMU's translation process, allowing read and write access through standard memory instructions while the kernel handles paging from storage to physical memory as needed. Complementing this, copy-on-write (COW) enables inter-process memory sharing by initially mapping multiple processes to the same physical pages; modifications trigger a copy only for the affected page, preserving the original shared state without upfront duplication. COW relies on MMU protections to detect writes and update page tables dynamically, minimizing overhead in scenarios like process forking or snapshotting. An extension of MMU functionality critical for secure zero-copy is the Input-Output Memory Management Unit (IOMMU), which provides address translation and memory protection for DMA transfers. The IOMMU allows I/O devices to access user-space memory directly while enforcing isolation, preventing unauthorized access and enabling zero-copy I/O without compromising system security.^[18] Advanced applications extend these principles to heterogeneous computing environments, such as the Heterogeneous System Architecture (HSA), introduced in 2012 by the HSA Foundation, with founding members including AMD.^[19] HSA provides a unified virtual address space across CPU and GPU, allowing both processors to access shared memory regions through coherent page mappings without explicit copies or synchronization barriers. This architecture uses MMU extensions and system-level memory coherence to enable seamless data sharing, reducing latency in compute-intensive workloads. Despite these benefits, MMU-based zero-copy techniques introduce trade-offs, particularly around security and resource management. Page pinning, often required to prevent swapping of shared pages during direct access (e.g., integrating with DMA for I/O), can lead to denial-of-service risks by exhausting swappable memory if overused, as pinned pages remain locked in physical RAM. Additionally, granting the kernel direct access to user-space pages for mapping or sharing raises security concerns, as vulnerabilities in translation or pinning logic could expose sensitive data or enable unauthorized modifications across protection domains.^[20]^[21]

Software Implementations

Operating System Interfaces

In Linux, the sendfile() system call enables efficient transfer of data from a file descriptor to a socket descriptor entirely within kernel space, avoiding user-space copies and leveraging direct memory access (DMA) for hardware support. Introduced in kernel version 2.2 (January 1999), it supports file-to-socket operations and was designed to optimize network servers like web proxies by reducing CPU overhead.^[22] The splice() system call, added in kernel 2.6.17 (June 2006), extends zero-copy capabilities to pipe-based data movement between arbitrary file descriptors, such as pipes or sockets, by using kernel buffers without user-space intervention.^[23] More recently, io_uring, introduced in kernel 5.1 (May 2019), provides an asynchronous interface for zero-copy I/O operations, including ring buffers shared between user and kernel space to minimize copies for both reads and writes.^[24] Among other Unix-like systems, FreeBSD implements sendfile() since version 3.0 (October 1998), which transfers file data directly to a socket using kernel-side copying and supports optional headers/trailers for enhanced flexibility in network applications. NetBSD achieves similar zero-copy effects through mmap() combined with write(), mapping files into user space for direct transmission without additional buffering.^[25] Solaris introduced zero-copy networking in the mid-1990s via system calls like sendfile(), sendfilev(), and write()/writev() with mmap(), utilizing virtual memory remapping and hardware checksumming to eliminate data copies in TCP transmissions.^[26] macOS, inheriting from BSD, provides sendfile() for sending files over sockets without user-space involvement, ensuring compatibility with high-performance I/O patterns.^[27] On Windows, the TransmitFile() function in the Winsock API, available since Windows 2000 (February 2000), facilitates zero-copy transmission of file data over connected sockets by relying on the kernel's cache manager to handle retrieval and sending without user-mode copies.^[28]

Applications

Networking Protocols

In networking protocols, zero-copy techniques optimize data transfer by minimizing or eliminating redundant memory copies between user space and kernel space, or across network stack layers, thereby reducing latency and CPU overhead. A prominent example is the use of the sendfile system call in TCP/IP sockets, which enables direct transfer of data from a file descriptor to a socket descriptor without intermediate buffering in user space. This mechanism is particularly beneficial for web servers serving static content, as it leverages kernel-level operations to bypass traditional read-write cycles. For instance, Apache HTTP Server supports EnableSendfile, which activates the kernel's sendfile support to deliver files directly from the filesystem to the network interface, improving efficiency for high-throughput scenarios. Similarly, Nginx employs the sendfile directive to achieve zero-copy transfers, reducing context switches and memory usage during static file delivery, which contributes to its performance in handling concurrent connections.^[22]^[29]^[30] Remote Direct Memory Access (RDMA) represents a hardware-accelerated zero-copy approach in networking, allowing direct data movement between the memory of two networked hosts without involving the CPU or operating system kernel on either end. RDMA operates over protocols like InfiniBand, introduced in the early 2000s, and its Ethernet-compatible variant, RDMA over Converged Ethernet (RoCE), which bypasses the traditional TCP/IP stack to enable kernel-bypass transfers. This is widely adopted in high-performance computing (HPC) environments, where RDMA-based Message Passing Interface (MPI) implementations over InfiniBand use zero-copy protocols for large-message transfers, achieving low-latency communication critical for parallel applications. For example, RDMA Write and Read operations pin buffers in memory and transfer data directly, eliminating copies and supporting scalable cluster interconnects in supercomputing.^[31]^[32] Other modern protocols incorporate zero-copy elements to enhance efficiency. HTTP/2's binary framing layer decomposes messages into frames for multiplexing over a single connection, facilitating implementations that avoid unnecessary data copies by processing headers and payloads in contiguous buffers. This framing supports zero-copy optimizations in servers, where data can be handed off directly to the network stack without reformatting overhead. Similarly, QUIC, the transport protocol underlying HTTP/3, offers stream-based zero-copy potential through its multiplexed streams over UDP, allowing applications to read and write data without intermediate buffering via APIs like those in LSQUIC, which skip kernel-user copies for stream processing. These features enable efficient handling of concurrent flows, such as in web applications requiring low-latency responses.^[33]^[34] In practice, zero-copy networking reduces latency in latency-sensitive domains through direct buffer handoff. For high-frequency trading, kernel-bypass techniques with zero-copy RDMA or user-space stacks minimize data movement delays in market data ingestion and order execution, enabling microsecond-scale responses over low-latency networks. In video streaming, protocols like those enhanced with zero-copy I/O in encrypted stacks allow direct DMA transfers of frames to the network, avoiding CPU copies and supporting high-throughput delivery without buffering artifacts. These examples highlight how zero-copy integrates with operating system interfaces, such as splice for pipe-based transfers, to streamline end-to-end data paths.^[35]^[36]

File and Storage I/O

In file and storage I/O, zero-copy techniques eliminate unnecessary data copies between user space and kernel space, enabling direct access to storage contents and reducing CPU overhead for high-throughput operations. A foundational mechanism is the mmap() system call in Unix-like systems, which maps a file or device into a process's virtual address space, allowing applications to read and write data via memory operations rather than explicit read() or write() calls that involve kernel-user buffer transfers. This mapping leverages the kernel's page cache, where file data is already buffered, permitting shared access without additional copying; changes to the mapped region are transparently propagated back to the file upon synchronization.^[37] Databases exemplify mmap's utility in zero-copy file I/O. In SQLite, memory-mapped I/O is enabled through the PRAGMA mmap_size directive, which specifies the database file portion to map, providing direct pointers to pages via the xFetch() VFS method and bypassing the xRead() copy path. This results in faster query performance for I/O-bound workloads by sharing pages with the OS page cache, though it primarily benefits reads over writes due to transaction safety requirements; for instance, mapping 256 MiB can accelerate sequential scans while conserving application RAM.^[38] For modern storage like SSDs and NVMe devices, Linux's direct I/O mode—activated by the O_DIRECT flag on file descriptors—bypasses the page cache to transfer data directly between user-provided buffers and the underlying storage via bio structures, avoiding both kernel buffering and user-kernel copies. This is particularly effective for large, sequential workloads where cache pollution is undesirable, as it minimizes memory bandwidth usage and enables hardware-accelerated I/O; the iomap_dio_rw() implementation handles alignment constraints (e.g., page-sized offsets) to ensure efficient submission to block devices like NVMe controllers.^[39] Virtualization further extends zero-copy to guest-host file sharing in hypervisors like KVM. The virtiofs shared file system, built on FUSE and integrated since Linux kernel 5.4, enables guests to access host directories with local semantics; its experimental Direct Access (DAX) feature maps file contents from the host's page cache into guest memory windows, supporting zero-copy reads and writes without data duplication across the hypervisor boundary.^[40]

Emerging Use Cases

In cloud and containerized environments, zero-copy techniques optimize data handling during migrations in platforms such as Docker and Kubernetes, particularly for microservices with large working sets. The ZeroCopy framework leverages file system metadata to migrate buffered container data without direct transfers, integrating with Docker 19.03 and Kubernetes via CRIU pre-copy mechanisms, which reduces data transmission by about six times, downtime by 31.8%, and eviction time by 21.5% relative to prior methods like Bonfire and FirepanIF.^[41] This approach minimizes bandwidth consumption and resource contention, enabling seamless scaling in dynamic cloud setups. Technologies like VirtioFS further support zero-copy volume mounts by providing direct filesystem access from containers to host storage, lowering overhead in Kubernetes clusters for persistent data operations.^[42] In AI and machine learning workloads, zero-copy enables efficient tensor data transfers between CPU and GPU, addressing memory bottlenecks in large-scale training. CUDA Unified Memory, with optimizations such as memory advise hints (introduced in CUDA 8.0 in 2016), allows automatic migration of data via virtual addressing, eliminating explicit copies and expanding effective GPU memory.^[43] For instance, in PyTorch-based deep learning, it supports larger mini-batch sizes—from 16 to 32 for BERT-Large models—while reducing training time by 9.4% through prefetching and reduced page faults.^[43] Comparable advancements in Heterogeneous System Architecture (HSA) facilitate zero-copy sharing across heterogeneous processors, enhancing tensor pipelines in multi-device AI systems.^[44] At the edge and in 5G networks, zero-copy implementations deliver low-latency processing for IoT gateways and base stations in real-time data pipelines. eBPF combined with XDP in Linux enables in-kernel packet handling for 5G User Plane Functions (UPF), achieving 10-11 million packets per second on six cores with under 70% CPU usage and less than 0.5% packet loss.^[45] This adaptive framework, as in the upf-bpf project, decouples rules per session to cut injection times by 96% (from 27 ms to 1 ms), supporting high-volume IoT streams in multi-access edge computing without kernel modifications.^[45] Such mechanisms ensure ultra-reliable low-latency communication for 5G enterprise and non-public networks. Recent io_uring developments in Linux 6.x kernels, starting from 2023, bolster asynchronous zero-copy for cloud-native applications through features like zero-copy receive (ZC Rx). This registers userspace buffers with a pinned page pool, integrating with the networking stack via NAPI polling and sk_buff marking, to enable high-throughput paths without data duplication.^[46] In benchmarks on 25 Gbps NICs like Broadcom BCM57504, it yields superior performance in tools such as iperf3 for hybrid control and data planes in microservices.^[46] Enhancements in kernel 6.10 further improve zerocopy send coalescing and bundle support, reducing overhead in scalable I/O scenarios.^[47]

Advantages and Challenges

Performance Benefits

Zero-copy techniques significantly enhance system efficiency by eliminating unnecessary data copies between user and kernel spaces, thereby reducing CPU overhead associated with memory operations. In traditional I/O paradigms, data transfer involves multiple copies—typically two between kernel buffers and user space—which consume substantial CPU cycles for memcpy operations; zero-copy avoids these, potentially reducing CPU utilization by up to 52% in kernel-bypass scenarios for I/O-intensive workloads like database operations.^[48] For instance, benchmarks on web-caching applications such as Memcached demonstrate throughput improvements of up to 41% under high network loads, attributed to minimized copying and cache misses exceeding 10% reduction.^[49] Scalability benefits are particularly evident in high-throughput environments, where zero-copy enables servers to sustain line rates like 10 Gbps without CPU saturation, as the kernel handles data movement directly via DMA, freeing processor resources for application logic. Empirical studies on file serving show speedups of 2–3× compared to conventional read/write methods; for example, transparent zero-copy frameworks achieve up to 2.3× faster performance in production applications by streamlining I/O paths.^[50] Additionally, zero-copy cuts memory usage by avoiding redundant buffers, with reductions up to 40% in embedded systems, which indirectly lowers overall power consumption by up to 30% in resource-constrained mobile and edge devices.^[51] A key efficiency gain lies in fewer context switches: traditional I/O requires at least four switches (two for read and two for write), whereas zero-copy mechanisms like sendfile reduce this to two, halving the overhead and enabling higher concurrency in servers.^[1] These reductions collectively allow systems to handle larger workloads with lower latency, as seen in microbenchmarks where zero-copy yields up to 3.8× speedup for multi-copy I/O patterns.^[48]

Limitations and Trade-offs

While zero-copy techniques offer significant performance advantages in data transfer, they introduce notable security risks by granting direct kernel access to user-space buffers, thereby expanding the attack surface for potential exploits. For instance, the use of pinned pages in zero-copy I/O can lead to temporal vulnerabilities where memory mappings remain exposed longer than necessary due to delayed unmapping for performance reasons, allowing malicious I/O devices to access unintended data regions.^[48]^[52] Additionally, shared memory regions between kernel and user space heighten the risk of data corruption or unauthorized access, particularly from untrusted network interface cards (NICs).^[48] Implementation complexity is another key trade-off, as zero-copy requires meticulously aligned buffers and sophisticated ownership tracking to prevent data races or invalid accesses, placing a substantial management burden on developers. This often necessitates careful error handling for scenarios like buffer misalignment or access conflicts, which can complicate application design and coordination between sender and receiver endpoints.^[48]^[53] Furthermore, zero-copy is generally unsuitable for small or random I/O operations, where the need for precise buffer alignment (e.g., via O_DIRECT flags) may inadvertently require fallback copies, undermining the technique's benefits.^[48]^[53] Compatibility issues limit zero-copy's portability across operating systems, as mechanisms like sendfile or splice are primarily optimized for UNIX-like environments such as Linux and Solaris, often requiring kernel modifications or hardware-specific support that is not universally available.^[54]^[55] In heterogeneous setups, applications may need to detect support and revert to traditional copying, reducing reliability. Overhead from initial setup, such as page locking or pinning, can also negate gains for short transfers; for example, I/O under 16 KB may suffer up to 30% throughput loss due to tracking and fault-handling costs.^[48]^[2]

References

[1]
Efficient data transfer through zero copy - IBM Developer
Jan 26, 2022 · Zero copy greatly improves application performance and reduces the number of context switches between kernel and user mode. The Java class ...Data transfer: The traditional... · Data transfer: The zero-copy...
[2]
Zero-copy networking - LWN.net
Jul 3, 2017 · On the software side, the contents of a file can be transmitted without copying them through user space using the sendfile() system call. That ...
[3]
An Efficient Zero-Copy I/O Framework for UNIX - ACM Digital Library
Traditional UNIX - I/O interfaces are based on copy semantics, where read and write calls transfer data between the kernel and user-defined buffers.
[4]
Zero-copy I/O processing for low-latency GPU computing
This paper presents a zero-copy I/O processing scheme that maps the I/O address space of the system to the virtual address space of the compute device, allowing ...
[5]
[PDF] Systems Reference Library IBM System/360 Principles of Operation
This manual covers the characteristics, functions, and features of the IBM System/360, including operating principles, CPU, instructions, and input/output.
[6]
Chapter 4 Process Address Space - The Linux Kernel Archives
With a process, space is simply reserved in the linear address space by pointing a page table entry to a read-only globally visible page filled with zeros. On ...
[7]
https://www.kernel.org/doc/gorman/html/understand/understand007.html
[8]
[PDF] DMA Fundamentals on Various PC Platforms
A DMA controller can directly access memory and is used to transfer data from one memory location to another, or from an I/O device to memory and vice versa.
[9]
I/O Program Controlled Transfer vs DMA Transfer - GeeksforGeeks
Jul 23, 2025 · Program-controlled I/O requires CPU to monitor and wait for device readiness, while DMA allows direct memory access with less CPU intervention.
[10]
Direct Memory Access - an overview | ScienceDirect Topics
DMA enables zero-copy techniques in operating systems, where packets remain in a fixed memory location and only pointers are passed among protocol ...
[11]
Direct Memory Access - OS X and iOS Kernel Programming [Book]
DMA also allows for so-called zero-copy, in that we can transfer memory from a user space buffer directly to a device without any memory copies. PCI doesn't ...
[12]
[PDF] Architecture of the IBM System/360 - People @EECS
An input/output system offering new degrees of concurrent opercation, compatible chcnnel opercation, data rates capproaching 5,000,000 characters/second, ...
[13]
1.4.1. PCI Express and DMA - Intel
Jun 5, 2025 · PCI Express is a serial bus. DMA enables data transfers between host and device memory without processor involvement, offloading the processor.<|separator|>
[14]
Scatter-Gather or Contiguous? Understanding DMA Types - WinDriver
Both contiguous and scatter-gather DMA (also: SG) are techniques used in PCIe driver development to efficiently transfer data between a device and system memory ...
[15]
[PDF] Making In-Memory Databases Fast on Modern NICs - · Ryan Stutsman
The results show that zero-copy always reduces the CPU load of the server, and, as expected, the effect is larger for larger records. With 1024 B records, the ...
[16]
Page Tables - The Linux Kernel documentation
Page tables map virtual addresses as seen by the CPU into physical addresses as seen on the external memory bus. Linux defines page tables as a hierarchy.
[17]
MSG_ZEROCOPY — The Linux Kernel documentation
Copy avoidance is not a free lunch. As implemented, with page pinning, it replaces per byte copy cost with page accounting and completion notification overhead.Missing: implications | Show results with:implications
[18]
Page pinning and filesystems - LWN.net
May 10, 2022 · Page pinning operates on pages outside the filesystem's purview, causing issues when contents change unexpectedly. Filesystems are generally ...
[19]
sendfile(2) - Linux manual page - man7.org
... Linux 2.6.x kernel series, but was restored in Linux 2.6.33. The original Linux sendfile() system call was not designed to handle large file offsets.Missing: introduction | Show results with:introduction
[20]
splice(2) - Linux manual page - man7.org
The three system calls splice(), vmsplice(2), and tee(2), provide user-space programs with full control over an arbitrary kernel buffer, implemented within ...
[21]
io_uring(7) - Linux manual page - man7.org
io_uring performance Because of the shared ring buffers between kernel and user space, io_uring can be a zero-copy system. Copying buffers to and from ...
[22]
Operating Systems II -- Virtual memory management in NetBSD
Allows to mmap() + write() a file to achieve the Linux sendfile(). Useful resources. The NetBSD kernel sources are in ~vincent/NetBSD-src/sys/ . Chuck ...
[23]
Zero-Copy TCP in Solaris - USENIX
This paper describes a new feature in Solaris that uses virtual memory remapping combined with checksumming support from the networking hardware.Missing: APIs | Show results with:APIs
[24]
Mac OS X Manual Page For sendfile(2) - Apple Developer
The sendfile() system call sends a regular file specified by descriptor fd out a stream socket specified by descriptor s.
[25]
TransmitFile function (winsock.h) - Win32 apps - Microsoft Learn
Aug 19, 2022 · The TransmitFile function transmits file data over a connected socket handle. This function uses the operating system's cache manager to retrieve the file data.
[26]
NOV93: A Netware Chat Utility - Jacob Filipp
You can divide the packet for transmission or reception using the ECB's fragment-count and fragment-descriptor fields. This provides an automatic way to split ...
[27]
Zero-copy - Wikipedia
In computer science, zero-copy refers to techniques that enable data transfer between memory spaces without requiring the CPU to copy the data.Missing: historical motivation
[28]
https://learn.microsoft.com/en-us/windows/win32/api/winsock/nf-winsock-transmitfile
[29]
net/http: permit splice from socket to ResponseWriter #40888 - GitHub
Aug 18, 2020 · What did you do? Assumed a simple io.Copy() between a socket and an http.ResponseWriter would go through splice(). What did you expect to see ...
[30]
http://nginx.org/en/docs/http/ngx_http_core_module.html#sendfile
[31]
Module ngx_http_core_module
### Summary of `sendfile` in Nginx and Zero-Copy Performance
[32]
[PDF] RoCE in the Data Center | Networking
RDMA also allows communication without the need to copy data to the memory buffer. This zero copy transfer enables the receive node to read data directly from ...
[33]
High performance RDMA-based MPI implementation over InfiniBand
In this paper, we propose a new design of MPI over InfiniBand which ... Pin-down Cache: A Virtual Memory Management Technique for Zero-copy Communication.
[34]
HTTP Request Processing with Zero-Copy Optimization(8586)
Aug 17, 2025 · The framework's implementation demonstrates that sophisticated zero-copy techniques can be applied throughout the request processing pipeline.
[35]
Tutorial — lsquic 4.3.1 documentation
LSQUIC provides stream functions that skip intermediate buffering. They are used for zero-copy stream processing. ssize_t lsquic_stream_readf ...
[36]
[PDF] Trading a little Bandwidth for Ultra-Low Latency in the Data Center
Techniques like kernel bypass and zero copy [44, 12] are significantly reducing the latency at the end-host and in the NICs; for example, 10Gbps NICs are ...
[37]
[PDF] Crypt|Net: rethinking the stack for high-performance video streaming
Aug 21, 2017 · Zero-copy is not just about reducing software memory copies. While zero-copy operation has long been a goal in network-stack de- signs ...
[38]
mmap(2) - Linux manual page
### Summary: How `mmap` for Files Avoids Data Copies in User-Kernel Space for File I/O
[39]
Memory-Mapped I/O - SQLite
Apr 18, 2022 · To disable memory-mapped I/O, simply set the mmap_size to zero: PRAGMA mmap_size=0;. If mmap_size is set to N then all current implementations ...
[40]
2. Supported File Operations - The Linux Kernel documentation
In Linux, direct I/O is defined as file I/O that is issued directly to storage, bypassing the pagecache. The iomap_dio_rw function implements O_DIRECT (direct I ...
[41]
[PDF] Accelerating IO-Intensive Applications with Transparent Zero-Copy IO
Jul 13, 2022 · Compared to common uses of zero-copy IO stack APIs, such as memory mapped files, zIO can improve performance by up to 17% due to reduced TLB ...
[42]
virtiofs - shared file system for virtual machines
Virtiofs is a shared file system that lets virtual machines access a directory tree on the host. Unlike existing approaches, it is designed to offer local file ...How to install virtiofs drivers on... · Design DocumentMissing: zero- | Show results with:zero-
[43]
ZeroCopy: file system assisted container buffer migration in cloud ...
Aug 26, 2025 · The buffered/cached files are typically sourced from the storage backend and mounted to the container using volume or mount operations.
[44]
Unlocking High Performance with VirtioFS in Docker Desktop
With VirtioFS, containers can access files on the host machine directly, without the need for a network share or a volume mount. This reduces resource usage, ...
[45]
Efficient Use of GPU Memory for Large-Scale Deep Learning Model ...
CUDA Unified Memory enables automatic data migration between host memory and device memory via virtual memory. For example, if the GPU tries to access a virtual ...Missing: ML | Show results with:ML<|separator|>
[46]
Exhibiting GPU-Centric Communications - arXiv
Mar 31, 2025 · GPUDirect RDMA is used to perform zero-copy transfers of the GPU-attached memory buffers, when possible, from the GPU to the NIC without ...
[47]
Run-Time Adaptive In-Kernel BPF/XDP Solution for 5G UPF - MDPI
The project proposes an in-kernel solution based on BPF and eXpress Data Path (XDP) for 5G User Plane Function (UPF) implementations.
[48]
None
### Summary of Enhancements to io_uring for Async Zero-Copy Rx in Linux 6.x
[49]
What's new with io_uring in 6.10 · axboe/liburing Wiki - GitHub
May 17, 2024 · Greatly improve zerocopy send performance, by enabling coalescing of buffers. · Add support for send/recv bundles. · Improvements for accept.Missing: Linux cloud- native
[50]
[PDF] Revisiting Software Zero-Copy for Web-caching Applications with ...
ZCopy improves the through- put of Memcached over vanilla Linux up to 41.1% and. 40.8% for UDP and TCP processing when the value size is larger than 256 bytes.
[51]
Performance Analysis of Runtime Handling of Zero-Copy for ...
Performance results show that zero-copy is faster than the legacy "copy" implementation by a ratio of 1.2X-2.3X for a production-ready application.
[52]
a zero-copy TCP/IP protocol stack for embedded operating systems
Applying zero-copy mechanism can reduce memory usage and CPU processing time for data transmission. Power consumption can be reduced as well. In this paper ...
[53]
True IOMMU Protection from DMA Attacks: When Copy is Faster ...
Aug 7, 2025 · ... Temporal vulnerabilities occur when IOMMU mappings are open longer than necessary, allowing undesired accesses when memory may have been ...
[54]
[PDF] Zero Copy Receive using io_uring - Linux Plumbers Conference
Nov 15, 2023 · In the second part of the paper, we discuss the limitations of zero copy receive, and the challenges of fully making use of it in userspace ...
[55]
[PDF] ZeroCopy : Techniques , Benefits and Pitfalls | Semantic Scholar
This paper will mostly concentrate on the modern UNIX-like operating systems such as Solaris and Linux with an emphasis on ZeroCopy performance and ...
[56]
[PDF] MAIO - Rethinking Zero-Copy Networking
Why Zero-Copy? 1. Performance. 2. Performance. 3. Performance. 3. Zero-Copy ... Limitations: Zero-Copy semantics vs Copy Semantics. 1. Buffers change ...