Fact-checked by Grok 2 weeks ago

Zero-copy

Zero-copy is a performance optimization technique in computer operating systems and networking that eliminates or minimizes redundant data copying between during (I/O) operations, allowing data to be transferred directly from sources like to destinations such as network sockets. This approach reduces the number of context switches between user and kernel modes and conserves CPU cycles and by avoiding intermediate buffer copies. In traditional I/O workflows, data undergoes multiple copies—for instance, (DMA) transfers data from disk to a , then to a user-space , back to a , and finally to the network protocol engine—resulting in up to four copies and four context switches per transfer. Zero-copy mechanisms address this inefficiency by mapping or exchanging directly; for example, the sendfile() enables file data in the 's to be sent over a without user-space involvement, reducing operations to two context switches and potentially zero CPU-initiated copies. Similarly, the splice() moves data between file descriptors via pipes without user-space copying, supporting applications like web servers for efficient static content delivery. Key benefits include significant throughput improvements and lower resource utilization; benchmarks show zero-copy transfers completing up to 65% faster than traditional methods for large files, such as reducing the time to send a 1 GB file from 18,399 ms to 8,537 ms. Early frameworks, like the zero-copy I/O extensions developed for UNIX systems such as in the mid-1990s, demonstrated over 40% gains in for file transfers and more than 20% reductions in CPU usage. Modern implementations extend this to languages like via the FileChannel.transferTo() method, which leverages underlying OS zero-copy primitives for non-blocking I/O. While primarily used in networking and file I/O, zero-copy principles also apply to specialized domains like GPU and systems.

Fundamentals

Definition and Motivation

Zero-copy refers to a class of techniques in computer systems that enable the of between buffers or from a source to a destination without requiring intermediate copies mediated by the CPU, thereby minimizing unnecessary movement and overhead. These methods leverage hardware and software mechanisms to pass references to locations rather than duplicating the , allowing direct by the involved components. The concept emerged prominently in the as a response to bottlenecks in high-throughput applications, such as servers serving static content, where traditional (I/O) operations incurred significant inefficiencies due to multiple data copies across and space boundaries. In a classic file-to-network transfer scenario, data would typically undergo four copies: DMA from disk to , to , to , and DMA from to the network interface card or engine. This multiplicity of copies, combined with frequent context switches between and modes, strained CPU resources and in increasingly data-intensive environments. By eliminating these redundant operations, zero-copy techniques deliver key benefits, including reduced CPU cycles dedicated to data movement, lower consumption, and decreased overall in I/O paths. For instance, in kernel-to-user transfers, the number of context switches can be reduced from four to two, allowing the to handle higher rates with less overhead. Early hardware implementations of such principles date back to the , exemplified by the OS/360's subsystem, which enabled programs to instruct direct transfers between files or devices and main without ongoing CPU intervention, offloading I/O processing to autonomous channels that fetched command words and managed transfers independently.

Core Principles

Zero-copy techniques operate within the context of distinct spaces in , where user-space applications are isolated from the -space to ensure and . User-space memory is directly accessible to applications but not to kernel components without explicit mediation, while kernel-space handles privileged operations like I/O. Crossing this boundary traditionally requires copying to prevent unauthorized access, but this incurs significant overhead from context switches—transitions between user and kernel modes that involve saving and restoring processor states, potentially taking thousands of CPU cycles each. The operational scope of zero-copy encompasses data exchanges between user-space processes, inter-process communications, and user- interactions, contrasting sharply with traditional copy-based I/O where data is duplicated multiple times across . In copy-based methods, data from an I/O device (e.g., disk) is first loaded into a via (), then copied to a user upon a , and later copied back to a for transmission—resulting in up to four copies and multiple context switches per operation. Zero-copy addresses this by enabling direct access to data without redundant duplications, particularly beneficial in data-intensive scenarios like network transfers or file serving, where traditional approaches can consume 20-40% of CPU cycles on copying alone. At its core, zero-copy relies on pointer passing or descriptor-based transfers, allowing the or to reference user-provided directly rather than copying their contents. For instance, techniques like memory mapping () share pages between user and spaces, enabling the to access user via page tables without data movement. Descriptor-based methods, such as buffer exchanges, pass ownership of data structures (e.g., fast ) between spaces, minimizing CPU involvement in transfers. This shifts the burden to or optimized paths, reducing copies to as few as two ( to , to ) while limiting context switches to two per I/O cycle. To illustrate the flows: Traditional Copy-Based Flow (e.g., file read to network send):
  1. transfers data from disk to (1 copy).
  2. copies data from to user (2nd copy, 1st context switch pair).
  3. Application processes data in user space.
  4. Another copies data from user to socket (3rd copy, 2nd context switch pair).
  5. transfers from socket to network interface card (4th copy overall).
Zero-Copy Flow (e.g., using or ):
  1. transfers data from disk directly to or mapped user view (no initial copy to user).
  2. references the (via pointer or descriptor) and performs any necessary processing without user-space involvement.
  3. transfers from the same -referenced to network interface (eliminates 2-3 intermediate copies, only 1-2 context switches).
These mechanisms can improve throughput by over 40% and reduce CPU utilization significantly in high-volume I/O workloads.

Hardware Support

Direct Memory Access

(DMA) is a hardware mechanism that enables (I/O) devices, such as network interface cards (NICs) or controllers, to directly to or from system memory without requiring continuous (CPU) intervention. A dedicated DMA controller manages these transfers by arbitrating access to the memory bus, allowing the CPU to perform other tasks concurrently. This contrasts with programmed I/O (PIO), where the CPU actively polls the device status and handles each byte or block of , resulting in high CPU overhead and no zero-copy capability due to the necessary movement under CPU control. In the context of zero-copy techniques, plays a pivotal role by permitting the operating system to configure the DMA controller to read from or write to -space buffers directly, thereby eliminating intermediate CPU-mediated copies between and spaces. For instance, a NIC's DMA engine can pull straight from an application's buffer in for over , passing only buffer descriptors rather than copying the . This hardware-level bypass ensures that remains in its original location, supporting efficient processing where pointers to buffers are exchanged instead of duplicating contents. A significant advancement in DMA was provided by the architecture introduced in 1964, where channel programs enabled I/O devices to execute sequences of control words for transfers to main memory with minimal CPU involvement, laying the groundwork for zero-copy I/O operations. These early channels operated as specialized processors optimized for high-speed, concurrent movement, achieving rates up to 5,000,000 characters per second without tying up the CPU. Over time, DMA evolved with bus standards like (PCIe), where modern devices use DMA engines integrated into the PCIe fabric to perform transfers between peripherals and host memory, further enhancing zero-copy efficiency in contemporary systems. A key advancement in for zero-copy is scatter-gather functionality, which allows the controller to handle non-contiguous buffers by processing a list of descriptor entries specifying scattered or destination locations. This mitigates issues from fragmentation, where user buffers may not be physically contiguous, enabling direct operations on disparate pages without requiring the kernel to consolidate them via CPU copies. In practice, the kernel provides the engine with a gather list for transmission or a scatter list for reception, ensuring seamless zero-copy transfers even for fragmented application data.

Memory Management Techniques

The (MMU) plays a central role in enabling zero-copy operations by facilitating virtual-to-physical address translation through page tables, which allow multiple processes or components to access the same physical pages without duplicating data. Page tables maintain mappings that associate virtual addresses in a process's with physical locations, ensuring that shared regions can be referenced directly rather than copied. This mechanism supports efficient sharing across address spaces, as the MMU hardware resolves translations on-the-fly, avoiding the need for explicit data movement. Key techniques leveraging the MMU for zero-copy include memory-mapped files and mechanisms. Memory-mapped files map file contents directly into a process's via entries, enabling file I/O operations to treat disk data as in-memory objects without intermediate buffering or copying. This approach integrates seamlessly with the MMU's translation process, allowing read and write access through standard memory instructions while the handles paging from storage to physical memory as needed. Complementing this, (COW) enables inter-process memory sharing by initially mapping multiple processes to the same physical pages; modifications trigger a copy only for the affected page, preserving the original shared state without upfront duplication. COW relies on MMU protections to detect writes and update s dynamically, minimizing overhead in scenarios like process forking or snapshotting. An extension of MMU functionality critical for secure zero-copy is the Input-Output Memory Management Unit (IOMMU), which provides address translation and for DMA transfers. The IOMMU allows I/O devices to access user-space memory directly while enforcing isolation, preventing unauthorized access and enabling zero-copy I/O without compromising system security. Advanced applications extend these principles to environments, such as the (HSA), introduced in 2012 by the HSA Foundation, with founding members including . HSA provides a unified across CPU and GPU, allowing both processors to access regions through coherent page mappings without explicit copies or synchronization barriers. This architecture uses MMU extensions and system-level to enable seamless , reducing latency in compute-intensive workloads. Despite these benefits, MMU-based zero-copy techniques introduce trade-offs, particularly around and . Page pinning, often required to prevent of shared pages during direct (e.g., integrating with for I/O), can lead to denial-of-service risks by exhausting swappable memory if overused, as pinned pages remain locked in physical . Additionally, granting the direct to user-space pages for or raises concerns, as vulnerabilities in translation or pinning logic could expose sensitive data or enable unauthorized modifications across protection domains.

Software Implementations

Operating System Interfaces

In , the sendfile() enables efficient transfer of data from a to a descriptor entirely within space, avoiding user-space copies and leveraging () for support. Introduced in kernel version 2.2 (January 1999), it supports file-to-socket operations and was designed to optimize servers like web proxies by reducing CPU overhead. The splice() , added in 2.6.17 (June 2006), extends zero-copy capabilities to pipe-based data movement between arbitrary s, such as pipes or s, by using buffers without user-space intervention. More recently, , introduced in 5.1 (May 2019), provides an asynchronous interface for zero-copy I/O operations, including ring buffers shared between user and kernel space to minimize copies for both reads and writes. Among other Unix-like systems, implements sendfile() since version 3.0 (October 1998), which transfers file data directly to a socket using kernel-side copying and supports optional headers/trailers for enhanced flexibility in network applications. achieves similar zero-copy effects through mmap() combined with write(), mapping files into user space for direct transmission without additional buffering. introduced zero-copy networking in the mid-1990s via system calls like sendfile(), sendfilev(), and write()/writev() with mmap(), utilizing virtual memory remapping and hardware checksumming to eliminate data copies in TCP transmissions. macOS, inheriting from BSD, provides sendfile() for sending files over sockets without user-space involvement, ensuring compatibility with high-performance I/O patterns. On Windows, the TransmitFile() function in the Winsock API, available since (February 2000), facilitates zero-copy transmission of file data over connected sockets by relying on the kernel's manager to handle retrieval and sending without user-mode copies.

Applications

Networking Protocols

In networking protocols, zero-copy techniques optimize data transfer by minimizing or eliminating redundant memory copies between , or across network stack layers, thereby reducing latency and CPU overhead. A prominent example is the use of the sendfile in / sockets, which enables direct transfer of data from a to a descriptor without intermediate buffering in user space. This mechanism is particularly beneficial for web servers serving static content, as it leverages kernel-level operations to bypass traditional read-write cycles. For instance, supports EnableSendfile, which activates the kernel's sendfile support to deliver files directly from the filesystem to the network interface, improving efficiency for high-throughput scenarios. Similarly, employs the sendfile directive to achieve zero-copy transfers, reducing context switches and memory usage during static file delivery, which contributes to its performance in handling concurrent connections. Remote Direct Memory Access (RDMA) represents a hardware-accelerated zero-copy approach in networking, allowing direct data movement between the of two networked hosts without involving the CPU or operating system on either end. RDMA operates over protocols like , introduced in the early 2000s, and its Ethernet-compatible variant, (RoCE), which bypasses the traditional / stack to enable kernel-bypass transfers. This is widely adopted in (HPC) environments, where RDMA-based Message Passing Interface (MPI) implementations over use zero-copy protocols for large-message transfers, achieving low-latency communication critical for parallel applications. For example, RDMA Write and Read operations pin buffers in and transfer data directly, eliminating copies and supporting scalable cluster interconnects in supercomputing. Other modern protocols incorporate zero-copy elements to enhance efficiency. HTTP/2's binary framing layer decomposes messages into frames for multiplexing over a single connection, facilitating implementations that avoid unnecessary data copies by processing headers and payloads in contiguous buffers. This framing supports zero-copy optimizations in servers, where data can be handed off directly to the network stack without reformatting overhead. Similarly, , the transport protocol underlying , offers stream-based zero-copy potential through its multiplexed streams over , allowing applications to read and write data without intermediate buffering via APIs like those in LSQUIC, which skip kernel-user copies for . These features enable efficient handling of concurrent flows, such as in applications requiring low-latency responses. In practice, zero-copy networking reduces in latency-sensitive domains through direct buffer handoff. For , kernel-bypass techniques with zero-copy RDMA or user-space stacks minimize data movement delays in ingestion and order execution, enabling microsecond-scale responses over low-latency networks. In video streaming, protocols like those enhanced with zero-copy I/O in encrypted stacks allow direct transfers of frames to the network, avoiding CPU copies and supporting high-throughput delivery without buffering artifacts. These examples highlight how zero-copy integrates with operating system interfaces, such as for pipe-based transfers, to streamline end-to-end data paths.

File and Storage I/O

In file and storage I/O, zero-copy techniques eliminate unnecessary data copies between , enabling direct access to storage contents and reducing CPU overhead for high-throughput operations. A foundational mechanism is the mmap() in systems, which maps a or device into a process's , allowing applications to read and write data via memory operations rather than explicit read() or write() calls that involve kernel-user buffer transfers. This mapping leverages the kernel's , where data is already buffered, permitting shared access without additional copying; changes to the mapped region are transparently propagated back to the upon synchronization. Databases exemplify mmap's utility in zero-copy file I/O. In SQLite, memory-mapped I/O is enabled through the PRAGMA mmap_size directive, which specifies the database file portion to map, providing direct pointers to pages via the xFetch() VFS method and bypassing the xRead() copy path. This results in faster query performance for I/O-bound workloads by sharing pages with the OS page cache, though it primarily benefits reads over writes due to transaction safety requirements; for instance, mapping 256 MiB can accelerate sequential scans while conserving application RAM. For modern storage like SSDs and NVMe devices, Linux's direct I/O mode—activated by the O_DIRECT flag on file descriptors—bypasses the to transfer data directly between user-provided buffers and the underlying storage via bio structures, avoiding both buffering and user- copies. This is particularly effective for large, sequential workloads where pollution is undesirable, as it minimizes usage and enables hardware-accelerated I/O; the iomap_dio_rw() implementation handles alignment constraints (e.g., page-sized offsets) to ensure efficient submission to block devices like NVMe controllers. Virtualization further extends zero-copy to guest-host in like KVM. The virtiofs shared , built on and integrated since 5.4, enables guests to access host directories with local semantics; its experimental Direct Access () feature maps file contents from the host's into guest memory windows, supporting zero-copy reads and writes without data duplication across the hypervisor boundary.

Emerging Use Cases

In and containerized environments, zero-copy techniques optimize data handling during migrations in platforms such as and , particularly for with large working sets. The ZeroCopy framework leverages metadata to migrate buffered container data without direct transfers, integrating with 19.03 and via CRIU pre-copy mechanisms, which reduces data transmission by about six times, downtime by 31.8%, and eviction time by 21.5% relative to prior methods like Bonfire and FirepanIF. This approach minimizes bandwidth consumption and , enabling seamless scaling in dynamic setups. Technologies like VirtioFS further support zero-copy mounts by providing direct filesystem access from containers to host storage, lowering overhead in clusters for persistent data operations. In and workloads, zero-copy enables efficient tensor data transfers between CPU and GPU, addressing bottlenecks in large-scale . Unified , with optimizations such as advise hints (introduced in 8.0 in 2016), allows automatic migration of data via virtual addressing, eliminating explicit copies and expanding effective GPU . For instance, in PyTorch-based , it supports larger mini-batch sizes—from 16 to 32 for BERT-Large models—while reducing time by 9.4% through prefetching and reduced page faults. Comparable advancements in (HSA) facilitate zero-copy sharing across heterogeneous processors, enhancing tensor pipelines in multi-device systems. At the edge and in networks, zero-copy implementations deliver low-latency processing for gateways and base stations in pipelines. combined with XDP in enables in-kernel packet handling for User Plane Functions (UPF), achieving 10-11 million packets per second on six cores with under 70% CPU usage and less than 0.5% . This adaptive , as in the upf-bpf project, decouples rules per session to cut injection times by 96% (from 27 ms to 1 ms), supporting high-volume streams in without kernel modifications. Such mechanisms ensure ultra-reliable low-latency communication for enterprise and non-public networks. Recent developments in 6.x kernels, starting from 2023, bolster asynchronous zero-copy for cloud-native applications through features like zero-copy receive (ZC Rx). This registers userspace buffers with a pinned page pool, integrating with the networking stack via NAPI polling and sk_buff marking, to enable high-throughput paths without data duplication. In benchmarks on 25 Gbps NICs like BCM57504, it yields superior performance in tools such as iperf3 for hybrid control and data planes in . Enhancements in kernel 6.10 further improve zerocopy send coalescing and bundle support, reducing overhead in scalable I/O scenarios.

Advantages and Challenges

Performance Benefits

Zero-copy techniques significantly enhance system efficiency by eliminating unnecessary data copies between and spaces, thereby reducing CPU overhead associated with memory operations. In traditional I/O paradigms, data transfer involves multiple copies—typically two between kernel buffers and user space—which consume substantial CPU cycles for memcpy operations; zero-copy avoids these, potentially reducing CPU utilization by up to 52% in kernel-bypass scenarios for I/O-intensive workloads like database operations. For instance, benchmarks on web-caching applications such as demonstrate throughput improvements of up to 41% under high network loads, attributed to minimized copying and cache misses exceeding 10% reduction. Scalability benefits are particularly evident in high-throughput environments, where zero-copy enables servers to sustain line rates like 10 Gbps without CPU , as the kernel handles data movement directly via , freeing processor resources for application logic. Empirical studies on file serving show speedups of 2–3× compared to conventional read/write methods; for example, transparent zero-copy frameworks achieve up to 2.3× faster performance in production applications by streamlining I/O paths. Additionally, zero-copy cuts usage by avoiding redundant buffers, with reductions up to 40% in systems, which indirectly lowers overall power consumption by up to 30% in resource-constrained mobile and edge devices. A key efficiency gain lies in fewer context switches: traditional I/O requires at least four switches (two for read and two for write), whereas zero-copy mechanisms like sendfile reduce this to two, halving the overhead and enabling higher concurrency in servers. These reductions collectively allow systems to handle larger workloads with , as seen in microbenchmarks where zero-copy yields up to 3.8× speedup for multi-copy I/O patterns.

Limitations and Trade-offs

While zero-copy techniques offer significant advantages in data transfer, they introduce notable risks by granting direct access to user-space buffers, thereby expanding the for potential exploits. For instance, the use of pinned pages in zero-copy I/O can lead to temporal vulnerabilities where mappings remain exposed longer than necessary due to delayed unmapping for reasons, allowing malicious I/O devices to access unintended data regions. Additionally, regions between and user space heighten the risk of or unauthorized access, particularly from untrusted network interface cards (NICs). Implementation complexity is another key trade-off, as zero-copy requires meticulously aligned buffers and sophisticated ownership tracking to prevent data races or invalid accesses, placing a substantial management burden on developers. This often necessitates careful error handling for scenarios like buffer misalignment or access conflicts, which can complicate application design and coordination between sender and receiver endpoints. Furthermore, zero-copy is generally unsuitable for small or random I/O operations, where the need for precise buffer alignment (e.g., via O_DIRECT flags) may inadvertently require fallback copies, undermining the technique's benefits. Compatibility issues limit zero-copy's portability across operating systems, as mechanisms like sendfile or are primarily optimized for environments such as and , often requiring kernel modifications or hardware-specific support that is not universally available. In heterogeneous setups, applications may need to detect support and revert to traditional copying, reducing reliability. Overhead from initial setup, such as page locking or pinning, can also negate gains for short transfers; for example, I/O under 16 KB may suffer up to 30% throughput loss due to tracking and fault-handling costs.

References

  1. [1]
    Efficient data transfer through zero copy - IBM Developer
    Jan 26, 2022 · Zero copy greatly improves application performance and reduces the number of context switches between kernel and user mode. The Java class ...Data transfer: The traditional... · Data transfer: The zero-copy...
  2. [2]
    Zero-copy networking - LWN.net
    Jul 3, 2017 · On the software side, the contents of a file can be transmitted without copying them through user space using the sendfile() system call. That ...
  3. [3]
    An Efficient Zero-Copy I/O Framework for UNIX - ACM Digital Library
    Traditional UNIX - I/O interfaces are based on copy semantics, where read and write calls transfer data between the kernel and user-defined buffers.
  4. [4]
    Zero-copy I/O processing for low-latency GPU computing
    This paper presents a zero-copy I/O processing scheme that maps the I/O address space of the system to the virtual address space of the compute device, allowing ...
  5. [5]
    [PDF] Systems Reference Library IBM System/360 Principles of Operation
    This manual covers the characteristics, functions, and features of the IBM System/360, including operating principles, CPU, instructions, and input/output.
  6. [6]
    Chapter 4 Process Address Space - The Linux Kernel Archives
    With a process, space is simply reserved in the linear address space by pointing a page table entry to a read-only globally visible page filled with zeros. On ...
  7. [7]
  8. [8]
    [PDF] DMA Fundamentals on Various PC Platforms
    A DMA controller can directly access memory and is used to transfer data from one memory location to another, or from an I/O device to memory and vice versa.
  9. [9]
    I/O Program Controlled Transfer vs DMA Transfer - GeeksforGeeks
    Jul 23, 2025 · Program-controlled I/O requires CPU to monitor and wait for device readiness, while DMA allows direct memory access with less CPU intervention.
  10. [10]
    Direct Memory Access - an overview | ScienceDirect Topics
    DMA enables zero-copy techniques in operating systems, where packets remain in a fixed memory location and only pointers are passed among protocol ...
  11. [11]
    Direct Memory Access - OS X and iOS Kernel Programming [Book]
    DMA also allows for so-called zero-copy, in that we can transfer memory from a user space buffer directly to a device without any memory copies. PCI doesn't ...
  12. [12]
    [PDF] Architecture of the IBM System/360 - People @EECS
    An input/output system offering new degrees of concurrent opercation, compatible chcnnel opercation, data rates capproaching 5,000,000 characters/second, ...
  13. [13]
    1.4.1. PCI Express and DMA - Intel
    Jun 5, 2025 · PCI Express is a serial bus. DMA enables data transfers between host and device memory without processor involvement, offloading the processor.<|separator|>
  14. [14]
    Scatter-Gather or Contiguous? Understanding DMA Types - WinDriver
    Both contiguous and scatter-gather DMA (also: SG) are techniques used in PCIe driver development to efficiently transfer data between a device and system memory ...
  15. [15]
    [PDF] Making In-Memory Databases Fast on Modern NICs - · Ryan Stutsman
    The results show that zero-copy always reduces the CPU load of the server, and, as expected, the effect is larger for larger records. With 1024 B records, the ...
  16. [16]
    Page Tables - The Linux Kernel documentation
    Page tables map virtual addresses as seen by the CPU into physical addresses as seen on the external memory bus. Linux defines page tables as a hierarchy.
  17. [17]
    MSG_ZEROCOPY — The Linux Kernel documentation
    Copy avoidance is not a free lunch. As implemented, with page pinning, it replaces per byte copy cost with page accounting and completion notification overhead.Missing: implications | Show results with:implications
  18. [18]
    Page pinning and filesystems - LWN.net
    May 10, 2022 · Page pinning operates on pages outside the filesystem's purview, causing issues when contents change unexpectedly. Filesystems are generally ...
  19. [19]
    sendfile(2) - Linux manual page - man7.org
    ... Linux 2.6.x kernel series, but was restored in Linux 2.6.33. The original Linux sendfile() system call was not designed to handle large file offsets.Missing: introduction | Show results with:introduction
  20. [20]
    splice(2) - Linux manual page - man7.org
    The three system calls splice(), vmsplice(2), and tee(2), provide user-space programs with full control over an arbitrary kernel buffer, implemented within ...
  21. [21]
    io_uring(7) - Linux manual page - man7.org
    io_uring performance Because of the shared ring buffers between kernel and user space, io_uring can be a zero-copy system. Copying buffers to and from ...
  22. [22]
    Operating Systems II -- Virtual memory management in NetBSD
    Allows to mmap() + write() a file to achieve the Linux sendfile(). Useful resources. The NetBSD kernel sources are in ~vincent/NetBSD-src/sys/ . Chuck ...
  23. [23]
    Zero-Copy TCP in Solaris - USENIX
    This paper describes a new feature in Solaris that uses virtual memory remapping combined with checksumming support from the networking hardware.Missing: APIs | Show results with:APIs
  24. [24]
    Mac OS X Manual Page For sendfile(2) - Apple Developer
    The sendfile() system call sends a regular file specified by descriptor fd out a stream socket specified by descriptor s.
  25. [25]
    TransmitFile function (winsock.h) - Win32 apps - Microsoft Learn
    Aug 19, 2022 · The TransmitFile function transmits file data over a connected socket handle. This function uses the operating system's cache manager to retrieve the file data.
  26. [26]
    NOV93: A Netware Chat Utility - Jacob Filipp
    You can divide the packet for transmission or reception using the ECB's fragment-count and fragment-descriptor fields. This provides an automatic way to split ...
  27. [27]
    Zero-copy - Wikipedia
    In computer science, zero-copy refers to techniques that enable data transfer between memory spaces without requiring the CPU to copy the data.Missing: historical motivation
  28. [28]
  29. [29]
    net/http: permit splice from socket to ResponseWriter #40888 - GitHub
    Aug 18, 2020 · What did you do? Assumed a simple io.Copy() between a socket and an http.ResponseWriter would go through splice(). What did you expect to see ...
  30. [30]
  31. [31]
    Module ngx_http_core_module
    ### Summary of `sendfile` in Nginx and Zero-Copy Performance
  32. [32]
    [PDF] RoCE in the Data Center | Networking
    RDMA also allows communication without the need to copy data to the memory buffer. This zero copy transfer enables the receive node to read data directly from ...
  33. [33]
    High performance RDMA-based MPI implementation over InfiniBand
    In this paper, we propose a new design of MPI over InfiniBand which ... Pin-down Cache: A Virtual Memory Management Technique for Zero-copy Communication.
  34. [34]
    HTTP Request Processing with Zero-Copy Optimization(8586)
    Aug 17, 2025 · The framework's implementation demonstrates that sophisticated zero-copy techniques can be applied throughout the request processing pipeline.
  35. [35]
    Tutorial — lsquic 4.3.1 documentation
    LSQUIC provides stream functions that skip intermediate buffering. They are used for zero-copy stream processing. ssize_t lsquic_stream_readf ...
  36. [36]
    [PDF] Trading a little Bandwidth for Ultra-Low Latency in the Data Center
    Techniques like kernel bypass and zero copy [44, 12] are significantly reducing the latency at the end-host and in the NICs; for example, 10Gbps NICs are ...
  37. [37]
    [PDF] Crypt|Net: rethinking the stack for high-performance video streaming
    Aug 21, 2017 · Zero-copy is not just about reducing software memory copies. While zero-copy operation has long been a goal in network-stack de- signs ...
  38. [38]
    mmap(2) - Linux manual page
    ### Summary: How `mmap` for Files Avoids Data Copies in User-Kernel Space for File I/O
  39. [39]
    Memory-Mapped I/O - SQLite
    Apr 18, 2022 · To disable memory-mapped I/O, simply set the mmap_size to zero: PRAGMA mmap_size=0;. If mmap_size is set to N then all current implementations ...
  40. [40]
    2. Supported File Operations - The Linux Kernel documentation
    In Linux, direct I/O is defined as file I/O that is issued directly to storage, bypassing the pagecache. The iomap_dio_rw function implements O_DIRECT (direct I ...
  41. [41]
    [PDF] Accelerating IO-Intensive Applications with Transparent Zero-Copy IO
    Jul 13, 2022 · Compared to common uses of zero-copy IO stack APIs, such as memory mapped files, zIO can improve performance by up to 17% due to reduced TLB ...
  42. [42]
    virtiofs - shared file system for virtual machines
    Virtiofs is a shared file system that lets virtual machines access a directory tree on the host. Unlike existing approaches, it is designed to offer local file ...How to install virtiofs drivers on... · Design DocumentMissing: zero- | Show results with:zero-
  43. [43]
    ZeroCopy: file system assisted container buffer migration in cloud ...
    Aug 26, 2025 · The buffered/cached files are typically sourced from the storage backend and mounted to the container using volume or mount operations.
  44. [44]
    Unlocking High Performance with VirtioFS in Docker Desktop
    With VirtioFS, containers can access files on the host machine directly, without the need for a network share or a volume mount. This reduces resource usage, ...
  45. [45]
    Efficient Use of GPU Memory for Large-Scale Deep Learning Model ...
    CUDA Unified Memory enables automatic data migration between host memory and device memory via virtual memory. For example, if the GPU tries to access a virtual ...Missing: ML | Show results with:ML<|separator|>
  46. [46]
    Exhibiting GPU-Centric Communications - arXiv
    Mar 31, 2025 · GPUDirect RDMA is used to perform zero-copy transfers of the GPU-attached memory buffers, when possible, from the GPU to the NIC without ...
  47. [47]
    Run-Time Adaptive In-Kernel BPF/XDP Solution for 5G UPF - MDPI
    The project proposes an in-kernel solution based on BPF and eXpress Data Path (XDP) for 5G User Plane Function (UPF) implementations.
  48. [48]
    None
    ### Summary of Enhancements to io_uring for Async Zero-Copy Rx in Linux 6.x
  49. [49]
    What's new with io_uring in 6.10 · axboe/liburing Wiki - GitHub
    May 17, 2024 · Greatly improve zerocopy send performance, by enabling coalescing of buffers. · Add support for send/recv bundles. · Improvements for accept.Missing: Linux cloud- native
  50. [50]
    [PDF] Revisiting Software Zero-Copy for Web-caching Applications with ...
    ZCopy improves the through- put of Memcached over vanilla Linux up to 41.1% and. 40.8% for UDP and TCP processing when the value size is larger than 256 bytes.
  51. [51]
    Performance Analysis of Runtime Handling of Zero-Copy for ...
    Performance results show that zero-copy is faster than the legacy "copy" implementation by a ratio of 1.2X-2.3X for a production-ready application.
  52. [52]
    a zero-copy TCP/IP protocol stack for embedded operating systems
    Applying zero-copy mechanism can reduce memory usage and CPU processing time for data transmission. Power consumption can be reduced as well. In this paper ...
  53. [53]
    True IOMMU Protection from DMA Attacks: When Copy is Faster ...
    Aug 7, 2025 · ... Temporal vulnerabilities occur when IOMMU mappings are open longer than necessary, allowing undesired accesses when memory may have been ...
  54. [54]
    [PDF] Zero Copy Receive using io_uring - Linux Plumbers Conference
    Nov 15, 2023 · In the second part of the paper, we discuss the limitations of zero copy receive, and the challenges of fully making use of it in userspace ...
  55. [55]
    [PDF] ZeroCopy : Techniques , Benefits and Pitfalls | Semantic Scholar
    This paper will mostly concentrate on the modern UNIX-like operating systems such as Solaris and Linux with an emphasis on ZeroCopy performance and ...
  56. [56]
    [PDF] MAIO - Rethinking Zero-Copy Networking
    Why Zero-Copy? 1. Performance. 2. Performance. 3. Performance. 3. Zero-Copy ... Limitations: Zero-Copy semantics vs Copy Semantics. 1. Buffers change ...