mmap
mmap is a POSIX system call in Unix-like operating systems that maps files, devices, or shared memory objects into a process's virtual address space, allowing the mapped content to be accessed directly via memory operations rather than traditional file I/O calls.[1] This mechanism, known as memory-mapped I/O, enables efficient data transfer by leveraging the operating system's virtual memory subsystem to handle paging and caching.[2]
A fully functional mmap system call first appeared in SunOS 4.0 (1988) and was implemented in 4.4BSD (1993), and it has since become a standard feature in POSIX.1-2001 and later revisions, including POSIX.1-2008. In Linux, the underlying system call evolved from the original mmap to mmap2 since kernel version 2.4 for better large-file support, though the user-space interface remains consistent.[2] Defined in the <sys/mman.h> header, it is part of the Single UNIX Specification and is widely supported across BSD derivatives, Linux distributions, and other Unix variants.[3]
Key parameters of mmap include the suggested starting address (typically NULL for kernel selection), the length of the mapping (must be greater than zero and page-aligned), protection flags (e.g., PROT_READ, PROT_WRITE, PROT_EXEC), mapping flags (e.g., MAP_SHARED for shared changes or MAP_PRIVATE for copy-on-write), a file descriptor for the object to map, and an offset into that object.[1] On success, it returns a pointer to the mapped region; failure yields MAP_FAILED and sets errno.[2] Linux extensions, such as MAP_ANONYMOUS for non-file-backed allocations, enhance its versatility for dynamic memory needs.[2]
One primary advantage of mmap is reduced overhead: it eliminates the need for explicit read/write system calls and associated data copies between kernel and user space after initial mapping, as the virtual memory manager handles access transparently, potentially incurring page faults only on demand.[4] This makes it particularly efficient for large files or random access patterns, outperforming sequential I/O in scenarios like database operations or image processing.[5] Additionally, MAP_SHARED enables inter-process communication by allowing multiple processes to share the same physical memory pages, conserving resources in multi-process environments such as web servers.[4] However, it requires careful management of synchronization and permissions to avoid race conditions or security issues.[1]
Fundamentals
Definition and Purpose
The mmap system call is a POSIX-standard interface that establishes a mapping between a process's virtual address space and a file, device, shared memory object, or typed memory object, allowing the process to treat the mapped content as if it were directly accessible in RAM.[1] This mechanism integrates disk or device data into the process's memory layout without requiring explicit data transfers, leveraging the operating system's virtual memory subsystem.[2]
Understanding mmap requires familiarity with virtual memory, a core operating system feature that abstracts physical memory limitations by providing each process with an illusion of contiguous address space larger than available RAM. Virtual memory operates through paging, where the address space is divided into fixed-size pages (typically 4 KB on many systems), and physical memory into corresponding frames; pages not actively needed can be swapped to disk, with the kernel handling on-demand loading via page faults.[6] This foundation enables mmap to fault in mapped pages lazily, only allocating and populating physical memory when accessed.
The primary purpose of mmap is to facilitate efficient random access to large files or memory regions without the overhead of repeated read or write system calls, which would otherwise involve explicit buffer management and data copying between kernel and user space.[5] It also supports inter-process shared memory, where multiple processes can map the same object to synchronize data exchange atomically, and enables anonymous mappings for dynamic memory allocation backed by swap space rather than files.[1] By design, mmap promotes lazy loading, where the kernel populates pages on first access, optimizing resource use for sparse or sequential access patterns.[2]
Key benefits include enhanced performance through kernel-managed paging, which minimizes user-space intervention and exploits hardware efficiencies like direct memory access; elimination of extra data copies inherent in traditional I/O, as mapped regions appear directly in the process's address space; and support for atomic updates in shared mappings, ensuring consistency across processes without additional locking in many cases.[7] These advantages make mmap particularly valuable for applications handling large datasets, such as databases or scientific computing, where file-backed mappings treat disk content as virtual RAM and anonymous mappings provide efficient heap-like allocation.[5]
Basic Mechanics
When the mmap system call is invoked, the kernel creates a new mapping by reserving a contiguous range of virtual addresses within the calling process's address space, typically starting at a page-aligned address chosen by the kernel if not specified by the user. This reservation does not allocate physical memory pages upfront; instead, the kernel records the mapping details in the process's memory descriptor without committing resources immediately, enabling efficient use of virtual address space.[2]
Physical pages are allocated lazily through demand paging: upon the first access to a virtual address in the mapped range, the processor's Memory Management Unit (MMU) detects an invalid page table entry, triggering a page fault that interrupts the process and transfers control to the kernel. The kernel then resolves the fault by allocating a physical page, populating the corresponding page table entry with the appropriate mapping—translating the virtual address to a physical location or file offset—and resuming execution, allowing the user process to proceed without direct involvement in memory allocation or translation. This mechanism integrates mmap seamlessly with the operating system's virtual memory subsystem, where the kernel maintains page tables to facilitate MMU-driven address translation and enforce access protections.[2]
The kernel's central role extends to ongoing management, as it handles all virtual-to-physical address translations for the mapped region via the MMU, ensuring that subsequent accesses bypass user-level intervention while applying hardware-accelerated lookups through the page tables. For unmapping, the munmap system call directs the kernel to remove the specified virtual address range from the process's mappings, freeing the reserved address space and, if applicable, writing back any modified (dirty) pages to their backing store before invalidating the page table entries to prevent further access.[2][8]
Mapping Types
File-Backed Mappings
File-backed mappings in the mmap system call associate a region of a process's virtual memory with a portion of a file or device, allowing the file's contents to be accessed as if they were in memory. These mappings are established using a valid file descriptor (fd) obtained from opening the file, which serves as the backing store for the mapped region. The mapping starts at a specified file offset, which must be a multiple of the system's page size to align with memory management requirements, and covers a length determined by the length parameter.[2][9]
Such mappings support various protection modes defined by POSIX flags, including read-only (PROT_READ), read-write (PROT_READ | PROT_WRITE, requiring the file to be opened in O_RDWR mode), and shared or private behaviors controlled by MAP_SHARED or MAP_PRIVATE. In shared mappings (MAP_SHARED), modifications to the mapped memory are immediately reflected in the underlying file and visible to other processes mapping the same file. Private mappings (MAP_PRIVATE), in contrast, employ copy-on-write semantics, where writes create a private copy of the page without altering the original file, ensuring isolation while initially sharing the file's data. The file offset parameter enables precise control over which part of the file is mapped, facilitating targeted access without loading the entire file into memory.[2][9]
Operations on file-backed mappings treat the mapped address range as ordinary memory, enabling direct reads and writes without explicit system calls for I/O. For shared mappings, any write to the mapped region propagates changes to the backing file upon page synchronization, such as via msync or process exit, while reads lazily load file data into physical memory on demand. The length parameter allows partial mappings of large files, mapping only the necessary segments to conserve virtual address space and enable efficient access to specific portions. This approach supports both sequential and random access patterns, bypassing traditional buffering mechanisms in the standard I/O library.[2][9]
File-backed mappings are particularly suited for scenarios involving large files where performance is critical, such as processing voluminous datasets without loading them entirely into RAM. For instance, in image processing applications, they allow direct manipulation of pixel data in memory-mapped files, optimizing memory usage for medical imaging software that merges and analyzes high-resolution volumes. Similarly, they facilitate efficient handling of log files, enabling append-only writes and random reads for analysis in server environments, reducing I/O overhead compared to buffered file operations. These use cases leverage the kernel's paging to handle files exceeding available physical memory, providing transparent demand-paging for scalability.[9][10]
Despite their advantages, file-backed mappings have limitations tied to the underlying storage system. They require filesystem support for memory mapping; if the filesystem does not provide this (e.g., certain network or special filesystems), the mmap call fails with ENODEV. Additionally, if the backing file is removed after mapping but before unmapping, the mapping persists and continues to function normally, with reads and writes operating on the preserved inode data. Applications must handle such scenarios appropriately to maintain robustness.[2][9]
Anonymous Mappings
Anonymous mappings in the mmap() system call provide a mechanism to map regions of a process's virtual address space to anonymous memory not backed by any file, enabling efficient allocation of memory without file system involvement.[11] These mappings are created by specifying the MAP_ANONYMOUS flag (or its synonym MAP_ANON), with the file descriptor set to -1 and the offset to 0, as the system ignores any other values for these parameters in this mode.[11] The resulting memory region is of the specified length in bytes and can be configured as private (MAP_PRIVATE) or, on some systems, shared (MAP_SHARED) to control write behavior and visibility across processes.[2]
The contents of anonymous mappings are initialized to zero upon creation, with pages faulted in and zero-filled on first access to ensure this state without immediate full allocation.[11] Unlike file-backed mappings, which provide persistent storage tied to a file for data durability across process lifecycles, anonymous mappings lack any backing store and thus do not persist data after process termination.[2] This zero-initialization occurs lazily via the kernel's handling of page faults, optimizing performance by deferring actual memory commitment until needed.[2]
In terms of operations, anonymous mappings function similarly to dynamic allocation via malloc() but enforce page-aligned boundaries, making them suitable for allocating large, contiguous blocks of memory that exceed typical heap limits or require specific alignment.[2] They are particularly advantageous for avoiding the overhead of heap fragmentation and metadata management in the C library, while providing direct access to the virtual memory system for scalability in high-performance applications.[2]
Common use cases include implementing custom heaps or stacks within a process, creating temporary buffers for computation-intensive tasks, and establishing shared memory segments for inter-process communication.[2] Anonymous mappings with MAP_SHARED are a Linux extension (since kernel 2.4) for sharing between related processes, but in POSIX, such mappings are private even if MAP_SHARED is specified. For portable POSIX shared memory between unrelated processes, use shm_open() to create a shared memory object in a memory-based filesystem (e.g., tmpfs), which can then be mapped with MAP_SHARED for equivalent behavior but with naming for easier management.[11][12][2] This approach bypasses file system overhead entirely, offering faster setup and lower latency compared to file-based alternatives for transient data sharing.[2]
Visibility and Synchronization
Memory Visibility
In memory-mapped regions established via the mmap system call, visibility of modifications depends on the mapping type specified by the MAP_SHARED or MAP_PRIVATE flags. For MAP_SHARED mappings, write operations modify the underlying memory object, making changes immediately visible to all other processes that have mapped the same object.[13] In contrast, MAP_PRIVATE mappings employ a copy-on-write mechanism, where any write triggers the kernel to create a private copy of the affected page for the modifying process; consequently, such changes remain local to that process and do not propagate to the underlying object or become visible to other processes.[13][2]
Several factors influence visibility in these mappings, particularly the role of the file system page cache in file-backed scenarios. When a file is mapped with MAP_SHARED, initial reads populate the page cache from the file's on-disk content, and subsequent writes update the cache directly; other processes accessing the same mapping observe these updates through shared cache pages without immediate disk involvement.[2] POSIX provides consistency guarantees such that changes to MAP_SHARED mappings are reflected in the underlying object, ensuring visibility across mappings, though persistence to stable storage requires additional synchronization to handle caching effects.[13]
Cross-process observation varies by mapping backing. In file-backed MAP_SHARED mappings, visibility occurs through the shared underlying file, where modifications propagate via the page cache and become observable by other processes remapping or accessing the file.[2] For anonymous MAP_SHARED mappings—typically created using a shared memory object file descriptor—changes enable direct kernel-mediated memory sharing, allowing immediate visibility to unrelated processes that map the same object without file system intermediation.[2]
At the thread level within a single process, in multi-threaded contexts, visibility of changes to mapped memory requires synchronization primitives such as mutexes, memory barriers, or atomic operations to ensure updates are visible and ordered across threads, as unsynchronized accesses may be reordered or cached locally by the compiler or hardware.[13]
Synchronization Mechanisms
The msync() system call provides a mechanism to explicitly synchronize modifications made to a memory-mapped region with its backing store, ensuring that dirty pages—those altered since mapping—are flushed to the underlying file or device. This function takes the starting address of the region, its length, and flags specifying the synchronization behavior; common flags include MS_SYNC, which blocks until the write completes and ensures data integrity by invalidating cached copies, and MS_ASYNC, which schedules the flush asynchronously without blocking the calling process (though on modern Linux kernels since version 2.6.19, MS_ASYNC is effectively a no-op as the kernel handles dirty page tracking automatically). Applications use msync() to guarantee timely persistence of changes, particularly in scenarios requiring durability, such as database operations or before process termination.[14][15]
For shared mappings (MAP_SHARED), changes may be written back asynchronously by the kernel's writeback mechanism, but there is no guarantee of flushing to the backing file upon unmapping with munmap() or process termination without explicit use of msync(); upon process termination, the kernel typically writes back dirty pages, though for reliability in critical applications, msync() is recommended. This behavior ensures potential persistence beyond the lifetime of the mapping or process, but without explicit calls like msync(), there is no assurance of immediate or ordered writes, potentially leading to partial updates if the system crashes. In private mappings (MAP_PRIVATE), modifications are not propagated to the backing store, and unmapping discards them without syncing.[2][16][15]
To coordinate concurrent access by multiple processes to the same mapped file, applications rely on advisory locking mechanisms applied to the underlying file descriptor, such as record locks via fcntl() or whole-file locks via flock(). These POSIX-compliant locks allow processes to acquire exclusive or shared locks on file regions, preventing overlapping writes and enabling mutual exclusion without kernel-enforced mandatory locking, which is not supported for mapped files in most implementations. For example, fcntl() with F_SETLKW can lock specific byte ranges corresponding to the mapped area, signaling intent to other processes to wait or avoid access.
The mmap() interface itself offers no built-in support for atomic operations or fine-grained locking on the mapped memory, meaning that concurrent reads and writes from multiple processes or threads can lead to data races or torn updates without additional safeguards. Developers must implement synchronization externally, often combining file locks with higher-level primitives like semaphores or mutexes for shared memory regions, to achieve thread-safety and consistency in multi-process environments. This layered approach is common in high-performance applications but requires careful design to avoid performance overhead from contention.[17]
System Call Interface
Parameters and Flags
The mmap system call in POSIX-compliant systems establishes a mapping between a process's virtual address space and a file, shared memory object, or anonymous memory region, with its parameters defining the mapping's location, size, access permissions, and behavior. The core parameters include addr, which specifies the desired starting address for the mapping (typically set to NULL to allow the kernel to choose an optimal location, or a specific page-aligned address as a hint); len, the number of bytes to map (must be greater than zero and typically rounded up to the nearest page boundary); prot, a bitwise OR combination of protection flags controlling access rights; flags, additional options dictating the mapping type and properties; fildes, the file descriptor of the underlying object (ignored for anonymous mappings); and off, the offset within the object from which to start the mapping (must be a multiple of the system page size). These parameters allow fine-grained control over how memory is allocated and accessed, ensuring compatibility with the process's address space constraints.[18][2]
Protection flags, specified via the prot parameter, define the allowable operations on the mapped pages and are enforced by the operating system at runtime. Common flags include PROT_READ for read access, PROT_WRITE for write access, PROT_EXEC for execute access, and PROT_NONE to prohibit all access; these can be combined bitwise (e.g., PROT_READ | PROT_WRITE for read-write permissions), but the combination must align with the underlying file's open mode and system policies. Violations of these protections, such as attempting to write to a read-only mapping, trigger a segmentation fault (SIGSEGV signal) or a page fault handled by the kernel, potentially leading to process termination if unhandled. This enforcement mechanism ensures memory safety and prevents unauthorized access.[18][2]
The flags parameter configures the mapping's sharing, fixity, and backing type, influencing inter-process interactions and resource usage. MAP_SHARED enables changes to the mapping to propagate to the underlying object and be visible to other processes mapping the same object, facilitating inter-process communication. In contrast, MAP_PRIVATE performs copy-on-write operations, where modifications remain local to the process without affecting the original object, providing isolated views of the data. MAP_FIXED requires the mapping to occur exactly at the address specified in addr, overriding the kernel's placement decision (useful for precise address control but risky if the region is already in use). For anonymous mappings, which lack a file backing and are initialized to zero, the MAP_ANONYMOUS flag is used (with fildes set to -1), supporting heap-like allocations without disk involvement; this flag is a common extension to the POSIX standard, widely supported in implementations such as Linux and BSD derivatives. Other flags like MAP_FIXED_NOREPLACE (Linux-specific, since kernel 4.17) prevent overwriting existing mappings, adding safety for address-sensitive applications. Flags must include exactly one of MAP_SHARED or MAP_PRIVATE, and invalid combinations result in failure.[18][2][16]
Error conditions arise from invalid or incompatible parameter values, with the system setting errno to indicate the failure reason upon returning MAP_FAILED. Common errors include EINVAL for invalid arguments, such as a zero-length mapping, non-page-aligned offset, incompatible prot and flags (e.g., lacking MAP_SHARED or MAP_PRIVATE), or an unaligned addr; ENOMEM when insufficient virtual address space or physical memory is available, often due to resource limits like RLIMIT_AS or RLIMIT_DATA; EACCES if the file descriptor lacks required permissions (e.g., write access denied for PROT_WRITE); EBADF for an invalid file descriptor (unless MAP_ANONYMOUS is specified); and ENOTSUP for unsupported protection combinations or features on the underlying object. In Linux, additional errors like EOVERFLOW occur for offset or length overflows on 32-bit systems with large files, while EPERM may arise for privileged operations such as executable mappings on no-execute filesystems. These conditions ensure robust error handling in applications using memory mappings.[18][2]
Return Values and Errors
Upon successful completion, the mmap() function returns a pointer to the start of the mapped memory region, represented as a void * type. This address serves as the base for accessing the mapped area and should be treated as opaque for pointer arithmetic to ensure portability and avoid assumptions about the virtual address space layout.[1][2]
In the event of failure, mmap() returns the constant MAP_FAILED, defined as (void *) -1 in <sys/mman.h>, and sets the global variable errno to indicate the specific error condition. Common errors include EACCES, which occurs when the file descriptor lacks the necessary read or write permissions (e.g., attempting PROT_WRITE on a read-only file); ENODEV, signaling that the underlying device or filesystem does not support memory mapping; EINVAL, triggered by invalid parameters such as a zero-length mapping or misaligned offset; ENOMEM, due to insufficient virtual address space or physical memory; and EBADF for an invalid file descriptor. Other errors like EAGAIN (resource limits exceeded for locking) or EOVERFLOW (offset plus length exceeds the file's maximum offset) may also arise depending on the implementation. Applications must explicitly check for MAP_FAILED after the call and examine errno to handle failures appropriately.[1][2]
Post-call validation is essential to ensure the mapping's integrity. Developers should verify that the returned address is not MAP_FAILED and, if using MAP_FIXED, confirm it matches the requested addr (which must be page-aligned). For file-backed mappings, if the specified length exceeds the file size, the full length is mapped. However, the portion of the last page beyond the file end is zero-filled, and accessing pages entirely beyond the file end results in a SIGBUS signal. Applications should check the file size (e.g., via fstat()) before accessing to avoid SIGBUS on regions beyond EOF.[1][16][2]
Portability considerations for the return value are particularly relevant across architectures with differing address space sizes. On 64-bit systems, the returned pointer can reside in a vast virtual address space (up to 2^64 bytes theoretically, though limited in practice), while 32-bit processes—even on 64-bit kernels—operate within a constrained 4 GB space, potentially leading to ENOMEM if the kernel allocates addresses outside this range; using addr = [NULL](/page/Null) (without MAP_FIXED) enhances compatibility by allowing the kernel to select a suitable address. Implementations may vary in handling large mappings, with some 32-bit environments requiring explicit 64-bit variants like mmap64() to support offsets beyond 32 bits.[1][2]
Usage Examples
C Programming Language
In the C programming language, the mmap() function, defined in <sys/mman.h>, enables processes to map files or devices into their virtual address space, treating them as arrays for direct access. This interface, part of the POSIX standard, takes parameters including the desired starting address (typically NULL for kernel selection), mapping length, protection flags (e.g., PROT_READ), sharing flags (e.g., MAP_PRIVATE), a file descriptor, and an offset. On success, it returns a pointer to the mapped region; on failure, it returns MAP_FAILED and sets errno.[16]
A basic example demonstrates mapping a file in read-only mode. The program opens a file, determines its size using fstat(), maps it with PROT_READ and MAP_PRIVATE flags for private copy-on-write behavior, accesses the data via pointer dereference, and unmaps it with munmap(). The following code snippet, adapted from examples in The Linux Programming Interface, reads and prints the contents of a file to stdout:
c
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
if (argc != 2) {
fprintf(stderr, "Usage: %s <file>\n", argv[0]);
exit(EXIT_FAILURE);
}
int fd = open(argv[1], O_RDONLY);
if (fd == -1) {
perror("open");
exit(EXIT_FAILURE);
}
struct stat sb;
if (fstat(fd, &sb) == -1) {
perror("fstat");
close(fd);
exit(EXIT_FAILURE);
}
if (sb.st_size == 0) {
printf("File is empty\n");
close(fd);
exit(EXIT_SUCCESS);
}
void *addr = mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
if (addr == MAP_FAILED) {
perror("mmap");
close(fd);
exit(EXIT_FAILURE);
}
if (write(STDOUT_FILENO, addr, sb.st_size) != sb.st_size) {
perror("write");
munmap(addr, sb.st_size);
close(fd);
exit(EXIT_FAILURE);
}
if (munmap(addr, sb.st_size) == -1) {
perror("munmap");
}
close(fd);
exit(EXIT_SUCCESS);
}
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
if (argc != 2) {
fprintf(stderr, "Usage: %s <file>\n", argv[0]);
exit(EXIT_FAILURE);
}
int fd = open(argv[1], O_RDONLY);
if (fd == -1) {
perror("open");
exit(EXIT_FAILURE);
}
struct stat sb;
if (fstat(fd, &sb) == -1) {
perror("fstat");
close(fd);
exit(EXIT_FAILURE);
}
if (sb.st_size == 0) {
printf("File is empty\n");
close(fd);
exit(EXIT_SUCCESS);
}
void *addr = mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
if (addr == MAP_FAILED) {
perror("mmap");
close(fd);
exit(EXIT_FAILURE);
}
if (write(STDOUT_FILENO, addr, sb.st_size) != sb.st_size) {
perror("write");
munmap(addr, sb.st_size);
close(fd);
exit(EXIT_FAILURE);
}
if (munmap(addr, sb.st_size) == -1) {
perror("munmap");
}
close(fd);
exit(EXIT_SUCCESS);
}
This approach replaces traditional read() calls by providing direct memory access, with the kernel handling paging on demand.[19]
For an advanced example illustrating shared visibility, consider mapping a region with MAP_SHARED before forking a child process; changes by one process are visible to the other due to the shared backing store. The following snippet, adapted from The Linux Programming Interface, uses an anonymous mapping (no file descriptor) to share an integer between parent and child, demonstrating how the child's increment is observed by the parent after wait():
c
#include <sys/wait.h>
#include <sys/mman.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
int *addr;
pid_t pid;
addr = mmap(NULL, sizeof(int), PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_ANONYMOUS, -1, 0);
if (addr == MAP_FAILED) {
perror("mmap");
exit(EXIT_FAILURE);
}
*addr = 1; /* Initialize shared value */
pid = fork();
if (pid == -1) {
perror("fork");
exit(EXIT_FAILURE);
}
if (pid == 0) { /* Child */
sleep(1); /* Ensure parent prints first */
printf("Child read: *addr = %d\n", *addr);
(*addr)++; /* Modify shared value */
printf("Child wrote: *addr = %d\n", *addr);
exit(EXIT_SUCCESS);
}
/* Parent */
printf("Parent read: *addr = %d\n", *addr);
if (wait(NULL) == -1) {
perror("wait");
exit(EXIT_FAILURE);
}
printf("Parent after child: *addr = %d\n", *addr); /* Sees 2 */
if (munmap(addr, sizeof(int)) == -1) {
perror("munmap");
}
exit(EXIT_SUCCESS);
}
#include <sys/wait.h>
#include <sys/mman.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
int *addr;
pid_t pid;
addr = mmap(NULL, sizeof(int), PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_ANONYMOUS, -1, 0);
if (addr == MAP_FAILED) {
perror("mmap");
exit(EXIT_FAILURE);
}
*addr = 1; /* Initialize shared value */
pid = fork();
if (pid == -1) {
perror("fork");
exit(EXIT_FAILURE);
}
if (pid == 0) { /* Child */
sleep(1); /* Ensure parent prints first */
printf("Child read: *addr = %d\n", *addr);
(*addr)++; /* Modify shared value */
printf("Child wrote: *addr = %d\n", *addr);
exit(EXIT_SUCCESS);
}
/* Parent */
printf("Parent read: *addr = %d\n", *addr);
if (wait(NULL) == -1) {
perror("wait");
exit(EXIT_FAILURE);
}
printf("Parent after child: *addr = %d\n", *addr); /* Sees 2 */
if (munmap(addr, sizeof(int)) == -1) {
perror("munmap");
}
exit(EXIT_SUCCESS);
}
For file-backed shared mappings, replace MAP_ANONYMOUS and -1 with a valid file descriptor and offset of 0, ensuring the file is opened with appropriate permissions. This setup highlights inter-process communication without explicit synchronization primitives like semaphores.[20]
Best practices for using mmap() in C emphasize robust error handling and portability. Always check the return value against MAP_FAILED and use perror() or strerror(errno) to diagnose issues such as ENOMEM (insufficient memory) or EINVAL (invalid parameters). For large files exceeding 2 GB, employ off_t offsets (enabled via _FILE_OFFSET_BITS=64 compilation flag) to support 64-bit addressing, and align offsets to page boundaries (typically 4 KB) using lseek() if necessary to avoid alignment errors. Avoid the MAP_FIXED flag unless absolutely required, as it forces a specific address and can lead to conflicts with the dynamic loader or other mappings, reducing portability across systems. Additionally, call munmap() to release mappings promptly, especially in long-running processes, to free virtual address space and allow the kernel to reclaim resources.[16][2]
Compared to standard I/O functions like fread() and fwrite(), which operate on buffered streams and are optimized for sequential access, mmap() offers advantages for non-sequential patterns by enabling direct pointer-based reads and writes without repeated lseek() or buffer management overhead. For instance, random access to file regions becomes as simple as array indexing, potentially reducing system call latency for scattered I/O in applications like databases, though fread() may suffice and be simpler for purely linear traversal of small files. This makes mmap() particularly suitable for sparse or irregular access, where the kernel's demand-paging defers physical loads until needed.[16][21]
Database Implementations
In database systems, memory-mapped file operations via mmap enable storage engines to map database files directly into a process's virtual address space, facilitating efficient access without explicit read or write system calls. This approach is particularly prominent in embedded and lightweight databases, such as SQLite, where the mmap Virtual File System (VFS) layer, introduced in version 3.7.17, uses the xFetch and xUnfetch methods to map pages of the database file into memory, allowing direct pointer access for queries.[22] By providing zero-copy access to file contents, mmap in SQLite supports seamless integration with the operating system's page cache, enabling queries to operate on mapped data without kernel-user space data transfers.[22]
The primary benefits of mmap in database implementations include reduced I/O overhead, especially for operations like index scans that involve sequential or random access patterns, as the OS handles paging transparently. This mechanism also allows databases larger than available RAM to function effectively, as only actively accessed pages are loaded into memory, leveraging the OS's demand-paging for scalability. In SQLite, for instance, configuring the mmap size via PRAGMA mmap_size (up to a default maximum of 64 GiB) optimizes performance for large datasets by minimizing explicit caching logic in the database engine itself.[22]
Prominent examples of mmap-based database engines include the Lightning Memory-Mapped Database (LMDB), a B+-tree key-value store that maps its entire database file into memory for direct access, eliminating the need for a separate page cache or buffer manager. LMDB achieves ACID-compliant transactions through a copy-on-write strategy on data pages, ensuring read consistency across multiple processes and threads without locks for readers, while writes are serialized to maintain durability via mmap's persistence guarantees. Similarly, MongoDB's WiredTiger storage engine, the default since version 3.2, employs memory-mapped files for I/O operations in its block manager, batching file system interactions to enhance throughput and support compression alongside caching for high-concurrency workloads.[23][24]
Despite these advantages, mmap introduces challenges in database environments, particularly for concurrent transactions and crash recovery. Ensuring transactional safety requires careful synchronization, as the OS may flush modified (dirty) pages to disk unpredictably, potentially persisting uncommitted changes; this necessitates explicit calls to msync to force durability, but incomplete msync operations during crashes can leave partial updates, complicating recovery. Databases like LMDB mitigate this with single-writer semantics and shadow paging, while others integrate write-ahead logging (WAL) to track committed operations separately from the mapped files, adding overhead for replay during recovery but ensuring atomicity. In multi-threaded or multi-process scenarios, additional protocols—such as copy-on-write or reader-writer locks—are needed to prevent corruption from concurrent modifications to shared mappings.[17]
History and Implementations
Development History
The mmap system call originated as part of enhancements to virtual memory management in the Berkeley Software Distribution (BSD) of Unix, first documented in the 4.2BSD System Manual released in August 1983. This specification aimed to provide support for memory-mapped files and inter-process shared memory, addressing limitations in earlier Unix versions that lacked efficient mechanisms for mapping file contents directly into a process's address space. The design drew inspiration from pioneering work on demand-paged virtual memory in the 1970s, including Unix's adoption of paging in Version 6 (1975), and earlier systems like Multics, which introduced memory-mapped I/O concepts in the late 1960s to enable hierarchical addressing and efficient file access.[25]
Development of the mmap interface was led by the University of California, Berkeley's Computer Systems Research Group (CSRG), with key contributions from the team including Bill Joy, who implemented a functional version in SunOS 4.0 in 1987, marking one of the earliest practical deployments.[26] Further refinements occurred during the 4.3BSD release in 1986, where the interface was elaborated in architectural documents to support sparse address spaces and shared libraries, though full kernel implementation in BSD awaited 4.4BSD in 1993.[27][28] mmap was implemented in the Linux kernel starting with version 0.98.2 in 1992, enabling its use in open-source Unix-like environments.[29]
Early adoption extended beyond BSD, with mmap integrated into AT&T's System V Release 4 (SVR4) in 1988, influencing a wide range of commercial Unix derivatives and promoting its use for high-performance I/O in applications like databases.[2] Standardization efforts culminated in its inclusion in POSIX.1-2001, ensuring portability across compliant systems, while extensions such as mmap64 for handling files larger than 2 GB were added in subsequent large file support (LFS) specifications to accommodate growing storage needs.[2]
The mmap system call is defined in the POSIX.1-2001 standard, providing core functionality for mapping files or devices into a process's virtual address space across compliant systems such as Linux, macOS, and BSD variants. These platforms support essential parameters including address hint, length, protection modes (PROT_READ, PROT_WRITE, PROT_EXEC), and flags like MAP_SHARED for visible modifications across processes or MAP_PRIVATE for copy-on-write behavior. Anonymous mappings (without a backing file) are widely available via MAP_ANONYMOUS, though not strictly required by POSIX, enabling allocation of private or shared memory regions. Compliance ensures portability for basic file-backed and anonymous mappings, with updates to shared mappings propagated to the underlying file or other processes as specified.
Variations arise in extended flag support. In Linux, additional flags enhance functionality, such as MAP_NORESERVE, which creates mappings without reserving swap space, useful for large sparse files to avoid overcommitment penalties. Another Linux-specific extension is MAP_HUGETLB (introduced in kernel 2.6.32), which allocates memory using huge pages (typically 2MB or 1GB) to reduce translation lookaside buffer (TLB) overhead in high-performance applications; this requires pre-allocated huge pages via kernel configuration. macOS and FreeBSD adhere more closely to POSIX with fewer extensions, supporting flags like MAP_ALIGNED for superpage alignments in FreeBSD (since version 9) but lacking Linux's MAP_NORESERVE or direct huge page mapping without filesystem mounts. For instance, FreeBSD introduces MAP_NOSYNC to disable asynchronous writes, optimizing for embedded or low-latency scenarios.[2][30][31]
On Windows, there is no direct mmap equivalent; instead, the Win32 API uses CreateFileMapping to create a file mapping object (backed by a file or the paging file), followed by MapViewOfFile to map a view into the process address space. This two-step process differs from Unix mmap's single call, requiring explicit handle management and lacking native support for anonymous mappings without involving the paging file via INVALID_HANDLE_VALUE in CreateFileMapping. Protection and offset specifications are similar, but Windows emphasizes section objects for inter-process sharing, with no direct equivalent to MAP_PRIVATE's copy-on-write for files.[32][33]
Android provides partial mmap support through its Bionic libc, implementing the POSIX interface for file and anonymous mappings to enable memory-efficient I/O in resource-constrained mobile environments. However, Bionic's implementation omits some advanced features like full huge page support and may exhibit variations in behavior due to Android's customized kernel, such as stricter overcommit limits to prevent OOM kills. In embedded systems, mmap availability depends on the kernel; Linux-based embedded platforms support it, but systems without a memory management unit (MMU) or global virtual file system (VFS) often lack shared mappings (MAP_SHARED), restricting use to private, process-local views to avoid synchronization overhead.[34][35]