Unix domain socket
A Unix domain socket, also known as a local socket or IPC socket, is a data communications endpoint for exchanging data between processes executing on the same host machine, utilizing the AF_UNIX (or AF_LOCAL) address family defined in the POSIX standard.[1] Introduced in the 4.2BSD release of the Unix operating system in 1983, these sockets provide an efficient mechanism for local inter-process communication (IPC) that leverages the filesystem namespace for addressing via thesockaddr_un structure.[2][3]
Unix domain sockets support three primary types: SOCK_STREAM for reliable, connection-oriented communication; SOCK_DGRAM for connectionless, reliable datagram exchange; and SOCK_SEQPACKET for sequenced, reliable packet transmission (the latter available since Linux 2.6.4).[4] They can operate in unnamed mode for simple peer-to-peer connections, be bound to a filesystem pathname (where the socket appears as a special file type), or use Linux-specific abstract namespaces that do not create visible filesystem entries.[4] The address structure includes a family field set to AF_UNIX and a pathname component of variable length, typically up to 108 bytes on Linux systems, though POSIX does not mandate a fixed size.[3][4]
Key features distinguish Unix domain sockets from network sockets, including the ability to pass file descriptors, process credentials, and security contexts via ancillary data mechanisms like SCM_CREDENTIALS (introduced in Linux 2.2).[4] Socket options such as SO_PASSCRED, SO_PEERCRED, and SO_PASSSEC enable authentication and access control based on filesystem permissions for pathname-bound sockets.[4] While POSIX standardizes the core interface, implementations like Linux extend functionality with non-portable features, making Unix domain sockets a foundational IPC tool in Unix-like systems for applications requiring low-overhead, secure local data exchange.[1][4]
Introduction
Definition and Purpose
Unix domain sockets, also known as AF_UNIX or AF_LOCAL sockets, are a form of inter-process communication (IPC) in Unix-like operating systems that enables processes executing on the same host to exchange data by leveraging the file system namespace for addressing, thereby avoiding involvement of the network protocol stack.[4][5] This mechanism treats sockets as special files within the filesystem, allowing communication endpoints to be identified via pathnames or abstract identifiers, which facilitates direct, kernel-mediated data transfer between local processes.[4] The primary purpose of Unix domain sockets is to support efficient, low-overhead communication for applications requiring local IPC, such as client-server architectures confined to a single machine, where the absence of network layering reduces latency and resource usage compared to TCP/IP sockets over loopback interfaces.[4] By operating entirely within the kernel's local domain, they minimize context switches and data copying, making them suitable for high-performance scenarios on shared-host environments.[6] Key characteristics of Unix domain sockets include bidirectional data flow, support for both stream-oriented semantics (reliable, ordered byte streams) and datagram-based semantics (unordered, message-preserving packets), and seamless integration with the POSIX socket application programming interface for portability across compliant systems.[4][5] These features enable reliable local messaging without the complexities of network addressing or routing. Common use cases encompass database servers like PostgreSQL or MySQL interacting with local clients via socket paths such as /tmp/.s.PGSQL.5432, as well as inter-daemon communication in Unix-like systems for tasks like service coordination and event notification.[7]Historical Development
Unix domain sockets originated in the Berkeley Software Distribution (BSD) Unix variants, specifically introduced in 4.2BSD released in August 1983 by the Computer Systems Research Group (CSRG) at the University of California, Berkeley.[8] Developed as an extension to the Berkeley sockets API, they were designed to facilitate efficient interprocess communication (IPC) on the same host, addressing the limitations of traditional Unix pipes, which were unidirectional and required a common ancestor process for related processes.[8] In 4.2BSD, pipes themselves were reimplemented using pairs of connected Unix domain stream sockets, enhancing the overall IPC framework with bidirectional, reliable communication channels.[8] The CSRG team, funded in part by DARPA for networking research, drove this innovation to unify local and network IPC under a single socket paradigm, with key contributions from developers like Bill Joy and the broader Berkeley networking group.[8] Adoption extended beyond BSD when AT&T incorporated the socket interface into Unix System V Release 4 (SVR4) in 1988, allowing portability of BSD socket-based applications to System V derivatives and promoting wider use in commercial Unix systems.[9] Standardization followed in the inaugural POSIX.1 specification (IEEE Std 1003.1-1988), which defined the AF_UNIX address family for Unix domain sockets, ensuring cross-platform compatibility in POSIX-compliant systems.[10] Subsequent enhancements included Linux-specific features like abstract sockets, introduced in kernel version 2.2 in January 1999, which allow unnamed sockets without filesystem entries for improved performance and cleanup.[4] In 4.2BSD, Unix domain sockets initially supported SOCK_STREAM for stream-oriented communication and SOCK_DGRAM for connectionless, message-preserving transfers. SOCK_SEQPACKET, providing reliable, ordered delivery with boundary preservation, was added in subsequent BSD releases and POSIX standards.[8] These developments solidified Unix domain sockets as a core IPC mechanism in Unix-like operating systems.[4]Core Concepts
Socket Types and Semantics
Unix domain sockets support three main socket types, each defining distinct communication semantics within the local inter-process communication framework. The SOCK_STREAM type establishes a reliable, bidirectional, connection-oriented byte stream, ensuring ordered delivery without message boundaries, akin to a virtual pipe between processes on the same host.[11] This type requires explicit connection setup via system calls like connect(), providing flow control and error handling to prevent data loss or duplication.[4] In contrast, the SOCK_DGRAM type enables connectionless, datagram-oriented communication, where each message is treated as an atomic unit with preserved boundaries but without guaranteed ordering or reliability in the POSIX specification; however, many implementations, including Linux, ensure reliable delivery since operations occur entirely within the kernel, eliminating network-related losses.[11][4] Datagrams in this type undergo no fragmentation, as the entire message is delivered atomically if the receiving buffer can accommodate it, and no connection establishment is needed for send or receive operations.[4] The SOCK_SEQPACKET type combines aspects of the previous two, offering reliable, connection-oriented, sequenced packet streams that maintain message boundaries and deliver data in the exact order sent, using flags like MSG_EOR to delineate records.[11][4] A key feature across these types is support for ancillary data, which allows transmission of metadata beyond the primary payload; notably, SCM_RIGHTS enables the passing of open file descriptors between processes over the socket, facilitating resource sharing without serialization.[12][4] Unlike IP-based sockets, Unix domain sockets operate exclusively on the local machine, bypassing the network stack entirely, which obviates the need for routing, address resolution, or checksum computations, resulting in lower overhead and higher performance for intra-host communication.[4] This local scope ensures all semantics are enforced by the kernel without external protocol layers.[11]Addressing Mechanism
Unix domain sockets operate within the AF_UNIX address family, also known as AF_LOCAL in POSIX standards, which enables local interprocess communication without relying on network protocols.[13][4] Addresses in this family are identified using the filesystem namespace or, on Linux, an abstract namespace, distinguishing them from network sockets that use IP addresses and ports.[4] Unlike Internet domain sockets, Unix domain addressing does not involve port numbers; instead, endpoints are referenced by pathnames or abstract identifiers that must remain unique within their respective namespace to ensure proper connection resolution.[4] The core address structure is defined asstruct sockaddr_un, which includes a family field and a path component.[3] Specifically, it consists of sa_family_t sun_family, set to AF_UNIX or AF_LOCAL, followed by char sun_path[], an array for storing the address identifier.[3][4] In POSIX implementations, the size of sun_path varies but is typically sufficient for paths up to 92–108 characters; Linux defines a maximum of 108 bytes via UNIX_PATH_MAX.[3][4] For pathname-based addressing, sun_path holds a null-terminated string representing a filesystem path, and the full address length is calculated as offsetof(struct sockaddr_un, sun_path) + strlen(sun_path) + 1.[4]
Pathname sockets bind to actual filesystem entries, which appear as special files (type S_IFSOCK) and persist until explicitly removed, allowing resolution through standard directory lookups.[4] This ties the socket's visibility to the file system's structure and permissions, enabling processes to locate endpoints via path traversal.[4] In contrast, Linux's abstract namespace provides a non-persistent alternative where the address begins with a null byte (\0) in the first position of sun_path, followed by an arbitrary sequence of bytes that do not create or rely on filesystem objects.[4] These abstract addresses exist solely in kernel memory, vanishing when all references to the socket are closed, and their length excludes the leading null byte in binding operations.[4]
During binding, the kernel ensures address uniqueness within the chosen namespace to prevent conflicts, rejecting attempts to bind duplicate pathnames or abstract identifiers.[4] Pathnames must be encoded as null-terminated strings, while abstract names use raw bytes without termination, allowing flexible but namespace-specific identification.[4] This mechanism supports efficient local routing, as the kernel directly maps addresses to process descriptors without intermediate network layers.[4]
Implementation Details
System Calls for Creation and Management
Unix domain sockets are created using thesocket() system call with the address family AF_UNIX (also known as AF_LOCAL), specifying a socket type such as SOCK_STREAM for reliable, connection-oriented communication, SOCK_DGRAM for reliable datagram service, or SOCK_SEQPACKET for sequenced packet transmission, and a protocol value of 0 to select the default protocol.[11][4] The call returns a file descriptor on success or -1 on failure, with common errors including EAFNOSUPPORT if the address family is unsupported, EMFILE or ENFILE if file descriptor limits are reached, and EPROTONOSUPPORT if the specified protocol is invalid for the family.[11][4]
To assign a local address to the socket, the bind() system call is used, taking the socket file descriptor, a pointer to a sockaddr_un structure containing the AF_UNIX family and the address (such as a pathname or abstract namespace identifier, as detailed in the Addressing Mechanism section), and the length of the address structure.[14][4] On failure, it returns -1 and sets errno to values like EADDRINUSE if the address is already in use, ENOENT if a pathname component does not exist, or EACCES if permissions are insufficient for the filesystem path.[14][4]
For server-side operations, after binding, the listen() call marks the socket as passive, specifying a backlog parameter to limit the queue of incoming connections, typically up to the system-defined SOMAXCONN limit.[15] It returns 0 on success or -1 with errors such as EINVAL for an invalid backlog or ENOTSOCK if the descriptor is not a socket.[15] The accept() call then extracts the next pending connection from the queue, creating a new connected socket descriptor while optionally retrieving the peer's address; it blocks until a connection arrives unless the socket is in non-blocking mode, in which case it returns EAGAIN or EWOULDBLOCK if the queue is empty.[16] Common errors include EMFILE for process file descriptor exhaustion and ECONNABORTED if the connection attempt was aborted.[16]
Client-side connection establishment uses the connect() system call on an unbound socket, providing the server's sockaddr_un address and length to link the two endpoints.[17][4] It returns 0 on success for connection-mode sockets or -1 with errno set to ECONNREFUSED if the server rejects the connection, ENOENT if the server's pathname does not exist, or EISCONN if already connected.[17][4]
Data transfer on connected Unix domain sockets can employ send() and recv() for message-oriented operations or the generic read() and write() system calls on the socket descriptor, with send() transmitting up to the specified length and returning the number of bytes sent or -1 on errors like ENOTCONN if unconnected or EPIPE if the peer has closed the connection.[18][4] Similarly, recv() receives data into a buffer, returning the byte count, 0 on orderly shutdown, or -1 with errors such as ECONNRESET if the connection is reset.[19] For datagram sockets, these calls handle individual messages, while stream sockets treat data as a continuous byte stream.[19][4]
To manage connection lifecycle, shutdown() disables send, receive, or both directions on a connected socket via the how parameter (SHUT_RD, SHUT_WR, or SHUT_RDWR), returning 0 on success or -1 with ENOTCONN if unconnected.[20] Finally, close() deallocates the socket descriptor, potentially blocking briefly if the SO_LINGER option is set to ensure data transmission, and returns 0 on success or -1 with EBADF for an invalid descriptor.[21]
An important ancillary feature of Unix domain sockets is the ability to pass file descriptors between processes using control messages with sendmsg() and recvmsg(), where the msghdr structure's msg_control buffer carries ancillary data via level SCM_RIGHTS and type SCM_RIGHTS for file descriptors, limited typically to 253 per message to avoid reference counting issues.[22][4] These calls return the number of bytes sent or received on success, or -1 with errors like ETOOMANYREFS if too many descriptors are passed or EBADF for invalid ones; credentials can also be passed using SCM_CREDENTIALS if the SO_PASSCRED socket option is enabled.[22][4]
Kernel-Level Handling
In the Linux kernel, Unix domain sockets are managed through specialized data structures that extend the generic socket framework. The core structure isstruct sock, which provides common socket functionality including receive and send queues, such as sk_receive_queue for buffering incoming data in the form of sk_buff objects. This is extended by struct unix_sock, embedded within struct sock, which includes members like addr for the socket address, path for filesystem associations, peer for the connected counterpart, and inq_len to track input queue length. These structures enable efficient local communication without the overhead of network protocols.[23]
Data flow in Unix domain sockets relies on kernel-managed buffers to handle transmission between processes. Incoming data is queued in the sender's output buffer and then moved to the receiver's sk_receive_queue, where it awaits user-space retrieval via system calls like recvmsg. Due to the local nature of these sockets, zero-copy optimizations are feasible; for instance, file descriptors passed via ancillary data (SCM_RIGHTS) reference shared kernel file descriptions without duplicating content, and direct memory transfers between process address spaces minimize copying. Pathname-based sockets associate with filesystem inodes via the sockfs pseudo-filesystem, allowing the kernel to enforce access through inode permissions during binding and connection.[4][24]
Protocol handling for Unix domain sockets bypasses traditional transport layers, enabling direct process-to-process routing. Connections are established via file descriptors or process IDs embedded in control messages (e.g., SCM_CREDENTIALS for PID, UID, and GID), with the kernel using the peer pointer in struct unix_sock to link sockets without address resolution overhead. Supported types include stream (SOCK_STREAM), datagram (SOCK_DGRAM), and sequential packet (SOCK_SEQPACKET), with data queued per-message for datagrams to preserve boundaries. Cleanup occurs automatically: abstract sockets (those without a filesystem path) are removed from the kernel's address space upon the last close of all references, while pathname sockets require explicit unlinking from the filesystem to remove their inode association, preventing lingering entries.[4][23]
Practical Usage
Pathname vs. Abstract Sockets
Unix domain sockets support two primary addressing modes: pathname-based and abstract. Pathname sockets bind to a filesystem path, creating a visible socket file that persists until explicitly unlinked, while abstract sockets use a namespace independent of the filesystem, remaining invisible and automatically cleaned up upon process termination.[4] Pathname sockets require specifying a filesystem path during the bind operation, such as/tmp/mysock, which results in the creation of a special socket file in the directory. This file is marked with socket type permissions and requires the process to have write access to the parent directory. The socket remains active even after the creating process exits, as long as it is not removed via unlink, allowing for persistent interprocess communication across process lifecycles. For creation, a process typically creates a temporary directory if needed (e.g., using mkdir), binds the socket to the path with the bind system call, and ensures cleanup by unlinking the path after use to avoid leaving orphaned files.[4]
Abstract sockets, a Linux-specific extension not available in standard BSD implementations, are addressed by a path prefixed with a null byte (e.g., \0mysock), preventing any filesystem interaction or visibility. These sockets do not create files, bypassing filesystem overhead and permissions like umask, and are automatically removed when all file descriptors referencing them are closed, eliminating manual cleanup needs. To create one, the address structure for bind includes the null-prefixed string without a terminating null, limiting the effective path length to the socket address size minus the family field (typically up to 108 bytes on Linux).[4][4]
The choice between modes involves trade-offs in visibility, security, and performance. Pathname sockets offer discoverability, as they appear in directory listings (e.g., via ls) and leverage filesystem permissions for access control, making them suitable for scenarios requiring explicit endpoint identification. In contrast, abstract sockets provide enhanced privacy by avoiding filesystem exposure and slightly faster binding due to no disk I/O, but they lack discoverability and rely on out-of-band coordination for endpoint sharing, which can complicate multi-process setups.[4][25]
Security and Permissions
Unix domain sockets are treated as special files in the filesystem when bound to a pathname, inheriting the standard Unix permission model where access is controlled by mode bits such as those set viachmod(2). For instance, connecting to a stream socket or sending datagrams requires write permission on the socket file itself, while search (execute) permission is needed on all directories in the path prefix. These permissions default to all bits enabled except those masked by the process's umask and can be modified using chmod(2) to restrict access, such as setting mode 0777 for broad readability or more restrictive values for security.[4][26][27]
Ownership of pathname-bound sockets follows the creator's user ID (UID) and group ID (GID), which can be altered with chown(2). During connection attempts via connect(2), the kernel verifies that the connecting process has the necessary permissions against the socket's ownership and mode bits, denying access with EACCES if write permission on the socket or search permission on the path components is lacking. This model ensures that only authorized processes can establish communication, with the superuser able to bypass certain restrictions.[4][28][29]
Advanced security features allow explicit credential passing to enhance authentication. Setting the SO_PASSCRED socket option enables the receiver to obtain the sender's process ID (PID), UID, and GID through an SCM_CREDENTIALS ancillary data message attached to messages via sendmsg(2). This mechanism, which requires kernel verification of the sender's credentials, permits fine-grained access control based on the peer's identity, though privileged processes can forge credentials. Additionally, SO_PEERCRED retrieves the peer's credentials for connected stream sockets. Abstract sockets, lacking filesystem ties, ignore these ownership and permission changes, relying instead on other mechanisms like network namespaces.[4][30]
Pathname-bound sockets are susceptible to race conditions during creation and binding, where an attacker might exploit timing gaps in filesystem operations to access or hijack the socket path. To mitigate such races when creating intermediate directories, use mkdir(2) with the O_EXCL flag to ensure atomic existence checks and creation. Abstract sockets circumvent filesystem-based attacks entirely by not creating persistent files, thus avoiding permission-related vulnerabilities tied to the directory structure.[4][31]