Fact-checked by Grok 2 weeks ago

Netlink

Netlink is a socket-based (IPC) protocol in the that facilitates the exchange of information between kernel space and user-space processes, as well as among user-space processes, primarily for network configuration, , rules, and system events such as audit logs and updates. It operates using datagram-oriented sockets in the AF_NETLINK address family, supporting both and messaging through a standardized message format that includes a header (nlmsghdr) and variable-length , often structured with type-length-value (TLV) attributes for flexibility. Originally developed as a replacement for the rigid ioctl() system calls and inspired by BSD 4.4 routing sockets, Netlink evolved from an early character device called "Skiplink" implemented by Alan Cox in Linux kernel version 1.3.31 in 1995, before being redesigned as a full socket interface by Alexey Kuznetsov in version 2.1.68 in 1997, with the stable socket-based version appearing in Linux 2.2 in 1999. The protocol was formally described in RFC 3549 in 2003 as an IP services protocol, highlighting its role in kernel-user space interactions for forwarding engines and control plane components, though it remains Linux-specific and non-standardized beyond that document. Key features of Netlink include support for multiple protocol families—such as NETLINK_ROUTE for routing tables, NETLINK_FIREWALL for /netfilter, NETLINK_AUDIT for security auditing, and NETLINK_GENERIC for extensible subsystems introduced in 2005—allowing dynamic registration and introspection without fixed structures. It enables asynchronous notifications from the kernel to user space via groups, optional message acknowledgments, and atomic operations for tasks like route additions, but it is not connection-oriented or guaranteed reliable, potentially dropping messages under memory pressure or errors. Libraries like libnl provide user-space APIs to simplify interactions, making Netlink a foundational component for tools such as for network management.

Overview

Definition and Purpose

Netlink is a socket-based (IPC) mechanism in the , utilizing the address family AF_NETLINK to enable bidirectional data transfer between user-space processes and kernel modules. This interface operates as a datagram-oriented service, primarily using SOCK_RAW or SOCK_DGRAM types, and supports for direct messaging and for group-based distribution. By leveraging a standardized , Netlink allows user-space applications to interact with subsystems in a structured yet adaptable manner, without relying on device-specific file descriptors. The primary purpose of Netlink is to facilitate the , , and of subsystems, with a particular emphasis on networking features such as routing tables, firewall rules, and interface statistics. It enables flexible by employing a type-length-value (TLV) format for payloads, which avoids the need for rigid C structures and supports extensibility through additional attributes or multipart messages. This design promotes efficient data exchange for tasks like updating state from user-space tools or retrieving dynamic information, such as events or policy changes. Netlink was developed as a replacement for the legacy ioctl() interface, addressing its key limitations including poor extensibility, lack of bidirectional support, and challenges in portability across architectures due to fixed command definitions. Unlike ioctl(), which is inherently synchronous and user-initiated, Netlink supports asynchronous operations where the kernel can proactively send notifications to subscribed user-space processes. This core concept of kernel-generated messages complements user-initiated requests, enabling event handling and reducing polling overhead in applications like network daemons. Various Netlink protocol families, such as NETLINK_ROUTE for routing and NETLINK_AUDIT for security auditing, extend its applicability across different kernel domains.

Key Features and Advantages

Netlink provides robust scalability through its support for multicast groups, enabling multiple user-space processes to receive kernel-generated events efficiently without requiring individual unicast transmissions. Each Netlink family can define up to 32 multicast groups, allowing processes to join specific groups via the bind(2) system call using a bit mask in the nl_groups field, which facilitates one-to-many communication for scenarios like routing updates or device notifications. This multicast capability enhances system efficiency by reducing the overhead of repeated kernel-to-user messaging compared to traditional one-to-one interfaces. A core aspect of Netlink's flexibility lies in its variable-length , which employs a type-length-value (TLV) encoding for attributes in the . This design allows for extensible payloads where new attributes can be added without disrupting , as receivers can ignore unknown types while processing known ones. The header includes an nlmsg_len field to delineate variable sizes, supporting multipart messages flagged with NLM_F_MULTI and terminated by NLMSG_DONE, which accommodates complex data exchanges while maintaining protocol evolution. Unlike unidirectional system calls, Netlink enables true bidirectional communication, permitting the to initiate messages to user for asynchronous notifications, such as event-driven updates. This duplex nature, built on a full-duplex model, allows the to push without user- polling, addressing limitations in mechanisms where user must always trigger interactions. Netlink's advantages include strong portability across different architectures, as it leverages the standard BSD socket API without reliance on architecture-specific structures. It supports asynchronous operation through mechanisms like select(2) and poll(2) on the socket descriptor, enabling non-blocking I/O for efficient event handling. Integration with familiar socket functions, such as socket(2), sendmsg(2), and recvmsg(2), simplifies development by reusing established networking primitives. In comparison to , Netlink avoids the pitfalls of fixed-format C structures that often lead to user-kernel mismatches across versions or architectures, offering instead a structured, extensible alternative that eliminates the need for new ioctl commands when extending functionality. It also proves more efficient than reading or writing to /proc or /sys filesystems for dynamic data, as these file-based methods lack native support for large transfers, events, or , often limiting payloads to a single page and requiring constant polling. These attributes collectively position Netlink as a preferred mechanism for modern systems handling networking and configuration tasks.

History and Development

Origins in Linux Kernel

Netlink evolved from an early character device called "Skiplink" implemented by Alan Cox in Linux kernel version 1.3.31 in 1995. It was redesigned as a full socket interface by Alexey Kuznetsov in version 2.1.68 in 1997, with the stable socket-based version introduced in Linux kernel version 2.2, released in 1999. It was primarily developed by Alexey Kuznetsov, building upon Alan Cox's work, to provide a more extensible alternative to traditional mechanisms for network management. This development occurred at the Institute for System Programming of the Russian Academy of Sciences (INR RAS), where Kuznetsov contributed significantly to Linux networking features. The primary motivations for creating Netlink stemmed from the limitations of existing interfaces, such as calls, which were rigid and primarily , lacking support for bidirectional communication or efficient data transfer for network statistics and configurations. Netlink addressed these issues by offering a standardized, asynchronous that enabled capabilities and reduced the need for custom modifications. It drew inspiration from BSD's sockets (AF_ROUTE), extending their concepts to support service control and forwarding separation in a more flexible manner, allowing user-space processes to query and modify tables dynamically. Early adoption of Netlink centered on its integration with the suite of tools, also authored by , which replaced older utilities like route and for advanced management. These tools leveraged Netlink's NETLINK_ROUTE family to interact with the kernel's subsystem, enabling efficient and traffic control without relying on deprecated interfaces. Initially, Netlink focused on networking tasks, such as route manipulation and interface configuration, but was architected with a modular family system to support broader kernel-user space interactions beyond just services. This design laid the groundwork for its expansion into other domains, though its core emphasis remained on providing a reliable mechanism for network-related operations.

Evolution and Milestones

Netlink's evolution in the early 2000s saw significant expansions within the , transitioning from its initial focus on to broader subsystem integrations. By version 2.4.6, the NETLINK_FIREWALL was introduced to enable communication for IPv4 packet handling in operations, allowing user-space applications to receive and process packets from the netfilter framework. Similarly, starting with 2.6.6, the NETLINK_AUDIT provided an interface to the Linux Audit Subsystem, facilitating the transmission of audit records from to user space for security logging and compliance monitoring. A pivotal advancement came with the introduction of multicast support in kernel 2.6.14, which added socket options like NETLINK_ADD_MEMBERSHIP and NETLINK_DROP_MEMBERSHIP, enabling efficient one-to-many messaging for event notifications across multiple user-space listeners. This was complemented by the launch of Generic Netlink in kernel 2.6.15, which standardized user-defined protocol families through dynamic registration and a simplified , reducing the need for custom kernel patches and promoting extensibility for new subsystems. Standardization efforts formalized Netlink's role as an IP services protocol via RFC 3549 in July 2003, which detailed its use for intra-kernel and kernel-user-space messaging, including and applications, while emphasizing its TLV-based for flexibility. support was integrated through the NETLINK_ROUTE6 family starting in kernel 2.2, allowing user-space management of routing tables and addresses alongside IPv4 equivalents. To ease user-space development, the libnl library suite was released in the mid-2000s, providing high-level C APIs for Netlink socket handling, message construction, and protocol-specific abstractions, with initial versions like libnl-1.0 appearing around 2005 to abstract raw socket complexities. Experimental support for Netlink extended beyond Linux in 2022, when FreeBSD added an initial implementation of the protocol (per RFC 3549) to its kernel, enabling compatibility for tools like iproute2 and improving cross-platform networking utilities. In recent years, up to 2025, Netlink has seen enhancements in security and adoption in modern environments. Its role in container orchestration has grown, with networking plugins such as those using the Container Network Interface (CNI) relying on Netlink for route manipulation and interface configuration in environments like Multus or . Furthermore, integrations with have boosted , where eBPF programs load and communicate via Netlink sockets (e.g., using the BPF family) to export data for and tracing, as seen in tools like bpftool and production systems from vendors like .

Architecture

Socket-Based Interface

Netlink provides a socket-based interface that leverages the standard API, enabling user-space applications to communicate with the in a manner consistent with conventional network programming models. This design replaces traditional ioctl() calls with a more extensible socket-oriented approach, supporting bidirectional data exchange for tasks such as network configuration and . The address family for Netlink sockets is specified as AF_NETLINK during socket creation, distinguishing it from other families like AF_INET used for . Within this family, protocols are identified by numeric constants; for example, NETLINK_ROUTE, assigned the value 0, is dedicated to modifications, status updates, and related networking events. Socket types supported include SOCK_RAW and SOCK_DGRAM, both treated equivalently by the Netlink as datagram-oriented, allowing raw access to messages without protocol-specific processing layers. Netlink incorporates support for PID namespaces, facilitating isolated communication environments particularly useful in containerized systems. The Netlink port ID () in socket addresses refers to the socket within its specific PID namespace, ensuring that processes sharing the same PID across different namespaces maintain distinct communication channels. Error handling follows standard socket conventions, utilizing errno codes such as ENOBUFS to signal receive queue overflows, which requires applications to implement buffer resizing or message resynchronization to prevent . As ordinary file descriptors, Netlink sockets integrate naturally with Linux's I/O multiplexing facilities, permitting the use of select(2), poll(2), or epoll(7) to monitor multiple sockets concurrently for efficient event-driven programming. Messages exchanged over these sockets adhere to a structured format comprising headers and payloads, as defined in the Netlink message protocol.

Communication Mechanisms

Netlink employs unicast for direct, point-to-point message delivery between the kernel and user-space processes, utilizing port IDs specified in the nl_pid field of the sockaddr_nl structure. In user space, the kernel automatically assigns the nl_pid as the process ID for the initial socket opened by an application, while subsequent sockets receive unique identifiers to enable targeted routing. This mechanism ensures reliable delivery of messages to specific recipients without involving intermediaries. For broader dissemination, Netlink supports modes through group subscriptions managed via the nl_groups bitmask in the socket address, allowing up to 32 groups per protocol family. User-space applications subscribe to these groups using the NETLINK_ADD_MEMBERSHIP socket option, enabling the to events—such as route changes or link status updates—to all interested parties within a group, like RTMGRP_LINK for notifications. Only processes with privileges or the CAP_NET_ADMIN capability can send messages to these groups, though reception is often permitted for other users in families like NETLINK_ROUTE. Asynchronous notifications facilitate event-driven communication, where the initiates messages to user space for real-time updates, such as interface going up or down, using functions like netlink_unicast on the side to target specific ports. These notifications are crucial for dynamic and are commonly used in standard families, for example, routing dumps in the route netlink family. Dump operations enable iterative data retrieval, such as querying routing tables, by setting the NLM_F_DUMP flag in messages; the responds with a sequence of responses, and user space can request acknowledgments via NLM_F_ACK to confirm receipt and handle multi-part dumps reliably. Queue management in Netlink involves per-socket receive and send buffers that queue incoming and outgoing messages, with default sizes of 212992 bytes (208 KiB) on 64-bit 2.6 and later to balance performance and memory usage. These buffers are configurable using socket options like SO_RCVBUF and SO_SNDBUF (or library wrappers such as nl_socket_set_buffer_size in libnl), allowing applications to increase sizes for high-volume scenarios and prevent overflows, which would otherwise trigger ENOBUFS errors and message drops. Proper sizing is essential for maintaining communication integrity under load, as undelivered messages can lead to incomplete state synchronization between and user space.

Message Format

Basic Packet Structure

Netlink messages are structured as a byte stream beginning with a fixed 16-byte header known as struct nlmsghdr, which provides essential metadata for routing, sequencing, and processing the message. The header fields are defined as follows:
FieldTypeSize (bytes)Description
nlmsg_len__u324Total length of the message in bytes, including the header and payload; must be at least the size of the header (16 bytes) and a multiple of 4 for alignment.
nlmsg_type__u162Message type indicating the content or purpose, such as RTM_NEWROUTE for a new route notification or NLMSG_ERROR for error reporting.
nlmsg_flags__u162Control flags directing message handling, for example NLM_F_REQUEST to indicate a user-space request to the kernel or NLM_F_MULTI for multipart messages.
nlmsg_seq__u324Sequence number assigned by the sender to track message order and match requests with responses.
nlmsg_pid__u324Port ID of the sender, typically the process ID (PID) in user space or 0 for kernel-generated messages.
This structure ensures 4-byte alignment across the entire message, achieved through padding macros like NLMSG_ALIGN(), promoting architecture independence by avoiding assumptions about native word sizes or in multi-byte fields, which are transmitted in network byte order. The nlmsghdr format has remained fixed since its introduction in Linux kernel 2.2, with protocol evolution handled through flag extensions rather than header modifications. The , which carries protocol-specific data such as type-length-value attributes, immediately follows the header and has a variable of nlmsg_len - sizeof(struct nlmsghdr), padded to 4 bytes if necessary.

Attributes and Payload Encoding

Netlink messages encapsulate their variable using a sequence of attributes, which provide a flexible, extensible mechanism for conveying kernel-specific data beyond the fixed message header. These attributes follow the header and are parsed independently, allowing for optional or conditional inclusion of information without altering the core message structure. The attribute format adheres to a length-type-value (LTV) scheme, where each attribute is self-describing and aligned for efficient processing. Each attribute is represented by a 4-byte header followed by its value and any necessary padding. The header consists of a 16-bit field (nla_len), which specifies the total size of the attribute in bytes (including the header and padding, with a minimum value of 4), and a 16-bit type field (nla_type), which identifies the attribute's semantic meaning within the context of the Netlink family (e.g., RTA_DST for a destination in messages). The value portion immediately follows the header and can hold data of varying types, such as integers, strings, or binary blobs, depending on the attribute's policy. To maintain alignment, the entire attribute—including its value—is padded to the next 4-byte boundary using zero bytes, ensuring that subsequent attributes start at a multiple of 4 bytes from the message's beginning. This padding is calculated using the NLMSG_ALIGN macro, which rounds up to the nearest 4-byte multiple. Multi-byte values within attributes are encoded in host byte order (little-endian on most architectures) by default, facilitating direct use in and user-space code. However, for attributes requiring portability across —such as network-related fields—the high bits of the type field can set the NLA_F_NET_BYTEORDER flag (bit 14), indicating that the payload uses network byte order (big-endian). The type field's lower 14 bits (bits 0-13) define the actual type, with bit 14 used for the NLA_F_NET_BYTEORDER flag and bit 15 for the NLA_F_NESTED flag, providing without redefining types. Attributes support nesting to represent complex, hierarchical data structures. When the NLA_F_NESTED flag (bit 15) is set in the type field, the attribute's value is interpreted as a container holding its own sequence of sub-attributes, each following the standard LTV format. This enables multi-level encoding, such as nesting route metrics within a attribute, without fixed-size limitations. Nested attributes must still adhere to overall and length rules, and parsers recursively process them until the container's length is exhausted. Error handling in Netlink leverages attributes for detailed feedback. Responses of type NLMSG_ERROR include a payload with a standard nlmsgerr structure containing an errno value to indicate failure, such as EINVAL for invalid arguments. Since Linux kernel 4.3, if the NETLINK_EXT_ACK flag is set via setsockopt(2), the error payload can include additional TLV attributes providing contextual details, such as pointers to malformed data, enhancing debugging without altering the basic error format. Modern extensions to Netlink attributes accommodate advanced use cases, including the transmission of . The NLA_BINARY type treats the value as an opaque byte sequence of variable length, subject to policy-defined minimum and maximum sizes, enabling the direct passage of unstructured payloads like in various protocols.

Families and Protocols

Netlink standard families consist of predefined numeric protocol identifiers that enable user-space applications to interact with core subsystems through dedicated communication channels. These families are assigned fixed IDs in the kernel's user-space API headers, ensuring consistent addressing across kernel versions. The kernel enforces access to these families based on process capabilities, such as CAP_NET_ADMIN for sending messages or joining groups, while certain families like NETLINK_ROUTE and NETLINK_KOBJECT_UEVENT permit reception by non-privileged users under specific conditions. The NETLINK_ROUTE family, with ID 0, serves as the primary interface for managing networking configurations, including routing tables, network interfaces, and neighbor tables, allowing user-space tools to query and update these elements in real time. This family is central to tools like for handling link events and address assignments without requiring direct modifications. Historically, the NETLINK_FIREWALL family (ID 3) facilitated the transport of IPv4 packets from the netfilter subsystem to user space for processing firewall rules, integrating with early mechanisms; however, it has been deprecated and repurposed as unused since 3.5, with modern firewall operations shifting to the NETLINK_NETFILTER family. Current and nf_tables integrations rely on NETLINK_NETFILTER (ID 12) to manage netfilter rules, logging, and packet flows efficiently. The NETLINK_SELINUX family (ID 7) handles event notifications from the SELinux security module, enabling user-space monitoring of security policy enforcements and access decisions. Complementing this, the NETLINK_AUDIT family (ID 9), introduced in 2.6.6, supports the transmission of audit records and logging messages from the kernel's auditing subsystem, allowing tools to track system calls, file accesses, and policy violations for compliance and . NETLINK_KOBJECT_UEVENT (ID 15), available since 2.6.10, broadcasts kernel-generated events related to hotplugging and kobject management to user space, facilitating dynamic detection and response in environments and managers like . NETLINK_GENERIC (ID 16), introduced in 2.6.15, provides a foundational for extending Netlink with custom protocols while maintaining compatibility with the standard . An additional standard family, NETLINK_RDMA (ID 20), added in Linux 3.0, supports management and monitoring of (RDMA) resources, particularly for subsystems, enabling user-space control over high-performance networking fabrics. Enhanced RDMA management features, including net namespace support, were integrated starting in 5.3. Generic Netlink, identified by the protocol constant NETLINK_GENERIC with family ID 16, serves as a multicast-oriented framework within the Netlink subsystem that facilitates the creation of custom operations and families for -user space communication. It enables dynamic registration of user-defined protocols, allowing subsystems to extend Netlink beyond predefined standards without requiring new families. In the , a custom family is registered using the genl_register_family() function, which allocates resources and assigns a , typically starting from 17 for custom implementations. This capability supports group-based notifications, where multiple user-space processes can subscribe to events from the . Protocol definitions for custom Generic Netlink families are structured around (ops) specified in a genl_ops , where user-space applications define commands such as DOIT for single-request handling. Each includes a defined via struct nla_policy to validate incoming Netlink attributes, ensuring and preventing malformed payloads during message processing. For instance, the Netlink interface utilizes a custom Generic Netlink family named "ethtool" to manage hardware configuration tasks like querying link modes or setting channel counts on network interfaces, demonstrating practical extensibility for device-specific interactions. are numbered from 0 to , providing ample space for commands while supporting both requests and notifications. Message handling in custom protocols emphasizes dumps for iterative and notifications for asynchronous updates. Dump operations, flagged with NLM_F_DUMP, invoke a DUMPIT callback to generate multi-message responses terminated by NLMSG_DONE, ideal for listing resources like registered . Notifications leverage groups defined in the structure, enabling efficient event broadcasting to subscribed sockets. To simplify implementation of these protocols in user space, libraries such as libmnl provide minimalistic abstractions for constructing, parsing, and validating messages, reducing for attribute handling and sequence tracking without imposing heavy dependencies. This approach ensures custom Generic Netlink protocols remain lightweight and reusable across applications.

Usage and API

Socket Creation and Binding

To create a Netlink socket in user space, an application calls the socket() with the address family AF_NETLINK, typically using the socket type SOCK_RAW for raw access to Netlink messages, and specifies a protocol identifier corresponding to the desired Netlink family (e.g., NETLINK_ROUTE for routing information). This returns a fd upon success, which serves as the handle for subsequent operations on the socket. The socket address is represented by the struct sockaddr_nl, which includes the field nl_family set to AF_NETLINK to indicate the address family. The nl_pid field specifies the unicast identifier: a value of 0 allows the to auto-assign a , while a non-zero value (typically the process ID of the calling application) sets an explicit user-space for targeted communication. Additionally, the nl_groups field is a bitmask used to subscribe to groups, enabling reception of broadcast messages from the to multiple listeners (defaulting to 0 for no subscriptions). Binding the socket to a local address is performed using the bind() : bind(fd, (struct sockaddr *) &addr, sizeof(addr)), where addr is a struct sockaddr_nl instance configured as described. This step associates the socket with the specified port ID and multicast group subscriptions; if nl_pid is 0, the kernel automatically assigns a unique port ID to the socket. Binding is essential for receiving messages, as unbound sockets cannot listen for incoming data. Most Netlink operations require elevated privileges, specifically the CAP_NET_ADMIN capability or an effective user ID of 0 (), particularly for binding to groups or performing administrative actions like handling. However, certain families support non-privileged access for read-only reception, such as NETLINK_KOBJECT_UEVENT for kernel event notifications, allowing unprivileged processes to bind and receive s without group subscriptions. Common errors during socket creation include EAFNOSUPPORT if an invalid address family is specified (e.g., anything other than AF_NETLINK), or EINVAL for unsupported protocol values in the third argument to socket(). Binding may fail with EADDRINUSE if the requested port ID is already in use, or EPERM if the process lacks the necessary CAP_NET_ADMIN capability for the operation.
c
#include <sys/socket.h>
#include <linux/netlink.h>

int fd = [socket](/page/Socket)(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);  // Example for routing family
if (fd < 0) {
    // Handle error, e.g., EAFNOSUPPORT for invalid family
}

struct sockaddr_nl addr = {0};
addr.nl_family = AF_NETLINK;
addr.nl_pid = getpid();  // Or 0 for kernel auto-assignment
addr.nl_groups = 0;      // Bitmask for multicast subscriptions

if (bind(fd, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
    // Handle error, e.g., EPERM without CAP_NET_ADMIN
}
This example illustrates the basic sequence, with error handling for privilege-related failures.

Message Transmission and Reception

Netlink messages are transmitted from user space to the kernel or other processes using the sendmsg() system call, which operates on a bound Netlink socket. The call requires a struct msghdr that includes an array of struct iovec elements pointing to the message components, typically starting with a struct nlmsghdr header followed by the payload in TLV-encoded attributes. The msg_name field of the msghdr is set to a struct sockaddr_nl specifying the destination, such as port ID 0 for the kernel, while msg_namelen is sizeof(struct sockaddr_nl). Flags in the nlmsghdr can include NLM_F_REQUEST for queries or NLM_F_ACK to request an acknowledgment from the recipient. For example, to send a routing query, the sequence number (nlmsg_seq) is incremented per message, and the port ID (nlmsg_pid) in the header is set to the sender's port ID (typically the process ID or the bound socket's nl_pid) to identify the origin:
c
struct nlmsghdr *nh = /* allocated header */;
nh->nlmsg_pid = getpid();  /* Sender's port ID */
nh->nlmsg_seq = ++sequence_number;
nh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;

struct iovec iov = { nh, nh->nlmsg_len };
struct sockaddr_nl sa = { .nl_family = AF_NETLINK, .nl_pid = 0 };  /* Target [kernel](/page/Kernel) */
struct msghdr msg = { .msg_name = &sa, .msg_namelen = sizeof(sa),
                      .msg_iov = &iov, .msg_iovlen = 1 };
int len = sendmsg(fd, &msg, 0);
if (len < 0) /* Handle error */;
This ensures reliable delivery attempts, though Netlink is datagram-oriented and may drop messages under memory pressure. Reception occurs via recvmsg(), which populates the provided iovec buffer with incoming messages, often in batches. A typical buffer size is 8192 bytes to accommodate multiple messages without truncation, and the call returns the total bytes received. Messages are validated using the NLMSG_OK() macro, which checks if nlmsg_len fits within the remaining buffer size, and processed iteratively with NLMSG_NEXT() to advance to the next message:
c
char buf[8192];
struct iovec iov = { buf, sizeof(buf) };
struct sockaddr_nl sa;
struct msghdr msg = { .msg_name = &sa, .msg_namelen = sizeof(sa),
                      .msg_iov = &iov, .msg_iovlen = 1 };
int len = recvmsg(fd, &msg, 0);
if (len < 0) /* Handle error, e.g., ENOBUFS for buffer overflow */;

for (struct nlmsghdr *nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len); nh = NLMSG_NEXT(nh, len)) {
    if (nh->nlmsg_type == NLMSG_DONE) break;
    if (nh->nlmsg_type == NLMSG_ERROR) {
        struct nlmsgerr *err = (struct nlmsgerr *)NLMSG_DATA(nh);
        /* Handle errno in err->error */
    }
    /* Process payload */
}
Multipart messages, flagged with NLM_F_MULTI, are terminated by NLMSG_DONE. Message sequencing relies on the nlmsg_seq field to correlate requests with responses, while nlmsg_pid in the header identifies the sender, typically matching the process ID of the originating socket for user-space messages (0 for -originated). The kernel sets nlmsg_pid to 0 for its outbound messages, allowing user-space applications to filter by origin. Acknowledgments, requested via NLM_F_ACK in the sent message, prompt the kernel to reply with an NLMSG_ERROR message containing a struct nlmsgerr ; a value of 0 indicates success, while negative errno values denote failures like EOPNOTSUPP for unsupported operations. In multi-threaded contexts, sequence numbers must be managed per thread to prevent mismatches, often using or mutex-protected counters, as shared increments can lead to validation failures during response matching. To close a Netlink socket, invoke close(fd) on the , which releases resources and stops further communication. For subscriptions, unsubscription is achieved by rebinding the socket with nl_groups set to 0 in the struct sockaddr_nl, effectively leaving all groups without closing the socket.

Applications

Networking and Routing

Netlink plays a central role in networking by enabling user-space applications to configure and monitor routing tables and network interfaces through the NETLINK_ROUTE family. This family facilitates bidirectional communication between the kernel and user-space processes, allowing for the addition, deletion, and querying of routes as well as the management of device states and addresses. In routing operations, user-space tools send messages to the via sockets bound to NETLINK_ROUTE. For instance, the RTM_GETROUTE message retrieves entries; when the destination length (rtm_dst_len) and source length (rtm_src_len) fields in the struct rtmsg header are set to 0, the returns all entries for the specified in a multi-part dump response. To add a new route, the RTM_NEWROUTE message is used, accompanied by attributes such as RTA_GATEWAY, which specifies the next-hop as a protocol address type. These operations ensure that configurations, including IPv4 and tables, can be dynamically updated without restarting processes. Interface management leverages similar message types for device lifecycle events and configurations. The RTM_NEWLINK message notifies or configures network , using the struct ifinfomsg to set flags like IFF_UP for bringing an interface online or IFF_DOWN for taking it offline, often with attributes such as IFLA_IFNAME for the device name or IFLA_MTU for . Address assignment occurs via RTM_NEWADDR, where struct ifaddrmsg defines the family (e.g., AF_INET) and attributes like IFA_ADDRESS provide the to bind to the interface index. These mechanisms allow precise control over interface states and configurations in . The utility suite serves as the primary user-space frontend for these Netlink interactions, with commands like ip route for managing routes and ip link for interface operations. For example, ip route add 192.168.1.0/24 via 10.0.0.1 translates to an RTM_NEWROUTE message with the appropriate gateway attribute, while ip link set eth0 up issues an RTM_NEWLINK to toggle the flags. Similarly, querying the ARP table involves sending an RTM_GETNEIGH message with struct ndmsg specifying the family (e.g., AF_INET for IPv4 entries), prompting the kernel to dump neighbor entries including link-layer addresses via attributes like NDA_LLADDR. Kernel notifications for link changes, such as interface up/down events, are delivered asynchronously via groups in NETLINK_ROUTE, where user-space processes join groups like RTNLGRP_LINK (1) using setsockopt with NETLINK_ADD_MEMBERSHIP to receive RTM_NEWLINK or RTM_DELLINK messages. Netlink's design supports efficient handling of large routing tables through a dump-and-acknowledge , where requests flagged with NLM_F_DUMP (combining NLM_F_ROOT for full table dumps and NLM_F_MATCH for filtering) elicit multi-part responses marked with NLM_F_MULTI, ending in NLMSG_DONE. User-space applications can request acknowledgments via NLM_F_ACK to confirm , enabling reliable even for extensive datasets like full route dumps, which avoids the overhead of individual queries. This approach ensures scalability in high-volume networking environments.

Other Kernel-User Interactions

Netlink extends beyond core networking to facilitate kernel-user interactions in device management and security contexts, leveraging dedicated protocol families for event notifications and policy management. In device management, the NETLINK_KOBJECT_UEVENT family, introduced in Linux 2.6.10, enables the kernel to broadcast hotplug events to userspace, such as USB insertions or removals, allowing dynamic handling of hardware changes. Userspace tools subscribe to this multicast socket to receive structured uevent messages containing attributes like type (e.g., "add" or "remove") and subsystem details. For instance, the daemon subscribes to NETLINK_KOBJECT_UEVENT via a raw Netlink socket, processing these events to create or remove /dev entries and trigger associated rules for initialization. This mechanism replaced earlier hotplug handlers, providing a scalable IPC path for managing diverse peripherals without polling. For security applications, Netlink supports interactions with , ing, and subsystems. The NETLINK_NETFILTER family, available since Linux 2.6.14, allows userspace tools to configure and query netfilter rules, including updates to nf_tables chains for packet filtering and . The nft command-line utility compiles rules into Netlink messages using libraries like libnftnl, transmitting them to the for enforcement, while the responds with status via extended ACKs. Similarly, NETLINK_AUDIT, introduced in Linux 2.6.6, delivers logs from the to userspace daemons like auditd, which opens a NETLINK_AUDIT to receive records of security-relevant events such as file accesses or syscall invocations. NETLINK_SELINUX, since Linux 2.6.4, notifies userspace of SELinux policy changes and enforcement decisions, enabling tools to react to AVC denials or load new modules without recompilation. Additional uses include monitoring and diagnostics, where Generic Netlink enables custom protocols for tools like . The utility employs a Generic Netlink family named "ethtool" to query and set network interface parameters, such as link modes or offload features, bypassing for more flexible attribute-based exchanges. In emerging cloud-native environments of the 2020s, Netlink underpins container orchestration plugins, including those for device passthrough in , extending its role in dynamic resource management across virtualized workloads.

References

  1. [1]
    netlink(7) - Linux manual page - man7.org
    Netlink is used to transfer information between the kernel and user-space processes. It consists of a standard sockets-based interface for user space processes ...
  2. [2]
    RFC 3549 - Linux Netlink as an IP Services Protocol - IETF Datatracker
    This document describes Linux Netlink, which is used in Linux both as an intra-kernel messaging system as well as between kernel and user space.
  3. [3]
    [PDF] The Netlink protocol: Mysteries Uncovered
    1.2 A bit of history. The driver was initially written by Alan Cox under the codename “Skiplink”2 as a character device /dev/netlink for Linux kernel v1.3.31 ...
  4. [4]
    Introduction to Netlink - The Linux Kernel documentation
    Netlink is often described as an ioctl() replacement. It aims to replace fixed-format C structures as supplied to ioctl() with a format which allows an easy ...
  5. [5]
    Netlink Protocol Library Suite (libnl)
    The libnl suite is a collection of libraries providing APIs to netlink protocol based Linux kernel interfaces.Netlink Library · Generic Netlink Socket · Core Library (libnl) · Index of /~tgr/libnl/files
  6. [6]
    Kernel Korner - Why and How to Use Netlink Socket - Linux Journal
    Jan 5, 2005 · Netlink socket is a special IPC used for transferring information between kernel and user-space processes. It provides a full-duplex communication link between ...
  7. [7]
  8. [8]
    [PDF] Communicating between the kernel and user-space in Linux using ...
    Netlink is more flexible than other Linux kernel interfaces that have been used in Unix-like operating systems to communicate kernel and user-space. Netlink is ...Missing: advantages scalability
  9. [9]
    rtnetlink - Speaker Deck
    Mar 26, 2015 · ... kernel triggered by other processes. History of Netlink • It was introduced in Linux 2.2,. 1999, by Alexey Kuznetsov in INR RAS as a successor ...Missing: origins | Show results with:origins
  10. [10]
    RFC 3549: Linux Netlink as an IP Services Protocol
    Since the Linux 2.1 kernel, Netlink has been providing the IP service ... kernel version 2.4.6. These are: NETLINK_ROUTE, NETLINK_FIREWALL, and ...
  11. [11]
    Netlink Protocol Families | Hitch Hiker's Guide to Learning
    Jun 13, 2020 · Netlink Protocol Families ; NETLINK_AUDIT, provides an interface to the audit subsystem found in Linux kernel versions 2.6.6 and later.Missing: milestones | Show results with:milestones
  12. [12]
    Netlink on FreeBSD - The FreeBSD Project
    Jan 23, 2023 · Netlink is a communication protocol defined in RFC 3549. It is an async, TLV-based protocol, providing 1-1 and 1-many communications between kernel and ...
  13. [13]
    Debugging netlink requests - Julia Evans
    Sep 3, 2017 · This week I was working on a Kubernetes networking problem. Basically our container network backend was reporting that it couldn't delete routes ...
  14. [14]
    rt_route.yaml - The Linux Kernel Archives
    ... protocol: netlink-raw protonum: 0 doc: Route configuration over rtnetlink. definitions: - name: rtm-type name-prefix: rtn- type: enum entries: - unspec ...Missing: NETLINK_ROUTE | Show results with:NETLINK_ROUTE<|control11|><|separator|>
  15. [15]
    socket(7) - Linux manual page - man7.org
    Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a socket as readable only if at least SO_RCVLOWAT bytes are available. SO_RCVTIMEO SO_SNDTIMEO ...
  16. [16]
    Netlink Library (libnl)
    May 9, 2011 · The core library contains the fundamentals required to communicate over netlink sockets. It deals with connecting and disconnectng of sockets, sending and ...
  17. [17]
  18. [18]
    libmnl: Netlink attribute helpers - Netfilter.org
    The payload of the Netlink message contains sequences of attributes that are expressed in TLV format.Missing: padding | Show results with:padding
  19. [19]
    netfilter: nf_tables: add ebpf expression - LWN.net
    Aug 31, 2022 · ebpf pinned "/sys/fs/bpf/myprog" Signed-off-by: Florian Westphal ... (NLA_BINARY) + */ +enum nft_ebpf_attributes { + NFTA_EBPF_UNSPEC, + ...
  20. [20]
    networking:generic_netlink_howto [Wiki]
    May 18, 2017 · This document gives a brief introduction to Generic Netlink, some simple examples on how to use it and some recommendations on how to make the most of the ...
  21. [21]
    Introduction to Generic Netlink, or How to Talk with the Linux Kernel
    Feb 10, 2023 · Netlink is a socket domain created with the task of providing IPC for the Linux Kernel, especially kernel<->user IPC.
  22. [22]
    Netlink interface for ethtool - The Linux Kernel documentation
    Netlink interface for ethtool uses generic netlink family ethtool (userspace application should use macros ETHTOOL_GENL_NAME and ETHTOOL_GENL_VERSION)Missing: often | Show results with:often
  23. [23]
  24. [24]
    rtnetlink(7) - Linux manual page - man7.org
    rtnetlink is a Linux routing socket, a new feature of Linux 2.2.
  25. [25]
    netlink(7) - Linux manual page - man7.org
    The currently assigned netlink families are: NETLINK_ROUTE Receives routing and link updates and may be used to modify the routing tables (both IPv4 and IPv6), ...
  26. [26]
    ip(8) - Linux manual page - man7.org
    ip - show / manipulate routing, network devices, interfaces and tunnels. SYNOPSIS top ip [ OPTIONS ] OBJECT { COMMAND | help }
  27. [27]
    IPROUTE2 Utility Suite Documentation - Policy Routing
    This includes all of the utilities in the IPROUTE2 suite. Then we will begin extensive coverage of the ip command with documentation of usage and examples.
  28. [28]
    udev: Device Manager for the Linux Kernel in Userspace - Insu Jang
    Nov 27, 2018 · udev (userspace /dev) is a device manager for the Linux kernel. As the successor of devfsd and hotplug, udev primaily manages device nodes in the /dev ...
  29. [29]
    The netfilter.org "nftables" project
    nftables replaces the popular {ip,ip6,arp,eb}tables. This software provides a new in-kernel packet classification framework that is based on a ...
  30. [30]
    audit_open(3) - Linux manual page - man7.org
    audit_open creates a NETLINK_AUDIT socket for communication with the kernel part of the Linux Audit Subsystem. The audit system uses the ACK feature of netlink.
  31. [31]
    Network Plugins - Kubernetes
    Jul 30, 2024 · Kubernetes uses CNI plugins for cluster networking, requiring a compatible plugin (v0.4.0 or later) and a loopback interface for each sandbox.Container Network Interface · Device Plugins · Troubleshooting CNI plugin...