kqueue
kqueue is a scalable event notification interface in Unix-like operating systems that enables applications to efficiently monitor kernel events across diverse sources, including file descriptor I/O, signals, timers, asynchronous I/O completions, process state changes, and filesystem modifications, using a unified, filter-based mechanism to register interests and retrieve notifications without the performance limitations of earlier interfaces like select() or poll().[1] Introduced in FreeBSD 4.1 in 2000 by developer Jonathan Lemon, it provides a generic API centered on two primary system calls—kqueue(), which creates an event queue returning a file descriptor, and kevent(), which atomically adds, deletes, or fetches events via structured change and event lists—allowing for level-triggered notifications and extensibility to new event types through kernel filters without altering the user-space interface.[1][2]
Designed to address the scalability issues of traditional polling methods, which require scanning entire descriptor sets and incur high overhead with growing numbers of monitored objects, kqueue maintains event state in the kernel, enabling constant-time operations for inactive items and reducing system call frequency, thus supporting high-performance applications like web servers handling thousands of concurrent connections.[1] Key innovations include support for multiple event filters (e.g., EVFILT_READ for readable data, EVFILT_VNODE for file attribute changes, EVFILT_PROC for process forks/exits), user-defined events, and detailed event data such as pending byte counts, making it suitable for both network and system monitoring tasks.[2]
Originally implemented in FreeBSD, kqueue has been adopted across various BSD derivatives and related systems, including NetBSD, OpenBSD, DragonFly BSD, and macOS (based on Darwin), where it serves as a core component for asynchronous programming and event-driven architectures.[1][3][4][5]
Introduction
Definition and Purpose
kqueue is a kernel-level facility in Unix-like operating systems that provides a scalable mechanism for monitoring multiple sources for various events, including I/O readiness on file descriptors, signals, timers, and filesystem changes.[1] It operates by allowing applications to register interest in these events through kernel-maintained data structures known as knotes, which aggregate notifications efficiently without the per-process limitations common in earlier systems.[6] Introduced in FreeBSD 4.1, kqueue serves as a generic event delivery system, enabling the kernel to notify user-space processes of conditions based on customizable filters.[1]
The primary purpose of kqueue is to address the scalability challenges faced by high-performance applications, particularly network servers handling thousands of connections, which older interfaces like select() and poll() could not manage efficiently due to their linear scanning of file descriptors.[1] By the late 1990s and early 2000s, the growth of internet applications demanded a more robust solution for multiplexing diverse event types without performance degradation, as traditional mechanisms imposed significant overhead in terms of CPU cycles and system call frequency.[1] kqueue overcomes these limitations by supporting arbitrary event registrations across processes and delivering notifications in a way that scales to large numbers of monitored objects, making it ideal for event-driven programming in servers and other I/O-intensive workloads.[6]
In its basic workflow, an application first creates an event queue in the kernel to serve as a notification channel.[1] It then registers knotes for specific events on file descriptors or other sources, associating user-defined data with each for context.[1] Finally, the application polls the queue using a single system call to retrieve any pending events, allowing efficient batch processing of changes without repeated invocations for each monitored item.[6] This design minimizes kernel-user space transitions and supports high-throughput scenarios by enabling the kernel to buffer and coalesce events internally.[1]
Key Features and Benefits
kqueue offers significant scalability by supporting the monitoring of thousands of file descriptors and events without the linear time complexity limitations inherent in older mechanisms like select() or poll(), allowing applications to handle large numbers of concurrent connections efficiently up to system-imposed limits such as the per-user RLIMIT_KQUEUES resource limit.[1][2] This design enables robust performance in environments with high event volumes, where traditional polling would incur prohibitive CPU and memory overhead.
A core strength of kqueue lies in its broad support for diverse event types, extending far beyond basic I/O readiness to include file status changes via the EVFILT_VNODE filter, signal notifications with EVFILT_SIGNAL, process lifecycle events through EVFILT_PROC, asynchronous I/O completion using EVFILT_AIO, and even user-defined events via the extensible EVFILT_USER filter.[1][2] This versatility allows developers to unify monitoring of multiple system conditions within a single interface, simplifying application logic for complex, event-driven architectures.
kqueue achieves high efficiency through O(1) time complexity for event registration, deletion, and notification operations, which minimizes CPU cycles compared to the repeated scanning required by polling-based systems.[1] It further provides fine-grained control via level-triggered notifications by default, edge-triggered behavior using the EV_CLEAR flag to reset the event after retrieval, and one-time notifications via EV_ONESHOT to disable the event after delivery, enabling precise handling of event persistence.[2] These attributes collectively reduce system call overhead and enhance responsiveness in non-blocking I/O scenarios.
The benefits of kqueue are particularly pronounced in high-concurrency applications, such as web servers and databases, where it facilitates improved throughput by efficiently dispatching events without blocking threads, as demonstrated in benchmarks showing superior performance over select() and poll() under load.[1][7] This has contributed to its adoption in systems like macOS for scalable networking tasks.[1]
History and Development
kqueue was first implemented in the FreeBSD operating system as part of version 4.1, which was released on July 27, 2000.[8] The facility was primarily developed by Jonathan Lemon, a contributor to the FreeBSD project, who authored the initial system calls and documentation.[1] Lemon's work focused on enhancing the kernel's ability to handle event notifications efficiently, particularly for applications requiring high scalability in network and I/O operations.[1]
The development of kqueue was motivated by the inherent limitations of earlier event notification mechanisms like select() and poll(), which exhibited poor scalability as the number of file descriptors increased, due to their O(N) time complexity and requirements for repeated traversals of descriptor lists.[1] These systems also involved redundant memory copies between user and kernel space, making them inefficient for modern network applications that needed to monitor thousands of connections simultaneously.[1] Additionally, select() and poll() were restricted primarily to file descriptor-based I/O events and lacked native support for other sources such as signals, asynchronous I/O completions, or filesystem changes, prompting the need for a more versatile solution within the FreeBSD kernel.[1]
Key design goals for kqueue included providing a unified, scalable interface capable of handling diverse event sources without the performance bottlenecks of prior methods, while ensuring simplicity for adoption and reliability through level-triggered notifications that avoided silent failures.[1] This approach was inspired by earlier research on event notification, such as the "get next event" API proposed by Banga, Mogul, and Druschel, but was specifically optimized for integration into the BSD kernel architecture.[1]
In its early releases, kqueue was integrated directly into the FreeBSD kernel via the kqueue() and kevent() system calls, with initial support centered on basic I/O filters such as EVFILT_READ and EVFILT_WRITE for monitoring sockets and files.[1] This foundational implementation allowed applications to register interest in events efficiently and retrieve them in batches, marking a significant advancement in FreeBSD's support for scalable event-driven programming.[1]
Adoption Across Unix-like Systems
Following its initial introduction in FreeBSD 4.1, kqueue saw continued enhancements in subsequent FreeBSD releases, including the addition of timer support via the EVFILT_TIMER filter in the 5.x series around 2003–2004, enabling periodic or one-shot notifications for time-based events.[9] These improvements built on the core scalability features, allowing broader application in high-performance scenarios without altering the fundamental event queue mechanism.
kqueue was adopted into Apple's Darwin kernel, forming the basis for event notification in macOS and iOS; it has been available since the initial release of Mac OS X 10.0 Cheetah in March 2001, providing scalable I/O handling that underpinned later frameworks like Grand Central Dispatch introduced in Mac OS X 10.6 Leopard. This integration has persisted, with kqueue remaining central to modern macOS and iOS for efficient monitoring of file, socket, and process events across Apple's ecosystem.[10]
Other BSD variants incorporated kqueue shortly after its FreeBSD origins, with OpenBSD adding support in version 2.9 released in June 2001, evolving to version 3.4 in 2002 with minor variations such as adjusted filter flags for vnode events to align with its security-focused kernel policies.[4] NetBSD integrated it in version 2.0 released in December 2004, supporting events for sockets, files, processes, and signals through a port of the original implementation.[11] DragonFly BSD, which forked from FreeBSD 4.8-STABLE in 2003, has included kqueue since its first release in April 2004.[12]
Linux lacks native kqueue support, relying instead on user-space ports like those in libraries such as libevent for compatibility, though kqueue's design influenced the development of epoll in the Linux kernel starting with version 2.5 in 2002, sharing goals of efficient, scalable event multiplexing for file descriptors. As of 2025, kqueue remains stable across supporting systems, with ongoing kernel refinements for security, such as mitigations for race conditions in event handling addressed in FreeBSD 14.0 released in November 2023 and subsequent advisories.[13][14]
Architecture
Event Queues and Knotes
In kqueue, the event queue serves as a per-process kernel object that functions as a notification channel for monitoring various system events. It is created through the kqueue system call, which returns a file descriptor representing the queue; this descriptor is not inherited by child processes upon forking. The queue maintains registered events and delivers notifications when those events occur, enabling scalable event handling without the limitations of earlier mechanisms like select or poll.[1][2]
At the core of the event queue are knotes, which are internal kernel data structures that encapsulate individual event registrations. Each knote links the monitored data structure (such as a file descriptor or signal), the associated filter for evaluating activity, the parent kqueue, and connections to related knotes. Knotes are dynamically allocated and stored within the queue's substructures, including an active list for ready events, a hash table for non-descriptor identifiers, and an array indexed by descriptors. This design allows knotes to represent a wide range of event sources efficiently.[1]
The registration process for events involves applications submitting a changelist of kevent structures via the kevent system call to add, change, or delete knotes. Upon an EV_ADD flag, the kernel's register function allocates a new knote using the (ident, filter) tuple as a unique key and invokes the filter's attach routine to associate it with the target object. Changes or deletions similarly update or detach existing knotes, ensuring the queue reflects the application's current interests. Various filters, such as those for file descriptors or signals, define the specific conditions monitored by each knote.[1][2]
When an event triggers activity on a monitored object, the kernel's notification flow begins with a call to the knote function, which evaluates the relevant filters on the object's knote list. Qualifying knotes are marked active and moved to the queue's ready list. The kqueue_scan function then dequeues these knotes, revalidates them against their filters, and copies the event details to the application's event list for retrieval via kevent. This process ensures timely and accurate delivery of events while minimizing unnecessary wakeups.[1]
Event queues and knotes incorporate limits and management mechanisms to prevent resource exhaustion. Per-user limits on the number of kqueues are enforced via the RLIMIT_KQUEUES resource limit, while system-wide constraints apply, such as the kern.kq_calloutmax sysctl for the maximum number of timers. The queue's internal structures, including the active list, hash table, and descriptor array, expand dynamically without fixed upper bounds, though overall file descriptor limits (kern.maxfiles) indirectly cap usage. Knotes are automatically freed upon queue closure or object detachment, facilitating efficient cleanup. For custom event handling, such as user-defined notifications, the EVFILT_USER filter allows applications to inject events directly into the queue, aiding in scenarios like inter-thread signaling.[1][2][15]
Filters and Event Types
kqueue supports a variety of filters to monitor different types of events across file descriptors, processes, signals, timers, and user-defined conditions, each defined by the f_filter field in a struct kevent. These filters allow applications to register interest in specific occurrences, such as data availability or process state changes, using shared parameters like fflags for filter-specific flags, data for numeric values (e.g., byte counts or statuses), and udata as an opaque user pointer for associating custom data.[16]
The EVFILT_READ filter detects when input is available on a descriptor, such as sockets or pipes, triggering when data exceeds the low-water mark (customizable via NOTE_LOWAT in fflags) or upon EOF conditions like shutdowns, with data reporting bytes available or offset to EOF.[16] The EVFILT_WRITE filter monitors output readiness, activating when buffer space is available for writing, with data indicating remaining space in buffers for sockets, pipes, or fifos, and setting EV_EOF if the reader disconnects.[16] For asynchronous I/O, EVFILT_AIO watches completion of aio requests, using data to convey error status from aio_error(), tied to sigevent conditions without additional fflags.[16]
File and directory changes are handled by EVFILT_VNODE, which supports fflags like NOTE_DELETE for deletions, NOTE_EXTEND for size increases, NOTE_RENAME for renames, and others for attributes, opens, closes, links, reads, writes, or revocations, returning triggered events in fflags upon notification.[16] Process-related events use EVFILT_PROC, monitoring forks (NOTE_FORK), executions (NOTE_EXEC), exits (NOTE_EXIT with exit status in data), or child tracking (NOTE_CHILD with parent PID in data), with NOTE_TRACK or NOTE_TRACKERR for error handling.[16] The EVFILT_SIGNAL filter captures signal deliveries, counting occurrences in data and automatically applying edge-triggering semantics.[16]
Timers are managed via EVFILT_TIMER, where data specifies the interval in milliseconds (or other units via fflags like NOTE_SECONDS or NOTE_NSECONDS), supporting periodic firing unless modified by EV_ONESHOT or absolute time (NOTE_ABSTIME).[16] For user-defined events, EVFILT_USER enables posting custom notifications from user space using NOTE_TRIGGER in fflags, with the lower 24 bits of fflags available for application-specific flags like NOTE_FFNOP or NOTE_FFOR, bypassing kernel involvement.[16]
Event notification operates in level-triggered mode by default, continuously reporting the current state (e.g., data remains available until read), but can switch to edge-triggered mode via the EV_CLEAR flag, which deasserts the event after one retrieval to notify only on state transitions.[16] This distinction is particularly useful for filters like signals or timers, where EV_CLEAR prevents repeated notifications without changes.[16] Edge cases include zero-length reads on EVFILT_READ, which may still trigger if underlying data is present but not fully consumed, and automatic removal of knotes associated with closed descriptors to avoid dangling events.[16]
API
Core Functions
The primary interface to kqueue consists of a small set of system calls that enable the creation, management, and monitoring of event queues in kernel space.[2] The kqueue() system call is used to create a new kernel event queue, returning a small integer file descriptor representing the queue upon success; its prototype is int kqueue(void);, and it returns -1 on failure, with errno set accordingly.[2] This descriptor serves as the handle for all subsequent operations on the queue, allowing applications to monitor various events such as file descriptor readiness or process state changes.[2]
The core manipulation of events is handled by the kevent() system call, which multiplexes the tasks of adding, modifying, deleting events on the queue, and retrieving pending events that are ready for processing.[2] Its prototype is int kevent(int kq, const struct kevent *changelist, int nchanges, struct kevent *eventlist, int nevents, const struct timespec *timeout);, where kq is the queue descriptor, changelist and nchanges specify events to register or modify, eventlist and nevents receive returned events, and timeout controls the blocking duration (NULL for indefinite wait).[2] The call returns the number of events placed in eventlist or -1 on error.[2]
Central to these operations is the struct kevent data structure, which encapsulates event details for both input changes and output events.[2] It includes fields such as ident (an identifier like a file descriptor), filter (the type of event filter), flags (action indicators like EV_ADD for addition or EV_DELETE for removal), fflags (filter-specific flags), data (filter-specific data, such as bytes available), and udata (opaque user data).[2] The EV_SET macro is typically used to populate this structure before passing it to kevent().[2] Event flags, such as those for adding or deleting knotes, are detailed in subsequent sections.[2]
To destroy a kqueue, applications close the file descriptor using the standard close() system call, which deallocates the queue and removes all associated events referencing open file descriptors.[2]
Error handling in these functions follows POSIX conventions, with common errno values including EBADF for an invalid queue descriptor, EINTR for interruption by a signal, ENOMEM for insufficient kernel memory (in kqueue()), EMFILE for exceeding per-process file descriptor limits, and EINVAL for invalid arguments like negative list lengths (in kevent()).[2] If kevent() encounters an error during event processing, it may set the EV_ERROR flag in the returned kevent structure, with the specific error code in the data field.[2]
Event Operations and Flags
In the kqueue system, event operations are primarily managed through flags passed to the kevent() function, which allows for adding, deleting, modifying, and retrieving events associated with kernel notes (knodes). These flags control the lifecycle and behavior of events within the queue, enabling efficient monitoring of various I/O and system conditions. The core kevent() prototype, int kevent(int kq, const struct kevent *changelist, int nchanges, struct kevent *eventlist, int nevents, const struct timespec *timeout);, facilitates these operations by processing changes from the changelist and returning triggered events in the eventlist.[17]
The add operation registers a new event using the EV_ADD flag, which appends the knote to the kqueue and implicitly enables it unless modified otherwise; combining EV_ADD with EV_ENABLE explicitly permits the event to be returned by kevent() when triggered. This mechanism supports initial setup for monitoring file descriptors, signals, or other filters without duplicating existing entries, as re-adding modifies the parameters of any matching knote.[17]
Deletion removes events via the EV_DELETE flag, which detaches the knote from the kqueue, with automatic cleanup occurring on the last close of associated file descriptors; alternatively, EV_DISABLE suspends reporting of the event without removing it, allowing later reactivation. These operations ensure precise control over active monitoring, preventing unnecessary notifications during temporary pauses.[17]
Modifying an existing knote involves reusing EV_ADD to update its parameters, often combined with flags like EV_ONESHOT for one-time notification—where the event is deleted after retrieval—or EV_CLEAR to reset its state post-retrieval, which is particularly useful for state-transition filters. Such changes enable dynamic adjustments to event sensitivity without full recreation.[17]
Upon retrieval, flags in the returned struct kevent indicate event status: EV_EOF signals a filter-specific end-of-file condition, while EV_ERROR denotes an error with the errno value stored in the data field, allowing applications to handle failures gracefully. These retrieval indicators provide context for processing without additional system calls.[17]
Filter-specific flags further refine behavior; for instance, in the EVFILT_READ filter, NOTE_LOWAT sets a low-water mark threshold in the data field, triggering notifications only when sufficient data is available, overriding the default to optimize for buffered I/O scenarios.[17]
Timeout handling in kevent() uses the timespec parameter to differentiate blocking from non-blocking polls: a non-NULL timespec specifies a maximum wait interval, with a zero-valued structure effecting an instantaneous poll, whereas NULL enables indefinite blocking until an event occurs. This flexibility supports both responsive and efficient long-running applications.[17]
Usage and Implementations
Basic Programming Examples
To illustrate basic usage of kqueue for event notification, consider a simple C program that monitors standard input (file descriptor 0) for readability using the EVFILT_READ filter.[1] This example creates a kqueue, registers the event, polls indefinitely for occurrences, and handles basic errors by checking return values from system calls.[18]
The following code snippet demonstrates this setup:
c
#include <sys/event.h>
#include <sys/time.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
int main(void) {
int kq, nev;
struct kevent event;
struct kevent eventlist[1];
int fd = STDIN_FILENO; /* File descriptor to monitor (stdin) */
/* Create the kqueue */
kq = kqueue();
if (kq == -1) {
perror("kqueue");
exit(EXIT_FAILURE);
}
/* Set up the event for readability on the file descriptor */
EV_SET(&event, fd, EVFILT_READ, EV_ADD, 0, 0, NULL);
if (kevent(kq, &event, 1, NULL, 0, NULL) == -1) {
perror("kevent (add)");
close(kq);
exit(EXIT_FAILURE);
}
/* Poll for events with infinite timeout */
while ((nev = kevent(kq, NULL, 0, eventlist, 1, NULL)) != -1) {
if (nev > 0) {
/* Process the event */
if (eventlist[0].flags & EV_ERROR) {
fprintf(stderr, "Error on event: %s\n", strerror(eventlist[0].data));
break;
}
if (eventlist[0].filter == EVFILT_READ) {
printf("Data available for reading on fd %d\n", (int)eventlist[0].ident);
/* In a real application, read the data here */
char buf[1024];
ssize_t n = read(fd, buf, sizeof(buf));
if (n > 0) {
write(STDOUT_FILENO, buf, n);
} else if (n == 0) {
printf("EOF on fd %d\n", fd);
break;
}
}
}
}
if (nev == -1) {
perror("kevent (poll)");
}
/* Cleanup */
close(kq);
return EXIT_SUCCESS;
}
#include <sys/event.h>
#include <sys/time.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
int main(void) {
int kq, nev;
struct kevent event;
struct kevent eventlist[1];
int fd = STDIN_FILENO; /* File descriptor to monitor (stdin) */
/* Create the kqueue */
kq = kqueue();
if (kq == -1) {
perror("kqueue");
exit(EXIT_FAILURE);
}
/* Set up the event for readability on the file descriptor */
EV_SET(&event, fd, EVFILT_READ, EV_ADD, 0, 0, NULL);
if (kevent(kq, &event, 1, NULL, 0, NULL) == -1) {
perror("kevent (add)");
close(kq);
exit(EXIT_FAILURE);
}
/* Poll for events with infinite timeout */
while ((nev = kevent(kq, NULL, 0, eventlist, 1, NULL)) != -1) {
if (nev > 0) {
/* Process the event */
if (eventlist[0].flags & EV_ERROR) {
fprintf(stderr, "Error on event: %s\n", strerror(eventlist[0].data));
break;
}
if (eventlist[0].filter == EVFILT_READ) {
printf("Data available for reading on fd %d\n", (int)eventlist[0].ident);
/* In a real application, read the data here */
char buf[1024];
ssize_t n = read(fd, buf, sizeof(buf));
if (n > 0) {
write(STDOUT_FILENO, buf, n);
} else if (n == 0) {
printf("EOF on fd %d\n", fd);
break;
}
}
}
}
if (nev == -1) {
perror("kevent (poll)");
}
/* Cleanup */
close(kq);
return EXIT_SUCCESS;
}
This program initializes a kqueue descriptor with kqueue(), which returns -1 on failure, prompting error reporting via perror() and program exit.[18] It then registers a kevent structure using EV_SET() to monitor the specified file descriptor for read events, invoking kevent() with the changelist to add the event; a return value of -1 indicates failure, handled similarly with perror().[1] The main loop calls kevent() with no changelist and an eventlist of size 1, using a NULL timeout for blocking indefinitely until an event occurs or an error arises.[18] Upon receiving events (nev > 0), it checks for EV_ERROR flags to report kernel errors via strerror() on the event's data field, and processes the read event by echoing input data or detecting EOF.[19] Finally, the kqueue descriptor is closed with close() to release resources.[1]
To compile this program, include the <sys/event.h> header for kqueue structures and functions; no special linking is typically required as these are standard system calls available in the C library.[18] For instance, use cc example.c -o example on a supported system like FreeBSD.[19]
kqueue received full support in FreeBSD starting with version 4.1, released in 2000, where it was introduced as a scalable event notification mechanism.[1] The implementation allows tuning via sysctls such as kern.kq_calloutmax, which sets the system-wide limit on the number of timers that can be registered across all kqueues.[2] Additionally, the RLIMIT_KQUEUES resource limit controls the maximum number of kqueues per user, helping manage resource usage in multi-process environments.[2] In FreeBSD 14.0 (released November 2023), timerfd(2) was added for Linux compatibility, with kqueue's EVFILT_TIMER recommended for native use, alongside enhanced process visibility controls that support secure kqueue monitoring in jailed environments.[13]
On macOS, kqueue has been available since Mac OS X version 10.3 (Panther) in 2003, providing compatibility with BSD-derived systems.[20] It integrates deeply with libdispatch, Apple's Grand Central Dispatch framework, which uses kqueue under the hood to handle asynchronous event sources like file descriptors and timers, enabling higher-level concurrent programming without direct kevent calls.[21]
NetBSD adopted kqueue with version 2.0 in 2004, offering a similar API to FreeBSD for monitoring events on files, processes, and signals.[3] OpenBSD included support starting from version 2.9 in 2000, maintaining API compatibility but with security-focused restrictions on certain filters, such as limited asynchronous I/O monitoring (available only on UFS file systems).[4] In OpenBSD, EVFILT_USER for user-defined events was added in 2025.[22]
User-space libraries abstract kqueue for cross-platform development on BSD and macOS systems. Libevent provides a wrapper that selects kqueue as the backend on supported platforms, offering a unified API for event handling across operating systems.[23] Similarly, libev uses kqueue when available on BSD derivatives, prioritizing it for efficient polling in event-driven applications. Node.js, through its libuv library, employs kqueue as the backend for asynchronous I/O on macOS and BSD, facilitating JavaScript-based server-side event loops.
Portability challenges arise on non-supporting systems like Linux, where libraries such as libevent and libev fall back to epoll for equivalent functionality. As of 2025, no major deprecations of kqueue have occurred across supported platforms.[13]
Comparisons and Alternatives
With Traditional Mechanisms (select/poll)
The select() system call, a traditional mechanism for I/O multiplexing in Unix-like systems, suffers from significant limitations that hinder its scalability. It relies on fixed-size bitmasks (fd_set) to track file descriptors, with the maximum number typically capped at FD_SETSIZE, often 1024, beyond which monitoring fails without recompiling with a larger value.[24] Additionally, select() exhibits O(n time complexity, where n is the number of file descriptors, as the kernel scans the entire list on each invocation to check for readiness, leading to inefficiencies with growing descriptor counts.[1]
Introduced to address some of select()'s shortcomings, the poll() system call uses a dynamic array of pollfd structures, eliminating the fixed FD_SETSIZE limit and allowing monitoring of arbitrarily many descriptors. However, poll() retains the O(n) scanning overhead, requiring the full list of descriptors to be resubmitted with every call, which involves repeated user-kernel memory copies and kernel-side traversal, making it similarly unscalable for high-descriptor workloads.[1]
In contrast, kqueue overcomes these issues through its stateful design, featuring persistent registrations of file descriptors (known as knotes) that remain in the kernel without resubmission, enabling O(1) event retrieval for ready descriptors only.[1] This avoids the O(n) scanning and memory copy penalties of select() and poll(), imposes no artificial limit on descriptor count beyond system resources like memory, and supports efficient handling of thousands of concurrent events.[1]
Performance evaluations underscore kqueue's advantages in server environments. In benchmarks using the thttpd web server with 500 requests per second and up to 10,000 idle connections, kqueue maintained low response times and CPU usage, while poll() failed to scale beyond approximately 600 idle connections due to excessive overhead.[1] Similarly, in a web proxy cache test with 100 active connections and up to 4,000 cold connections, kqueue exhibited constant CPU utilization, whereas select() saturated around 2,000 connections.[1] These results demonstrate kqueue's ability to manage roughly 10 times more connections with substantially less CPU expenditure compared to traditional mechanisms.
For use cases, select() and poll() remain suitable for small-scale applications or legacy code handling fewer than a few hundred file descriptors, where their simplicity outweighs scalability concerns.[1] Kqueue, however, is preferred for high-performance servers managing thousands of clients, such as web proxies or network daemons, to achieve efficient resource utilization and avoid bottlenecks.[1]
With Linux epoll
kqueue and Linux's epoll both offer scalable mechanisms for monitoring file descriptors and delivering I/O event notifications, addressing the limitations of earlier interfaces like select and poll by avoiding O(n) scanning of file descriptor sets.[25][26] Both support level-triggered (default) and edge-triggered modes, where level-triggering notifies as long as the condition persists and edge-triggering signals only on state changes, enabling efficient handling of high-volume connections without redundant wakeups.[25][26] epoll was introduced in Linux kernel 2.5.44 in 2002, building on concepts similar to kqueue, which debuted in FreeBSD 4.1 in 2000.[25][1]
Despite these parallels, the APIs diverge significantly, posing challenges for cross-platform code portability. epoll employs a split model with epoll_ctl() for adding, modifying, or deleting interests on file descriptors and epoll_wait() for retrieving events, requiring multiple system calls for common operations like batch updates.[25] In contrast, kqueue unifies these actions in the kevent() call, allowing simultaneous registration, modification, deletion, and polling of events in a single invocation, which reduces overhead for dynamic workloads.[26] Additionally, epoll is confined to file descriptor-based I/O events and lacks native support for timers or signals, necessitating auxiliary mechanisms like timerfd or signalfd for such notifications.[25] kqueue, however, provides dedicated filters (EVFILT_TIMER for timers and EVFILT_SIGNAL for signals), enabling a more comprehensive event model without extra file descriptors.[26]
Event mappings exist between the two, such as kqueue's EVFILT_READ approximating epoll's EPOLLIN for readability and EVFILT_WRITE to EPOLLOUT for writability, but nuances arise in user data handling.[26][25] kqueue associates udata—a user-defined pointer or value—with each knote (event filter instance), offering finer-grained flexibility for per-event customization across diverse filter types.[26] epoll's epoll_data_t, while similarly user-definable, is tied to the file descriptor in the interest set, limiting its granularity to I/O contexts.[25]
Performance benchmarks show the two interfaces as largely comparable for core I/O multiplexing tasks, with throughput scaling similarly under high connection loads on their respective platforms.[27] However, kqueue demonstrates advantages in scenarios involving non-I/O events like timers or signals, owing to its unified API that minimizes system call volume.[27] Cross-platform libraries such as libevent mitigate portability issues by abstracting both backends, selecting epoll on Linux and kqueue on BSD/macOS systems to provide a consistent interface.[23]
On Linux, io_uring represents a more recent advancement introduced in kernel 5.1 (2019), offering a completion-based asynchronous I/O interface that further reduces system calls through submission and completion queues, supporting multishot operations for efficient event batching as of 2025.[28] While io_uring builds on epoll concepts for scalability, it provides broader support for file, network, and device I/O without traditional polling limitations, positioning it as an alternative to both epoll and kqueue in high-performance applications. Kqueue lacks a direct equivalent in BSD/macOS but continues to be optimized for similar use cases.
In terms of adoption, epoll dominates on Linux-based servers due to its integration in the kernel and widespread use in high-performance networking stacks.[25] kqueue prevails in BSD derivatives (FreeBSD, NetBSD, OpenBSD, DragonFly BSD) and Apple's macOS ecosystem, powering event-driven applications in those environments.[26][1] As of 2025, no notable efforts toward API convergence or cross-implementation have emerged, leaving developers reliant on abstraction layers for multi-platform support.[28]