Fact-checked by Grok 2 weeks ago

Keepalive

A keepalive (KA) is a mechanism in computer networking used to detect whether a connection between two endpoints remains active during periods of inactivity, by sending periodic probes or messages to prevent premature termination of idle links.^[1] This feature helps identify dead connections without relying solely on data transmission, reducing the risk of resource waste from undetected failures.^[2] In the Transmission Control Protocol (TCP), keepalives are an optional implementation that probes idle connections by transmitting segments with no data (or minimal payload) after a configurable interval, typically defaulting to at least two hours of inactivity.^[1] The sending endpoint sets the sequence number to one less than the next expected value, and if no acknowledgment is received after several probes, the connection may be considered failed, though implementers must not assume immediate death upon a single failure.^[3] TCP keepalives are configurable per connection, default to disabled to avoid unnecessary overhead, and are particularly useful in scenarios like long-lived sessions where network issues might otherwise go unnoticed.^[1] At higher protocol layers, keepalive concepts extend to applications such as HTTP, where persistent connections (also known as HTTP keep-alive) allow a single TCP connection to handle multiple requests and responses, improving efficiency by avoiding repeated handshakes.^[4] In HTTP/1.1, persistence is the default unless explicitly closed via the Connection: close header, while HTTP/1.0 required the Connection: keep-alive header to enable it.^[4] This reduces latency and bandwidth usage in web communications, though it demands careful management to handle timeouts and proxy interactions.^[5] Keepalives appear in other protocols too, such as SIP for signaling support and DNS for EDNS0 options to manage idle timeouts.^[6]^[7]

Overview and Fundamentals

Definition and Purpose

Keepalive is a network mechanism that involves sending periodic signals or probes across an idle connection to verify the responsiveness of the remote endpoint and prevent applications from hanging indefinitely on broken links.^[8] These probes, typically small packets with no application data, elicit acknowledgments from the peer to confirm ongoing connectivity without requiring actual data transmission. The core purpose of keepalive is to enable timely detection of connection failures, such as silent peer crashes, network partitions, endpoint failures, firewall-induced timeouts, or expiration of NAT bindings, allowing for graceful error handling, prompt resource cleanup, and avoidance of resource exhaustion from stale connections. By proactively checking liveness during periods of inactivity, keepalive mechanisms address scenarios where standard data acknowledgments are absent, ensuring that failures are identified before they lead to prolonged hangs or undetected dead connections.^[8] Key benefits include reduced server resource waste by terminating unresponsive connections efficiently and enhanced reliability for long-lived sessions, such as those in SSH for remote access or database connections for persistent queries.^[9]^[10] TCP keepalive provides the foundational implementation of this concept within the TCP protocol.

Historical Development

The concept of keepalive mechanisms emerged in the early development of TCP to manage idle connections in the ARPANET, where long-lived sessions could lead to resource exhaustion if network failures went undetected. Basic ideas for maintaining connection state during idle periods were outlined in RFC 793 (1981), which recommended periodic retransmissions every two minutes for zero-window conditions to ensure reliable reporting of window updates.^[11] These foundational concepts addressed the need for probing inactive links without a fully specified keepalive procedure. A dedicated TCP keepalive mechanism was formalized as an optional extension in RFC 1122 (1989), requiring implementations to support configurable probes sent after an idle period (defaulting to at least two hours) to detect broken connections and prevent indefinite resource holds, particularly in server applications.^[12] This standardization responded to issues identified in earlier ARPANET and Unix-based implementations, where idle connections consumed kernel resources without automatic cleanup. Influential early adoptions occurred in Unix systems, notably with the introduction of configurable keepalives in Berkeley Software Distribution (BSD) 4.2 in 1983, which highlighted problems like hung sockets in networked applications and prompted broader configurability. Keepalive adoption expanded through POSIX standards in the 1990s, with IEEE Std 1003.1g (developed mid-1990s, published 1998) defining socket options like SO_KEEPALIVE for portable control over idle detection across Unix-like systems.^[13] As networking evolved, keepalive concepts shifted from TCP-centric implementations to application-layer adaptations with the proliferation of web protocols in the 1990s, enabling persistent connections in protocols like HTTP to reduce overhead. Enhancements for improved idle detection appeared in RFC 5482 (2009), which introduced the TCP User Timeout Option to allow peers to negotiate longer timeouts and better handle intermittent connectivity.^[14] Recent specifications in RFC 9293 (2022), updating the core TCP protocol, reaffirmed keepalive as an optional, application-controlled feature with the same probing mechanics and configurability requirements from RFC 1122, ensuring compatibility while emphasizing its role in resource management.^[15]

TCP Keepalive

Probe Mechanism

TCP keepalive probes are specialized packets designed to test the viability of an idle TCP connection without altering its state. These probes consist of empty acknowledgment (ACK) packets, carrying no data payload, which ensures that the segment length (SEG.LEN) is set to 0. The ACK flag is set in the TCP header, and the sequence number (SEG.SEQ) is specifically chosen as the sender's next sequence number minus one (SND.NXT - 1), positioning it just outside the current window to elicit a response without advancing the sequence space if the probe is lost.^[16] The transmission of a keepalive probe occurs over the existing TCP connection using the same socket established for the original session, targeting the peer endpoint when the connection has been idle—meaning no data or ACK segments have been exchanged—for a prolonged period. Upon receipt, a live receiver processes the probe as a standard ACK and responds with its own ACK segment, acknowledging the probe's sequence number and confirming the connection's health. This response reaffirms the connection state without requiring any application-level intervention.^[16] If the receiver has closed or forgotten the connection, it may respond with a reset (RST) segment instead, signaling to the sender that the connection is invalid and prompting an immediate transition to the CLOSED state, often followed by notifying the application. In cases where no response is received to the probe, the absence of acknowledgment triggers the TCP timeout mechanism, which, after subsequent handling, can lead to connection failure detection and termination, potentially via a FIN or RST segment depending on the context. The probe's design ensures it does not interfere with normal data flow, as its sequence number avoids consuming window space or altering ongoing transmissions.^[16]

Algorithm Details

The TCP keepalive algorithm operates on idle connections to detect potential failures without disrupting normal data flow. When a TCP connection enters an idle state—defined as no data or acknowledgment packets exchanged for a specified period—the mechanism initiates a series of probes to verify the peer's responsiveness. These probes consist of empty segments (no data payload) sent from the local host to the remote host, expecting an acknowledgment (ACK) in return. The process begins only after the connection has been idle for a configurable timeout period, governed by the parameter tcp_keepalive_time, which defaults to 7200 seconds (2 hours) in Linux implementations.^[17]^[18] Upon expiration of the idle timer, the first keepalive probe is transmitted while the connection remains in the ESTABLISHED state. If an ACK is received from the peer, the connection is deemed alive, and the idle timer is reset to its full duration (tcp_keepalive_time), restarting the countdown from zero. This ensures that any renewed activity or successful probe response prevents unnecessary probing. However, if no ACK arrives within the subsequent probe interval (tcp_keepalive_intvl, defaulting to 75 seconds in Linux), the probe is retransmitted. This retry process continues up to a maximum number of attempts specified by tcp_keepalive_probes (defaulting to 9 in Linux). Probes are sent exclusively in the ESTABLISHED state, as keepalive is designed for ongoing, idle connections rather than those in transition.^[16]^[17]^[18] If all probes fail without an ACK, the algorithm concludes that the connection is dead, triggering termination. This failure handling typically results in the local stack closing the socket, which may transition the connection to CLOSE_WAIT before full closure or initiate an abort sequence, depending on the implementation. To prevent excessive resource consumption or the "Sorcerer's Apprentice Syndrome" (where repeated unacknowledged probes exacerbate network issues), the number of retries is strictly limited, ensuring the process halts after the maximum probes. The total time to detect a dead connection can be approximated mathematically as:

\text{Total detection time} \approx \text{tcp\_keepalive\_time} + \text{tcp\_keepalive\_probes} \times \text{tcp\_keepalive\_intvl}

Using Linux defaults, this yields approximately 7200 + 9 × 75 = 7875 seconds before closure.^[17]^[18]^[16]

Configuration Parameters

TCP keepalive can be configured at the socket level using the setsockopt() system call to enable the feature and adjust its parameters for individual connections. The SO_KEEPALIVE socket option must first be set to enable keepalive probes on a specific socket, after which platform-specific options like TCP_KEEPIDLE, TCP_KEEPINTVL, and TCP_KEEPCNT can be tuned on Linux systems for IPv4 sockets. These options control the idle time before the first probe (TCP_KEEPIDLE), the interval between subsequent probes (TCP_KEEPINTVL), and the maximum number of unacknowledged probes before the connection is considered dead (TCP_KEEPCNT).^[17]^[19] At the system-wide level, TCP keepalive parameters are adjustable via kernel settings on Linux, primarily through the /proc/sys/net/ipv4/ directory or the sysctl command. For instance, the tcp_keepalive_time parameter sets the default idle timeout before keepalive probes begin, while tcp_keepalive_intvl and tcp_keepalive_probes correspond to the probe interval and count, respectively; these can be persistently configured by editing /etc/sysctl.conf and applying changes with sysctl -p.^[20]^[21] On Windows, system-wide configuration is achieved by modifying registry keys under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters, such as KeepAliveTime, which specifies the interval in milliseconds between keepalive transmissions for idle connections, requiring a system reboot or service restart to take effect.^[22]^[23] By default, TCP keepalive is disabled on most systems to avoid unnecessary overhead, particularly for short-lived connections where probes could introduce latency or resource consumption without benefit; explicit enabling via SO_KEEPALIVE is thus required to customize intervals and prevent premature resource use.^[17] Tools like sysctl on Linux (e.g., sysctl net.ipv4.tcp_keepalive_time=600) provide a straightforward interface for runtime adjustments, allowing administrators to balance connection reliability against performance costs.^[20] In mobile devices, frequent keepalive probes can significantly impact battery life by repeatedly waking the Wi-Fi module, so longer intervals are often recommended to minimize power drain while maintaining connection integrity.^[24]

Behavioral Variations Across Systems

TCP keepalive implementations exhibit notable differences in default parameters across major operating systems, affecting the timing and reliability of dead connection detection. In Linux, the default configuration sets an idle timeout of 7200 seconds (2 hours) before initiating probes, followed by an interval of 75 seconds between probes, with a maximum of 9 probes, resulting in a total detection window of approximately 2 hours and 11 minutes after idleness begins.^[25] These values are adjustable system-wide using sysctl parameters such as net.ipv4.tcp_keepalive_time, net.ipv4.tcp_keepalive_intvl, and net.ipv4.tcp_keepalive_probes. Windows employs a similar 2-hour idle timeout by default via the KeepAliveTime registry value set to 7,200,000 milliseconds, but uses a much shorter probe interval of 1 second, with 10 retransmissions (fixed in Windows Vista and later), leading to faster detection within about 10 seconds once probing starts—contrasting Linux's longer overall process. This behavior is governed by the TCP/IP stack settings in the registry under HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters, where parameters like KeepAliveInterval influence retransmission timing.^[26] Systems derived from BSD, such as macOS and FreeBSD, align closely with Linux defaults, featuring a 2-hour idle period (net.inet.tcp.keepidle = 7200 seconds) and 75-second probe intervals (net.inet.tcp.keepintvl = 75 seconds), though FreeBSD and macOS often limit probes to 8, yielding a probing phase of 10 minutes and a system-imposed maximum detection time around 2 hours and 10 minutes. In mobile operating systems like Android and iOS, which build on Linux and BSD kernels respectively, the base TCP keepalive defaults mirror their parent systems but are frequently disabled or aggressively shortened in practice to conserve battery life, with connections often suspended in background modes to prevent unnecessary probing traffic. Network environments introduce additional variations, as firewalls and Network Address Translation (NAT) devices may silently drop keepalive probes if not explicitly configured to forward them, causing false positives in connection failure detection even when the underlying link is intact. In IPv6 deployments, some TCP stacks exhibit subtle differences, such as altered handling of keepalive packets in firewall rules due to the absence of widespread NAT, though certain implementations retain IPv4-like behaviors that can lead to inconsistent probe traversal across hybrid networks. These system and network factors underscore the need for application-level overrides where defaults prove inadequate.

Keepalive in Higher-Level Protocols

HTTP Persistent Connections

HTTP persistent connections, also known as keep-alive connections, enable the reuse of a single TCP connection for multiple HTTP requests and responses, thereby reducing the overhead associated with establishing new connections for each exchange. This mechanism was introduced in HTTP/1.1 through the "Connection" header, where the absence of "Connection: close" implies persistent behavior by default, allowing the client and server to maintain the socket open after a response.^[27] The original explicit "Connection: keep-alive" header from HTTP/1.0 was retained for compatibility but became optional in HTTP/1.1, as specified in RFC 2616 published in 1999.^[28] In operation, a client initiates a persistent connection by sending an HTTP request without the "Connection: close" header, and the server responds accordingly while keeping the TCP socket open for subsequent requests on the same connection. This allows sequential or pipelined requests to reuse the established connection, with the server typically closing it after an idle period to free resources; idle timeouts commonly range from 5 to 60 seconds, depending on server configuration, such as Apache's default of 5 seconds. For longer-idle scenarios, underlying TCP keepalive probes may detect and close dead connections, though HTTP semantics primarily govern short-term persistence.^[29] The concept evolved significantly in HTTP/2, defined in RFC 7540 (2015), which mandates a single persistent connection per origin and introduces multiplexing to interleave multiple request-response streams over that connection, eliminating the need for explicit keep-alive headers.^[30] HTTP/2 uses PING frames to measure round-trip time and maintain connection liveness, serving an implicit keepalive function without requiring separate probes.^[31] HTTP/3, built on QUIC (RFC 9000), further integrates persistence by design, using a single QUIC connection for multiplexing streams and supporting connection migration while avoiding TCP's head-of-line blocking. Persistent connections offer key benefits by eliminating the repeated TCP three-way handshake overhead, which typically adds 100-200 milliseconds of latency per new connection depending on network round-trip time, thus improving overall page load times for resource-heavy pages.^[32] However, implementations often cap the maximum number of requests per connection to prevent resource exhaustion, such as 100 requests in Apache HTTP Server configurations or similar limits in browser handling to balance performance and stability.

Application-Layer Implementations

In database systems such as MySQL and PostgreSQL, keepalive functionality is typically managed through a combination of server-side configuration parameters that close idle connections and client-side mechanisms that send periodic probes to detect and prevent drops. In MySQL, the server-side wait_timeout variable specifies the duration after which an idle connection is terminated, with a default value of 28,800 seconds (8 hours), allowing detection of disconnected clients without explicit pings from the server. Client libraries, such as MySQL Connector/J, support validation through lightweight ping operations to verify connection liveness before queries, often invoked periodically (e.g., every 30 seconds in custom implementations) to maintain long-running sessions.^[33] Similarly, PostgreSQL relies on server parameters like tcp_keepalives_idle for TCP-level probing, but client drivers like Npgsql implement application-layer keepalives by initiating periodic ping roundtrips to the server, ensuring timely detection of idle client drops in extended connections.^[34]^[10] Real-time communication protocols extend keepalive concepts with explicit frame-based mechanisms to sustain bidirectional links over potentially unreliable networks. WebSockets, defined in RFC 6455, utilize Ping (opcode 0x9) and Pong (opcode 0xA) control frames sent periodically by either endpoint to confirm the peer's availability and prevent intermediate proxies from closing idle tunnels; the recipient must respond with a Pong immediately upon receiving a Ping.^[35] In the MQTT protocol (version 5.0), clients establish a keepalive interval during connection (ranging from 0 to 65535 seconds), within which they must send a PINGREQ packet at least once, prompting the broker to reply with a PINGRESP to affirm the link's vitality and enable disconnection detection if no response arrives.^[36] Messaging protocols like AMQP, as implemented in systems such as RabbitMQ, incorporate heartbeat frames to monitor broker-client connections and mitigate silent failures. RabbitMQ negotiates a heartbeat timeout during connection establishment, with a default value of 60 seconds; absent any frame (including heartbeats) within this period, the peer assumes the connection is lost and closes it, using simple heartbeat method invocations (AMQP 0-9-1 frame type 8) as low-overhead probes. Beyond standardized protocols, many applications deploy custom keepalive logic at the application layer, employing timers to dispatch dummy or lightweight messages (e.g., empty payloads or status queries) at fixed intervals to verify endpoint reachability, frequently paired with exponential backoff strategies for retrying failed probes to balance reliability and resource use. These approaches draw inspiration from TCP keepalive principles to enhance fault tolerance in distributed systems.

Heartbeat Mechanisms

Heartbeats are periodic, application-level messages exchanged between nodes in distributed systems to indicate ongoing liveness and operational status, typically carrying lightweight payloads such as "alive" signals or health metrics.^[37] These messages are sent bidirectionally at fixed intervals, enabling proactive monitoring of component availability without relying on underlying transport protocols.^[38] Unlike keepalive mechanisms, which primarily maintain idle transport-layer connections through minimal probes, heartbeats operate at the application layer, transmitting regardless of connection idleness and often including additional data like resource utilization or service state.^[39] This proactive nature allows heartbeats to support cluster coordination and load balancing, such as in Hadoop where DataNodes send heartbeats to the NameNode every 3 seconds by default to report availability and receive task assignments.^[40] In systems like Apache Hadoop, failure to receive a heartbeat within approximately 10 minutes (based on the interval multiplier) marks a node as dead, triggering resource reallocation and failover processes.^[40] Practical implementations include HashiCorp Consul, where services register with TTL-based health checks that require periodic heartbeat updates to the agent for maintaining discovery catalogs; missed updates lead to service deregistration and failover routing. Similarly, the gRPC health checking protocol defines a standardized service for servers to report statuses like SERVING or NOT_SERVING via periodic queries, facilitating load balancer decisions in microservices environments.^[41] While heartbeats overlap with keepalive in detecting prolonged idleness, their application-driven design emphasizes richer status reporting for distributed resilience.^[41] There is no unified RFC standardizing heartbeats, but early patterns appear in protocols like RFC 863's Discard service, which supports periodic data transmission for connectivity testing, while modern frameworks adopt de facto standards such as gRPC's health proto.^[42]

Dead Connection Detection

Dead connection detection in TCP networks involves mechanisms to identify terminated or broken connections without relying on periodic keepalive probes, enabling timely resource cleanup and error handling. These methods leverage protocol signals and timeouts to respond to failures, particularly in scenarios where keepalive is disabled to minimize overhead.^[43] Explicit connection closures are signaled through TCP control packets: the FIN flag indicates a graceful shutdown where the sender has no more data to transmit, prompting the receiver to acknowledge and potentially initiate its own closure, transitioning the connection through states like CLOSE-WAIT and TIME-WAIT. In contrast, the RST flag aborts the connection immediately upon detecting errors, such as invalid segments or unauthorized access attempts, flushing queues and notifying the application of the reset without further data exchange. Network-layer errors provide another detection avenue via ICMP messages, such as Destination Unreachable (type 3, code 1 for host unreachable), which TCP implementations process to abort affected connections, especially for "hard errors" indicating permanent failures like host crashes or routing issues. Soft errors, like congestion, may trigger retransmissions instead of immediate abortion to avoid unnecessary drops. At the application level, timeouts on stalled read or write operations detect inactivity; for instance, if no data is received or acknowledged within a user-configurable timeout period (typically seconds to minutes), the connection is aborted to prevent indefinite hanging, as specified by the TCP User Timeout (UTO) parameter. This approach integrates with socket APIs, where applications set deadlines on I/O calls to handle delays from network partitions or peer unresponsiveness. Alternatives to standard probes include sending dummy data packets from the application to elicit acknowledgments or errors; if the peer is unreachable, the send operation will timeout or return an error, revealing the dead state without dedicated keepalive timers. Additionally, the SO_LINGER socket option enables forceful closure by setting a linger time of zero, which discards unsent data and transmits an RST packet to terminate the connection abruptly, useful for immediate cleanup in error scenarios.^[13] These techniques prove valuable when TCP keepalive is disabled, such as in resource-constrained or high-throughput environments like IoT devices or data centers, where probe-induced bandwidth and CPU overhead could degrade performance for frequent, short-lived transfers.^[44] However, such methods are less reliable against silent failures, including network blackholes where packets are dropped without generating RST, FIN, or ICMP responses, leaving connections in a lingering half-open state until application timeouts expire.

Implementation Considerations

Security Implications

TCP keepalive mechanisms, which send periodic probe packets over idle connections, introduce several security risks primarily due to their unencrypted and unauthenticated nature. These probes can reveal the existence of active connections to passive observers, such as network eavesdroppers, potentially aiding in reconnaissance efforts by confirming long-lived sessions without direct interaction.^[45] In misconfigured environments, keepalives may contribute to resource exhaustion during distributed denial-of-service (DDoS) attacks by allowing excessive idle connections to persist, amplifying the impact of connection-flooding attempts if maximum connection limits are not enforced.^[46] Additionally, the periodic nature of keepalive traffic enables tracking of idle sessions, raising privacy concerns as it signals user presence and activity patterns over time, particularly in scenarios where connections are maintained to monitor online status.^[47] Vulnerabilities in keepalive handling have been documented in specific implementations, highlighting potential for exploitation. For instance, a type confusion error in the TCP socket keepalive option processing in VMware ESXi allowed local privilege escalation by mishandling socket options, demonstrating how improper parsing of keepalive parameters can lead to code execution flaws.^[48] To mitigate these risks, administrators should disable TCP keepalive on public-facing servers where possible, as it reduces unnecessary exposure of connection details.^[49] Firewalls can be configured to drop unsolicited keepalive probes or enforce strict idle timeouts, preventing unauthorized persistence of sessions.^[45] Encrypting connections with TLS obscures keepalive signals, as the probes become indistinguishable from regular encrypted traffic, thereby addressing interception and spoofing threats.^[50] In modern deployments, keepalives can exacerbate the attack surface by prolonging connection lifespans on resource-constrained devices, inviting sustained probing or exploitation amid inherent vulnerabilities like weak authentication. Conversely, protocols like QUIC integrate keepalive functionality within an encrypted framework based on TLS 1.3, mitigating TCP's unencrypted probe risks by embedding liveness checks into secure, multiplexed streams that resist eavesdropping and tampering.^[51]

Performance Trade-offs

Enabling TCP keepalive introduces minimal but measurable overhead in terms of network and computational resources. Each keepalive probe consists of a TCP segment with no data, resulting in a packet of approximately 40-60 bytes including headers (20 bytes IP + 20 bytes TCP, plus options and link-layer overhead if applicable). Some implementations may include a single byte of payload. This traffic, while small, accumulates for idle connections and requires CPU cycles to generate and process probes on both endpoints.^[52] In mobile environments, frequent probes can exacerbate battery drain by periodically waking the radio interface, even if the overall energy impact remains low compared to active data transfers.^[53] The primary benefits of TCP keepalive center on enhanced connection reliability and resource efficiency. It enables faster detection of failed connections, typically within minutes when tuned aggressively, as opposed to hours under default settings, allowing applications to recover promptly and avoid lingering resource allocation.^[2] This is particularly valuable for scalability in distributed systems, where keepalive reduces the accumulation of half-open connections—those stuck in a limbo state due to unacknowledged closures—that can exhaust connection pools in load balancers and proxies.^[54] Trade-offs arise based on network characteristics and workload patterns. In low-latency local area networks (LANs), keepalive probes offer little value and may introduce unnecessary overhead without improving performance. Conversely, in wide area networks (WANs) with higher latency and packet loss, keepalive is essential for timely failure detection but requires careful tuning of intervals to avoid excessive probing; for instance, web servers often use a 5-minute idle timeout to balance responsiveness, while longer-lived protocols like telnet benefit from 1-hour intervals to minimize interruptions.^[55]^[56] Key performance metrics for TCP keepalive include detection latency, calculated as T_{\text{detect}} = T_{\text{idle}} + (N_{\text{probes}} - 1) \times T_{\text{intvl}}, where T_{\text{idle}} is the idle time before the first probe (default 7200 seconds), N_{\text{probes}} is the number of probes (default 9), and T_{\text{intvl}} is the interval between probes (default 75 seconds).^[55] Optimization strategies aim to minimize probe-related bandwidth for idle connections, achieved by extending intervals in low-risk scenarios while shortening them where rapid detection is critical.

Best Practices

TCP keepalive should be enabled selectively for long-lived connections expected to exceed 10 minutes of idle time, as it is designed to detect dead peers without unnecessary overhead on transient sessions.^[57] Conservative intervals are recommended, such as setting the idle probe time to twice the expected maximum idle period (e.g., 7200 seconds for two-hour idles) followed by up to 9 probes at 75-second intervals, to balance detection reliability with resource use.^[57] Monitoring tools like tcpdump or ss -o should be employed to observe keepalive probe traffic and connection states in production environments.^[57] In cloud environments like AWS Elastic Load Balancing (ELB), intervals should be shortened to under 350 seconds for Network Load Balancers (NLB) to preempt default idle timeouts and prevent abrupt connection drops.^[58] For short-burst applications such as REST APIs, where connections typically last seconds, keepalive should be disabled to avoid probe overhead on frequently recycled sockets.^[59] Application-layer protocols benefit from combining TCP keepalive with heartbeats, such as gRPC's HTTP/2-based pings, for more precise idle detection without relying solely on transport-layer mechanisms.^[59] Testing involves simulating network failures using tools like tc/netem to emulate packet loss or delays, ensuring keepalive probes trigger appropriately under stress.^[57] Verification can be performed with Wireshark to capture and analyze probe packets, confirming timely transmission and responses.^[57] Emerging guidance favors protocol-native features like HTTP/2 PING frames over raw TCP keepalive for better control and reduced latency in modern applications, as they allow application-specific tuning without transport-layer interference.^[59] Configurations should be audited against updates in RFC 9293 (2022), which clarifies keepalive defaults to off and mandates per-connection control to mitigate security risks from unsolicited probes.^[15]

References

[1]
https://www.rfc-editor.org/rfc/rfc9293.html#section-3.8.4
[2]
2. TCP keepalive overview
Keepalive can tell you when another peer becomes unreachable without the risk of false-positives. In fact, if the problem is in the network between two peers, ...
[3]
4.2.3.6 TCP Keep-Alives - freesoft.org
A keep-alive mechanism periodically probes the other end of a connection when the connection is otherwise idle, even when there is no data to be sent.
[4]
https://datatracker.ietf.org/doc/html/rfc9112#section-9.3
[5]
https://datatracker.ietf.org/doc/html/rfc2616#section-19.7.1
[6]
RFC 6223: Indication of Support for Keep-Alive
This specification defines a new Session Initiation Protocol (SIP) Via header field parameter, "keep", which allows adjacent SIP entities to explicitly ...<|control11|><|separator|>
[7]
RFC 7828 - The edns-tcp-keepalive EDNS0 Option - IETF Datatracker
This document defines an EDNS0 option ("edns-tcp-keepalive") that allows DNS servers to signal a variable idle timeout.
[8]
Overview of Keepalive Mechanisms on Cisco IOS
Dec 17, 2014 · Keepalive messages are sent by one network device via a physical or virtual circuit in order to inform another network device that the circuit between them ...Missing: definition | Show results with:definition
[9]
ssh_config(5) - OpenBSD manual pages
The TCP keepalive option enabled by TCPKeepAlive is spoofable. The server alive mechanism is valuable when the client or server depend on knowing when a ...
[10]
18: 19.3. Connections and Authentication - PostgreSQL
2. TCP Settings. Specifies the amount of time with no network activity after which the operating system should send a TCP keepalive message to the client.
[11]
RFC 793: Transmission Control Protocol
Summary of each segment:
[12]
RFC 1122 - Requirements for Internet Hosts - Communication Layers
This is one RFC of a pair that defines and discusses the requirements for Internet host software. This RFC covers the communications protocol layers.<|control11|><|separator|>
[13]
RFC 5482 - TCP User Timeout Option - IETF Datatracker
This document specifies a new TCP option -- the TCP User Timeout Option -- that allows one end of a TCP connection to advertise its current user timeout value.
[14]
RFC 9293 - Transmission Control Protocol (TCP) - IETF Datatracker
Some TCP implementations, however, have included a keep-alive mechanism. To confirm that an idle connection is still active, these implementations send a ...
[15]
RFC 1122 - Requirements for Internet Hosts - Communication Layers
No readable text found in the HTML.<|control11|><|separator|>
[16]
tcp(7) - Linux manual page - man7.org
Moreover, when used with the TCP keepalive (SO_KEEPALIVE) option, TCP_USER_TIMEOUT will override keepalive to determine when to close a connection due to ...
[17]
TCP Keepalive HOWTO
### Summary of Linux TCP Keepalive Implementation
[18]
TCP Keepalive HOWTO - The Linux Documentation Project
May 4, 2007 · This document describes the TCP keepalive implementation in the linux kernel, introduces the overall concept and points to both system configuration and ...Missing: algorithm | Show results with:algorithm
[19]
https://www.tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/
[20]
Configuring TCP KeepAlive settings - IBM
Log in as a system administrator. Edit the sysctl.conf file: # vi /etc/sysctl.conf; Add the following parameter values: net.ipv4.tcp_keepalive_time = 600 net ...
[21]
KeepAliveTime registry setting for Windows Server 2019
Dec 4, 2022 · KeepAliveTime: determines how often TCP sends keep-alive transmissions. TCP sends keep-alive transmissions to verify that an idle connection is still active.
[22]
Enabling system wide TCP keepalives on a Windows system
Jul 14, 2009 · This Microsoft TechNet details on Configuring system wide Keep-Alives with the KeepAliveTime and related registry variables. HKLM\SYSTEM ...TCP timeout for established connections in Windows - Server Faultnetstat - Show TCP keepalive status in Windows? - Server FaultMore results from serverfault.com
[23]
Increase TCP Keep-Alive initial interval to 300 seconds - KDE Invent
10 second interval consumes additional power on battery-powered devices, forcing them to wake Wi-Fi module more often than it should be woken up. See.
[24]
3. Using TCP keepalive under Linux
Linux has built-in support for keepalive. You need to enable TCP/IP networking in order to use it. You also need procfs support and sysctl support.Missing: POSIX | Show results with:POSIX
[25]
https://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html
[26]
RFC 2616 - Hypertext Transfer Protocol -- HTTP/1.1 - IETF Datatracker
The original HTTP/1.0 form of persistent connections (the Connection: Keep-Alive and Keep-Alive header) is documented in RFC 2068. [33] 19.6.3 Changes from RFC ...RFC 7230 · RFC 2068 · RFC 7235
[27]
https://datatracker.ietf.org/doc/html/rfc2616#section-14.10
[28]
https://datatracker.ietf.org/doc/html/rfc2616
[29]
https://datatracker.ietf.org/doc/html/rfc2616#section-8.1
[30]
What is HTTP Keep Alive | Benefits of Connection Keep ... - Imperva
HTTP keep-alive, aka, HTTP persistent connection, is an instruction that allows a single TCP connection to remain open for multiple HTTP requests/responses.
[31]
8 Connection Pooling with Connector/J - MySQL :: Developer Zone
MySQL Connector/J can validate the connection by executing a lightweight ping against a server. In the case of load-balanced connections, this is performed ...
[32]
Keepalive | Npgsql Documentation
Npgsql has a keepalive feature, which makes it initiate periodic ping roundtrips. This feature is by default disabled, and must be enabled via the Keepalive ...Missing: side tcp_user_timeout
[33]
https://dev.mysql.com/doc/connector-j/en/connector-j-usagenotes-j2ee-concepts-connection-pooling.html
[34]
HeartBeat - Martin Fowler
HeartBeat. Show a server is available by periodically sending a message to all the other servers. Problem. When multiple servers form a cluster, ...
[35]
What are Heartbeat messages in distributed systems? - Design Gurus
Dec 8, 2023 · Basic Function: Heartbeat messages are simple signals sent periodically from one node (a server, a service, or a component) to another.
[36]
HeartBeats: How Distributed Systems Stay Alive
Apr 20, 2024 · A heartbeat is a periodic message sent from one component to another to monitor each other's health and status.
[37]
[XML] hdfs-default.xml - Apache Hadoop
... interval</name> <value>300000</value> <description> This time decides the interval to check for expired datanodes. With this value and dfs.heartbeat.interval ...
[38]
Health Checking - gRPC
May 20, 2024 · Explains how gRPC servers expose a health checking service and how client can be configured to automatically check the health of the server it is connecting to.The Server Side Health Service · Enabling Client Health Checking
[39]
RFC 863: Discard Protocol - » RFC Editor
Postel Request for Comments: 863 ISI May 1983 Discard Protocol This RFC specifies a standard for the ARPA Internet community. Hosts on the ARPA Internet ...Missing: heartbeat | Show results with:heartbeat
[40]
RFC 9006 - TCP Usage Guidance in the Internet of Things (IoT)
A TCP implementation may also be able to send "keep-alive" segments to test a TCP connection. According to [RFC1122], keep-alives are an optional TCP ...
[41]
setsockopt
### Summary of SO_LINGER Socket Option for Forceful Connection Closure with RST
[42]
https://www.rfc-editor.org/rfc/rfc863.html
[43]
TCP/IP Keep-Alive - Data Innovations
Keep-Alive messages are small, periodic empty packets sent over an idle connection that help to maintain the connection and detect potential issues.
[44]
[PDF] Understanding and Responding to Distributed Denial of Service ...
Oct 28, 2022 · o Define strict TCP keepalive and maximum connection configurations on all perimeter devices. o Configure firewalls to block, as a minimum ...
[45]
Keeping TCP connections alive to track which clients are online
Aug 14, 2015 · As for my understanding of TCP, asserting "Keeping TCP connections alive" is misleading, as there is no TCP-protocol-specific mechanism ...TCP Connection Keep-Alive direction - Server Faultnetstat - Show TCP keepalive status in Windows? - Server FaultMore results from serverfault.com<|separator|>
[46]
CVE-2022-31696: An Analysis of a VMware ESXi TCP Socket ...
Jun 22, 2023 · This blog post details a vulnerability I discovered in ESXi's implementation of the setsockopt system call that could lead to a sandbox escape.
[47]
[PDF] Principled Unearthing of TCP Side Channel Vulnerabilities
TCP side-channels are critical vulnerabilities that can be exploited by adversaries towards launching dangerous attacks. Prior studies have demonstrated that ...
[48]
Keepalives considered harmful - The Cloudflare Blog
Mar 19, 2020 · Generally keepalives are great and should stay enabled, but in our case the latency of local connection establishment is much lower than the ...
[49]
TCP Keepalive Best Practices for Network Security - LinkedIn
Sep 29, 2023 · If you use TCP keepalive excessively or unnecessarily, you may generate too much network traffic, overload your servers, or trigger false alarms ...
[50]
Top 10 IoT Security Risks and How to Mitigate Them - SentinelOne
Jul 23, 2025 · IoT security risks come from many technical vulnerabilities across device hardware, software, and network communications.
[51]
How QUIC Is Displacing TCP for Speed and Security
Oct 11, 2023 · Improved Security - QUIC works on TLS 1.3, which offers better security. Additionally, it also encrypts large parts of the protocol unlike TCP ...
[52]
None
Summary of each segment:
[53]
Silent TCP connection closure for cellular networks
On the other hand, closing TCP connection immediately after its last data transfer avoids the energy overhead, but can cause performance degradation as doing so ...
[54]
Load Balancer TCP Reset and idle timeout in Azure - Microsoft Learn
Sep 25, 2025 · A common practice is to use a TCP keep-alive. This practice keeps the connection active for a longer period. For more information, see these .
[55]
https://tldp.org/HOWTO/TCP-Keepalive-HOWTO/configuringtcpkeepalive.html
[56]
Telnet Keep-Alive Time-Out Parameter - IBM
Dec 18, 2019 · The TCP keep-alive parameter (TCPKEEPALV) can be set using the Change TCP Attributes (CHGTCPA) command. The value defaults to 120 minutes.
[57]
TCP Keepalive HOWTO
### Summary of TCP Keepalive Performance Trade-offs
[58]
Network Load Balancers - AWS Documentation
Clients or targets can use TCP keepalive packets to restart the idle timeout. Keepalive packets sent to maintain TLS connections can't contain data or payload.Create a Network Load Balancer · Update the security groups · Zonal shift · Tags
[59]
Keepalive - gRPC
Oct 9, 2024 · Health checking allows a server to signal whether a service is healthy while keepalive is only about the connection.Overview · Background · How configuring keepalive... · Keepalive configuration...