Handshake (computing)
In computing, a handshake is a protocol dialogue between two systems for identifying and authenticating themselves to each other, or for synchronizing their operations with each other.[1] This process ensures reliable communication by establishing parameters such as data rates, error handling, and security measures before full data exchange begins. Handshakes are essential across various computing domains, including hardware interfaces, network protocols, and cryptographic exchanges, preventing issues like data loss or unauthorized access. In hardware contexts, handshaking typically involves the exchange of control signals over dedicated lines to coordinate data transfer between devices, such as in serial communication ports. For instance, Request to Send (RTS) and Clear to Send (CTS) signals enable hardware flow control, allowing a sender to pause transmission if the receiver is not ready, thus using five wires: transmit (TX), receive (RX), RTS, CTS, and ground.[2] This mechanism contrasts with software handshaking, which relies on embedded data characters like XON/XOFF for flow control without additional hardware. In networking, handshakes are critical for establishing connections in protocols like TCP and TLS. The TCP three-way handshake, defined in the original protocol specification, involves a client sending a SYN packet, the server responding with SYN-ACK, and the client replying with ACK to synchronize sequence numbers and confirm bidirectional reachability.[3] Similarly, the TLS handshake negotiates encryption parameters and authenticates parties, culminating in shared session keys for secure communication, as outlined in the TLS 1.3 standard.[4] These processes underpin much of modern internet traffic, ensuring reliability and confidentiality.Overview
Definition
In computing, a handshake refers to an initial exchange of signals or messages between two devices, programs, or systems to establish, negotiate, or verify the parameters necessary for subsequent communication or data transfer. This process ensures that both parties are ready and capable of interacting under agreed-upon conditions, such as transmission speed or protocol compatibility.[5][6] Key characteristics of a computing handshake include its bidirectional nature, which facilitates synchronization between the communicating entities; negotiation of operational parameters, potentially encompassing data rates, authentication credentials, or session keys; and incorporation of mechanisms for detecting errors or incompatibilities during the setup phase. These elements collectively prepare a stable channel, preventing mismatches that could lead to failed interactions. Unlike unilateral signals, handshakes require mutual confirmation to proceed.[7][8] The term "handshake" derives from the human physical gesture symbolizing agreement or trust, and its analogy was adopted in early computing for data communication systems and network protocols.[9][10] A handshake differs from a simple acknowledgment (ACK), which is a basic, one-way confirmation of data receipt during an active session; in contrast, a handshake constitutes a structured, multi-step sequence dedicated to initial connection establishment rather than ongoing verification. In protocols like TCP, handshakes play a foundational role in synchronizing endpoints before data exchange begins.[11][12]Purpose and Importance
Handshakes in computing primarily serve to establish reliable connections between communicating entities, such as devices or software processes, by synchronizing their operational states and confirming mutual readiness for data exchange.[7] This process allows the parties to negotiate essential parameters, including data transfer rates, encoding formats, and error-checking mechanisms, ensuring compatibility before any substantive communication occurs.[10] Additionally, handshakes facilitate authentication to verify the identities of participants and detect potential incompatibilities early, preventing wasted resources on mismatched interactions.[13] The benefits of handshakes are substantial, as they reduce transmission errors by incorporating mechanisms for detection and correction, such as acknowledgments, thereby enhancing overall data integrity.[10] By verifying identities and enabling encryption negotiation, handshakes bolster security against unauthorized access, while also promoting efficient resource allocation through synchronized flow control that avoids overwhelming slower components.[13] Furthermore, they support backward compatibility, allowing newer systems to interoperate with legacy hardware or software by dynamically adjusting to supported capabilities.[7] Despite these advantages, handshakes introduce drawbacks, including overhead in terms of time and bandwidth due to the multiple signal exchanges required, which can introduce latency in high-speed environments.[7] They are also susceptible to certain risks, such as resource exhaustion attacks that exploit the state-holding nature of the process, exemplified by SYN flooding where incomplete handshakes consume server memory and processing capacity.[14] Failure modes, like timeouts or mismatched responses, can lead to abrupt connection drops, necessitating retry mechanisms that further amplify overhead.[15] In modern computing, handshakes are indispensable for the scalability of distributed systems, where myriad devices must interoperate seamlessly across networks.[13] Their role is particularly vital in Internet of Things (IoT) ecosystems and cloud environments, enabling secure, synchronized data flows among heterogeneous devices while supporting real-time applications in industrial automation and beyond.[13] Without effective handshaking, the reliability and efficiency of these expansive, interconnected infrastructures would be severely compromised.[10]Types of Handshakes
Software Handshakes
Software handshakes, also known as software flow control, involve the use of special control characters embedded within the data stream to coordinate data transmission between devices, without requiring dedicated hardware control lines. This method relies on in-band signaling over the existing transmit and receive data lines, typically in serial communications.[16][17] The primary mechanism uses ASCII control characters: XON (Transmit On, DC1, hexadecimal 0x11) to resume transmission and XOFF (Transmit Off, DC3, hexadecimal 0x13) to pause it. When a receiving device's buffer approaches capacity, it sends an XOFF character to the sender, which halts data flow until an XON is received. This process ensures synchronization and prevents buffer overflows, though it can be less reliable if control characters are corrupted or misinterpreted as data.[16][18] Software handshakes operate at the physical or data link layer of the OSI model and are commonly applied in asynchronous serial interfaces, such as RS-232 connections between computers and peripherals like printers or modems. They are particularly useful in scenarios with limited cabling, as no extra wires are needed beyond TX, RX, and ground. However, they introduce potential latency from processing control characters and risk data disruption if the protocol lacks error checking.[16] Compared to hardware handshakes, software methods offer flexibility for software-configurable systems but may incur higher CPU overhead for parsing control characters. In some setups, both can be combined, with hardware taking precedence if enabled.[19] The origins of software handshaking date to the early 1960s, coinciding with the development of ASCII (1963) and asynchronous teletypewriter systems, where control characters were used to manage transmission over telephone lines. It became standardized in serial communication protocols and remains in use for legacy and embedded systems.[16][20]Hardware Handshakes
Hardware handshakes involve signal-based exchanges between devices using dedicated control pins or electrical lines at the physical or data link layer, enabling direct coordination without reliance on higher-layer software protocols.[21] These mechanisms ensure reliable data flow by signaling device readiness and managing transmission timing through voltage level changes, where +3 V to +15 V represents logic 0 (SPACE) and -3 V to -15 V represents logic 1 (MARK) in standards like RS-232.[22] This approach is particularly suited to environments where precise, low-level synchronization is required to prevent data loss or buffer overflows.[23] Common mechanisms include the Request-to-Send (RTS) and Clear-to-Send (CTS) signals, which provide hardware flow control in serial interfaces. In this protocol, the data terminal equipment (DTE), such as a computer, asserts the RTS line (pin 4) to indicate readiness to transmit data, prompting the data circuit-terminating equipment (DCE), like a modem, to assert CTS (pin 5) when it can receive, thereby initiating data transfer.[21] Additional signals, such as Data Terminal Ready (DTR) and Data Set Ready (DSR), may complement RTS/CTS by confirming overall connection status before communication begins.[22] These voltage-driven interchanges allow for simple synchronization without embedding control information in the data stream itself.[23] Hardware handshakes find primary application in point-to-point serial links, such as RS-232 connections between computers and peripherals like printers or sensors, where software cannot reliably predict timing due to variable processing delays.[21] For instance, in environmental control systems interfacing with devices like thermostats, RTS/CTS ensures half-duplex communication proceeds only when both ends are prepared, avoiding transmission errors in unreliable timing scenarios.[23] This is essential for legacy and embedded systems relying on direct electrical signaling for basic coordination.[22] Compared to software handshakes, hardware methods offer lower latency through immediate electrical responses, eliminating the need for packet-based acknowledgments and reducing CPU overhead.[24] They impose no additional data overhead, making them efficient for real-time flow control, though their scope is limited to straightforward negotiations such as readiness signaling rather than complex parameter exchanges like baud rate or parity settings.[21] In hybrid systems, hardware handshakes can integrate with software approaches to provide layered control for more robust communication.[24] The origins of hardware handshaking trace back to the 1960s, evolving from teletypewriter systems that required reliable signal coordination for early data transmission over telephone lines. These mechanisms were formalized in the EIA RS-232 standard, first published in 1962 by the Electronic Industries Association to standardize interfaces between data terminals and modems, with subsequent revisions ensuring compatibility across serial ports.[21][22]Handshakes in Networking Protocols
TCP Three-Way Handshake
The TCP three-way handshake is a fundamental mechanism in the Transmission Control Protocol (TCP) for establishing a reliable, connection-oriented communication session between a client and a server. It consists of three sequential steps—Synchronize (SYN), Synchronize-Acknowledge (SYN-ACK), and Acknowledge (ACK)—designed to synchronize sequence numbers and verify bidirectional reachability, ensuring both endpoints can reliably exchange data without prior state assumptions. This process prevents issues like old or duplicate packets from previous connections interfering with new ones, as each side independently generates and confirms its initial sequence number (ISN). Defined in the original TCP specification, the handshake operates at the transport layer and is essential for TCP's reliability features, such as ordered delivery and error recovery.[25] The process begins with the client (active opener) initiating a connection by sending a SYN segment to the server (passive opener). This SYN packet includes the client's 32-bit ISN, randomly selected to avoid predictability, and sets the SYN control flag while leaving the acknowledgment number undefined. Upon receipt, the server responds with a SYN-ACK segment, which includes its own 32-bit ISN, sets both SYN and ACK flags, and specifies an acknowledgment number equal to the client's ISN plus one (ACK = client's ISN + 1) to confirm receipt of the SYN; this step consumes one sequence number from the server's side. Finally, the client sends an ACK segment acknowledging the server's ISN plus one (ACK = server's ISN + 1), completing the handshake and transitioning both endpoints to the ESTABLISHED state, where data transmission can begin. Each segment may carry TCP options, and the SYN and SYN-ACK packets do not carry application data to maintain the handshake's focus on connection setup.[25][26][27] During the handshake, several key parameters are exchanged to optimize the connection. The Maximum Segment Size (MSS) is advertised via a TCP option in the SYN and SYN-ACK segments, allowing each side to inform the other of its maximum receivable data payload size (typically derived from the interface MTU minus headers), though it is not formally negotiated but unilaterally stated for the peer to respect. The initial receive window size, a 16-bit field in the TCP header, is included in all segments to enable flow control by indicating the amount of buffer space available for incoming data, starting from the SYN-ACK onward. Additionally, support for Selective Acknowledgments (SACK) can be proposed through the SACK-Permitted option in the SYN segment, permitting the receiver to acknowledge non-contiguous blocks of data later in the session if both sides agree during the handshake. These parameters ensure efficient data transfer tailored to the network path.[28][29][30] Sequence number synchronization relies on a simple yet robust formula: each endpoint generates a random 32-bit ISN at the start, with the SYN segment advancing the sequence by one (as SYN consumes a slot), so subsequent data begins at ISN + 1; acknowledgments confirm this by setting ACK = peer's ISN + 1 in the response. This mutual confirmation—client acknowledging server's ISN + 1 in the final ACK, and server having already acknowledged the client's—ensures both sides agree on the starting point for byte-stream numbering, preventing desynchronization. In mathematical terms, for client ISN_c and server ISN_s: \text{SYN: } \text{SEQ} = \text{ISN}_c, \quad \text{CTL} = \text{SYN} \text{SYN-ACK: } \text{SEQ} = \text{ISN}_s, \quad \text{ACK} = \text{ISN}_c + 1, \quad \text{CTL} = \text{SYN, ACK} \text{ACK: } \text{SEQ} = \text{ISN}_c + 1, \quad \text{ACK} = \text{ISN}_s + 1, \quad \text{CTL} = \text{ACK} The 32-bit space wraps around after $2^{32} bytes, but the handshake's design minimizes collision risks through ISN randomization.[31][32][33] A notable security concern with the three-way handshake is vulnerability to SYN flooding attacks, where an attacker sends numerous SYN packets without completing the handshake, exhausting the server's half-open connection queue and denying service to legitimate clients. This exploits the server's need to allocate resources (like state and buffers) upon receiving a SYN, potentially leading to resource depletion. A common mitigation is the use of SYN cookies, a cryptographic technique where the server encodes state information into the ISN of the SYN-ACK without storing per-connection data; upon receiving the final ACK, the client-provided cookie is verified to restore state only for valid connections, reducing memory usage during floods.[34][35] The three-way handshake was originally standardized in RFC 793 in 1981 as part of the core TCP specification. Subsequent updates have maintained its core mechanics while enhancing compatibility, such as adaptations for IPv6 transport in RFC 2460, which ensure the handshake functions identically over IPv6 addresses without altering the sequence synchronization process, and further refinements in RFC 9293 (TCP specification update in 2022) for clarity and minor clarifications on options handling.[36]TLS Handshake
The TLS handshake is a multi-phase cryptographic exchange protocol that establishes a secure communication session between a client and a server over a reliable transport layer, such as TCP, by authenticating the parties, negotiating cryptographic parameters, and deriving symmetric session keys for subsequent encrypted data transfer.[37] This process ensures confidentiality, integrity, and authenticity while preventing eavesdropping and tampering during session initiation.[38] The handshake begins with the ClientHello message, in which the client specifies supported TLS versions (e.g., up to 1.3), a list of preferred cipher suites defining encryption algorithms and key exchange methods, a 32-byte random nonce for freshness, and optional extensions such as supported key exchange groups (e.g., secp256r1 or X25519).[39] The server responds with a ServerHello message selecting the highest mutually supported version and cipher suite, its own 32-byte random nonce, and relevant extensions, followed by its X.509 certificate chain for public-key authentication.[40] The key exchange phase then occurs, typically using ephemeral Diffie-Hellman (DHE) or elliptic curve Diffie-Hellman (ECDHE) for forward secrecy, where the client and server exchange public values to compute a shared premaster secret without transmitting it directly; as of November 2025, hybrid post-quantum key exchanges—combining classical methods like X25519 with post-quantum algorithms such as ML-KEM (Kyber)—are widely adopted for quantum-resistant security, with over 50% of traffic on major platforms using such protections; alternatively, RSA-based exchange encrypts the premaster secret with the server's public key from the certificate.[41][42] The handshake concludes with Finished messages from both parties, each containing a message authentication code (MAC) computed over the entire handshake transcript using the newly derived keys to verify integrity and prevent replay attacks.[43] Central to the TLS handshake are concepts from public-key cryptography, which facilitate initial server (and optionally client) authentication via digital certificates, transitioning to efficient symmetric cryptography for the session's bulk encryption using algorithms like AES.[44] Extensions such as Server Name Indication (SNI) allow the client to specify the target hostname in the ClientHello, enabling virtual hosting on shared IP addresses without compromising security.[45] Key derivation combines the premaster secret with the client and server random nonces through a pseudorandom function (PRF); in earlier versions like TLS 1.2, this uses a PRF based on HMAC-SHA256, while TLS 1.3 employs HKDF (HMAC-based key derivation function) for enhanced security.[46] For instance, in TLS 1.3, traffic secrets are derived as: \text{Traffic Secret} = \text{HKDF-Expand-Label}(\text{Handshake Secret}, \text{"c ap traffic"}, \text{0x}, \text{Hash.length}) where HKDF-Expand-Label applies HKDF-Extract and HKDF-Expand with a label and context from the handshake transcript hash.[47] The protocol evolved from the insecure Secure Sockets Layer (SSL) 1.0, proposed by Netscape in 1994 and never released due to vulnerabilities, through SSL 2.0 (1995, flawed authentication) and SSL 3.0 (1996, basis for TLS), to TLS 1.0 (1999, RFC 2246) which formalized improvements like explicit IVs.[48] Subsequent versions addressed weaknesses: TLS 1.1 (2006, RFC 4346) mitigated CBC padding oracle attacks, TLS 1.2 (2008, RFC 5246) introduced flexible signature algorithms and AEAD ciphers, and TLS 1.3 (2018, RFC 8446) streamlined the handshake to a single round-trip (1-RTT) by integrating key exchange earlier and mandating forward secrecy.[49][38][37] TLS incorporates security features like Perfect Forward Secrecy (PFS), achieved through ephemeral key exchanges (e.g., ECDHE) that ensure compromise of long-term keys does not expose past sessions, and resistance to downgrade attacks via explicit version negotiation in extensions and authenticated checks in the ServerHello random field or Finished MACs.[40] These mechanisms build on an underlying reliable transport like TCP to focus solely on cryptographic security.[50]SMTP Handshake
The SMTP handshake is the initial command-based exchange in the Simple Mail Transfer Protocol (SMTP) that establishes a session between a client Mail Transfer Agent (MTA) and a server MTA for email delivery, enabling capability negotiation and identity verification before message transfer begins.[51] This process occurs at the application layer over a TCP connection and differs from lower-layer handshakes by relying on textual commands and responses rather than binary flags or cryptographic keys. It ensures both parties agree on protocol features, such as support for extended capabilities, while identifying the domains involved in the transaction.[52] The handshake begins when the client initiates a TCP connection to the server on port 25, prompting the server to issue a 220 "Service ready" greeting, which may include the server's domain and software details.[53] The client then sends an EHLO (Extended SMTP) command followed by its fully qualified domain name (FQDN) or address literal, signaling support for SMTP extensions; alternatively, it uses the simpler HELO command for basic SMTP without extensions.[54] In response, the server replies with a multiline 250 "OK" status, listing supported extensions—such as AUTH for authentication or STARTTLS for opportunistic TLS encryption—in a parameterized format that allows the client to select compatible features.[55] If EHLO fails or extensions are unavailable, the client falls back to HELO, and the server confirms with a single 250 response, clearing any prior state.[54] Key parameters in the handshake include the client's domain identification via EHLO or HELO, which verifies the sender's origin, and the server's advertised features, such as 8BITMIME for transporting 8-bit text content without alteration or PIPELINING for sending multiple commands without waiting for individual responses to improve efficiency.[52] These extensions are registered with IANA and defined in separate RFCs, allowing modular enhancement of the base protocol.[56] Following the greeting exchange, the client may negotiate authentication using AUTH parameters or initiate encryption with STARTTLS if advertised, upgrading the session to TLS in a single sentence of integration without altering the core handshake flow.[57] Error handling during the handshake involves standardized reply codes; for instance, a 421 "Service not available" response indicates the server is busy or shutting down, prompting the client to close the connection and implement retry logic with exponential backoff.[58] Other errors, like 500 for syntax issues or 503 for invalid sequence, terminate the session immediately, distinguishing temporary (4xx) from permanent (5xx) failures to guide client behavior.[59] The SMTP handshake is standardized in RFC 5321, published in 2008, which consolidates and updates the original RFC 821 from 1982 by incorporating Extended SMTP (ESMTP) mechanisms for modern extensions while maintaining backward compatibility.[60][61] This evolution ensures robust session setup across diverse email infrastructures. In the broader email flow, the handshake establishes sender and receiver identities through domain parameters, paving the way for subsequent MAIL FROM and RCPT TO commands that specify paths without re-verifying basics.[62] For illustration, a typical EHLO exchange might appear as follows:This format highlights the server's multiline response, enabling the client to proceed with feature-aware commands.[63]C: EHLO client.example.com S: 250-server.example.com Hello client.example.com S: 250-8BITMIME S: 250-PIPELINING S: 250 STARTTLS S: 250 [OK](/page/OK)C: EHLO client.example.com S: 250-server.example.com Hello client.example.com S: 250-8BITMIME S: 250-PIPELINING S: 250 STARTTLS S: 250 [OK](/page/OK)