USB communications
Universal Serial Bus (USB) is an industry-standard serial bus specification for connecting computers and electronic devices, enabling bidirectional communication, power delivery, and hot-plugging of peripherals such as keyboards, mice, storage drives, and displays.[1] Developed initially in the mid-1990s by a consortium including Compaq, DEC, IBM, Intel, Microsoft, NEC, and Nortel, USB was designed to replace disparate legacy ports like serial, parallel, and PS/2 with a unified, low-cost interface that supports plug-and-play functionality and automatic device configuration.[2]
The USB architecture employs a tiered-star topology centered on a host controller (typically in a PC or hub), which manages communication with up to 127 devices through hubs that extend connectivity and support multiple data rates.[2] Devices connect via differential signaling over twisted-pair wires (D+ and D- lines for data, plus VBUS for power and ground), with enumeration occurring upon attachment to assign addresses and configure endpoints based on device descriptors.[2] Communication follows a host-centric, polling-based protocol using packets structured into tokens (to initiate transactions), data (for payload), and handshakes (for acknowledgment), organized within 1 ms frames (or microframes for high-speed modes).[3]
USB communications support four primary transfer types: control (for setup and configuration), interrupt (for low-latency input like keyboards), bulk (for reliable, high-volume data like printers), and isochronous (for time-sensitive streams like audio/video), ensuring efficient bandwidth allocation across low-speed (1.5 Mbps), full-speed (12 Mbps), and higher modes.[2] The standard has evolved through multiple generations for increased performance: USB 1.0 (1996, up to 12 Mbps), USB 1.1 (1998, refinements), USB 2.0 (2000, 480 Mbps high-speed), USB 3.0 (2008, 5 Gbps SuperSpeed), USB 3.1/3.2 (2013/2017, up to 20 Gbps), USB4 (2019, up to 40 Gbps with Thunderbolt 3 integration), and USB4 Version 2.0 (2022, up to 80 Gbps), with 240 W power delivery via USB Type-C connectors.[4][5] Backward compatibility across versions allows seamless integration, while advancements like USB Power Delivery enhance charging capabilities beyond data transfer.[4]
Physical Layer (USB PHY)
Electrical Specifications
USB electrical specifications establish the foundational parameters for signal integrity, power distribution, and physical connectivity, ensuring reliable communication across various device classes and versions. Voltage levels for power delivery begin with the baseline of 5 V nominal for USB 2.0, where host ports must supply VBUS between 4.75 V and 5.25 V, while bus-powered devices operate down to 4.40 V to accommodate voltage drops.[6] Data signaling voltages are designed for compatibility with standard logic levels; for full-speed operation (12 Mbps), signals on the D+ and D- lines use 3.3 V TTL-compatible levels, with high states defined as 2.8 V to 3.6 V and low states as 0 V to 0.3 V relative to ground.[7] These specifications support low-power devices drawing up to 100 mA (one unit load) and high-power devices up to 500 mA without negotiation.[8]
Advanced power capabilities extend these limits through USB Power Delivery (USB PD), which negotiates higher voltages and currents over USB Type-C connectors. USB PD 2.0 and 3.0 support fixed voltages of 5 V, 9 V, 15 V, and 20 V, enabling up to 100 W (20 V at 5 A), while USB PD 3.1 introduces 28 V, 36 V, and 48 V options for power levels reaching 240 W (48 V at 5 A).[9] For data signaling in high-speed modes (480 Mbps in USB 2.0), the specification shifts to differential voltage swings of approximately ±400 mV centered around a 3.3 V common-mode voltage, reducing susceptibility to noise compared to single-ended signaling.[10]
Cable and connector requirements further define electrical performance. USB employs twisted-pair wiring for the differential D+ and D- signals, requiring a characteristic impedance of 90 Ω ±15% to minimize reflections and maintain signal quality.[11] Connector evolution reflects the need for smaller form factors and versatility: initial USB 1.x and 2.0 standards used rectangular Type-A plugs for hosts and squarer Type-B for devices, followed by Mini-A/B and Micro-A/B variants for portable applications in the mid-2000s; USB Type-C, released in August 2014, introduced a compact, reversible 24-pin design that supports all USB speeds and became mandatory for USB 3.1 and later specifications.[12]
To mitigate electromagnetic interference (EMI), USB cables incorporate dual-layer shielding—typically aluminum foil for high-frequency coverage and a tinned copper braid for low-frequency protection—covering at least 65% of the cable surface, with the shield grounded at both connector ends to chassis or signal ground.[13] This grounding path drains induced currents, preventing radiation or reception of noise that could corrupt data. Cable length limits are imposed to preserve eye diagram margins and round-trip delay: full-speed USB 2.0 supports up to 5 m total bus length, while high-speed operation is effectively limited to 3 m cables to avoid excessive attenuation and jitter.[14]
Signaling Rates and Framing
USB communications employ various signaling rates to accommodate different performance needs, evolving from the initial USB 1.x standards to support higher bandwidths. Low-speed mode operates at 1.5 Mb/s, primarily for devices like keyboards and mice requiring minimal data throughput. Full-speed mode achieves 12 Mb/s, suitable for early peripherals such as printers and scanners. High-speed mode, introduced in USB 2.0, reaches 480 Mb/s, enabling faster transfers for storage and imaging devices.[6] SuperSpeed in USB 3.0 provides 5 Gb/s, while subsequent iterations like USB 3.2 extend to 10 Gb/s and 20 Gb/s using multi-lane configurations. USB 4, released in 2020, supports up to 40 Gb/s, with version 2.0 in 2022 doubling this to 80 Gb/s via advanced encoding.[15][16]
In USB 2.0 modes, data is encoded using non-return-to-zero inverted (NRZI), where a transition represents a 0 and the absence of a transition a 1, ensuring sufficient signal edges for clock recovery. To maintain synchronization and prevent long runs of identical bits that could cause clock drift, bit stuffing inserts a 0 after every six consecutive 1s, limiting run lengths to six bits. This mechanism adds up to 16.7% overhead but preserves reliable clock extraction without dedicated clock lines.[6]
Framing in USB 2.0 begins with an 8-bit sync pattern, KJKJKJKK in 8b/10b notation (though USB 2.0 uses NRZI directly), which provides a unique sequence for receiver alignment and phase locking. The end-of-packet (EOP) is signaled by a single-ended zero (SE0) state lasting at least two bit times for full-speed or eight for high-speed, allowing clear delineation of packet boundaries. Inter-packet delays enforce a minimum of two bit times at full-speed and low-speed to reset the bus and prepare for the next transmission, preventing overlap and aiding recovery.[6]
Clock tolerance is critical for maintaining bit synchronization across devices. For full-speed, the clock must stay within ±0.25% of the nominal 12 MHz for both hosts and devices; high-speed relaxes to ±500 ppm for both, balancing precision with implementation feasibility.[6]
USB 3.0 and later shift from NRZI to more efficient encodings for SuperSpeed and beyond, using 8b/10b in USB 3.0 to map 8-bit data to 10-bit symbols for DC balance and transition density, combined with a scrambler based on a 16-bit LFSR (polynomial X^{16} + X^5 + X^4 + X^3 + 1) to reduce electromagnetic interference. Framing evolves to ordered sets like TS1/TS2 for training and synchronization, with headers including CRC for integrity and EOP marked by EPF symbols followed by idle sequences. Clock tolerance tightens to ±300 ppm, with optional spread-spectrum clocking up to ±5000 ppm deviation for EMI mitigation.[15]
Higher USB 3.x rates (10–20 Gb/s) adopt 128b/132b encoding to minimize overhead to 3%, supporting denser data packing. USB 4 introduces PAM-3 signaling in its version 2.0 for 80 Gb/s operation, using three amplitude levels at a 25.6 Gbaud rate per lane to achieve higher throughput without increasing pin count, while maintaining backward compatibility through protocol tunneling. This evolution ensures scalable, reliable data flow across USB generations.[16]
Signaling States
USB communications employ differential signaling over the D+ and D- twisted-pair lines to transmit data reliably, using voltage differences to represent logical states while minimizing electromagnetic interference.[6] In USB 1.x and 2.0 full-speed (12 Mb/s) and low-speed (1.5 Mb/s) modes, three primary signaling states—J, K, and SE0 (single-ended zero)—define the bus behavior. The J state serves as the idle condition for full-speed, where D+ is driven high (approximately 2.8–3.6 V) and D- low (0–0.3 V), resulting in a differential voltage of about 200–400 mV with D+ > D-.[6] The K state, used for signaling transitions and data encoding, inverts this polarity: D+ low and D- high, yielding D+ < D-.[6] SE0 occurs when both lines are driven low (<0.3 V), suppressing differential signaling for control purposes like resets or end-of-packet (EOP) delineation.[6]
Low-speed mode inverts the J and K polarities relative to full-speed to accommodate devices with pull-up resistors on D- instead of D+, enabling speed detection via the host's measurement of line idle states.[6] In low-speed, the J state has D- high (2.0–3.3 V) and D+ low, while K has D- low and D+ high, maintaining similar differential voltages but reversed.[6] These states facilitate device attachment detection: a full-speed device's 1.5 kΩ pull-up on D+ pulls the idle line to J, whereas a low-speed device's pull-up on D- results in idle K for full-speed observers, prompting the host to adjust signaling accordingly.[6]
High-speed capability (480 Mb/s) in USB 2.0 is negotiated via the chirp protocol during connection, where devices initially operate in full-speed compatibility mode.[6] Upon reset, a high-speed-capable device signals its ability by driving a Chirp K (D- high for 1–7 ms) on the bus, prompting the host or hub to respond with a Chirp J/K sequence—alternating J and K states every 40–60 μs for at least three cycles—to confirm high-speed support and switch the downstream port to high-speed electrical characteristics.[6] If the chirp fails or is not detected, the device reverts to full-speed operation.[6]
Line transitions between these states manage bus control and power states. A reset is initiated by the host driving SE0 for at least 10 ms (though devices may begin reset detection after 2.5 μs of SE0), reinitializing the device and downstream ports.[6] Suspend occurs after 3 ms of idle bus activity (J state with no transitions), reducing device power draw to ≤500 μA while keeping the connection alive.[6] Resume is signaled by driving the K state for 10–20 ms, either by the host to awaken the bus or by a remote-wakeup-enabled device; this transitions the bus back to active signaling.[6]
Error conditions, such as babble, arise when a device transmits beyond its allotted time, potentially disrupting the bus.[6] Hubs detect babble via timeouts, such as continuous activity exceeding 1 ms after EOP or failure to receive a start-of-frame (SOF) token within a frame interval (1 ms for full-speed), triggering port disablement to isolate the fault.[6] This ensures bus integrity by enforcing transaction timeouts and preventing indefinite signaling.[6]
Transmission Process
In USB communications, the host initiates all transmissions on the bus, employing token-based access control to prevent collisions by scheduling packets and ensuring exclusive use of the medium. This host-centric approach eliminates the need for carrier sensing or arbitration among devices, as the host dictates the timing and sequence of all bus activity.[17]
The transmission of a packet commences with a sync field, which aligns the receiver's bit clock to the transmitter's. In full-speed mode (12 Mbit/s), the sync field consists of an 8-bit pattern encoded as KJKJKJKK using NRZI signaling, where the final two K states signal the start of the packet identifier. Following the sync, the packet identifier (PID) is transmitted as an 8-bit value—comprising 4 data bits defining the packet function and their bitwise complement for error detection—allowing the receiver to decode the packet's role in the transaction. The data payload, if present, follows the PID, carrying up to 1023 bytes in full-speed isochronous transfers or 64 bytes in control and bulk transfers, with bit stuffing applied every 6 consecutive 1s to maintain clock recovery. A cyclic redundancy check (CRC) then verifies integrity: a 5-bit CRC5 for token packets or a 16-bit CRC16 for data packets, computed over the preceding fields to detect errors. The packet concludes with an end-of-packet (EOP) delimiter, signaled by driving both data lines low (SE0 state) for at least 2 bit times (approximately 160-175 ns at full speed), followed by a return to the idle J state for 1 bit time.[18][18][18][18][18][18]
Upon reception, the device verifies signal quality through eye pattern analysis, which assesses voltage and timing margins to confirm compliance with the required bit eye opening (e.g., 400 mV differential at full speed). This ensures the overall bit error rate (BER) remains below the recommended limit of 10^{-12} for high-speed (480 Mbit/s) operation, providing robust error performance over typical cable lengths. After the transmitter releases the bus (post-EOP), a turnaround time of 8 bit times elapses to allow line settling and prevent contention, during which the bus idles in the J state before the receiver may drive a response.[18][18][18]
For a representative full-speed OUT token packet (used to initiate host-to-device data transfer), the bit sequence is as follows:
Sync (8 bits): K J K J K J K K
PID (8 bits, OUT token): 0001 1110
ADDR (7 bits): [device address, e.g., 0000001]
ENDP (4 bits): [endpoint number, e.g., 0001]
CRC5 (5 bits): [calculated remainder, e.g., 01100]
EOP: SE0 (≥2 bit times) + J (1 bit time)
Sync (8 bits): K J K J K J K K
PID (8 bits, OUT token): 0001 1110
ADDR (7 bits): [device address, e.g., 0000001]
ENDP (4 bits): [endpoint number, e.g., 0001]
CRC5 (5 bits): [calculated remainder, e.g., 01100]
EOP: SE0 (≥2 bit times) + J (1 bit time)
This sequence totals variable length based on address and endpoint but exemplifies the PHY-level framing before any data follows in a subsequent packet. If the receiver detects a CRC error during validation, it responds with a negative acknowledgment (NAK) handshake packet after the turnaround, signaling the host to retry the transmission; repeated failures (typically after three attempts) may lead to transaction abortion.[18][18][18]
Protocol Layer Basics
Packet Structure Overview
USB packets in the Universal Serial Bus (USB) protocol follow a standardized format to ensure reliable data transmission across the bus. Every packet begins with an 8-bit synchronization (SYNC) field for low- and full-speed operations, consisting of the pattern 00000001 (in NRZI encoding) to align the receiver's clock with the transmitter's; high-speed packets use a 32-bit SYNC field for improved synchronization at faster rates.[6] Following the SYNC is the 8-bit packet identifier (PID) field, which encodes the packet type using a 4-bit opcode in the lower nibble, complemented by its bitwise inverse in the upper nibble to enable single-bit error detection—if the complements do not match, the packet is discarded. For instance, the PID for an OUT token packet is 0001 in the lower nibble and 1110 in the upper (binary 11100001 or 0xE1).[17][19]
The PID is succeeded by variable fields specific to the packet type, such as a 7-bit device address and 4-bit endpoint number for token packets, or payload data for data packets, allowing addressing of up to 127 devices and 16 endpoints per device. These fields are followed by a cyclic redundancy check (CRC) for integrity verification: 5 bits covering the address and endpoint in token packets, or 16 bits over the data payload in data packets. The packet concludes with an end-of-packet (EOP) delimiter, signaled by driving both data lines low (single-ended zero, SE0) for at least two bit times, followed by a transition to the idle state (J state) for one bit time to indicate completion.[6][17]
Packet payload sizes vary by USB version and transfer type to balance bandwidth and error handling. In USB 1.x specifications, low-speed devices support a maximum of 8 bytes per packet, while full-speed control transfers use an 8-byte setup packet, with data packets up to 64 bytes for bulk and interrupt endpoints. USB 2.0 extended this in high-speed mode, allowing up to 512 bytes for bulk transfers and 1024 bytes for isochronous transfers to accommodate higher throughput demands.[20][21] Historically, the USB 1.0 specification released in 1996 defined the initial packet format with these low- and full-speed limits, emphasizing simplicity for early peripherals; the USB 2.0 specification in 2000 introduced high-speed enhancements to support broader multimedia applications.[6][22]
In USB 3.x SuperSpeed modes, the packet structure builds on prior versions but incorporates data scrambling after the 32-bit SYNC and PID to mitigate electromagnetic interference (EMI) by randomizing long runs of identical bits. Scrambling employs a self-synchronizing linear feedback shift register (LFSR) with the polynomial G(X) = X^{16} + X^5 + X^4 + X^3 + 1, reset on comma symbols, ensuring the encoded signal has balanced DC characteristics without affecting data integrity, as the receiver applies the inverse process. CRC lengths extend to 32 bits for enhanced error detection in larger payloads, supporting packet sizes up to 1024 bytes or more in burst modes.[23]
Token Packets
Token packets in USB communications serve as host-initiated signals to manage and direct data transactions on the bus, specifying the recipient device, endpoint, and transfer type without carrying payload data themselves. These packets are essential for coordinating communication between the host and peripherals, ensuring orderly access to the shared bus. In the USB protocol, token packets precede data or handshake packets in most transactions, providing the framework for control, bulk, interrupt, and isochronous transfers.[6]
The general structure of a token packet includes a synchronization field (Sync), an 8-bit packet identifier (PID), a 7-bit device address (ADDR), a 4-bit endpoint number (ENDP), and a 5-bit cyclic redundancy check (CRC5) for integrity verification, followed by an end-of-packet delimiter. The Sync field, consisting of 8 bits in full- and low-speed modes or 32 bits in high-speed mode, aligns the receiver's clock and marks the packet's start. The PID encodes the token subtype in its lower 4 bits, with the upper 4 bits serving as a bitwise complement for error detection. ADDR identifies the target device (ranging from 0 to 127), while ENDP specifies the endpoint (0 to 15, with endpoint 0 reserved for control). The CRC5, computed using the polynomial G(X) = X^5 + X^2 + 1, covers the ADDR and ENDP fields to detect transmission errors, yielding a residual of 01100 in binary for error-free packets.[6]
The OUT token, identified by PID 0xE1 (binary 11100001), initiates a data transfer from the host to the device, directing the specified endpoint to prepare for incoming data. Similarly, the IN token, with PID 0x69 (binary 01101001), requests data from the device to the host, prompting the endpoint to transmit its response. Both OUT and IN tokens share the same ADDR and ENDP fields, enabling precise targeting in non-isochronous transfers. The SETUP token, using PID 0x2D (binary 00101101), is structurally identical to OUT but signals the initial phase of a control transfer, where the host sends setup data to configure or query the device. In high-speed USB 2.0 and later, the PING token (PID 0xB4, binary 10110100) probes a device's buffer availability for OUT transactions without transmitting data, reducing bus overhead by avoiding unnecessary retries in flow-controlled bulk or control transfers.[6]
For bus synchronization, the Start of Frame (SOF) token, encoded with PID 0xA5 (binary 10100101), is broadcast periodically by the host: every 1 ms in full-speed mode to mark frame boundaries or every 125 μs in high-speed mode for microframes. Unlike other tokens, SOF includes an 11-bit frame number field (replacing ENDP and part of ADDR space) to maintain timing synchronization across devices, aiding in bandwidth allocation and error recovery. Token packets play a pivotal role in structuring USB transactions by defining their initiation and direction.[6]
| Token Type | PID (Hex) | Purpose | Key Fields |
|---|
| OUT | 0xE1 | Host-to-device data initiation | ADDR (7 bits), ENDP (4 bits) |
| IN | 0x69 | Device-to-host data request | ADDR (7 bits), ENDP (4 bits) |
| SETUP | 0x2D | Control transfer setup phase | ADDR (7 bits), ENDP (4 bits) |
| PING | 0xB4 | High-speed buffer probe | ADDR (7 bits), ENDP (4 bits) |
| SOF | 0xA5 | Frame/microframe synchronization | Frame number (11 bits) |
Data Packets
Data packets in USB communications serve as the primary mechanism for transferring payload data between the host and devices, following an initiating token packet. These packets encapsulate the actual data bytes intended for endpoints, ensuring reliable delivery through built-in sequencing and error detection. Unlike token packets, which only initiate transactions, data packets carry variable-length payloads protected by cyclic redundancy checks (CRC). Their direction—either OUT (host-to-device) or IN (device-to-host)—is determined by the preceding token packet, aligning with the transaction's intent as defined in endpoint descriptors.[6][24]
In USB 2.0 and earlier, data packets begin with an 8-bit Packet Identifier (PID) that alternates between DATA0 (binary 11000011) and DATA1 (binary 00111100) to maintain sequencing integrity. This toggle mechanism, often referred to as the data toggle bit, allows both sender and receiver to track expected packets; a mismatch indicates a missing or corrupted packet, triggering retransmission or error handling at the protocol layer. The payload follows the PID, consisting of 0 to 1,024 bytes for high-speed bulk and control transfers (or up to 1,023 bytes for isochronous), padded if necessary to align to byte boundaries and conform to the maximum packet size specified in the endpoint descriptor. A 16-bit CRC (CRC16) is appended to the payload for error detection, computed using the polynomial X^{16} + X^{15} + X^2 + 1, covering the entire data field to verify integrity upon receipt.[6]
USB 3.x introduces enhancements for higher throughput and link-layer reliability, expanding data packets to include a dedicated Data Packet Header (DPH) preceding the payload. The DPH, typically 16 bytes in SuperSpeed (with extensions in later generations), incorporates fields such as endpoint number, data length, and a header sequence number—initially 3 bits (modulo-8) for SuperSpeed, enabling ordered delivery and recovery from transmission errors through retransmission protocols. The payload supports up to 1024 bytes per packet, with bursts enabling larger effective transfers to accommodate SuperSpeed+ bandwidth, while still enforcing per-endpoint maximum sizes and byte-aligned padding via zero-filling if the data is shorter. Error protection upgrades to a 32-bit CRC (CRC32) over the payload, using the polynomial $0x04C11DB7, which provides stronger detection for larger data blocks and supports retry mechanisms in the link layer.[24]
Handshake Packets
Handshake packets in USB communications are short response packets transmitted by the receiver (typically a device or hub) to indicate the status of a transaction following a token and optional data packet. They play a crucial role in providing feedback on data reception success, errors, or readiness, enabling reliable flow control without the overhead of full data payloads. These packets are defined in the USB 2.0 specification, where they form the handshake phase of transactions in control, bulk, and interrupt transfers, but are absent in isochronous transfers due to their time-sensitive nature.[25]
The structure of a handshake packet is minimal to ensure low latency: it consists of an 8-bit synchronization (SYNC) field, an 8-bit packet identifier (PID) field, and an end-of-packet (EOP) marker, totaling approximately 3 bytes in transmission. The PID field encodes the specific response type using a 4-bit code followed by its bitwise complement for basic error detection, eliminating the need for a cyclic redundancy check (CRC) due to the packet's brevity and non-data-carrying purpose. Unlike data packets, handshake packets contain no address, endpoint, or payload fields, focusing solely on status signaling.[25]
The ACK (PID 0xD2) handshake confirms successful reception of a data packet without errors, including correct CRC validation if applicable, and is issued by the receiver to acknowledge integrity in control, bulk, and interrupt OUT transactions. In contrast, the NAK (PID 0x5A) signals that the device is temporarily unable to transmit or receive data, such as when an endpoint buffer is not ready, prompting the host to retry later; this is commonly used in bulk transfers for flow control to avoid overwhelming low-speed devices. The STALL (PID 0x1E) indicates a more severe condition, such as a halted endpoint due to a permanent error or unsupported request, requiring the host to issue a Clear Feature command to resume operations. Introduced in USB 2.0 for high-speed operations, the NYET (PID 0x96) response denotes "not yet," signifying partial buffer availability or that a split transaction is incomplete, enhancing efficiency in high-speed bulk and control transfers by allowing immediate retries without full stalls.[25]
Handshake packets facilitate essential flow control mechanisms in USB, particularly preventing buffer overflows in devices with limited resources by using NAK and NYET to throttle data rates dynamically during transactions. For instance, in bulk OUT transfers, repeated NAKs from the device signal backpressure until buffers are cleared, ensuring reliable delivery without data loss. Their integration into transaction flows, such as the status stage of control transfers, allows hosts to detect and respond to issues promptly, maintaining overall bus stability. The absence of CRC in these packets underscores their role as lightweight status indicators, relying instead on the inherent robustness of the USB physical layer and PID complement for detection of transmission errors.[25]
Advanced Protocol Features
Special Packets
In USB communications, special packets handle exceptional scenarios such as legacy speed compatibility, transaction splitting in mixed-speed environments, error indication at the physical layer, and power management transitions. These packets deviate from standard token, data, and handshake formats to support synchronization, error recovery, and efficiency in evolving USB generations.[6][26]
The PRE (Preamble) packet, defined in the USB 2.0 specification, enables full-speed or high-speed hubs to switch to low-speed mode (1.5 Mb/s) for communicating with low-speed devices. It consists of an 8-bit SYNC field at full-speed, followed by an 8-bit PRE PID (binary 11000001 or 0xC1) and a full-speed End of Packet (EOP) signal. Upon detection, hubs must drive a 'J' state on enabled low-speed ports within four full-speed bit times and prepare their repeaters and transaction translators for low-speed signaling, ensuring downstream packets propagate correctly to low-speed endpoints. This packet precedes all low-speed token packets (e.g., OUT, IN, SETUP) and is transmitted by the host, with a minimum setup interval of four full-speed bit times before the subsequent low-speed SYNC. Hubs comprehend the PRE PID while ignoring it for non-low-speed traffic, truncating invalid packets if necessary.[6][6]
SSPLIT and CSPLIT packets, introduced in USB 2.0 for high-speed hubs, facilitate split transactions to bridge high-speed (480 Mb/s) hosts with full-speed or low-speed devices attached via transaction translators (TTs). The SSPLIT (Start-Split) packet initiates the transaction by queuing a request or data in the TT during one microframe, while the CSPLIT (Complete-Split) packet retrieves the response or status in a subsequent microframe, allocating up to 80% of microframe time for periodic transfers. Both are special token packets with an 8-bit SYNC, 8-bit PID (SSPLIT: binary 01101001 or 0x69; CSPLIT: binary 10011010 or 0x9A), 7-bit device address, 4-bit endpoint number, 7-bit hub address, 7-bit port number, 1-bit start/complete (S/C) flag (0 for start, 1 for complete), 1-bit speed (S) flag (0 for full-speed, 1 for low-speed), 1-bit end (E) or reserved (U) flag, 2-bit endpoint type (ET: 00 control, 01 isochronous, 10 bulk, 11 interrupt), and a 5-bit CRC. These fields route the transaction to the specific hub and port, with the host controller managing the split invisibly to devices. Hubs buffer and translate the transaction, returning handshakes (e.g., ACK, NAK, STALL) via CSPLIT based on TT status.[6][6]
In USB 3.0 and later, physical layer (PHY) errors such as disparity violations or invalid 8b/10b decoding are signaled to enable link-level recovery and maintain a bit error rate below 1 in 10^12 for data and 1 in 10^20 for headers. These errors are handled through mechanisms like the LBAD (Link Bad) link command, which is transmitted upstream in response to header errors (e.g., CRC-5/CRC-16 failures), and the SUB symbol (K28.4), which substitutes invalid symbols during reception. Upon detection of a PHY error, the receiver may queue an LBAD for upstream transmission, potentially triggering retransmission, flow control adjustments, or entry into Recovery state if errors persist. Hubs validate incoming headers and discard erroneous packets, restoring buffer credits only for valid ones, while transmitters may abort via DPPABORT ordered sets if babble or timeout occurs. This mechanism integrates with CRC fields in all packets for end-to-end integrity.[26][26]
USB 3.1 and later enhance link power management with dedicated LINK_POWER_MANAGEMENT packets, which are Link Management Packets (LMPs) using subtypes like Set Link Function to transition to low-power states U1 (fast exit latency ~1 µs) and U2 (slower exit 2 µs to 65 ms) during idle periods, reducing power while preserving quick recovery. These 8-symbol LMPs include a 4-symbol LCSTART sequence (three SLC symbols and one EPF), a repeated 2-symbol Link Command Word (11-bit command + 5-bit CRC-5), and a 14-byte header with a Force_LinkPM_Accept bit to override rejections. Commands such as LGO_U1 or LGO_U2 request entry (with no data payload), LAU accepts, LXU rejects, and LPMA confirms after acceptance, governed by timers like PM_LC_TIMER (3 µs) and PM_ENTRY_TIMER (6 µs). Devices report support via U1_ENABLE (selector 48) and U2_ENABLE (selector 49) features in GetStatus, with timeouts configured using SetPortFeature (e.g., PORT_U1_TIMEOUT: 1-127 µs steps). Hubs propagate requests but reject if traffic is pending, buffering packets until U0 resumption, and integrate with Latency Tolerance Messages (LTM) for BELT values (default 1 ms) to optimize idle tolerance. This ensures energy efficiency without disrupting isochronous flows.[26][26]
Speed Negotiation
Speed negotiation in USB communications establishes the operating speed between a host and device during initial connection and subsequent events, ensuring compatibility across low-speed (1.5 Mb/s), full-speed (12 Mb/s), high-speed (480 Mb/s), SuperSpeed (5 Gb/s), and higher rates in later versions. This process begins with a bus reset and uses specific signaling sequences to detect and confirm the highest mutually supported speed, preventing mismatches that could lead to communication failures. The negotiation is version-specific but builds on shared principles of reset detection and response handshakes, with fallback mechanisms to lower speeds if higher capabilities are not verified.[6][26]
In USB 2.0, the initial connection starts with a reset sequence where the host or hub drives a Single-Ended Zero (SE0) state on the differential pair for at least 10 ms (T_DRST ≥ 10 ms). High-speed capable devices detect this reset after a minimum of 2.5 µs (T_FILTSE0) and respond with a chirp-K signal, a low-frequency differential pulse lasting 1 to 7 ms (T_UCH ≥ 1.0 ms, T_UCHEND ≤ 7.0 ms), to indicate high-speed capability. The host then initiates the high-speed handshake by sending an alternating sequence of chirp-K and chirp-J signals (each 40-60 µs, T_DCHBIT) starting within 100 µs (T_WTDCH) of detecting the device's chirp-K and continuing until 500 µs before the reset ends (T_DCHSE0). Upon receiving at least three chirp-Ks in the sequence (K-J-K-J-K), the device responds with a chirp-J signal lasting at least 1 ms to confirm high-speed support. If the handshake succeeds, the link operates at 480 Mb/s with 125 µs microframes; otherwise, it falls back to full-speed operation using a 1.5 kΩ pull-up on the D+ line and 1 ms frames. This process is detailed in Section 7.1.7.5 of the USB 2.0 specification.[6]
For USB 3.x, SuperSpeed detection integrates with the USB 2.0 reset but extends it using Low-Frequency Periodic Signaling (LFPS) at approximately 20 MHz to probe for enhanced capabilities after the full-speed fallback. During the post-reset phase (0-1 ms, tCheckSuperSpeedOnReset), the host sends LFPS bursts (duration ≥100 µs, inter-burst gap ≥2 ms) in the Polling.LFPS substate of the Link Training and Status State Machine (LTSSM). A SuperSpeed-capable device responds with at least two LFPS bursts, prompting the host to send four more, completing the handshake within 12 ms and transitioning to SSK/SSJ states (SuperSpeed equivalents of K/J) within 20 ns after LFPS cessation. Successful negotiation establishes a 5 Gb/s dual-simplex link after training sequences (TS1/TS2 ordered sets) and Port Capability Link Management Packets (LMPs) confirm the speed, with fallback to USB 2.0 if no response is detected within 360 ms timeout. LFPS also supports entry/exit from low-power states like U1/U2 (latencies ~1-2 µs) and U3 suspend (up to 20 ms), where resume triggers partial re-negotiation via LFPS bursts (≥10 µs for U3 wakeup) followed by abbreviated link training to restore the prior speed without full reset. This is outlined in Sections 6.9, 7.5.4.3, and 10.16.2 of the USB 3.0 specification.[26]
Re-negotiation occurs during suspend/resume cycles or in USB Type-C configurations involving port swaps, ensuring speed adaptation to changing conditions. On resume from USB suspend (e.g., U3 state), devices monitor CC pin voltages (vRd) and use LFPS to signal wakeup, followed by CC re-detection and link re-training to re-establish the negotiated speed, maintaining power delivery (e.g., 1.5 A or 3 A) during the transition as per USB suspend rules. In USB Type-C, port role swaps (PR_Swap for power roles or DR_Swap for data roles) via USB Power Delivery (PD) messaging prompt re-evaluation: after swap stabilization, the system re-detects connection via CC pins, reconfigures data lanes, and performs enumeration or abbreviated training to renegotiate speed, potentially falling back if cable or device limits are exceeded. This process, detailed in Sections 4.5, 4.6.1, and 5.4.4.2 of the USB Type-C Release 2.0 specification, avoids full resets by leveraging existing PD contracts.[27]
USB 4 (specification released 2019) extends negotiation to integrate PCIe and Thunderbolt protocols, supporting up to 40 Gb/s via dual-lane aggregation (20 Gb/s per lane mandatory, 40 Gb/s optional). Initial connection uses the Sideband Channel (two-wire interface) alongside LFPS for link initialization, where the Connection Manager enumerates routers and configures paths after basic LFPS handshakes establish electrical idle and receiver detection. Speed is negotiated dynamically during this phase, tunneling USB 3.x, PCIe (e.g., via DN/UP adapters for tree maintenance), and DisplayPort over the USB Type-C link, with bandwidth allocation (e.g., 2/3 to USB 3.x, 1/3 to PCIe initially) adjusted via control packets. Events like hot-plug or role swaps trigger re-entry into discovery, renegotiating up to 40 Gb/s using sideband signaling for management without interrupting tunneled traffic. This integration is described in the USB4 Version 1.0 system overview from the USB Implementers Forum.
USB 3.x and Later Enhancements
USB 3.0 introduced the SuperSpeed physical layer (PHY), which employs separate differential pairs for transmit (TX) and receive (RX) paths to enable full-duplex communication at up to 5 Gbit/s, contrasting with the half-duplex nature of prior USB versions. This design utilizes 8b/10b encoding to ensure DC balance and reliable clock recovery over copper cables up to 3 meters in length.[28] The link layer in USB 3.0 adds header packets (HPs) that precede data or handshake packets, carrying essential routing and control information, while implementing credit-based flow control to manage buffer availability—devices advertise credits for incoming headers, preventing overflow with up to four unacknowledged headers in a circular buffer. Retry mechanisms further enhance reliability by retransmitting errored headers via link commands like LRTY (Link Retry).[29]
At the protocol layer, USB 3.0 enhances bulk transfers with streams, allowing up to 16 logical streams per endpoint to multiplex independent data flows without endpoint proliferation, improving efficiency for applications like mass storage. Isochronous transfers gain support for precise timing via Isochronous Timestamp Packets (ITPs), broadcast by the host every 125 μs to synchronize real-time data streams such as audio or video, replacing USB 2.0's Start of Frame packets. Power management introduces link states U0 (active), U1/U2 (low-power idle with fast/slow exit latencies of <10 μs and <100 μs, respectively), and U3 (suspend), with Low-Frequency Periodic Signaling (LFPS) bursts facilitating entry and exit without full link retraining.[30][29][31]
Subsequent revisions build on these foundations for higher performance. USB 3.1 Gen 2 doubles the speed to 10 Gbit/s using the same PHY infrastructure but with refined equalization, while USB 3.2 introduces Gen 2x2 for 20 Gbit/s via dual 10 Gbit/s lanes and shifts to 128b/132b encoding, reducing overhead from 20% to ~3% for greater effective throughput. USB4, released in 2019 and updated to version 2.0 in 2022, supports up to 80 Gbit/s per direction symmetrically (160 Gbit/s aggregate bidirectional) or asymmetrically up to 120 Gbit/s in one direction with 40 Gbit/s in the other, enabling dynamic bandwidth allocation. It introduces tunneling for multiple protocols, including USB 3.2, DisplayPort (up to version 2.0 for 8K displays), and PCIe (up to Gen 4 for external GPUs or storage), with intelligent routing via a connection-oriented architecture. USB4 also incorporates role swapping for data and power roles, allowing seamless host-device transitions in dual-role ports.[32][33][34]
Transactions and Data Transfers
Control Transactions
Control transactions in USB communications, also referred to as control transfers, serve as the primary mechanism for host-device interaction, enabling device enumeration, configuration, status inquiries, and command issuance over endpoint 0 (the default control endpoint). These transfers are inherently bidirectional, allowing data flow in either direction as needed, and are essential for initializing and managing USB devices before higher-level data transfers commence. Unlike other transfer types, control transactions guarantee reliable delivery through handshakes and error protocols, ensuring robust communication for critical setup operations.[35]
A control transaction follows a structured three-phase sequence: the SETUP phase, an optional DATA phase, and the STATUS phase. In the SETUP phase, the host transmits an 8-byte SETUP token packet followed by a DATA0 data packet containing the request details, structured as follows: bmRequestType (1 byte, specifying direction, type, and recipient), bRequest (1 byte, the request code), wValue (2 bytes, request-specific value), wIndex (2 bytes, index or language ID), and wLength (2 bytes, number of bytes in the DATA phase). This phase always occurs and defines the parameters for the entire transfer. The DATA phase, if required by the request, follows immediately and can transfer up to 65,535 bytes of data (as specified by wLength), using one or more DATA0 or DATA1 packets depending on the toggle sequence, with the direction determined by bmRequestType. Finally, the STATUS phase confirms successful completion with a zero-length data packet in the opposite direction of the DATA phase (or SETUP if no DATA phase), accompanied by an ACK handshake from the recipient to indicate acceptance.[36][37]
Standard control requests, defined in the USB specification, include operations such as Get Descriptor (retrieves device, configuration, or interface descriptors), Set Address (assigns a unique address to the device post-initial detection), and Clear Feature (disables a specific feature like endpoint halt or remote wakeup). These requests are primarily enumerated during the device initialization process, which begins with a bus reset issued by the host to set the device to its default address (0) and speed, followed by a sequence of control transactions to gather descriptors and configure the device. For instance, after reset, the host issues Get Descriptor to obtain the device descriptor, then Set Address to reconfigure the device for further communication.[38][39]
Error handling in control transactions ensures reliability; if a device encounters an invalid request or cannot process it (e.g., due to unsupported features or resource constraints), it responds with a STALL handshake packet in the affected phase, signaling the host to halt the transfer. The host then issues a CLEAR_FEATURE request to clear the stall condition before retrying. Timeouts occur if no response is received within specified limits (typically 5 seconds for the entire transfer), after which the host may attempt up to 3 retries for transaction errors like CRC failures before escalating to a stall or aborting the operation.[36][40]
In USB 3.x and later specifications, control transactions retain the core three-phase structure but incorporate SuperSpeed enhancements, including extended 16-byte packet headers for improved flow control, credit management, and sequence numbering to support higher throughput and error recovery without altering the fundamental SETUP-DATA-STATUS flow. These extended headers encapsulate the original USB 2.0 packet types (tokens, data, and handshakes) within a SuperSpeed envelope for backward compatibility and enhanced performance.
Bulk and Interrupt Transactions
Bulk transfers in USB are designed for reliable, non-time-critical movement of large data volumes, such as file transfers or printer data, without reserving bandwidth in advance.[41] A bulk OUT transaction begins with the host sending an OUT token packet to specify the endpoint, followed by a data packet containing the payload, and concludes with a handshake packet from the device acknowledging receipt (ACK), indicating successful reception, or negative acknowledgment (NAK) if the device is not ready, triggering a retry by the host.[21] Similarly, a bulk IN transaction starts with an IN token from the host, prompting the device to send a data packet, which the host then acknowledges with a handshake; retries ensure guaranteed delivery until completion or error.[41] This retry mechanism on NAK responses provides error recovery without upper-layer protocol involvement, distinguishing bulk from other transfer types.[6]
Interrupt transfers mirror the structure of bulk transfers—using token, data, and handshake packets—but are intended for small, periodic data exchanges from devices requiring timely polling, such as keyboards, mice, or game controllers reporting status changes.[42] The host polls the interrupt endpoint at an interval defined by the device during enumeration, typically ranging from 1 to 255 milliseconds for full-speed connections, ensuring low-latency notification without dedicating continuous bandwidth.[43] Like bulk, interrupt IN transactions involve the host issuing an IN token, the device responding with data if available (or zero-length if none), and a host handshake; OUT variants follow the reverse flow.[41] Both directions support flow control via NAK handshakes to delay transfers until the endpoint's buffer is ready, while STALL handshakes signal endpoint errors or halts, requiring host intervention to clear.[21]
The maximum packet size for bulk and interrupt endpoints varies by USB speed and transfer type: up to 64 bytes at full speed (12 Mbps) for both; 512 bytes for bulk and up to 1024 bytes for interrupt at high speed (480 Mbps); 1024 bytes for both at SuperSpeed (5 Gbps).[41][6]
| USB Speed | Maximum Packet Size (Bulk) | Maximum Packet Size (Interrupt) |
|---|
| Full Speed | 64 bytes | 64 bytes |
| High Speed | 512 bytes | 1024 bytes |
| SuperSpeed | 1024 bytes | 1024 bytes |
In USB 3.x and later, bulk transfers are enhanced with streams, allowing multiple logical data streams to multiplex over a single endpoint using stream identifiers in packet headers, reducing protocol overhead from frequent endpoint switches and improving throughput for applications like mass storage.[44]
Isochronous Transactions
Isochronous transactions in USB communications are optimized for real-time applications requiring bounded latency and predictable bandwidth, such as streaming audio and video, where data delivery timing takes precedence over guaranteed reliability. These transfers ensure that data is made available at consistent intervals, typically every 1 ms frame in full-speed and low-speed modes or every 125 μs microframe in high-speed modes, without mechanisms for error acknowledgment or retransmission.[45]
The structure of an isochronous transaction is limited to an IN or OUT token packet followed immediately by a data packet, omitting any handshake response to minimize latency and preserve bus timing. Upon detection of errors, such as CRC failures, affected data packets are silently dropped by the hardware, with no retry or notification to the host, allowing the transfer to proceed to the next scheduled interval without disruption.[21][46][47]
Bandwidth for isochronous endpoints is pre-allocated by the host during device enumeration based on the endpoint descriptor's specifications, ensuring dedicated capacity for the transfer's duration. In high-speed USB, up to 80% of the bus bandwidth may be reserved for periodic transfers like isochronous, preventing contention from other traffic types and supporting sustained data rates necessary for multimedia streams. High-speed operation divides each 1 ms frame into eight 125 μs microframes, enabling finer scheduling granularity and higher effective throughput for isochronous data, such as up to three transactions per microframe in advanced configurations.[48][21][49]
USB 3.x SuperSpeed enhancements introduce Isochronous Timestamp Packets (ITPs), broadcast periodically to provide precise timing references across the bus topology, which supports synchronization in multi-device scenarios like audio clock recovery for seamless playback. These timestamps help align device clocks with the host, mitigating drift in applications requiring tight temporal coordination. Typical use cases include webcams for video capture and speakers or microphones for audio transmission, where the maximum data payload reaches 1024 bytes per microframe in SuperSpeed mode to accommodate high-resolution streams.[50][51][45][52]