End-to-end delay in computer networks refers to the total time required for a data packet to travel from its source to its destination, encompassing the sum of delays incurred across all traversed links and nodes. This includes processing delay (the time to examine and forward the packet at each intermediate router), queuing delay (the waiting time in buffers due to congestion), transmission delay (the time to serialize the packet onto the link, calculated as packet length divided by link capacity), and propagation delay (the time for the signal to physically traverse the medium, dependent on distance and signal speed).[1]As a fundamental quality-of-service (QoS) metric, end-to-end delay profoundly influences network performance, especially in applications demanding low latency, such as real-time video conferencing, online gaming, and industrial control systems, where excessive delays can degrade user experience or system reliability.[2] Accurate measurement and prediction of end-to-end delay are essential for optimizing routing algorithms, congestion control mechanisms, and resource allocation to meet application-specific requirements.[3] In heterogeneous networks, including wired, wireless, and multi-hop topologies, delay models help evaluate technology suitability for desired QoS levels, guiding decisions on bandwidth provisioning and protocol design.[4]Empirical analyses reveal that end-to-end delays exhibit substantial variability, often characterized by "peak-to-peak" fluctuations where maximum delays significantly exceed minima, primarily due to dynamic factors like traffic bursts, queueing dynamics, and path asymmetries between forward and reverse directions.[5] For instance, one-way transit times in the Internet can vary on timescales from milliseconds to seconds, with queueing delays showing low correlation across paths (e.g., correlations as low as 0.006) and influences from diurnal traffic patterns.[6][5] In wirelesssensor and multi-hop networks, additional contributors such as medium access contention and error-prone channels further amplify delays, necessitating cross-layer optimizations for probabilistic analysis and bound estimation.[7] As of 2025, research emphasizes machine learning techniques, such as recurrent neural networks and transformers, for probabilistic end-to-end delay forecasting and mitigation in 5G and emerging 6G infrastructures.[8]
Overview and Fundamentals
Definition
End-to-end delay is the total time required for a data packet or signal to traverse from the source application layer to the destination application layer in a communication network, including all intermediate network hops and endpoint processing. This encompasses the entire path from the originating application through the network stack, routers, and links, to the receiving application, without considering return traffic.[9]It differs from round-trip time (RTT), which measures the duration for a packet to travel from source to destination and back to the source, effectively doubling the one-way traversal under symmetric conditions; end-to-end delay is typically synonymous with one-way delay, focusing solely on the unidirectional journey. The concept of end-to-end delay was formalized during the development of early packet-switched networks, such as the ARPANET in the late 1960s and 1970s, where Leonard Kleinrock's queueing theory models analyzed message delays to optimize network performance.Mathematically, end-to-end delay is expressed as the sum of individual hop delays plus application-layer processing times at the source and destination:d_{\text{end-to-end}} = \sum_{i=1}^{N} d_{\text{hop},i} + d_{\text{app,source}} + d_{\text{app,dest}}where N is the number of hops, d_{\text{hop},i} is the delay at hop i, and the application terms account for endpoint computations.[6] This aggregate includes contributions from various network delay components along the path.
Importance in Communication Systems
End-to-end delay profoundly influences network performance metrics such as throughput, reliability, and interactivity. In TCP-based communications, higher delay increases the round-trip time (RTT), which directly limits achievable throughput according to the seminal model for TCP congestion avoidance, where throughput approximates the maximum segment size divided by (RTT times the square root of packet loss probability). This relationship means that elevated delays reduce the effective bandwidth, particularly in high-bandwidth-delay product scenarios, leading to suboptimal resource utilization and potential unfairness among flows with varying propagation times. For UDP-based applications, end-to-end delay variations manifest as jitter, disrupting the timely delivery of packets and causing issues like audio choppiness or video desynchronization in real-time streams, as observed in performance evaluations of UDP over packet-switched networks.[10][11][12]The criticality of low end-to-end delay is especially evident in real-time applications, where excessive latency degrades user experience and functionality. In Voice over IP (VoIP) systems, a one-way end-to-end delay exceeding 150 ms impairs natural conversation flow, as callers perceive echoes or unnatural pauses, per the ITU-T G.114 recommendation for acceptable voice transmission time.[13] In low-latency live video streaming, such as interactive broadcasts, end-to-end delays exceeding 400 ms under ideal conditions can lead to noticeable stalls and rebuffering, frustrating viewers and increasing abandonment rates in scenarios where content delivery networks target sub-400 ms latency.[14] These thresholds underscore how delay directly correlates with quality-of-experience (QoE) metrics, making it a key determinant for interactive and multimedia services.Inadequate broadband infrastructure, including latency issues, contributes to economic losses by exacerbating the digital divide and hindering productivity in affected regions, as highlighted in ITU analyses of connectivity gaps.[15] These impacts manifest in reduced e-commerce efficiency, slowed remote work, and limited access to digital services, amplifying inequalities in developing economies.With the advent of 5G and Internet of Things (IoT) ecosystems, the importance of minimizing end-to-end delay has intensified, targeting sub-millisecond latencies for emerging applications. In 5G-enabled IoT, delays below 1 ms are essential for autonomous systems like vehicle-to-everything (V2X) communications, enabling reliable real-time decision-making in safety-critical scenarios such as collision avoidance. This evolution shifts focus from mere connectivity to ultra-reliable low-latency communication (URLLC), where even minor delays can cascade into system failures, driving innovations in network slicing and edge computing to meet these stringent requirements.[16][17]
Delay Components
Propagation Delay
Propagation delay refers to the time required for an electromagnetic signal to travel from the sender to the receiver across the physical transmission medium, determined solely by the distance and the signal's propagation speed in that medium. This delay is independent of the network's traffic load or processing at intermediate nodes, representing a fundamental physical limit in communication systems.[1]The propagation delay \tau_p is calculated using the formula \tau_p = \frac{d}{v}, where d is the physical distance between sender and receiver, and v is the velocity of propagation, given by v = \frac{c}{n} with c as the speed of light in vacuum ($3 \times 10^8 m/s) and n as the refractive index of the medium. For example, in optical fiber with n \approx 1.5, v \approx 2 \times 10^8 m/s, resulting in approximately 1 ms of delay per 200 km of distance.[18][19]The propagation speed varies by medium type, influencing the delay for a given distance. In fiber optic cables, signals propagate at about $2 \times 10^8 m/s due to the refractive index of glass. Copper cables, such as twisted-pair or coaxial, achieve similar effective speeds around $2 \times 10^8 m/s (approximately 0.66c), limited by the dielectric properties of the insulation. Wireless transmission in free space or air approaches the speed of light at $3 \times 10^8 m/s, minimizing delay over the same distance compared to guided media.[20][21]A notable example of high propagation delay occurs in geostationary satellite links, where the satellite orbits at an altitude of 35,786 km, yielding a one-way delay of about 120 ms and a round-trip time of 240–250 ms due to the signal's travel through vacuum. This irreducible minimum delay forms a baseline component of the overall end-to-end delay in such systems.[22][23]
Transmission Delay
Transmission delay refers to the time required to serialize and push all the bits of a data packet onto the physical transmission medium from the sender's network interface.[24] This component of network delay is inherent to the link-layer operation and depends solely on the packet's size and the link's data rate, independent of the distance traveled or other traffic.[25]The transmission delay T_{trans} for a packet is given by the formula:T_{trans} = \frac{[L](/page/L')}{[R](/page/R)}where L is the packet length in bits and R is the bandwidth of the link in bits per second.[24] For example, transmitting a standard 1500-byte Ethernet packet (12,000 bits) over a 10 Mbps link results in a transmission delay of 1.2 ms.[26]This delay occurs at each hop along the network path, accumulating additively in multi-hop scenarios. Larger packets tend to increase the per-packet transmission delay but enhance efficiency by amortizing fixed header overhead over more payload data; for instance, Ethernet's standard maximum transmission unit (MTU) of 1500 bytes balances these trade-offs to optimize throughput without excessive serialization time.[27][28]Transmission delay dominates performance in low-bandwidth environments, such as legacy dial-up or narrowband links, but becomes negligible on modern high-speed infrastructures like 100 Gbps fiber optic connections, where it drops below 0.1 μs for a 1500-byte packet.[29] As one component in the total end-to-end delay, transmission delay sums across all traversed links to influence overall packet delivery time.[24]
Queuing Delay
Queuing delay refers to the time a packet spends waiting in the output queue of a router or switch before it can be transmitted onto the next link.[30] This delay arises when incoming traffic exceeds the immediate transmission capacity, causing packets to accumulate in finite buffers.[31]A fundamental model for analyzing queuing delay is the M/M/1 queue, which assumes Poisson arrivals at rate [\lambda](/page/Lambda) and exponential service times at rate [\mu > \lambda](/page/Lambda). In this model, the average delay experienced by a packet in the system is given by \frac{1}{\mu - \lambda}.[1]Little's Law complements this analysis by relating the average number of packets in the queue L to the arrival rate [\lambda](/page/Lambda) and average delay W, via the equation L = \lambda W, providing a tool to predict queue buildup under steady-state conditions.[32]Several factors influence queuing delay, including bursty traffic patterns where short-term spikes in arrival rates overwhelm link capacity, leading to temporary queue growth.[33] Over-subscription, where aggregate input bandwidth exceeds output capacity, exacerbates this by creating persistent contention for shared resources.[34] In tail-drop queuing disciplines, which discard arriving packets when buffers fill, worst-case scenarios occur during congestion collapse, where synchronized losses trigger global throughput reductions and prolonged queue oscillations.[35]Queuing delay can range from 0 ms in an empty queue to several seconds in severely overloaded networks, introducing significant variability to overall end-to-end delay. Protocols like TCP Reno indirectly mitigate this by detecting queue overflows through packet losses and reducing the sender's congestion window to alleviate buffer pressure.[36]
Processing Delay
Processing delay refers to the time required by a network node, such as a router or switch, to examine a packet's header, perform necessary computations like routing lookups, modify the packet if needed, and prepare it for forwarding.[37] This delay is deterministic and occurs for every packet at each hop, independent of network load.[38]Key components of processing delay include the routing table lookup, which in modern application-specific integrated circuits (ASICs) typically takes 10-100 nanoseconds due to fast memory access times.[39] Additional time is spent on header validation and serialization preparation, contributing to a total processing delay of 1-10 microseconds per hop in hardware-based routers.[39] For example, low-latency switches achieve latencies below 500 nanoseconds in cut-through mode.[39]Processing delay varies significantly between hardware and software implementations; hardware routers using ASICs process packets in the microsecond range, while software routers can incur delays in the millisecond range due to general-purpose CPU overhead.[37] In BGP routing, larger routing table sizes require more memory but generally do not increase per-packet lookup times in optimized hardware, as long as updates are infrequent.[40]Although often negligible individually, processing delay accumulates over multiple hops in a path, contributing a small but consistent portion to the overall end-to-end delay.[38]IPv6 introduces slight additional overhead from processing extension headers, which can route packets to slower processing paths and increase latency compared to IPv4.[41]
Measurement Methods
Direct Measurement
Direct measurement of end-to-end delay involves actively sending probe packets across a networkpath and capturing their transit times to obtain empirical data on latency. These methods typically rely on timestamping at the source and destination, often requiring synchronized clocks for precision, and focus on real-time observation rather than modeling. Common approaches approximate one-way delay from round-trip time (RTT) measurements or use specialized protocols for unidirectional assessment, providing insights into the total delay as the sum of propagation, transmission, queuing, and processing components.One fundamental technique is the use of the ping utility, which employs ICMP echo request and reply messages to measure RTT—the time from sending a packet to receiving its response.[42] This RTT serves as an approximation for twice the one-way delay under the assumption of symmetric forward and reverse paths, where the one-way delay is estimated as half the RTT value.[43]Ping is simple and widely available, offering millisecond-level resolution suitable for basic end-to-end latency checks in IP networks.[44]For a more granular view, traceroute (or tracert on Windows) breaks down the end-to-end path into per-hop latencies by sending packets with incrementally increasing time-to-live (TTL) values, prompting ICMP time-exceeded responses from intermediate routers.[45] Each hop's RTT is recorded, allowing summation to approximate the total end-to-end delay, though it introduces additional overhead due to multiple probe packets per hop.[46] This method helps identify delay hotspots along the route but can be slower and less precise for overall end-to-end metrics compared to direct probes.[47]Tools like iPerf enable direct measurement of end-to-end delay over UDP streams by generating controlled traffic between a client and server, reporting statistics such as mean, minimum, and maximum one-way latency in milliseconds with microsecond resolution.[48]iPerf's UDP mode simulates application-like traffic, capturing delay jitter and packet loss alongside latency, making it valuable for assessing performance in bandwidth-constrained or real-time scenarios.[49]Packet analyzers such as Wireshark facilitate delay measurement by capturing packets with high-resolution timestamps, typically at the interface level, enabling post-capture analysis of transit times between endpoints.[50]Timestamp precision can reach microseconds when using GPS-synchronized clocks or hardware timestamping, though accuracy depends on the operating system's capture mechanism and may introduce minor offsets from kernel processing.[51] This approach is passive in analysis but requires active traffic generation for end-to-end traces.To ensure accurate one-way delay measurements, clock synchronization protocols like the Network Time Protocol (NTP) are essential, aligning sender and receiver clocks to UTC with sub-millisecond accuracy in well-configured setups.[52] NTP exchanges timestamped packets to compute offsets and delays, mitigating errors from clock skew in distributed measurements.[53]For more precise one-way delay assessment, the Two-Way Active Measurement Protocol (TWAMP), defined in RFC 5357, extends one-way probing by incorporating bidirectional timestamps from both sender and reflector, allowing calculation of unidirectional delays without assuming path symmetry.[54] TWAMP supports microsecond-level granularity and is widely used in service provider networks for performance monitoring, including delay variation and loss.[55]Direct measurement techniques have inherent limitations, including the assumption of symmetric paths for RTT-based one-way approximations, which fails in asymmetric routing scenarios leading to inaccurate estimates.[56] Additionally, probe traffic introduces measurement overhead that can artificially inflate delays or congest low-bandwidth links, necessitating careful test design to minimize interference.[57] TWAMP addresses some issues through standardization but still requires synchronized clocks for reliability.[54]
Estimation Techniques
Estimation techniques for end-to-end delay involve mathematical modeling and simulation to predict delays indirectly, without requiring direct network probing or full access to the infrastructure. These methods provide bounds or approximations based on traffic characteristics and network parameters, enabling proactive network design and management. Network calculus, a deterministic approach, uses arrival curves to characterize traffic burstiness and service curves to model minimum guarantees, yielding upper bounds on delays across network paths.[58]In network calculus, the worst-case delay bound for a flow with burstiness b and sustained rate r, served by a link with minimum rate R > r and zero latency, is given by \frac{b}{R - r}, representing the time to clear the burst at the excess service rate. For probabilistic estimates, queueing theory approximations like the Kingman formula for a G/G/1 queue provide the mean waiting time W_q \approx \frac{c_a^2 + c_s^2}{2} \cdot \frac{\rho}{1 - \rho} \cdot \frac{1}{\mu}, where c_a^2 and c_s^2 are squared coefficients of variation for arrival and service times, \rho is utilization, and \mu is the service rate; end-to-end delays can be aggregated over hops using such approximations. Simulation tools like ns-3 model network topologies and protocols to estimate delays by replaying traffic scenarios and computing statistics such as average and maximum end-to-end latency.[58][59][60]In software-defined networking (SDN), controllers estimate end-to-end delays using flow table statistics and topology information to optimize routing decisions without real-time measurements. Machine learning approaches, such as neural networks trained on historical traffic data, predict delays by analyzing patterns in metrics like link utilization and packet loss, often sourced from protocols like SNMP for input features. These predictions support dynamic resource allocation in data centers.[61][62]Such techniques are particularly valuable in cloud networking, where Google's B4 wide-area network employs SDN-based path selection informed by delay estimates to balance loads and enhance overall performance, achieving up to 100% throughput gains over traditional routing that indirectly reduce queuing delays. Estimates can be validated against direct measurements for accuracy in deployment.[63]
Influencing Factors and Mitigation
Network and Traffic Factors
Network topology significantly influences end-to-end delay through the number of hops packets must traverse. In a star topology, data typically passes through a central hub, resulting in fewer hops—often just two between endpoints—which minimizes propagation and processing overhead compared to mesh topologies where paths may involve multiple intermediate nodes, increasing the diameter and potential delay. For instance, full mesh networks provide direct connections but scale poorly, while partial meshes can introduce variable hop counts that elevate average end-to-end delays, particularly in dense IoT environments.[64][65]Routing protocols further modulate delay during network changes. The Open Shortest Path First (OSPF) protocol, a widely used link-state routing method, experiences convergence transients that add temporary delays as routers recalculate paths after failures or updates, typically ranging from 50 milliseconds to 200 milliseconds in enterprise networks before stabilizing. These transients arise from the time required for link-state advertisements to propagate and shortest-path computations to complete across the topology.[66]Traffic load variability is a primary driver of fluctuating end-to-end delays, with peak-hour spikes often causing substantial increases due to resource contention. During high-traffic periods, such as business hours, network utilization can surge, leading to peak-to-peak delay variations where maximum latencies exceed minima by factors of 10 or more in operational Internet paths. This variability is exacerbated in shared infrastructures, where bursty traffic from applications like video streaming or cloud services overwhelms links.[57][67]Protocol overhead also contributes to delay differences across transport layers. Transmission Control Protocol (TCP) incurs higher latency than User Datagram Protocol (UDP) due to its reliability mechanisms, including acknowledgments that impose an additional delay of approximately half the round-trip time (RTT) per segment while awaiting confirmations. In contrast, UDP's connectionless nature avoids such overhead, enabling lower end-to-end delays in latency-sensitive scenarios like real-time gaming, though at the cost of potential packet loss.[68]In emerging 5G networks, slicing introduces configurable virtual networks that can yield variable end-to-end delays of sub-millisecond latencies, often under 1 ms for low-latency slices, with variations based on resource allocation and slice type (e.g., URLLC vs. eMBB). This variability stems from dynamic resource isolation, where slices compete for radio access and core network capacity, potentially amplifying delays under heterogeneous loads.[69]Edge computing mitigates some delay influences by localizing data processing near the source, thereby reducing the effective path length and associated propagation times in distributed systems. By offloading computations from distant cloud servers to edge nodes, it can cut end-to-end delays by orders of magnitude in applications like autonomous vehicles, where milliseconds matter.[70]A notable characteristic of Internet paths is asymmetry, where one-way delays from sender to receiver can differ from the reverse by up to 20%, as observed in measurements of commercial networks, leading to inconsistent round-trip experiences. These factors, including topology and trafficdynamics, can briefly amplify queuing delays on bottleneck links during transients.[71]
Strategies for Delay Reduction
Strategies for reducing end-to-end delay in networks primarily target buffering and scheduling mechanisms, path optimization techniques, protocol enhancements, and specialized implementations in emerging technologies like 5G. These approaches address key delay components such as queuing and propagation by prioritizing critical traffic, selecting efficient routes, and streamlining data transmission processes.[72]Buffering and scheduling strategies focus on mitigating queuing delays through active queue management (AQM) algorithms and priority-based handling. The Controlled Delay (CoDel) AQM algorithm monitors the minimum queue delay over a recent interval and drops packets when this delay exceeds a target threshold, significantly reducing queuing delays in congested scenarios compared to traditional drop-tail queues.[73]CoDel's design avoids the need for parameter tuning, making it suitable for dynamic networks, and has been standardized in RFC 8289. For real-time applications like Voice over IP (VoIP), priority queuing assigns higher precedence to voice packets, ensuring they experience minimal wait times in routers and reducing overall end-to-end delay by limiting exposure to lower-priority traffic bursts.[74] This technique is particularly effective in enterprise networks where VoIP traffic must maintain low latency for acceptable call quality.[75]Path optimization techniques minimize propagation and routing delays by directing traffic along shorter or less congested paths. Multiprotocol Label Switching Traffic Engineering (MPLS-TE) enables explicit routing, where label-switched paths are precomputed to meet bandwidth and delay constraints, optimizing resource allocation and reducing end-to-end delay in backbone networks.[76] Content Delivery Networks (CDNs), such as Akamai, further cut propagation delays by caching content at edge servers closer to users, achieving global latency savings of approximately 100 ms through reduced round-trip times and minimized peering congestion.[77] These methods are widely adopted for web and streaming services to enhance user experience under varying traffic loads.Protocol tweaks at the transport layer streamline connection establishment and multiplexing to lower setup and transmission delays. The QUIC protocol, standardized in RFC 9000, integrates TLS 1.3 encryption and supports multiplexing over a single connection, reducing the handshake from 1-3 round-trip times (RTTs) in TCP/TLS to 0-1 RTT via 0-RTT resumption, thereby decreasing initial latency for web applications. QUIC's UDP-based design also mitigates head-of-line blocking, improving overall throughput and delay in lossy networks.[78]In 5G networks, Ultra-Reliable Low-Latency Communications (URLLC) mode implements delay reduction through mini-slot scheduling and beamforming, targeting end-to-end latencies of 1 ms while maintaining 99.999% reliability. These features were introduced in 3GPP Release 16 in 2020, enhanced in Releases 17 and 18 (up to 2024), with further developments in Release 19 as of 2025 for industrial and vehicular applications.[79][80]
Applications and Analysis
Role in Real-Time Systems
In real-time systems, end-to-end delay plays a critical role as a key constraint in time-sensitive applications, where exceeding tolerable thresholds can compromise functionality and safety. For Voice over Internet Protocol (VoIP), the ITU-T G.114 recommendation specifies that one-way delays should not exceed 150 ms to maintain acceptable conversational quality, as higher latencies degrade perceived naturalness and increase user frustration.[81] Similarly, online gaming demands low end-to-end delays of 50-100 ms to ensure responsive interactions, preventing lag that disrupts competitive play and immersion.[82] In telemedicine, particularly for applications involving haptic feedback, delays must remain below 20 ms to avoid perceptible asynchrony between user actions and system responses, enabling precise remote procedures.[83]Challenges in managing end-to-end delay within these systems often stem from jitter, the variation in packet arrival times, which can lead to out-of-order packet delivery and require additional buffering that further increases latency.[84]Clock skew in distributed real-time systems exacerbates these issues by introducing discrepancies in time synchronization across nodes, complicating accurate delay measurements and event ordering essential for coordinated operations.[85] To address such constraints, systems like WebRTC employ jitter buffers and strategic delay management to optimize performance in peer-to-peer communications.[86]In 5G networks, ultra-reliable low-latency communication (URLLC) targets end-to-end delays below 1 ms for applications like factory automation and remote control, supporting industrial IoT with high reliability.[87] A notable example of stringent requirements appears in vehicle-to-everything (V2X) communications for autonomous vehicles, where ETSI standards mandate end-to-end latencies below 10 ms for critical use cases such as cooperative collision avoidance to enable reliable safety maneuvers.[88]
Performance Modeling
Performance modeling of end-to-end delay in computer networks typically treats the total delay as the aggregate of individual component delays encountered along the packet's path, including propagation, transmission, processing, and queuing delays. This additive decomposition allows for modular analysis, where the overall delay is modeled as D = \sum_{i=1}^{n} D_i, with each D_i representing the delay at hop i. Such models are foundational in queueing theory applied to networks, enabling predictions of average and distributional properties of end-to-end performance.[1]Stochastic models further refine this by incorporating randomness in arrivals and service times, often assuming Poisson processes for packet arrivals to capture bursty traffic patterns common in IP networks. Under the M/M/1 queueing assumption per hop, the expected queuing delay integrates into the end-to-end model, yielding closed-form expressions for mean delay as E[D] = \sum_{i} \left( \frac{1}{\mu_i - \lambda_i} \right), where \lambda_i is the arrival rate and \mu_i the service rate at hop i. For variance, when component delays are independent, the total delay variance is the sum of individual variances:\mathrm{Var}(D) = \sum_{i=1}^{n} \mathrm{Var}(D_i).This property, derived from the independence of random variables in stochastic processes, facilitates jitter analysis critical for real-time applications.[89]Simulation tools like OMNeT++ are employed to validate these models in multi-hop scenarios, where analytical tractability diminishes due to complex topologies and interactions. OMNeT++'s INET framework supports discrete-event simulations that compute end-to-end delay statistics, such as mean lifetime per packet, across wireless or wired networks with realistic protocols like TCP/IP. These simulations reveal how multi-hop routing amplifies delay variability.[9]For large-scale networks, fluid models approximate packet-level dynamics with continuous flows, reducing computational complexity while preserving key behaviors like congestion buildup. These deterministic or stochastic fluid approximations solve differential equations to predict end-to-end delay distributions.[90] In advanced applications, machine learning techniques, such as LSTM networks, enhance modeling by predicting delay anomalies like sudden spikes from historical traces; for instance, attention-enhanced LSTMs in mobile edge computing achieve up to 90% prediction accuracy at high signal-to-noise ratios.[91]End-to-end delay modeling has informed key standards, notably RFC 3390, which uses the bandwidth-delay product to justify increasing TCP's initial window for high-latency paths, balancing throughput and delay. Recent 2025 IETF drafts extend this to AI-driven networks, incorporating delay bounds in deterministic services for AI inference and edge computing.[92]