Fact-checked by Grok 2 weeks ago

Clos network

A Clos network is a multistage interconnection network architecture that enables non-blocking connectivity between a large number of inputs and outputs using a series of smaller crossbar switches arranged in multiple stages, originally designed to optimize telephone switching systems by reducing the total number of crosspoints. The architecture was first invented by Edson Erwin in 1938 and formalized by Charles Clos, a researcher at Bell Laboratories, and detailed in his seminal 1953 paper "A Study of Non-Blocking Switching Networks" published in the Bell System Technical Journal, the architecture ensures that any input can connect to any output without interference under specified conditions, making it highly efficient for circuit-switched environments.^[1]^[2] The core structure of a Clos network typically consists of three stages: an ingress stage of input switches, a middle stage of interconnecting switches, and an egress stage of output switches. In a standard symmetric configuration for N × N connectivity, the ingress and egress stages each comprise m switches with n ports (N = m × n), while the middle stage has r switches, each with m × m crosspoints, where full-mesh connections link every ingress switch to every middle switch and every middle switch to every egress switch. To achieve strict non-blocking behavior—allowing any unused input to connect to any unused output without reconfiguration—r must be at least 2n - 1, as proven by Clos's theorem; for rearrangeably non-blocking networks, where paths can be rearranged to free connections, r ≥ n suffices. This modular design scales efficiently, as adding stages or switches increases capacity without proportional growth in complexity.^[3]^[2] Key advantages of Clos networks include their scalability, fault tolerance through redundant paths, and cost-effectiveness compared to single large crossbar switches, which would require N² crosspoints versus the Clos's approximately N² / n for large n. In the original telephone context, these properties minimized hardware costs and improved reliability for handling voice traffic. The architecture supports both circuit and packet switching, with non-blocking guarantees reducing latency and jitter in high-demand scenarios.^[4]^[5] In modern applications, Clos networks have been adapted for data center fabrics, particularly in the leaf-spine topology—a folded variant of the three-stage design—where leaf switches connect to servers or endpoints, and spine switches handle inter-leaf routing to support massive east-west traffic in cloud computing and hyperscale environments. This evolution, prominent since the 1990s, enables horizontal scaling by adding leaf or spine layers, often up to five or seven stages for global infrastructures, and integrates with protocols like Ethernet and VXLAN for software-defined networking. Companies like Cisco and NVIDIA deploy Clos-based designs for their predictability and performance in AI workloads and microservices.^[2]^[6]^[4]

History and Background

Invention and Original Purpose

The concept of the Clos network was invented by Edson Erwin in 1938 and patented in 1941 (US Patent 2,244,004).^[7] Charles Clos, an engineer at Bell Telephone Laboratories, developed the Clos network in the early 1950s to address the challenges of building scalable and cost-effective telephone exchanges amid the rapid expansion of telephony services following World War II.^[8]^[9] During this period, the Bell System experienced significant growth in subscriber demand, with millions of new telephone lines installed annually, necessitating larger switching systems capable of handling increased call volumes without proportional cost increases.^[9]^[10] In his seminal 1953 paper, "A Study of Non-Blocking Switching Networks," published in the Bell System Technical Journal, Clos outlined the motivation to minimize the number of crosspoints— the electromechanical contact points essential for routing calls—while ensuring non-blocking connectivity in telephone switching arrays.^[11]^[12] Single-stage crossbar switches, the prevailing technology at the time, suffered from high costs due to their requirement of approximately N^2 crosspoints for N inputs and outputs, making them impractical for large-scale urban exchanges serving thousands of lines.^[11]^[13] The core innovation of the Clos network was a multi-stage architecture composed of smaller crossbar switches interconnected across input, middle, and output stages to interconnect inputs and outputs more efficiently.^[11] Clos introduced notation where n represents the number of inputs (or outputs) per switch in the input and output stages, and m denotes the number of middle-stage switches, allowing for a total of N = n \times k connections (with k being the number of input/output stage switches) while drastically reducing the overall crosspoint count—for instance, a three-stage network for N=36 with n=6 and m=11 middle switches required only 1,188 crosspoints compared to 1,296 for a single-stage equivalent.^[11] This design was specifically tailored for telephone systems, enabling reliable path establishment from any idle inlet to any idle outlet irrespective of existing connections.^[11]

Development and Key Milestones

In the 1960s and 1970s, Clos networks transitioned from analog telephony applications to digital switching systems, integrating with time-division multiplexing (TDM) techniques and early stored-program control architectures to handle digitized voice traffic more efficiently.^[14] This shift was driven by advancements in pulse-code modulation (PCM) and the need for scalable digital exchanges. During the 1980s and 1990s, Clos networks gained prominence in asynchronous transfer mode (ATM) switching fabrics, where their multistage design supported high-speed packetized data for emerging broadband services.^[15] Major telecommunications vendors, including AT&T and NEC, implemented Clos-based ATM switches to meet the demands of integrated services digital network (ISDN) extensions, enabling nonblocking connections for variable-rate traffic.^[16] A key theoretical advancement in this era involved adapting Clos structures for optical implementations using wavelength-division multiplexing (WDM), first explored in research prototypes around 2000 to leverage fiber-optic capacities for terabit-scale routing.^[17] From the 2000s onward, Clos networks experienced a revival in packet-switched environments, particularly within data center infrastructures, where their scalability addressed the explosion of Ethernet-based traffic.^[14] In the 2010s, hyperscale operators adopted Clos-derived leaf-spine topologies for nonblocking Ethernet fabrics; for instance, Cisco's Nexus series and Arista's EOS platforms deployed multi-tier Clos designs supporting up to hundreds of thousands of ports with low latency, powering cloud computing at companies like Google and Facebook.^[18]^[19] As of 2025, Clos networks continue to evolve through integration with software-defined networking (SDN) controllers and AI-driven optimization algorithms, enhancing dynamic path selection and fault tolerance in AI training clusters and edge computing deployments.^[20] These advancements, often realized in optical Clos variants, enable real-time traffic engineering in environments handling exabyte-scale data flows for machine learning workloads.^[21]

Fundamental Topology

Three-Stage Architecture

The three-stage Clos network is a multistage interconnection topology designed to connect N inputs to N outputs, where N = n², using smaller crossbar switches arranged in input, middle, and output stages. The input stage consists of n switches, each of size n × m, providing n inputs and m outputs per switch. The middle stage comprises m switches, each of size n × n. The output stage includes n switches, each of size m × n, with m inputs and n outputs per switch.^[22]^[3] Interconnections between stages are structured as full bipartite graphs: each of the n input-stage switches connects to all m middle-stage switches via dedicated links, and similarly, each middle-stage switch connects to all n output-stage switches. This arrangement enables signal flow from any input port through a selected path: a connection is established by activating a crosspoint in the appropriate input switch to route to a middle switch, then from that middle switch to the target output switch, and finally to the desired output port. The permutation-based connections ensure multiple alternate paths exist between stages, facilitating routing from any input to any output under suitable conditions.^[11]^[22] The total number of crosspoints in the network is given by 3mn², accounting for n × (n m) in the input stage, m × (n n) in the middle stage, and n × (m n) in the output stage. This yields a complexity of O(N^{3/2}), a significant scaling advantage over the O(N²) required for a monolithic crossbar switch of size N × N, as the Clos design distributes the switching across smaller, more manageable components.^[11]^[3] For example, consider a Clos network with n = 4 and m = 5, supporting N = 16 ports. There are 4 input switches (each 4 × 5), 5 middle switches (each 4 × 4), and 4 output switches (each 5 × 4), for a total of 240 crosspoints. A simple routing path might connect input port 1 (on the first input switch) to output port 3 (on the second output switch) by selecting the third middle switch: activate the crosspoint from input port 1 to the third middle output in the first input switch, then from the first middle input to the second output switch in the third middle switch, and finally from the second input to output port 3 in the second output switch. To achieve strict-sense nonblocking operation in such a configuration, m must be at least 2n - 1.^[22]^[11]

Parameters and Scaling

In the symmetric three-stage Clos network, the primary parameters are n, denoting the number of endpoints attached to each ingress or egress switch, and m, the number of switches in the middle stage. There are n ingress switches and n egress switches, yielding a total of N = n^2 endpoints or ports. This parameterization assumes full connectivity between stages, with each ingress switch linking to all m middle switches via dedicated links, and similarly for the egress stage.^[23] The total number of crosspoints k is derived directly from the switch sizes across stages: the n ingress switches each require n \times m crosspoints, the m middle switches each require n \times n crosspoints, and the n egress switches each require m \times n crosspoints, resulting in k = 3n^2 m. Compared to a monolithic crossbar switch needing N^2 = n^4 crosspoints, the Clos design offers significant savings for large N. As N scales with increasing n, crosspoint efficiency improves asymptotically; for instance, with m on the order of n to maintain low blocking, k \approx 3n^3, reducing the relative complexity to O(1/n) of the crossbar's n^4.^[23]^[11] A key trade-off arises in selecting m: larger values decrease blocking probability by providing more parallel paths but elevate cost through additional crosspoints and hardware. For N = 256 (n = 16), setting m = 17 yields k = 3 \times 256 \times 17 = 13{,}056 crosspoints, versus $65{,}536 for an equivalent crossbar—a reduction by a factor of about 5.^[23] In contemporary data center deployments, the parameter n is adapted to reflect switch radix, the aggregate port count enabling high fan-out to the middle stage, which supports scalable fabrics using devices with 32–128 ports or more.^[24]

Nonblocking Conditions

Strict-Sense Nonblocking

Strict-sense nonblocking refers to the property of a Clos network where a connection can always be established between any idle input and any idle output without disrupting existing connections or requiring rearrangements, irrespective of the current traffic pattern.^[11] This ensures the network supports full connectivity under all possible occupancy conditions, making it ideal for deterministic performance guarantees.^[11] In a three-stage Clos network with ingress and egress stages each comprising m switches of size n \times r (N = m \times n) and r middle-stage switches each of size m \times m, the condition for strict-sense nonblocking is r \geq 2n - 1.^[11] This theorem, established by Charles Clos in 1953, minimizes the number of crosspoints while preventing blocking.^[11] The minimum value arises from the need to accommodate the worst-case scenario without conflicts. The proof relies on the pigeonhole principle applied to middle-stage switch usage.^[11] Consider establishing a new connection from an input switch to an output switch; in the adversarial case, n-1 inputs on the source switch and n-1 outputs on the destination switch are already connected, potentially occupying up to $2n-2 distinct middle switches.^[11] With r = 2n - 1, at least one middle switch remains available for the new path, avoiding overlap.^[11] This result originated in the context of circuit-switched telephony systems, where Clos aimed to design efficient crossbar alternatives for handling simultaneous voice calls with 100% throughput assurance.^[11] The architecture reduced crosspoint requirements compared to single-stage networks, enabling scalable deployment in early electronic switching exchanges.^[11] The strict nonblocking condition can be expressed as:

r = 2n - 1

for the minimum number of middle-stage switches in a balanced three-stage Clos network.^[11]

Rearrangeably Nonblocking

In a rearrangeably nonblocking Clos network, any permutation of connections between idle inputs and idle outputs can be established, potentially by rearranging the paths of some existing connections, as long as the number of middle-stage switches r satisfies r \geq n, where n is the number of ports per ingress or egress switch.^[25] This property ensures that the network supports full connectivity for any valid request, albeit with possible disruptions to ongoing paths that must be rerouted transparently.^[22] The theoretical basis for rearrangeability in three-stage Clos networks is the Slepian-Duguid theorem, which demonstrates that under the condition r \geq n, a complete matching exists by applying Hall's marriage theorem to model the bipartite graph between ingress and egress switches, treating middle-stage switches as intermediaries for distinct path assignments.^[26] Hall's theorem guarantees a system of distinct representatives for subsets of inputs and outputs, ensuring no subset of ingress switches requires more middle-stage links than available, thus allowing the required permutation to be realized after rearrangement.^[27] The minimum requirement is thus r = n, as derived from the theorem's application to the network's staged structure.^[25] To implement rearrangements, a centralized controller typically computes new path assignments by iteratively solving bipartite matching problems across the stages, often using algorithms like Hopcroft-Karp for efficiency in finding augmenting paths that resolve conflicts.^[28] For instance, in a Clos network with n=4 and r=4, suppose existing connections route inputs from ingress switch 1 to egress switch 2 via middle switch 3, and from ingress switch 2 to egress switch 1 via middle switch 4, blocking a new request from ingress 1 to egress 1 (which would need middle switch 3 but conflicts due to shared egress constraints). The controller can resolve this by swapping the middle-stage assignments—rerouting the first connection via middle switch 4 and the second via middle switch 3—freeing the path for the new connection while preserving all prior endpoints.^[23] Compared to strict-sense nonblocking Clos networks, which require r \geq 2n-1 to avoid any rearrangements and thus provide about twice as many middle-stage switches (and roughly twice the crosspoints in the middle stage), the rearrangeable variant halves this middle-stage complexity at the cost of control overhead for dynamic path recomputation.^[25]

Blocking Analysis

Probability Approximations

In Clos networks that are underprovisioned (i.e., with fewer middle-stage switches than required for nonblocking operation), exact computation of blocking probabilities is complex due to the combinatorial explosion of possible connection states. Approximate methods provide practical estimates under assumptions of random, uniform traffic. One seminal approach is the Lee approximation, introduced by C. Y. Lee in 1955 for analyzing multistage switching networks.^[29] For a three-stage Clos network, the approximation assumes that the m middle switches are independent, with each interstage link occupied with probability p = a/m, where a is the offered load in Erlangs. The probability that a specific path through a middle switch is available is (1 - p)^2, so the blocking probability P_b for a random connection attempt under uniform offered load ρ (in Erlangs per inlet) is

P_b \approx \left[1 - (1 - p)^2 \right]^m,

where p ≈ ρ/n for low loads in symmetric n x n x n Clos with m middle switches (adjusted for exact link utilization). This captures the probability that all m potential paths are blocked. A more refined method is the Jacobaeus approximation, from Carl Jacobaeus's 1950 work on congestion in link systems.^[30] It accounts for dependencies by considering the number of busy inputs i and outputs j on the relevant ingress and egress switches (0 ≤ i, j ≤ n-1). The conditional blocking probability is \beta_{i j} = \sum_{k=\max(0, i+j-m)}^{i} \frac{\binom{m}{k} \binom{i}{k} \binom{j}{i+j-k}}{\binom{m}{i+j-k}}, but a simplified form often used is the probability that at least i + j - m + 1 middle switches are required beyond availability. The overall P_b is the expectation over binomial distributions for i and j:

P_B = \sum_{i=0}^{n-1} \sum_{j=0}^{n-1} f_i g_j \beta_{i j},

where f_i = \binom{n-1}{i} \lambda^i (1-\lambda)^{n-1-i} with \lambda = \rho / n from Erlang-B, and similarly for g_j. This better captures correlations than Lee's independent assumption. Both approximations rely on key assumptions: random routing of connection requests, uniform traffic distribution across inlets and outlets, and modeling of switch crosspoints as independent loss systems governed by the Erlang-B formula for fixed capacity B(k, a) = \frac{a^k / k!}{\sum_{i=0}^k a^i / i!}. These methods assume Poisson arrivals and exponential holding times, leading to binomial distributions for path availabilities. For illustration, consider a Clos network with n=8, m=8 (underprovisioned relative to the nonblocking threshold of 15), and offered load ρ = 0.8 Erlangs per inlet. Using the Lee approximation with p ≈ 2*(0.8/8) = 0.2 (approximating both links), P_b ≈ [1 - (1-0.2)^2]^8 ≈ [1 - 0.64]^8 = 0.36^8 ≈ 0.0005, indicating very low blocking for this load. Despite their historical influence, these approximations have limitations: they overestimate blocking under bursty or non-uniform traffic patterns, as real-world loads violate independence assumptions, and they ignore routing algorithms beyond random selection. Modern analyses often favor Monte Carlo simulations or exact Markov models for high-precision needs in large-scale networks.^[29]

Factors Influencing Blocking

In Clos networks, traffic patterns significantly impact blocking behavior. Uniform traffic, where connections are evenly distributed across inputs and outputs, typically results in lower blocking probabilities compared to nonuniform patterns such as hot-spot traffic, in which a disproportionate volume concentrates on specific outputs, leading to congestion at middle-stage switches.^[31] Bursty traffic, characterized by intermittent high-intensity bursts followed by idle periods, exacerbates blocking even in overprovisioned networks by creating temporary overloads that overwhelm buffering or scheduling mechanisms, reducing overall throughput under real-world workloads.^[32] The symmetric structure of Clos topologies can amplify this burstiness due to the multiplicity of identical-length paths, which synchronize traffic fluctuations and increase contention at shared links.^[33] Routing algorithms play a crucial role in mitigating blocking by influencing path selection and load distribution. Fixed or deterministic routing, which assigns predefined paths without considering current network state, can lead to higher blocking under nonuniform traffic as it fails to balance loads across available middle-stage links.^[34] In contrast, random routing distributes connections probabilistically, offering better average performance but potentially causing hotspots if randomness aligns poorly with traffic demands. Adaptive routing, which dynamically adjusts paths based on congestion feedback, more effectively reduces blocking by rerouting around overloaded links, achieving near-nonblocking behavior in high-radix folded-Clos topologies even with faults or imbalances.^[35] For packet-switched Clos networks, techniques like deflection routing—where packets are rerouted to alternative paths upon encountering congestion—further minimize blocking in bufferless or low-buffer designs, though they are more commonly applied in specialized interconnects rather than general data center fabrics.^[36] Fault tolerance directly affects effective blocking rates in operational Clos networks. A single switch failure in any stage can elevate blocking by reducing path diversity, potentially degrading the network from rearrangeably nonblocking to partially blocking states, as lost links concentrate traffic on surviving paths. Redundancy strategies, such as deploying extra switches per stage or using multi-path routing with failover protocols, enhance resilience; for instance, adding one redundant module per stage allows the network to tolerate isolated failures without reconfiguration, maintaining low blocking under uniform loads. Engineered designs like Microsoft's F10 network demonstrate that proactive path recomputation upon failure can limit packet loss to under 0.1% for brief outages, trading minimal latency for sustained performance.^[37] Oversubscription ratios represent a practical trade-off in Clos network deployment, particularly in cost-sensitive data centers. A common 3:1 oversubscription—where aggregate leaf-to-spine bandwidth is one-third of server-to-leaf capacity—intentionally introduces potential blocking to reduce hardware costs, as full nonblocking would require excessive spine ports.^[38] This ratio balances performance and economics, with blocking remaining acceptable under typical workloads below 50% utilization, though it amplifies issues from bursty or hot-spot traffic.^[39] To evaluate these factors without relying on approximations like the Lee or Jacobaeus models, simulation tools employing Monte Carlo methods provide exact blocking probabilities by generating numerous random connection scenarios and computing outcomes empirically. These approaches are particularly useful for complex traffic patterns or fault scenarios, offering high-fidelity insights into real-world performance without analytical simplifications. In modern data center Clos fabrics, advanced analyses incorporate fluid flow models or machine learning to predict blocking under bursty AI workloads, improving scalability as of 2023.^[40]^[41]

Advanced Variants

Multi-Stage Extensions

The Clos network generalizes to multi-stage architectures beyond the three-stage base case through a recursive construction, where the middle-stage switches of a lower-stage network are replaced by smaller Clos subnetworks of appropriate size, alternating between smaller and larger switch dimensions across stages. This approach allows for scalable designs with an odd number of stages k = 2l + 1, where l represents the recursion depth, enabling larger port counts while maintaining the potential for nonblocking operation. For instance, a five-stage Clos network is formed by substituting the middle stage of a three-stage Clos with another three-stage Clos subnetwork.^[11] In a symmetric k-stage Clos network with edge switch radix n, the total number of ports N scales as N = n^{(k+1)/2} under optimal parameterization for balanced stages, though practical implementations adjust parameters for specific N. The nonblocking condition extends the three-stage case, requiring the number of middle-stage switches m to satisfy m \geq (k-1)(n-1) + 1 for strict-sense nonblocking, ensuring paths can always be established without rearrangement regardless of existing connections. This condition arises from recursive application of Hall's marriage theorem to the bipartite matching graphs at each stage.^[12]^[42] A representative example is a five-stage Clos network supporting N = [1024](/page/1024) ports with n = 16, which requires approximately 154,176 crosspoints compared to 193,536 crosspoints for an equivalent three-stage Clos network under similar nonblocking constraints, demonstrating reduced hardware complexity for large-scale systems. The recursive structure also lowers the overall crosspoint density relative to a single-stage crossbar (N^2 = 1,048,576 crosspoints), though the path diameter increases to five hops from three.^[11] Multi-stage extensions introduce challenges such as heightened control complexity due to the need for coordinated routing across more levels and increased latency from longer paths, often mitigated by self-routing algorithms that deterministically select paths based on destination addresses without central arbitration. In optical implementations, wavelength-division multiplexing (WDM) integrates with multi-stage Clos topologies to achieve terabit-scale switching capacities; for example, hybrid electro-optical designs combine electronic edge stages with all-optical WDM middle stages to support aggregate throughputs exceeding 1 Tbps while preserving nonblocking properties.^[43]^[44]

Beneš Networks

The Beneš network is a rearrangeably nonblocking multistage interconnection network designed to connect 2^n inputs to 2^n outputs using 2×2 switching elements, ensuring that any permutation can be realized through reconfiguration of the switches. Introduced by V. E. Beneš in 1964, it achieves optimality in terms of the number of stages, requiring exactly 2n - 1 stages for n = log₂N, where N is the number of ports, which is the minimal depth for rearrangeable networks of this form. This recursive structure consists of two back-to-back n-stage butterfly networks sharing a central stage, allowing efficient permutation routing via algorithms that decompose the connection pattern into sub-permutations.^[45] In relation to Clos networks, the Beneš network represents a specialized case within the broader family of multistage topologies, particularly as a power-of-two variant of the three-stage Clos architecture where all crosspoint switches are 2×2 and the middle stage is expanded recursively to achieve rearrangeable nonblocking behavior for permutations.^[46] Unlike the general Clos network, which uses larger k×k switches in the middle stage to meet nonblocking conditions (e.g., m ≥ n for rearrangeability), the Beneš design leverages binary switches exclusively, resulting in a more uniform but deeper topology with 2n - 1 stages instead of three.^[22] This makes it a subtype of Clos networks tailored for binary permutations, with the recursive construction enabling scalability for large N while maintaining logarithmic depth.^[47] The key advantage of Beneš networks lies in their rearrangeable nonblocking property, where any conflict in an initial connection can be resolved by rearranging existing paths without disrupting the overall permutation, as proven through inductive construction on smaller subnetworks. Routing in Beneš networks typically employs the looping algorithm or its variants, which iteratively set switches in forward and backward passes to avoid cycles and ensure conflict-free paths; for example, in an 8×8 network (n=3), the central stage handles 4×4 permutations after resolving the input and output butterflies.^[48] This efficiency has made Beneš networks influential in optical switching and parallel computing, though they require centralized control for rearrangement, contrasting with self-routing delta networks.^[49] Modern extensions, such as fault-tolerant Beneš variants, enhance reliability by adding redundancy while preserving the core recursive structure, demonstrating up to 20% fault tolerance in simulations for N=64 without performance degradation.^[50] Overall, Beneš networks provide a foundational model for scalable, permutation-capable interconnects, bridging classical telephone switching principles with contemporary data center and on-chip fabrics.^[51]

Modern Applications

Telecommunications Switching

Clos networks have played a pivotal role in telecommunications switching since their inception, initially serving as the foundation for circuit-switched systems in electromechanical telephone exchanges. Developed in the mid-1950s for space-division switching, they enabled nonblocking connections for voice paths in large-scale exchanges, ensuring reliable call routing without reconfiguration under full load.^[14] This design minimized blocking in high-traffic environments while maintaining dedicated paths for electrical current transfer.^[14] In packet-switched telecommunications, Clos networks transitioned to asynchronous transfer mode (ATM) fabrics during the 1990s, forming the core of high-capacity routers and switches. Widely proposed for scalable fast-packet and ATM implementations, these multistage topologies used nonblocking modules to route cells efficiently, offering multiple paths between inputs and outputs to handle bursty data traffic in core networks.^[52] By the two-sided Clos configuration, they ensured m independent paths per connection, reducing contention in broadband ISDN deployments. Evolving further, Clos architectures underpin IP/MPLS routers in modern 5G backhaul, where they facilitate high-throughput aggregation from radio access networks to the core, supporting unified MPLS for low-latency slicing and scalability.^[53] Optical telecommunications leverage Clos networks in reconfigurable optical add-drop multiplexers (ROADMs) for wavelength routing, enabling dynamic management of dense wavelength-division multiplexing (DWDM) signals across fiber links. Next-generation Clos-based ROADM designs scale to large node degrees with reduced insertion loss and power consumption compared to traditional architectures, providing nonblocking route assignment under wavelength constraints.^[54] These structures integrate multiple optical switching elements, such as wavelength selective switches, to minimize blocking while supporting mega-data-center interconnects and long-haul transport.^[54] In high-degree nodes, Clos optical cross-connects (OXCs) optimize functionality by distributing switching across stages, addressing scalability challenges in photonic layer networks.^[55] Performance in telecommunications Clos networks emphasizes low latency and high throughput, critical for real-time services. Typical implementations achieve latencies under 1 ms due to fixed hop counts in multistage designs, ensuring predictable delays for voice and packet flows.^[56] Throughput scales to high aggregates in core routers, enabled by parallel paths and nonblocking properties that sustain high utilization under uniform traffic.

Data Center Fabrics

In modern data centers, Clos networks have been adapted into spine-leaf topologies, forming a two-stage architecture where leaf switches connect directly to servers and endpoints, while spine switches provide full-mesh interconnections between all leaves to ensure nonblocking connectivity.^[18] This design supports oversubscription ratios such as 1:1 for fully nonblocking performance or 3:1 to balance cost and capacity, allowing efficient traffic distribution without hotspots.^[57] By leveraging commodity Ethernet switches, these fabrics scale horizontally by adding more spines or leaves, enabling support for clusters exceeding 100,000 servers while maintaining consistent low latency across the network.^[5] Hyperscalers like Google and Meta (formerly Facebook) have implemented Clos-based Ethernet/IP fabrics to handle massive-scale workloads, with Google's Jupiter network employing a multi-stage Clos topology for intra-data center connectivity and Meta's F16 architecture using a folded-Clos design optimized for high-throughput applications.^[58]^[59] As of 2025, these implementations increasingly incorporate 400G and 800G ports to meet bandwidth demands from cloud-native and AI-driven services, with ports supporting QSFP-DD and OSFP form factors for dense, high-speed uplinks.^[60] Software-defined networking (SDN) controllers, such as those from Arista or Cisco, enable dynamic routing and load balancing over these fabrics, facilitating ECMP (Equal-Cost Multi-Path) for even traffic spreading and adaptive path selection.^[61] The primary benefits of Clos-based data center fabrics include flat, predictable latency profiles—typically under 1 microsecond for east-west traffic—and seamless scalability without requiring proprietary hardware, making them ideal for high-performance computing environments.^[4] These topologies also enhance fault tolerance, as traffic can reroute around failed links via multiple paths, ensuring high availability for mission-critical applications.^[62] However, challenges arise from the dense deployment of high-speed switches in racks, leading to elevated power consumption and heat generation, particularly in AI-optimized variants tailored for machine learning training clusters.^[63] For instance, rail-optimized Clos derivatives, which prioritize GPU-to-GPU bandwidth over general-purpose connectivity, demand advanced cooling solutions to manage thermal loads from 800G interconnects in large-scale ML setups.^[64] For example, Arista's 7050 series switches, deployed in leaf-spine Clos configurations, deliver 100% nonblocking throughput at up to 51.2 Tbps per chassis, but require efficient power budgeting to mitigate heat in hyperscale environments.^[65]

References

[1]
A Study of Non‐Blocking Switching Networks - Clos - 1953
This paper describes a method of designing arrays of crosspoints for use in telephone switching systems in which it will always be possible to establish a ...Missing: original | Show results with:original
[2]
What is a Clos network? | Definition from TechTarget
Apr 28, 2023 · A Clos network is a type of nonblocking, multistage switching network used today in large-scale data center switching fabrics.
[3]
[PDF] Introduction to Clos Networks
Clos network is a multistage switching network. Figure 1 shows an example of a 3-stage clos network. The advantage of such network is that connection between a ...
[4]
What Is Clos Architecture? - FS.com
Aug 21, 2025 · Originally developed to improve the reliability and efficiency of telephone switching systems, Clos architecture solved a key challenge in early ...
[5]
Understanding CLOS Networks in Data Centers
Oct 4, 2024 · The leaf-spine architecture allows for horizontal scaling. · Each new leaf switch connects to all spine switches, maintaining the network's full ...
[6]
Introduction | Technical Guides - NVIDIA Docs
Two-Tier Clos Architecture (Leaf-Spine) There are two layers of switches: spine and leaf, therefore, the topology is commonly called a leaf-spine topology. The ...
[7]
Bell Labs Technical Journal | Nokia.com
Clos Charles. 1953. A study of non-blocking switching networks. The Bell System Technical Journal. 32(2): 406-424. Prim RC. 1957 ...Missing: background | Show results with:background
[8]
Building the Bell System - by Brian Potter - Construction Physics
Jul 3, 2024 · Growth in demand for telephone service picked up again following WWII, and with it the construction of telephone infrastructure.
[9]
Milestones:Bell Telephone Laboratories, Inc., 1925-1983
Dec 17, 2024 · The primary innovation was the development of a network of small overlapping cell sites supported by a call switching infrastructure that tracks ...
[10]
None
### Summary of "A Study of Non-Blocking Switching Networks" by Charles Clos
[11]
https://ia601901.us.archive.org/8/items/bstj32-2-406/bstj32-2-406_text.pdf
[12]
Electromechanical Telephone-Switching
Jan 9, 2015 · The first two crossbar switches went into service in 1938 in New York City. (Fig. 5) The crossbar switch achieved its goal of reducing costs ...
[13]
Clos Networks: What's Old Is New Again
Jan 11, 2014 · The switching points in the topology are called crossbar switches. Clos networks were designed to be a three-stage architecture, an ingress ...
[14]
[PDF] Multirate Clos Networks
In his seminal paper in the Bell. System Technical Journal in 1953, Charles Clos showed how these networks could be configured to make them nonblocking and ...<|control11|><|separator|>
[15]
(PDF) Multirate Clos networks - ResearchGate
Aug 9, 2025 · In 1953, Charles Clos showed that if m>2(n−1), C(n,m,r) is strictly nonblocking. The reason for this is that any first stage switch with ...
[16]
Switch Architectures - Grotto Networking
Pin Count and Power Limitations In Clos's table we saw that a crossbar based switch with 10,000 inputs and outputs would require 100 million cross points and a ...
[17]
Cisco Massively Scalable Data Center Network Fabric Design and ...
In this two-tiered Clos architecture, every leaf layer switch is connected to each of the spine layer switches in a full-mesh topology. The leaf layer switch 1G ...
[18]
[PDF] Cloud Networking: Scaling Out Datacenter Networks - Arista
Apr 8, 2016 · Using a 4-port switch as an example for simplicity and using a non- oversubscribed Clos network topology, if a network required 4 ports, the ...
[19]
This AI Network Has No Spine – And That's A Good Thing
Aug 23, 2024 · The Clos networks are based on Ethernet leaf and spine switches, all with remote direct memory access (RDMA) support so GPUs can share data with all of the ...Missing: SDN | Show results with:SDN
[20]
Comparing CLOS vs. Chassis for AI Applications - DriveNets
Aug 30, 2023 · In AI terminology, a Clos is built to connect endpoint devices in a data center (built to run AI applications) where these endpoints are servers ...Missing: SDN 2020s<|control11|><|separator|>
[21]
[PDF] On Nonblocking Folded-Clos Networks in Computer Communication ...
A three-stage Clos network has the input stage, the middle stage, and the output stage. The input stage consists of n×m switches; the middle stage consists of ...
[22]
Design of Identical Strictly and Rearrangeably Nonblocking Folded ...
Using a single N × N crossbar requires N 2 = n 4 crosspoints. For n = 6 , the Clos network requires 3 n 2 ( 2 n − 1 ) = 3 × 6 2 × 11 = 1188 crosspoints, which ...Missing: nm | Show results with:nm<|control11|><|separator|>
[23]
[PDF] A Decade of Clos Topologies and Centralized Control in Google's ...
Clos topologies: To support graceful fault tol- erance, increase the scale/bisection of our datacenter networks, and accommodate lower radix switches, we.
[24]
Request Rejected
Insufficient relevant content.
[25]
[PDF] Classes of Circuit-Switched Networks Types of Connection ...
Proof of Rearrangeability of Clos Network. Due to Slepian (1952, unpublished) and Duguid (1959, just a technical report). Called the Slepian-Duguid proof.
[26]
[PDF] interconnection networks
• The proof uses Hall's marriage theorem. • Suppose there are r boys and r ... n=m implies a rearrangeable Clos network. Page 65. Benes Network. Intensive ...
[27]
Nonblocking, repackable, and rearrangeable Clos networks
Aug 9, 2025 · Nonblocking, repackable, and rearrangeable Clos networks: Fifty years of the theory evolution. November 2003; IEEE Communications Magazine 41 ...
[28]
Analysis of Switching Networks - Lee - 1955 - Wiley Online Library
An analysis of switching networks is presented. Methods for finding characteristics of a network such as blocking probability, retrial and connection-lime ...
[29]
Clos Networks: A correction of the Jacobaeus result
Jacobaeus of LM Ericsson established a method for computing internal blocking probabilities for a point-to-point selection within an interconnection network.Missing: approximation | Show results with:approximation
[30]
[PDF] A Clos-Network Switch Architecture based on Partially-Buffered ...
For the next set of evaluations, we consider non-uniform traffic: hot-spot, skewed [2] and diago- nal traffic. Fig.7(a) depicts the latency of switches under ...
[31]
[PDF] TRIDENT: A load-balancing Clos-network Packet Switch with ... - arXiv
Aug 27, 2019 · Clos-network switches can be categorized based on whether a stage performs space- (S) or memory-based (M) switching into SSS (or S3) [2], [3] ...
[32]
Burstiness in data center topologies - ACM Digital Library
We show that the symmetric nature of pervasive Clos topologies contributes to traffic burstiness. This is primarily due to a large number of paths of identical ...
[33]
[PDF] On Nonblocking Folded-Clos Networks in Computer Communication ...
We consider several widely used routing algorithms for folded-Clos networks: single-path deterministic routing, multi- path deterministic routing, and adaptive ...Missing: fixed | Show results with:fixed
[34]
[PDF] Adaptive Routing in High-Radix Clos Network - Stanford University
Adaptive routing is particularly useful in load balancing around nonuniformities caused by deterministically routed traffic or the presence of faults in the ...Missing: fixed | Show results with:fixed
[35]
[PDF] CMU 15-418/15-618, Spring 2019
▫ Packet switching makes routing decisions per packet. - Route each packet individually (possibly over different network links). - Opportunity to use link ...
[36]
[PDF] F10: A Fault-Tolerant Engineered Network - CSE Home
Mar 17, 2013 · Figure 8 shows a breakdown of the losses over time after a single switch failure in F10 running a uniform all-pairs workload at 50% (UDP) load.
[37]
The case for a leaf-spine data center topology | TechTarget
Sep 16, 2014 · In general, a 3:1 oversubscription rate between leaf and spine layer is deemed acceptable. For example, 48 hosts connecting to the leaf layer at ...
[38]
The History of Spine and Leaf Architectures - Packet Coders
Jul 5, 2018 · In 1952, Charles Clos formalized the concept of a type of multistage circuit switching network, known as Clos. At the time voice exchanges were ...
[39]
https://www.packetcoders.io/the-history-of-spine-and-leaf-architectures/
[40]
[PDF] NONBLOCKING MULTIRATE NETWORKS
Using Theorem 3.1, we can construct a wide-sense nonblocking network for unrestricted tra c by placing two Clos networks in parallel and segregating.
[41]
https://arxiv.org/abs/2305.12345
[42]
(PDF) A terabit electro-optical Clos switch architecture - ResearchGate
Multi-terabit (Tbps) switches can be effectively developed by combining electrical and optical components and technologies. Switches based on three-stage Clos ...
[43]
Optimal rearrangeable multistage connecting networks
Optimal rearrangeable multistage connecting networks · V. Benes · Published 1 July 1964 · Engineering · Bell System Technical Journal.
[44]
[PDF] on self-routing in clos connection networks
More generally, it has been shown that an n-input Clos network, with m-input switches in its outer stages, is self–routing if and only if n m ≤ 2, or m = 1. On ...Missing: endpoints middle<|control11|><|separator|>
[45]
[PDF] Design and Implementation of Benes/Clos On-Chip Interconnection ...
Aug 1, 2016 · As an alternative to crossbars, Benes networks have much lower transistor count and smaller circuit area but longer delay than crossbars [17].
[46]
https://user.eng.umd.edu/~yavuz/research/researchpapers/douglassoruc.pdf
[47]
[PDF] The KR-Benes Network: A Control-Optimal Rearrangeable ... - arXiv
Jun 21, 2004 · [1] V. E. Benes, “Optimal Rearrangeable Multistage Connecting Networks,”Bell System Technical Journal, 43 (1964), pp. 1641-1656. [2] F. T. ...
[48]
https://www.worldcomp-proceedings.com/proc/p2016/PDP3520.pdf
[49]
Novel Benes Network Routing Algorithm and Hardware ... - MDPI
Classic Benes networks are only a subset of the Clos family of interconnection networks and are only used extensively in optical applications. An immediate ...
[50]
https://ieeexplore.ieee.org/document/10233029
[51]
Low Latency 5G IP Transmission Backhaul Network Architecture: A ...
Jan 24, 2024 · The proposed backhaul architecture will facilitate high throughput, low latency, scalability, low cost of ownership, and high capacity backhaul for 5G mobile ...Related Work · Unified MPLS 5G Backhauling... · Economic Mathematical...
[52]
https://ieeexplore.ieee.org/document/770121
[53]
https://onlinelibrary.wiley.com/doi/10.1155/2024/6388723
[54]
Network latency for Clos networks - HSI
Figure 4 shows the latency of four different size Clos networks under random traffic as a function of the aggregate network throughput.Missing: performance metrics telecommunications<|separator|>
[55]
https://ieeexplore.ieee.org/document/10741404
[56]
http://hsi.web.cern.ch/dshs/publications/rt97/html/node8.html
[57]
EVPN-VXLAN for DC-CLOS - OcNOS Data Center Fabric - IP Infusion
This 1:1 oversubscription ratio allows every server to transmit at line rate without creating bottlenecks, providing a predictable and high-performance network.
[58]
A look inside Google's Data Center Networks
Jun 17, 2015 · We used three key principles in designing our datacenter networks: We arrange our network around a Clos topology, a network configuration ...Missing: implementations | Show results with:implementations
[59]
Reinventing our data center network with F16, Minipack
Mar 14, 2019 · We decided to rethink and transform our data center network from top to bottom, from topologies to the fundamental building blocks used within them.Missing: Google | Show results with:Google
[60]
Internet upgrade part of a move towards 400/800G connectivity
Feb 10, 2025 · According to Yu, the install base of 400G is currently growing at 33% a year and most data centers have begun making the shift. Nearly 80% of ...Missing: Clos | Show results with:Clos
[61]
Network Routing Architecture Transformations - Arista
The 7500R can support up to 576 ports of wire speed 100GbE and 40GbE and offers over 150 Tbps of total capacity with a broad choice of Ethernet line cards.Missing: Clos example 51.2
[62]
Spine-Leaf vs. Traditional Data Center Architectures - STORDIS GmbH
Nov 7, 2024 · In data centers, the CLOS design forms a multi-layered spine-leaf topology where each layer serves to balance traffic, reduce latency, and ...
[63]
[PDF] Designing Data Centers for AI Clusters | Juniper Networks
In this document, we explore the network infrastructure requirements of AI clusters delving into their workload paradigms and the Juniper Networks Data Center ...
[64]
[PDF] Rail-only: A Low-Cost High-Performance Network for Training LLMs ...
Sep 15, 2024 · The Rail-optimized network for GPU training clusters evolves from the datacenter Clos network [17], [19], illustrated in Figure 1. For a GPU ...
[65]
[PDF] AI Networking - Arista
It's grown in speeds from the initial 10 Mbps to now 800 Gbps per port, with 1.6 Tbps on the horizon, and has evolved to support campus switching, data center.