Fact-checked by Grok 2 weeks ago

Intel Ultra Path Interconnect

The Intel Ultra Path Interconnect (UPI) is a cache-coherent, point-to-point interconnect technology developed by for enabling high-bandwidth, low-latency communication between multiple processors in scalable server systems, primarily within the Intel Xeon Scalable processor family. It supports coherence across sockets, facilitating tasks such as , I/O , and system configuration in multi-processor environments. Introduced on July 11, 2017, alongside the first-generation Scalable processors (codename Skylake-SP), UPI replaced the earlier QuickPath Interconnect (QPI) to provide improved scalability and efficiency for workloads. Subsequent generations have evolved the technology, with speeds increasing to 11.2 GT/s in Ice Lake-SP (third generation, 2021) and the introduction of UPI 2.0 in (fourth generation, 2023) at up to 16 GT/s, further enhanced in (fifth generation, late 2023) at up to 20 GT/s and Granite Rapids (sixth generation, 2024) at up to 24 GT/s for broader platform compatibility. UPI employs a layered consisting of physical (PHY), , , and layers, with early versions utilizing 20 differential lanes per for bidirectional and embedded clocking to minimize latency (UPI 2.0 uses 24 lanes). Key specifications include support for up to three UPI per processor in multi-socket configurations (two for dual-socket systems), up to 46-bit physical addressing in early versions (52-bit in UPI 2.0), and home-snoop coherency . varies by version and processor: early implementations operate at 9.6 or 10.4 GT/s (approximately 19.2 or 20.8 GB/s per direction per ), while UPI 2.0 in supports speeds up to 16 GT/s (up to ~38.4 GB/s per direction per ) for enhanced performance in demanding applications, with later enhancements to 24 GT/s. Advanced features in UPI 2.0 encompass 16-bit for protocol protection, error detection and logging, viral error modes for reliability, optional for secure memory transactions including , decryption, and , and CXL compatibility.

Overview

Introduction

The Intel Ultra Path Interconnect (UPI) is a point-to-point, packetized, cache-coherent interconnect designed for multi-socket CPU systems, enabling efficient communication between processors in server environments. It serves as the primary mechanism for linking multiple processor dies, supporting shared address spaces and maintaining data consistency across sockets. UPI plays a crucial role in facilitating scalable shared-memory architectures for and applications, allowing systems to expand from two to eight sockets while preserving coherency. At its core, it operates via bidirectional serial links that transfer packets between dies, ensuring low-latency exchanges for lines and system requests. Introduced in 2017 as a replacement for the QuickPath Interconnect (QPI), UPI debuted with the first-generation Scalable processors. Modern implementations support speeds up to 24 GT/s, enhancing inter-socket for demanding workloads.

Key Features

The Intel Ultra Path Interconnect (UPI) exclusively employs a directory-based home snoop coherency , enabling scalable multi-socket systems to maintain data consistency across processors in a shared without relying on broadcast snooping mechanisms. This approach contrasts with prior interconnects by focusing solely on directory-based operations for efficient coherency management in high-core-count environments. UPI features a redesigned packetization format that enhances data transfer efficiency through structured flits and flow control, supporting low-power states such as L0p to reduce during idle periods. These packets facilitate quick transaction completions and integrate with the for optimized utilization in coherent . A core architectural element of UPI is the integration of distributed Caching and Home Agents () within each and Last Level Cache (LLC) bank, allowing for scalable resource distribution across the processor's interconnect. Each combines coherency agent functionality for request generation and snoop servicing with home agent logic for , minimizing bottlenecks in multi-socket configurations. UPI supports Sub-NUMA Clustering (SNC), which partitions the processor into localized domains for interleaving, mapping addresses to specific LLC slices to optimize latency-sensitive workloads. This feature enables finer-grained control over NUMA topology, improving performance by reducing remote access overhead within a .

History and Development

Origins and Replacement of QPI

The Intel QuickPath Interconnect (QPI) was introduced in 2008 as part of Intel's Nehalem microarchitecture, marking the first implementation in 45-nm processors produced in the second half of that year. Designed as a high-speed, packetized, point-to-point interconnect, QPI replaced the traditional front-side bus (FSB) to enable a distributed shared-memory architecture with improved bandwidth and reduced latency in multi-processor systems. It supported flexible cache coherency protocols, including source snoop for low-latency operations in smaller configurations and home snoop with directory-based tracking for better scalability in larger systems, allowing targeted communication between caching agents rather than broadcasts. Despite its advancements, QPI faced challenges in power consumption and efficiency as server demands grew, particularly in large-scale environments where multi-socket configurations amplified energy use and interconnect overhead. The protocol's support for multiple snoop modes, while versatile, could lead to increased traffic and power draw in expansive setups, limiting overall system optimization for . By 2017, these limitations contributed to QPI reaching end-of-life, as shifted focus to more efficient alternatives for enterprise workloads. The development of the Intel Ultra Path Interconnect (UPI) was motivated by the need to address QPI's shortcomings through enhanced power efficiency and superior scalability tailored for data centers. Key improvements included the introduction of a low-power L0p state to reduce idle energy consumption and a streamlined directory-only coherency protocol, which minimized unnecessary snoop traffic by relying solely on home snoop with directory tracking, simplifying design and boosting performance in multi-socket systems. UPI's initial implementation supported up to three links per processor at 10.4 GT/s, enabling configurations from two to eight sockets with optimized bandwidth allocation. The transition from QPI to UPI began with the launch of the Skylake-SP-based Intel Xeon Scalable processors in July 2017, phasing out QPI entirely in new server platforms as UPI became the standard interconnect for subsequent generations. This shift aligned with 's broader strategy to support denser, more power-efficient computing infrastructures.

Introduction in First-Generation Xeon Scalable

The Ultra Path Interconnect (UPI) debuted in July 2017 with the first-generation Xeon Scalable processors, codenamed Skylake-SP, marking 's shift to a new high-speed, cache-coherent inter-processor designed for multi-socket server environments. This launch replaced the previous QuickPath Interconnect (QPI) to support enhanced scalability in workloads. UPI 1.0 operated at a transfer rate of 10.4 GT/s per link, with each supporting up to three links to enable flexible , though initial implementations emphasized two-socket systems for optimal . It integrated across the Xeon Scalable lineup, including the , , Silver, and series, and complemented the on-die mesh architecture by providing efficient off-die extensions for inter-core and inter-socket data movement. In typical two-socket configurations, UPI delivered an aggregate of 20.8 GB/s per direction. Early adoption of UPI-equipped first-generation Scalable processors focused on deployments, where they powered platforms and (HPC) applications by offering improved I/O and for demanding enterprise tasks. Systems from major vendors like , HPE, and rapidly incorporated these processors, enabling up to 1.65x performance gains over prior generations in virtualization-heavy environments.

Evolution in Subsequent Generations

The second-generation Intel Xeon Scalable processors, codenamed Cascade Lake-SP and launched in 2019, retained the original Ultra Path Interconnect (UPI) 1.0 specification from the first generation, operating at 10.4 GT/s per link with support for up to three links per processor to maintain multi-socket scalability. This configuration ensured compatibility with existing platforms while introducing enhancements like support for Intel Optane DC persistent memory, which leveraged UPI for efficient data sharing across sockets in memory-intensive workloads. In the third-generation Xeon Scalable processors, codenamed Ice Lake-SP and released in 2021, UPI increased link speeds to 11.2 GT/s for improved inter- bandwidth, with up to three links per . These advancements, built on a , also incorporated protocol optimizations for better efficiency without altering the core coherency model. The fourth-generation Scalable processors, codenamed and introduced in 2023, marked a significant upgrade with UPI 2.0, boosting link speeds to 16 GT/s across four links per processor to support up to eight-socket configurations. A key addition was an integrated engine within the UPI module, enabling inline security features such as data encryption and integrity checks for secure multi-socket communication in enterprise environments. This version also improved signaling integrity and error correction to handle higher data rates reliably. Building on UPI 2.0, the fifth-generation Scalable processors, codenamed and launched later in 2023, raised link speeds to 20 GT/s while retaining four links per for consistent multi-socket performance. Integration with DDR5 memory channels complemented these UPI enhancements, allowing faster data movement between sockets and memory subsystems in bandwidth-sensitive applications. The sixth-generation Scalable processors, including the performance-oriented Granite Rapids and efficiency-focused variants launched in 2024 and early 2025, further advanced to 24 GT/s per link with support for up to six links per , enabling enhanced scalability in two-socket systems for and workloads. These improvements facilitate higher inter-socket throughput for distributed inference and training, where rapid data exchange between cores is critical. Across these generations, UPI has trended toward progressively higher link speeds—from 10.4 GT/s to 24 GT/s—and increased link counts for greater multi-socket , alongside power efficiency gains through advanced low-power states and reduced in coherency traffic. These evolutions have optimized UPI for denser core counts and emerging workloads like , without disrupting in platforms.

Technical Architecture

Coherency and Protocol

The Intel Ultra Path Interconnect (UPI) employs a directory-based home snoop coherency protocol to maintain cache consistency across multi-socket systems. This approach utilizes distributed home agents within the Coherency Home Agent (CHA) modules to track the state of cache lines in a directory structure, enabling targeted snoop probes rather than broadcasting to all sockets. By directing coherency traffic only to relevant agents, the protocol minimizes unnecessary inter-socket communication and reduces latency in scaled configurations. The UPI protocol stack consists of a layer and a , each handling distinct aspects of and transmission. The layer, implemented primarily in the , PCIe , and configuration agent (Ubox), manages high-level operations such as injecting, generating, and servicing transactions for access and interrupts. It processes message classes including requests (REQ), , and responses (RSP) to enforce the MESIF (Modified, Exclusive, Shared, , Forward) cache states. The , in contrast, operates below this to convert messages into fixed-size flits for transmission, incorporating error detection via and retry mechanisms for corrupted packets. Key transaction types in UPI include read and write requests issued by agents () for data access, coherency probes (snoops) generated by home agents () to query remote caches, and data responses that complete transfers with optional writebacks () for dirty lines. Flow control is enforced through a credit-based system at the , where virtual channels (e.g., VN0 for requests, VN1 for snoops) allocate buffers and prevent overflows by tracking available credits per peer. This ensures reliable ordering and prevents deadlocks in bidirectional traffic. Unlike its predecessor, the QuickPath Interconnect (QPI), which supported multiple snoop modes—including no-snoop, early snoop, home snoop, and —UPI is optimized exclusively for directory-based operation. This simplification eliminates the overhead of mode selection and preallocation, reducing protocol complexity and enabling better scalability in systems with many sockets by avoiding broadcast storms. The directory-only design lowers coherency latency in large-scale deployments, as home agents resolve states locally without probing unnecessary nodes. The coherency overhead in UPI can be modeled conceptually, where directory lookups enable constant-time O(1) resolution per transaction via the home agent, in contrast to traditional snooping's O(n) scaling with the number of sockets (n). This efficiency arises because the directory maintains sharer and owner pointers, allowing snoops to fan out selectively rather than to all peers. \text{Directory Lookup Time} \approx O(1) \quad \text{vs.} \quad \text{Snooping Time} \approx O(n) Such a model highlights UPI's advantage in multi-socket environments, where n grows beyond small configurations. The physical layer of Intel Ultra Path Interconnect (UPI) implements high-speed point-to-point links designed for cache-coherent communication between processors. These links employ signaling using complementary positive (DP) and negative (DN) signal pairs to transmit data reliably over short distances within multi-socket platforms. Embedded clocking is integrated into the , eliminating the need for separate clock lines and enabling synchronous operation at data rates up to 24 GT/s in recent generations, which simplifies board design and reduces pin count. Data integrity on UPI links is maintained through proprietary encoding at the for reliable and cyclic redundancy checks () at the for error detection, with retry mechanisms to handle corrupted flits. Each link direction supports up to 24 in UPI implementations, with earlier versions using 20 ; these operate in to achieve aggregate , where a full-width link aggregates the throughput of all active after encoding efficiency is applied. Link training occurs during system initialization to establish reliable communication, involving a sequence of , detection, speed , and equalization phases. The logic negotiates the highest supported data rate—such as 10.4 GT/s in first-generation UPI or 24 GT/s in later versions—while applying adaptive equalization to compensate for signal degradation over traces. This process ensures optimal alignment and margining for error-free operation across varying conditions. In multi-socket configurations, each CPU typically supports 2 to 6 UPI links, depending on the processor generation and platform topology, with links configurable to form a network for balanced interconnectivity. For instance, two-socket systems often use two links per CPU for direct connectivity, while larger four- or eight-socket setups employ up to six links to maintain low-latency paths in a non-blocking arrangement. Power management in the UPI physical layer includes low-power states to optimize during idle or light-load scenarios. The L0 state represents full active operation with maximum performance, while the L0.3 state provides a partial low-power mode by idling unused lanes or reducing voltage without fully entering deeper sleep states, enabling quick reactivation for bursty workloads. These states integrate with overall power policies to performance and thermal constraints.

Integration with Processor Design

The Intel Ultra Path Interconnect (UPI) is deeply integrated into the processor's architecture, serving as the primary interface for coherent communication between multi-socket configurations while leveraging the on-die mesh fabric for internal routing. In designs like the 4th Gen Scalable processors (codename ), UPI controllers are placed near dedicated mesh stops on the periphery of the tile-based structure, allowing efficient access to the overall system fabric without excessive hops. This placement ensures that UPI traffic integrates seamlessly with the processor's modular layout, where compute and I/O elements are connected via embedded multi-die interconnect bridges (EMIB). Coherency Home Agents (CHAs) are distributed across the processor tiles to maintain balanced load distribution, with one CHA typically associated per core tile and last-level cache (LLC) slice. This arrangement enables scalable handling of memory requests and snoops, as each CHA connects directly to the mesh interconnect, facilitating uniform access to shared resources like the LLC and memory controllers. In tile-based implementations, such as those in , this per-tile CHA setup supports the logical monolithicity of multi-tile packages, where all cores can access global resources transparently. On-die connectivity for UPI relies on the bidirectional fabric, which routes packets from tiles to I/O dies hosting the UPI links. In ' chiplet design, EMIB bridges extend the across tiles, minimizing for UPI by treating the multi-die package as a unified interconnect domain. This -based routing replaces earlier topologies, providing higher and for intra-package traffic destined for inter-socket UPI transmission. UPI's design enables scalability from 2 to 8 sockets through support for optimized topologies, such as the 8-socket 4-UPI configuration, and NUMA-aware routing that maps memory domains across sockets via UPI links. This allows processors to maintain coherent shared address spaces in large-scale systems, with the mesh distributing UPI-related traffic to prevent bottlenecks. In Granite Rapids-based processors, this extends to up to 6 UPI links per socket, further enhancing multi-socket connectivity in 1- to 8-socket setups. The integration of UPI has evolved from monolithic die designs in Skylake-based processors, where the mesh directly interfaced UPI on a single silicon expanse, to multi-tile architectures in and Granite Rapids. In these later generations, chiplet-based tiles—comprising compute, memory, and I/O elements linked by EMIB—allow UPI to scale with higher core counts while preserving low-latency access through extended routing. This progression supports denser packaging and improved resource sharing without compromising coherency.

Performance and Specifications

Bandwidth Capabilities

The Intel Ultra Path Interconnect (UPI) provides high-speed, full-duplex data transfer capabilities between multi-socket processors, with bandwidth scaling across versions through increased transfer rates. In its initial UPI 1.0 implementation, each link operates at 10.4 GT/s, delivering approximately 20.8 GB/s of unidirectional throughput (41.6 GB/s bidirectional). This configuration supports symmetric upload and download speeds, enabling efficient cache-coherent communication in dual-socket systems. Subsequent generations increased the per-link speed to 11.2 GT/s with UPI 2.0 in third-generation Ice Lake-SP processors, yielding about 22.4 GB/s unidirectional (44.8 GB/s bidirectional), an improvement that enhanced inter-socket data movement without altering the fundamental link architecture. Later evolutions of UPI 2.0 further boosted performance, with operational speeds up to 16 GT/s in fourth-generation processors, equating to roughly 32 GB/s unidirectional per link (64 GB/s bidirectional), and peak speeds of 24 GT/s in sixth-generation 6 processors (Granite Rapids, launched 2024), achieving approximately 48 GB/s unidirectional (96 GB/s bidirectional). These rates maintain full-duplex operation, ensuring balanced throughput in both directions across each link. Early UPI 1.0 versions used 8b/10b encoding (80% efficiency), while UPI 2.0 employs 128b/130b encoding (approximately 98% efficiency) for reduced overhead. Aggregate depends on the number of per , typically three in early generations and up to four in fourth-generation, or six in sixth-generation 6. For example, UPI 1.0 systems with three per provide 62.4 GB/s total unidirectional (3 × 20.8 GB/s), sufficient for many workloads. In contrast, sixth-generation 6 with six UPI 2.0 at 24 GT/s deliver around 288 GB/s aggregate unidirectional throughput (6 × 48 GB/s), significantly expanding multi-socket . This scaling follows the formula for effective : (GT/s × 20 lanes × encoding efficiency) / 8 bits per byte. UPI's throughput is influenced by encoding overhead, which minimizes wasted while preserving over differential pairs. This efficiency ensures that raw transfer rates closely translate to usable data rates, though actual performance can vary slightly based on overhead and . Brief references to multi-link setups highlight UPI's advantage over alternatives like in bandwidth density for x86 ecosystems, but detailed comparisons fall outside core throughput metrics.
UPI VersionMax GT/s per LinkUnidirectional Bandwidth per Link (GB/s)Typical Links per SocketAggregate Unidirectional (GB/s)
1.010.420.8362.4
2.0 (Ice Lake)11.222.4367.2
2.0 ()16324128
2.0 (Granite Rapids)24486288

Latency and Power Efficiency

The Intel Ultra Path Interconnect (UPI) is designed to minimize in multi-socket configurations, enabling efficient coherent across processors. In a two-socket setup, the end-to-end for remote typically ranges from approximately 120-150 for a single hop, based on measurements of patterns in NUMA systems. This encompasses transmission across the physical link, processing, and queuing delays at the endpoints. The directory-based home snoop coherency further optimizes this by localizing snoop traffic at the home agent, reducing unnecessary broadcasts and thus lowering overall remote delays compared to broadcast-based alternatives. UPI's power efficiency stems from its support for multiple low-power states that allow links to dynamically with demands. The state maintains full operational speed for active data transfer, while the L1 state shuts down the link entirely during idle periods, achieving significant energy savings with rapid entry and exit times on the order of microseconds to avoid penalties. In configurations like four-socket systems using first-generation Scalable processors (e.g., 8180), inter-socket latencies are approximately 130-150 ns. Compared to its predecessor, QuickPath Interconnect (QPI), UPI delivers improved power efficiency through refined packet formats that reduce overhead and enhance transfer rates per wire, alongside the introduction of deeper idle states like L1 for better idle power control. Subsequent generations, such as UPI 2.0 integrated in third-generation Xeon Scalable processors, further enhance efficiency with support for higher transfer rates up to 11.2 GT/s initially and voltage optimizations that enable finer-grained power scaling without compromising coherency performance. These advancements result in reduced energy use for inter-socket communication, particularly in bandwidth-intensive server workloads, while maintaining latencies suitable for scalable shared-memory systems. In later iterations like those in fourth-generation Xeon (Sapphire Rapids), protocol enhancements and faster link speeds contribute to progressively lower effective latencies, approaching sub-150 ns in optimized two-socket topologies despite increasing core counts.

Scalability Limits

Intel Ultra Path Interconnect (UPI) 2.0 supports configurations up to eight sockets in high-end server platforms, such as those using 4th Generation Intel Xeon Scalable processors (), where Platinum-series CPUs enable scaling to this maximum through multiple UPI links per processor. This limit arises from constraints in link connectivity and the coherency directory size, which tracks cache line states across sockets in a system; beyond eight sockets, the protocol's efficiency diminishes due to increased overhead in maintaining coherence. Topology constraints in multi-socket UPI systems, such as or arrangements, impose limits on remote access as the number of sockets grows, with larger configurations like eight-socket setups potentially requiring up to four for data traversal between distant processors. In these topologies, each additional hop introduces , making full-mesh impractical for scalability beyond four sockets without specialized hardware. Key bottlenecks include the maximum of six UPI links per CPU in recent implementations, which restricts direct connectivity in dense configurations and can lead to contention on shared paths. Additionally, in systems exceeding four sockets, the directory-based coherency mechanism experiences increased thrashing and contention as more agents query the for states, potentially degrading performance under high sharing workloads. To mitigate these limits, system administrators employ NUMA tuning techniques, such as thread affinity and memory allocation policies, to minimize inter-socket traffic and favor local access. Sub-NUMA Clustering (SNC) modes further partition each socket into smaller domains, reducing coherency overhead by isolating memory controllers and caches within subsets of cores. These approaches enhance effective in multi-socket environments without altering hardware topology.

Applications and Implementations

Usage in Server Platforms

Intel Ultra Path Interconnect (UPI) primarily serves as the high-speed coherency fabric in multi-socket server platforms, enabling seamless communication between processors in configurations such as 2-socket (2S) and 4-socket (4S) systems deployed in data centers for (HPC), cloud infrastructure, and AI training workloads. In these environments, UPI facilitates scalable multi-processor designs, allowing servers to handle demanding tasks without bottlenecks in inter-socket data transfer. Prominent examples of UPI integration include servers and HPE systems equipped with Scalable processors, where UPI links support up to three connections per socket to optimize performance in enterprise-grade multi-socket setups. For instance, in HPE servers, configurations allow tuning of UPI options to balance power and performance for specific server roles. These platforms leverage UPI's cache-coherent to enable efficient sharing across sockets, providing key benefits for workloads like large-scale databases—where it reduces latency in transactional queries—and environments, supporting higher density through unified address spaces. As of 2025, UPI continues in fifth-generation Scalable processors like , supporting advanced AI and HPC workloads with improved multi-socket scaling. As of the November 2025 TOP500 list, UPI-based processors power 57% of systems on the list of supercomputers, underscoring their widespread adoption in production HPC deployments. To ensure reliability in these mission-critical settings, UPI incorporates built-in diagnostics and error handling mechanisms, such as dynamic link width reduction and corrupt data containment, which detect and mitigate link failures by adjusting connectivity without full system downtime. These features log events via the System Event Log (SEL) for proactive maintenance in server farms.

Configurations and Compatibility

The number of UPI links per processor varies by generation and model, typically 2 or 3, with up to 4 in fourth-generation Sapphire Rapids and up to 6 in third-generation Cooper Lake for multi-socket systems. In multi-socket configurations, effective links across sockets can reach up to 6 or more depending on the topology. Cost-optimized setups utilize 2-link modes, ideal for dual-socket platforms where interconnect bandwidth needs are moderate and cost efficiency is prioritized. Balanced configurations employ 3 links per socket, offering improved data throughput for multi-socket servers handling moderate to high workloads. High-bandwidth modes leverage up to 6 links per socket in advanced multi-processor systems, enabling robust scalability for demanding environments. UPI maintains protocol-level backward compatibility with prior versions, facilitating evolution across processor generations while preserving coherency mechanisms. In multi-socket setups, all interconnected processors must operate at identical UPI speeds to ensure stable communication and avoid protocol mismatches. Configurations do not support mixing UPI with the legacy QuickPath Interconnect (QPI), as UPI fully replaces it in scalable designs. Certified platforms limit scalability to a maximum of 8 sockets to maintain reliability and performance. BIOS settings provide essential tuning for UPI operation, including speed selection among supported rates such as 10.4 GT/s, 11.2 GT/s, or higher depending on the generation. Lane reversal options allow flexible cabling arrangements by inverting signal lanes without hardware changes, simplifying multi-socket installations. For verification, UPI speeds and link counts are detailed in ARK product pages or official processor datasheets.

Comparisons to Other Interconnects

Intel's Ultra Path Interconnect (UPI) is primarily designed for coherent, low-latency CPU-to-CPU communication within multi-socket x86 systems, contrasting with AMD's , which serves a broader role encompassing CPU, I/O, and interconnects in a more integrated architecture. UPI emphasizes directory-based coherency optimized for homogeneous CPU environments, achieving lower inter-socket (typically 60-100 ns round-trip) suitable for latency-sensitive workloads like and simulations. In comparison, IF provides higher peak bandwidth—up to approximately 100 GB/s aggregate in 2-socket configurations—but incurs higher (often 130-200 ns inter-socket) due to its scalable, fabric-based design that prioritizes flexibility across heterogeneous components. Compared to NVIDIA's , UPI focuses on CPU-centric, coherent multi-socket scaling, while targets high-bandwidth GPU-to-GPU and GPU-to-CPU links in accelerated computing environments. delivers significantly higher bandwidth, with fifth-generation implementations offering up to 1,800 GB/s bidirectional per GPU via multiple links, enabling massive parallel data movement for and HPC tasks. However, standard is non-coherent, requiring extensions like NVLink-C2C for , whereas UPI natively supports full without additional protocols. Relative to Intel's predecessor, QuickPath Interconnect (QPI), UPI doubles the per link while halving power consumption through improved encoding and low-power states like L0p. QPI, at its peak 8.0 GT/s, provided about 25.6 GB/s bidirectional per link, but UPI's 10.4 GT/s initial speed (and up to 16 GT/s in later versions) achieves roughly 41.6 GB/s bidirectional per link with enhanced efficiency. UPI excels in x86-specific ecosystems for reliable, low-latency multi-CPU but offers less flexibility for heterogeneous compute compared to IF's or 's GPU-centric .
InterconnectAggregate Bandwidth (2-Socket Setup, Bidirectional)Typical Inter-Socket Latency (Round-Trip)
Intel UPI 2.0~64 /s (2 links)60-100 ns
AMD Infinity Fabric~100 /s130-200 ns
NVIDIA 5.0~1,800 /s (per GPU pair, multiple links)<50 ns (with )

References

  1. [1]
    Intel® Xeon® Processor Scalable Family Technical Overview
    Jan 12, 2022 · Intel UPI is a coherent interconnect for scalable systems containing multiple processors in a single shared address space. Intel Xeon processors ...
  2. [2]
    Intel Ultra Path Interconnect Version 2.0 (Intel UPI 2.0) Functional ...
    The Intel® UPI 2.0 link is the coherent communication interface between processors. Intel® UPI 2.0 architecture can be used in a wide variety of server platform ...
  3. [3]
    2nd Gen Intel® Xeon® Scalable Processors Brief
    Intel® Ultra Path Interconnect (Intel® UPI): Four Intel® Ultra Path ... Intel® Xeon® Scalable processors introduced in July 2017. HPC is no longer just the ...
  4. [4]
    [PDF] Intel® Xeon® Scalable Processors Datasheet, Vol. 1: Electrical
    the Intel® UPI defined in the Intel® Ultra Path Interconnect (Intel® UPI) Specifications. Page 49. Second Generation Intel® Xeon® Scalable Processors. 49.
  5. [5]
    [PDF] Efficient Performance for General-Purpose Workloads - Intel
    Intel® Ultra Path Interconnect (Intel® UPI) 2.0 provides up to 24 GT/s of inter-socket bandwidth—a 20 percent increase over the prior generation.
  6. [6]
    Intel® Xeon® Scalable processor Max Series
    Aug 4, 2022 · Intel® UPI Interconnect Speed, Up to 11.2 GT/s, Up to 10.4 GT/s ; PCIe. Up to 64 lanes per CPU (bifurcation support: x16, x8, x4); PCIe Gen 4 ...
  7. [7]
    [PDF] An Introduction to the Intel QuickPath Interconnect
    Intel® IBIST tools provide a mechanism for testing the entire interconnect path at full operational speed without the need for external test equipment.
  8. [8]
    [PDF] Akhilesh Kumar - Hot Chips
    This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice.
  9. [9]
    [PDF] Product Brief: Intel® Xeon® Scalable Platform
    Jul 4, 2017 · Intel® Ultra Path Interconnect (Intel® UPI): Up to three. Intel UPI channels increase scalability of the platform to as many as eight sockets ...
  10. [10]
    Detailed Specifications of the "Skylake-SP" Intel Xeon Processor ...
    Jul 11, 2017 · With the “Skylake-SP” architecture, Intel has replaced the older QPI interconnect with UPI. The throughput per link increases from 9.6GT/s ...Missing: timeline | Show results with:timeline
  11. [11]
    Intel Xeon Scalable Processor Family Platform Level Overview
    Jul 11, 2017 · The new Intel Xeon Scalable CPUs support 2 UPI or 3 UPI links per CPU. Intel can support a lower performance, lower cost 4S design (Intel Xeon ...
  12. [12]
    Intel Unveils Xeon Scalable Processors - HPCwire
    Jul 11, 2017 · The new Intel Xeon Scalable processor also delivers a 3.1x performance improvement generation-over-generation in cryptography performance. Intel ...
  13. [13]
    Intel launches Scalable processors for data centers - BetaNews
    Jul 12, 2017 · Offering up to 28 CPU cores and six terabytes of system memory, the new Xeon Scalable platform provides up to 1.65x performance increase versus ...
  14. [14]
    Cascade Lake Processors - HECC Knowledge Base
    May 13, 2021 · The Cascade Lake processors support up to three UPI links. The configuration at NAS has two links enabled. The UPI runs at a speed of 10.4 ...
  15. [15]
    Cascade Lake: Overview - Intel
    Learn about 2nd Generation Intel® Xeon® Scalable Processors with Intel® C620 Series Chipsets, formerly Cascade Lake (Purley Refresh).
  16. [16]
    Detailed Specifications of the "Cascade Lake SP" Intel Xeon ...
    Apr 2, 2019 · This article provides in-depth discussion and analysis of the 14nm Intel Xeon Processor Scalable Family (formerly codenamed “Cascade Lake-SP” or ...
  17. [17]
    Intel 3rd Gen Xeon Scalable Launched: 10nm Ice Lake-SP To ...
    Rating 4.0 Apr 6, 2021 · 3rd Gen Intel Xeon Scalable processors feature up to 3 UPI links between processors, at speeds up to 11.2 GT/s (vs. 10.4 GT/s in 2nd Gen Xeons) ...
  18. [18]
    Detailed Specifications of the "Ice Lake SP" Intel Xeon Processor ...
    Apr 6, 2021 · This article provides in-depth discussion and analysis of the 10nm Intel Xeon Processor Scalable Family (formerly codenamed “Ice Lake-SP” or “Ice Lake Scalable ...
  19. [19]
    Intel Xeon Processor Ice Lake - Hyperscale
    In stockNov 29, 2024 · Up to six Intel UltraPath Interconnect (Intel UPI) channels increase platform scalability and improve inter-CPU bandwidth for I/O-intensive ...
  20. [20]
    Intel Launches 3rd Gen Ice Lake Xeon Scalable - WikiChip Fuse
    Apr 6, 2021 · Ice Lake bumps this to 204.8 GB/s, a 1.6x improvement. Intel kept the number of UPI links the same as Skylake, however, they have increased the ...
  21. [21]
    Intel 4th Gen Xeon CPUs Official: Sapphire Rapids With Up To 60 ...
    Jan 10, 2023 · There's also an improved multi-socket scaling via Intel UPI, delivering up to 4 x24 UPI links at 16 GT/s and a new 8S-4UPI performance-optimized ...
  22. [22]
    4th Gen Intel Xeon Processor Scalable Family, sapphire rapids
    Jul 25, 2022 · Intel® UPI Interconnect Speed, Up to 11.2 GT/s, Up to 10.4 GT/s, Up to 16 GT/s. PCIe. Up to 64 lanes per CPU (bifurcation support: x16, x8, x4) ...
  23. [23]
    Intel Details Sapphire Rapids Xeon at Architecture Day 2021
    Aug 19, 2021 · We also get UPI 2.0 with up to 4×24 UPI links at 16GT/s. This is important. Intel is likely assuming that AMD will continue scaling Infinity ...
  24. [24]
    5th Gen Intel Xeon Processors Emerald Rapids Resets Servers by ...
    Dec 14, 2023 · The 5th Gen Intel Xeon Scalable codenamed “Emerald Rapids” adds more cores, lots more cache, a handful of new features, and new underlying chip architecture.
  25. [25]
    Intel 'Emerald Rapids' 5th-Gen Xeon Platinum 8592+ Review
    Rating 4.0 · Review by Paul AlcornDec 14, 2023 · UPI speed increased from 16 GT/s to 20 GT/s. Intel Emerald Rapids 5th-Gen Xeon Architecture. Image 1 of 7.
  26. [26]
    2.3 Emerald Rapids: 5th-Generation Intel® Xeon® Scalable ...
    ... UPI lanes composed of 2 die (Fig. 2.3.1) in a multichip package. This generation delivers an 18% performance improvement for general integer compute ...
  27. [27]
    Intel Shoots “Granite Rapids” Xeon 6 Into The Datacenter
    Sep 24, 2024 · The combination of Granite Rapids plus the “Sierra Forest” Xeon 6 chips announced in June of this year will help Intel slow the CPU market share losses in the ...
  28. [28]
    [PDF] Intel® Xeon® 6+ Processors
    12ch DDR5 8000MT/s. Platform. Compute & memory. Compute & memory. Intel® UPI up to 6 UPI 2.0 (up to 24 GT/s per lane). PCI Express up to 96 lanes PCIe 5.0 (x16 ...
  29. [29]
    Intel Launches New Xeon 6 Processors Optimized For AI At The Edge
    Feb 24, 2025 · The bulk of Intel's 'Granite Rapids' family of Xeon 6 processors has arrived, with core counts ranging from 4 to 86.
  30. [30]
    Intel Hot Chips Reveal: A Deeper Look Into Xeon In 2024
    Intel positions itself as already owning the performance crown in AI with Sapphire Rapids, and only creating more separation with Granite Rapids. Granite ...
  31. [31]
    A closer look at two newly announced Intel chips | Network World
    Aug 26, 2021 · Sapphire Rapids also offers Intel Ultra Path Interconnect 2.0 (UPI), a CPU interconnect used for multi-socket communication. UPI 2.0 features ...
  32. [32]
    [PDF] Intel® Xeon® Processor Scalable Memory Family Uncore ...
    The number of Intel UPI Links varies according to the specific version of the product. ... Second) - Max Intel UPI Bandwidth is 2 *. ROUND (Intel UPI Speed , 0).
  33. [33]
    [PDF] 4th Gen Intel® Xeon® Processor Scalable Family, Codename ...
    Aug 1, 2024 · Intel® Ultra Path. Interconnect. (Intel® UPI2.0). A cache-coherent, link-based Interconnect specification for Intel® processors. Also known as ...
  34. [34]
    [PDF] An Introduction to the Intel QuickPath Interconnect
    It has a snoop protocol optimized for low latency and high scalability, as well as packet and lane structures enabling quick completions of transactions.
  35. [35]
    None
    Summary of each segment:
  36. [36]
    4th Gen Intel Xeon Scalable Sapphire Rapids Leaps Forward
    Jan 10, 2023 · The XCC is something new. It integrates four compute tiles. It then uses EMIB to extend the on-tile mesh to adjacent tiles.
  37. [37]
    None
    ### Summary of UPI Integration in Sapphire Rapids
  38. [38]
    Intel® Data Direct I/O Technology Performance Monitoring
    Aug 2, 2024 · UPI: Intel Ultra Path Interconnect, a high-speed interconnect used in multi-socket server configurations to connect multiple sockets together.<|control11|><|separator|>
  39. [39]
    Intel Unveils Future-Generation Xeon with Robust Performance and ...
    Aug 28, 2023 · Intel Xeon processors with P-cores (Granite Rapids) are optimized to deliver the lowest total cost of ownership (TCO) for high-core performance- ...
  40. [40]
    Performance Characteristics of Common Transports and Buses
    Jul 19, 2013 · The values listed below describe a single QPI/UPI link on an Intel Xeon processor. ... 10.4 GT/s, 20.8 GB/s. AMD Infinity Fabric. The values ...
  41. [41]
    [PDF] Intel® Xeon® 6 Product Brief
    Intel® Ultra Path Interconnect (Intel® UPI) 2.0 provides up to 24 gigatransfers per second (GT/s) of inter- socket bandwidth—a 20 percent increase over the.
  42. [42]
    Latency of memory accesses via interconnectors - Server Fault
    May 5, 2021 · The latency of memory access connected to other CPU's socket which is connected by UPI to the CPU's socket is about 140ns (so one "hop" of ...Missing: inter- | Show results with:inter-
  43. [43]
    AMD EPYC Infinity Fabric Latency DDR4 2400 v 2666: A Snapshot
    Jul 24, 2017 · As you can see, the Intel inter-socket latency is roughly equivalent to the intra-socket latency for AMD EPYC Infinity Fabric. Intel is still ...
  44. [44]
    Intel Ultra Path Interconnect - Wikipedia
    The Intel Ultra Path Interconnect (UPI) is a scalable processor interconnect developed by Intel which replaced the Intel QuickPath Interconnect (QPI) in Xeon ...
  45. [45]
    Intel Rounds Out “Granite Rapids” Xeon 6 With A Slew Of Chips
    Feb 24, 2025 · Granted, these UPI links run at 24 GT/sec, which is amazingly fast. But the Xeon 6900P, which only scales to only two processors in a single ...
  46. [46]
    [PDF] Intel® Xeon® CPU Max Series Configuration and Tuning Guide
    The Quadrant mode results in two NUMA nodes (one node for each socket), while the SNC4 mode results in eight NUMA nodes (four per socket). Each NUMA node ...Missing: mitigations | Show results with:mitigations
  47. [47]
  48. [48]
    [PDF] PowerEdge Servers and 2nd Generation Intel Xeon Scalable ... - Dell
    PowerEdge Servers and 2nd Generation Intel® Xeon® Scalable. Processors ... • 3 UPI links @ 10.4 GT/s. • Up to 3.80 GHz (4 cores). • 6-ch DDR4 @ 2933 MT/s ...
  49. [49]
  50. [50]
    [PDF] Dell EMC PowerEdge R740 and 2nd Generation Intel Xeon ...
    We found that for Oracle Database performance, the Dell EMC PowerEdge R740 with 2nd Generation Intel Xeon Scalable processors handled over 2.6 times the TPM and ...Missing: UPI | Show results with:UPI
  51. [51]
    Tuning UEFI Settings for Performance and Energy Efficiency on 4th ...
    Nov 16, 2023 · When the Intel UPI link is the bottleneck, these remote reads stay in the TOR for a long time. When one socket's TOR is saturated with ...Missing: protocol + | Show results with:protocol +
  52. [52]
    June 2025 - TOP500
    The 65th edition of the TOP500 showed that the El Capitan system retains the No. 1 position. With El Capitan, Frontier, and Aurora, there are now 3 Exascale ...Missing: UPI | Show results with:UPI
  53. [53]
    [PDF] System Event Log (SEL) - Troubleshooting Guide - Intel
    BIOS POST has reduced the Intel® UPI Link Width because of an error condition seen during initialization. Table 61. Intel® UPI Link Width Reduced Sensor Typical ...
  54. [54]
    H3C G6 Servers Intel Platform RAS Technology White Paper-6W103
    Intel UPI Dynamic Link Width Reduction—Dynamically adjusts lane width to recover from hard failures occurred on one or multiple data lanes of an Intel UPI link.
  55. [55]
    [PDF] BIOS Setup Utility User Guide for the Intel® Server Boards D50TNP ...
    Jan 7, 2025 · The major difference is that SNC LLC is unified, and COD LLC is separated. SNC (Sub NUMA) enables the two-cluster SNC. Two-way interleave of ...<|separator|>
  56. [56]
    [PDF] Intel® Server Board S2600 Family - BIOS Setup User Guide
    Nov 1, 2019 · Intel technologies' features and benefits depend on system configuration and may require enabled hardware, software, or service activation.Missing: backward | Show results with:backward
  57. [57]
    Where to Find the Intel® Ultra Path Interconnect (Intel® UPI) Speed...
    The Intel® UPI Speed of Intel® Xeon® Scalable Processors is specified in ARK. Note, This information can also be found in the related Datasheet documents for ...
  58. [58]
    [PDF] High Performance Computing (HPC) Tuning Guide for AMD EPYC ...
    This provides the lowest memory latency. However, maximum memory bandwidth is still achieved using DDR4-3200 R2 DIMMs.
  59. [59]
    UPI Round Trip Latency (60-100ns) vs PCIe Gen ... - Real World Tech
    Apr 24, 2024 · Comparing the numbers for local socket versus far socket shows that the Ultra Path Interconnect (UPI) adds 60ns to 100ns of round trip ...Missing: bandwidth | Show results with:bandwidth
  60. [60]
    Pushing AMD's Infinity Fabric to its Limits - Chips and Cheese
    Nov 25, 2024 · A CCD's IFOP link provides 32 bytes per cycle of read bandwidth and 16 bytes per cycle of write bandwidth, at the Infinity Fabric clock (FCLK).
  61. [61]
    NVLink & NVSwitch: Fastest HPC Data Center Platform | NVIDIA
    The rack switch is designed to provide high bandwidth and low latency in NVIDIA GB300 NVL72 systems supporting external fifth-generation NVLink connectivity.
  62. [62]
    NVIDIA GB200 NVL72 Delivers Trillion-Parameter LLM Training and ...
    Mar 18, 2024 · The NVLink GPU-to-GPU bandwidth is 1.8 TB/s, which is 14x the bandwidth of PCIe. The fifth-generation NVLink is 12x faster than the first ...
  63. [63]
    NVLink-C2C | Chip Interconnect Technology - NVIDIA
    NVLink-C2C Benefits. High Bandwidth. Supports high-bandwidth, coherent data transfers between processors and accelerators. Low Latency.
  64. [64]
    Skylake Processors - HECC Knowledge Base
    May 13, 2021 · The UPI runs at a speed of 10.4 gigatransfers per second (GT/s). Each link contains separate lanes for the two directions. The total full-duplex ...
  65. [65]
    The Heart Of AMD's Epyc Comeback Is Infinity Fabric
    Jul 12, 2017 · AMD measures the cross-sectional bandwidth of Infinity Fabric across the Epyc MCM as four times that, or 42.6 GB/sec, but we think the more ...<|separator|>