Fact-checked by Grok 2 weeks ago

Intel QuickPath Interconnect

The Intel® QuickPath Interconnect (QPI) is a high-speed, packetized, point-to-point interconnect architecture developed by Intel Corporation to enable efficient communication between processors, I/O hubs, and other components in multi-socket systems.^[1] It replaced the front-side bus (FSB) architecture, providing significantly higher bandwidth—up to 25.6 GB/s per bidirectional link at 6.4 GT/s—and lower latency through direct cache-to-cache transfers in a distributed shared memory model.^[2] Introduced in 2008 alongside the 45 nm Nehalem microarchitecture, QPI debuted in products such as the Intel® Core™ i7-900 series desktop processors and the Intel® Xeon® 5500 series server processors, marking a shift to integrated on-die memory controllers and scalable multi-core designs.^[2] The architecture features a five-layer protocol stack—physical, link, routing, transport, and protocol—supporting the MESIF (Modified, Exclusive, Shared, Invalid, Forward) cache coherency protocol with optimized snoop behaviors for low-latency source snooping and high-scalability home snooping.^[2] Link speeds evolved from initial 4.8 GT/s to 8.0 GT/s in later implementations, using differential signaling over 20 lanes per direction with forward error correction and CRC for reliability.^[2]^[3] QPI incorporated robust reliability, availability, and serviceability (RAS) features, including link-level retry, self-healing capabilities, and clock failover, making it suitable for enterprise servers and high-performance computing.^[2] It powered multiple generations of Intel processors, including Westmere, Sandy Bridge-EP, Ivy Bridge-EP, Haswell-EP, and Broadwell-EP Xeon families, enabling up to four sockets in scalable configurations.^[2]^[4] QPI was eventually succeeded by the Intel® Ultra Path Interconnect (UPI) starting with the Skylake-SP-based first-generation Intel® Xeon® Scalable processors in Q3 2017, which offered improved power efficiency and flexibility while maintaining backward compatibility in some aspects.^[4]^[5]

Overview

Definition and Purpose

The Intel QuickPath Interconnect (QPI) is a high-speed, packetized, point-to-point interconnect developed by Intel to facilitate data transfer between processors, I/O hubs, and memory controllers.^[1] It employs a cache-coherent protocol to ensure data consistency across multiple processing units, enabling efficient high-bandwidth and low-latency communication in multi-processor environments.^[6] QPI's primary purpose is to replace the traditional front-side bus (FSB), which suffered from bottlenecks in shared-memory systems due to its multi-drop architecture and limited scalability.^[6] By shifting to a distributed shared memory model with integrated memory controllers per processor, QPI eliminates these constraints, supporting scalable multi-socket configurations initially up to four sockets and expanding to higher counts in subsequent generations.^[7] This design enhances overall system performance in demanding workloads by reducing contention and improving bandwidth allocation.^[1] At its core, QPI relies on differential signaling transmitted over serial lanes, allowing for compact, high-speed connections with minimal pin count.^[7] It operates in full-duplex mode, with unidirectional links forming bidirectional pairs for simultaneous data flow in both directions.^[7] The interconnect supports both coherent transactions, such as those maintaining cache consistency via snoop protocols, and non-coherent transactions for I/O operations, providing flexibility for diverse system requirements.^[1] QPI was initially targeted at server and high-end computing platforms, debuting in 2008 with Nehalem-based Intel Xeon processors, Tukwila-based Itanium processors, and select high-end desktop systems.^[6] These implementations focused on mission-critical environments, where QPI's architecture supported robust error detection and recovery to ensure reliability.^[7]

Key Characteristics

The Intel QuickPath Interconnect (QPI) employs a flexible lane structure consisting of 20-lane full-width links, with 10 lanes dedicated to transmission and 10 to reception, enabling high-bandwidth point-to-point communication between processors.^[2] This design supports configurable widths, including half-width (10 lanes) and quarter-width (5 lanes), to accommodate varying system requirements and optimize resource allocation in multi-socket configurations.^[2] QPI uses differential signaling with a forwarded clock for reliable data transmission across the lanes.^[2] Error correction is integrated through cyclic redundancy check (CRC) mechanisms, featuring an 8-bit CRC per 80-bit flit and an optional 16-bit rolling CRC for enhanced reliability in high-speed environments.^[2] Additionally, QPI provides full hardware cache coherence via the MESIF protocol, incorporating a directory-based approach for home snooping in multi-node systems to ensure scalability while minimizing latency in shared memory operations.^[2] Power efficiency is addressed through multiple link states, including L0 for active full-performance operation, L0s for low-power idle that halts data transmission while preserving quick reactivation, and L1 for deeper sleep that powers down most circuitry to minimize consumption during prolonged inactivity.^[8] Later generations of QPI, such as version 1.1, incorporate backward compatibility features that permit mixed-speed links, allowing integration with prior QPI 1.0 implementations without requiring uniform clock rates across all nodes.^[8]

History and Development

Introduction and Timeline

The Intel QuickPath Interconnect (QPI) was announced by Intel on September 18, 2007, during a presentation on the upcoming Nehalem microarchitecture, representing a fundamental shift from the shared front-side bus (FSB) architecture to a packetized, point-to-point interconnect combined with integrated memory controllers on the processor die.^[9] This evolution addressed the limitations of the FSB in supporting increasing core counts and data throughput, enabling more efficient distributed shared-memory systems.^[2] The development was motivated by the escalating bandwidth requirements of multi-core processors in server environments, driven by expanding data center workloads and the need for scalable, low-latency inter-processor communication.^[6] QPI first entered production with the Nehalem-based Intel Core i7 processors and X58 chipset in November 2008, marking its debut in desktop and entry-level server platforms.^[10] In the server segment, it launched alongside the Intel Xeon 5500 series (Nehalem-EP) and the Intel 5520 I/O Hub (IOH) in March 2009, providing enhanced connectivity for I/O subsystems in multi-socket configurations.^[11] These initial implementations established QPI 1.0 as a core component of Intel's high-performance computing strategy. Subsequent milestones included its expansion to the Itanium processor line with the 9300 series (Tukwila) on February 8, 2010, broadening QPI's application to enterprise-class systems requiring robust reliability and scalability. By 2012, QPI saw wider integration in Xeon platforms, with version 1.1 introduced in the Sandy Bridge-EP architecture (Xeon E5-2600 series) launched on March 6, 2012, facilitating improved coherence and support for larger node counts in data centers. This timeline underscored QPI's role in enabling the transition to more interconnected, multi-socket server designs amid rising computational demands.

Versions and Generations

The Intel QuickPath Interconnect (QPI) evolved through three primary generations, each tied to advancements in Intel's server processor architectures, with progressive improvements in data rates, reliability, and power efficiency.^[12] The first generation, QPI 1.0 (Gen 1), was introduced in November 2008 alongside the Nehalem-based Xeon 5500 series processors, operating at speeds of 4.8 GT/s or 6.4 GT/s to enable point-to-point connections in multi-socket systems.^[1] This version featured basic packetization for coherent data transfers, supporting up to 25.6 GB/s aggregate bandwidth per link pair in full-width configurations, and was also used in the Westmere processors (2009–2010) and the Tukwila Itanium family (2010). QPI 1.1 (Gen 2), released in 2012 with the Sandy Bridge-EP architecture in the Xeon E5-2600 series, increased maximum speeds to 8.0 GT/s while maintaining backward compatibility with Gen 1 systems. Key enhancements included faster link training sequences and improved link management for better reliability in dense server environments, allowing for up to 32 GB/s per link pair.^[12] This generation extended support to Ivy Bridge-EP (Xeon E5-2600 v2, 2013), focusing on unified interconnect protocols across x86 and Itanium platforms.^[13] The final major iteration, QPI 2.0 (Gen 3), debuted in 2014 with the Haswell-EP-based Xeon E5-2600 v3 processors, achieving up to 9.6 GT/s for enhanced bandwidth of approximately 38.4 GB/s per link pair.^[14] It introduced optimizations for error handling, such as advanced cyclic redundancy checks and power-efficient states, alongside support for higher socket densities in enterprise configurations.^[15] This version carried over to the Broadwell-EP Xeon E5-2600 v4 series in 2016, marking the last significant deployment of QPI before its phase-out.^[16] Overall, Gen 1 provided foundational packet-based communication, Gen 2 refined management and compatibility, and Gen 3 emphasized scalability for high-performance computing.^[12] QPI reached end-of-life with the introduction of Intel Ultra Path Interconnect (UPI) in the Skylake-SP Xeon Scalable processors in 2017.

Technical Architecture

Physical Layer

The physical layer of the Intel QuickPath Interconnect (QPI) manages the electrical and signaling aspects of data transmission between processors, utilizing differential current-mode logic for reliable high-speed communication.^[14] It supports bit rates up to 9.6 GT/s in later implementations, enabling aggregate bandwidths of up to 38.4 GB/s per full-duplex link pair at maximum speed.^[14]^[17] The layer employs DC-coupled differential signaling with opposite-polarity pairs (DP and DN) for both data and clock, ensuring robust transmission over printed circuit board traces.^[2] The pinout for each QPI port includes 20 differential data lanes (QPI_DTX_DN/DP[19:0] for transmit and QPI_DRX_DN/DP[19:0] for receive) plus one differential forwarded clock lane per direction (QPI_CLKTX_DN/DP and QPI_CLKRX_DN/DP), totaling 84 signals per port to form a full-width link pair.^[2] This 40-pin differential interface per unidirectional link supports point-to-point connections compatible with daisy-chain or mesh topologies for systems up to four nodes.^[7] Routing lengths of 14 to 24 inches (approximately 0.35 to 0.6 meters) are supported with 0 to 2 connectors, using low-loss materials to minimize attenuation.^[2] Clocking operates in a source-synchronous manner, where the transmitter supplies a differential forwarded clock at half the data rate (e.g., 4.8 GHz for 9.6 GT/s data), avoiding separate global clock lines and reducing skew.^[2] This approach includes clock fail-over mechanisms, allowing the clock to be remapped to a data lane if needed for reliability.^[2] No encoding scheme like 8b/10b is used; instead, raw differential data transmission relies on the forwarded clock for synchronization, with double data rate (DDR) operation in all generations doubling the effective throughput relative to the clock frequency.^[2] Link training and initialization sequences establish reliable communication by performing lane and polarity reversal, deskew across lanes, and adaptive waveform equalization at the transmitter to open the data eye at the receiver.^[7] These processes compensate for signal degradation due to frequency-dependent loss and crosstalk over the supported trace lengths, using discrete-time linear equalization with configurable tap coefficients.^[2] Built-in self-test modes, including loopback variants, facilitate probe-less validation without external hardware.^[7]

Protocol Layers

The Intel QuickPath Interconnect (QPI) employs a layered protocol stack consisting of the physical layer for signaling and logical layers—including the link layer for flow control and reliability, the routing layer for path determination, an optional transport layer for end-to-end features, and the protocol layer for transaction handling—that together ensure reliable, ordered delivery of packets while supporting cache coherence in multi-socket systems.^[2]^[18] The Link Layer oversees flow control and reliability mechanisms to prevent buffer overflows and recover from transmission errors. It employs a credit-based flow control system, where receiving agents return credits to indicate available buffer space in specific virtual channels, allowing transmitting agents to proceed only when sufficient credits are held. This layer supports virtual channels dedicated to traffic types such as coherent requests, home agent operations, I/O transactions, and snoop responses, with configurations typically featuring 4 to 6 channels per link to enable prioritization—for instance, elevating coherence traffic over non-coherent I/O—and to avoid deadlocks via separation of resource dependencies. For error handling, the Link Layer performs cyclic redundancy check (CRC) validation on each transmitted unit, triggering link-level retries and acknowledgments to retransmit corrupted data without higher-layer involvement.^[2]^[18] The Protocol Layer defines the rules for transaction initiation, processing, and completion, encapsulating operations into packets that facilitate coherent and non-coherent communication. It handles diverse transactions, including memory read/write requests, snoop inquiries to maintain cache consistency, data responses, and coherence protocol messages, all structured as sequences of 80-bit flits—the basic data units for protocol-level transfer—where each flit aligns with underlying phits as physical transmission quanta. Packets are classified into message classes such as snoop (SNP) for coherence probes, home (HOM) for directory-based tracking, data response (DRS) for payload delivery, non-data response (NDR) for acknowledgments, non-coherent standard (NCS) for I/O-like operations, and non-coherent bypass (NCB) for expedited non-cached transfers, with some classes (e.g., HOM) enforcing ordering while others permit unordered delivery to optimize latency.^[2]^[18] Virtual channels in QPI are integral to both layers, organized into up to three virtual networks (VN0 for general traffic, VN1 for snoop, and VNA for adaptive routing) that can yield a maximum of 18 channels across message classes, though practical implementations often use 4 to 6 for deadlock avoidance by isolating traffic flows—such as dedicating channels to coherent versus non-coherent streams—and supporting priority-based scheduling to minimize contention in shared links.^[2] QPI's coherence protocol extends the MOESI (Modified, Owned, Exclusive, Shared, Invalid) model to MESIF, incorporating a Forward (F) state for direct cache-to-cache data transfers that bypass the home agent, thereby reducing latency in two-hop scenarios. It employs directory-based tracking at home agents to monitor cache states across nodes, enabling scalable non-broadcast operation in multi-socket configurations; this supports source snooping for low-latency access in small systems and home snooping with directory intervention for larger setups, ensuring consistency without flooding the interconnect.^[2]^[18]

Specifications

Bandwidth and Frequencies

The Intel QuickPath Interconnect (QPI) operates at specific transfer rates, known as gigatransfers per second (GT/s), which vary by generation to balance performance and power efficiency in multi-socket processor configurations. First-generation QPI (Gen 1), introduced with Nehalem-based processors, supports frequencies of 4.8 GT/s, 5.86 GT/s, and 6.4 GT/s per link.^[19] Second-generation QPI (Gen 2), used in Sandy Bridge architectures, extends these options to 6.4 GT/s, 7.2 GT/s, and 8.0 GT/s.^[20] Third-generation QPI (Gen 3), implemented in Haswell-EP processors, provides the highest rates at 8.0 GT/s, 8.8 GT/s, and 9.6 GT/s, enabling greater scalability for demanding workloads.^[21]

Generation	Supported Frequencies (GT/s)
Gen 1	4.8, 5.86, 6.4
Gen 2	6.4, 7.2, 8.0
Gen 3	8.0, 8.8, 9.6

Bandwidth for QPI is determined by the transfer rate, effective data width of 16 bits per transfer across 20 lanes, and directionality, with calculations yielding peak theoretical values before overhead. At 6.4 GT/s in Gen 1, a full bidirectional link achieves 25.6 GB/s aggregate throughput (12.8 GB/s unidirectional), derived from 6.4 GT/s × 2 bytes per transfer.^[2] This scales proportionally with frequency: Gen 2 at 8.0 GT/s delivers up to 32 GB/s bidirectional, while Gen 3 at 9.6 GT/s reaches 38.4 GB/s bidirectional. These figures represent raw link capacity, with actual sustained throughput influenced by system configuration. Overhead factors impact effective bandwidth, particularly in earlier generations. Gen 1 incurs approximately 11% loss from protocol elements like CRC and flit headers, without dedicated line encoding.^[2] Later generations mitigate this through optimized designs, enhancing signal integrity at higher frequencies.^[14] In multi-socket systems, aggregate bandwidth scales with the number of links per processor, typically up to three full-width QPI ports in high-end configurations like the Xeon E7 family. This enables a total of 76.8 GB/s bidirectional bandwidth per CPU in a 2-socket system (3 links × 25.6 GB/s at Gen 1 rates), providing substantial inter-processor communication capacity for parallel computing applications.^[22]

Topology and Routing

The Intel QuickPath Interconnect (QPI) supports point-to-point topologies that enable direct connections between processors in multi-socket configurations, forming a mesh-like structure for systems with 2 to 4 sockets where each processor links to the others via dedicated QPI ports.^[2]^[7] In larger configurations, scalability extends through I/O Hubs (IOHs) or third-party node controllers, which bridge additional sockets without native QPI support for more than 4 sockets directly; for example, up to 8 sockets can be achieved in server platforms using IOH bridging to maintain coherent communication across the system.^[23]^[24] These IOHs connect via QPI links to processors and other IOHs, supporting configurations such as single or dual IOH setups for peer-to-peer transactions and interrupt forwarding.^[24] Routing in QPI is primarily deterministic, relying on programmable routing tables indexed by Node IDs to map destinations to specific QPI ports, ensuring predictable packet forwarding based on system configuration and firmware-defined paths.^[2]^[7] These tables allow flexibility for resource partitioning and reliability features, with source-based routing using system addresses and destination Node IDs to direct traffic without store-and-forward delays in route-through scenarios.^[7] Adaptive routing options are possible through the extensible transport layer, though not implemented in initial products, and congestion is managed via credit-based flow control at the link layer, where receivers advertise available buffer space to prevent overflows.^[2]^[25] Scalability in QPI systems is constrained by increasing latency with additional hops; one-hop remote memory latency is approximately 40% higher than local memory latency, leading to higher remote access times in multi-socket setups beyond 2 sockets.^[26] Configurations up to 8 sockets via IOH bridging maintain performance for NUMA-aware applications, but larger scales (e.g., up to 64 or 256 sockets) require external node controllers to extend the topology while preserving coherence.^[7]^[23]^[24] Fault tolerance is enhanced by features such as link failover, which allows rerouting around failed links using the programmable tables, and lane reversal at the physical layer to accommodate cabling issues without hardware changes.^[2]^[7] Additional redundancy includes self-healing links that reduce width (e.g., to half or quarter) upon detecting errors via CRC checks and retries, along with clock failover that repurposes a data lane for clock signals to sustain operation.^[2] These mechanisms ensure high availability in server environments, with faulty lanes or groups disabled to redistribute traffic dynamically.^[23]^[24]

Implementations

Processor Integration

The Intel QuickPath Interconnect (QPI) was first integrated directly into the processor die with the Nehalem microarchitecture in 2008, marking a shift from external front-side bus designs to on-package interconnects for improved latency and bandwidth in multi-socket systems. The QPI controller resides in the uncore domain of the processor, a dedicated region separate from the CPU cores that also houses the last-level cache and other system logic. This integration allows for point-to-point communication between processors, replacing the shared bus architecture and enabling scalable non-uniform memory access (NUMA) topologies. Paired with an on-die integrated memory controller (IMC) supporting three DDR3 channels per socket, the QPI facilitates efficient data transfer between local memory and remote sockets, with the global queue in the uncore managing coherence and routing.^[27]^[28] In Nehalem and subsequent Westmere processors (2008-2010), each die typically featured 2 to 3 QPI links, depending on the variant—such as 2 links in standard Xeon 5500 series for dual-socket setups and up to 3 in higher-end configurations like Nehalem-EX for larger clusters. These links operated at up to 6.4 GT/s, each providing 25.6 GB/s bidirectional bandwidth, for an aggregate of up to 76.8 GB/s with three links per socket, with the uncore's home agent handling snoop requests and directory-based coherence to maintain data consistency across sockets. The tight coupling with the IMC, which supports up to 32 GB/s of DDR3 memory bandwidth, minimized off-die traffic and enhanced overall system throughput in enterprise workloads.^[27]^[29] The Sandy Bridge-EP and Ivy Bridge-EP microarchitectures (2012-2013), powering the Xeon E5-2600 series, advanced QPI integration by introducing version 1.1 with support for higher speeds of up to 8.0 GT/s. The QPI agents were connected via an on-die bidirectional ring bus that links the cores, last-level cache slices, memory controller, and QPI interfaces for low-latency intra-socket communication before data is serialized for inter-socket transmission. This ring bus design, operating at core frequency ratios, optimized traffic flow to the two QPI links per die, enabling efficient scaling in enterprise environments up to 8 sockets via the E7 family platforms. The enhanced protocol in QPI 1.1 improved error correction and power management, reducing latency for cache-coherent requests in NUMA-aware applications.^[12]^[30]^[31] Subsequent Haswell-EP and Broadwell-EP generations (2014-2016) in the Xeon E5 v3 and v4 families refined QPI integration with speeds reaching 9.6 GT/s on two links per die, emphasizing enhanced NUMA awareness through improved directory caching and snoop filtering in the uncore. These processors, with up to 18 cores in Haswell-EP and 22 in Broadwell-EP, supported configurations scaling to up to 192 cores in eight-socket E7 v4 systems by leveraging QPI's point-to-point topology for balanced remote access latencies. The uncore's advanced queuing and the ring bus evolution ensured that inter-socket bandwidth kept pace with on-die demands, particularly in high-core-count enterprise servers where NUMA optimizations like page migration and affinity scheduling could exploit QPI's coherence mechanisms.^[32]^[33] For the Itanium processor family, QPI integration debuted in the Tukwila generation (2010) with two dedicated QPI links per die. This setup, combined with on-die memory controllers, enabled Tukwila's quad-core design to scale in mission-critical enterprise systems, where QPI handled cache coherence for explicitly parallel instruction computing (EPIC) workloads across sockets. The links operated at 4.8 GT/s, providing foundational interconnect support for Itanium's high-reliability features like error detection and recovery.^[34]^[35]

System-Level Usage

The Intel 5520 and 5500 chipsets, introduced in 2008, operate as Input/Output Hubs (IOH) that connect directly to processors through QPI links, functioning as bridges to PCI Express interfaces and legacy I/O components via the Enterprise South Bridge Interface (ESI). These chipsets provide up to 36 lanes of PCI Express 2.0 connectivity, supporting configurations such as two x16 ports for high-bandwidth peripherals like GPUs and storage controllers, while the ESI link to the ICH9 or ICH10 southbridge handles traditional I/O such as USB, SATA, and interrupt routing.^[36]^[37] Motherboards for LGA 1366 and LGA 2011 sockets include specialized traces to route QPI signals between the processor and IOH or across sockets in multi-processor setups, with active cooling often required for the IOH to dissipate heat generated by the high-frequency links. In dual-socket LGA 1366 designs, such as those supporting Xeon 5500 series processors, the IOH's active heatsink assembly draws air through dedicated intakes to ensure reliable operation under load.^[38]^[36] Multi-socket systems leverage QPI to implement Non-Uniform Memory Access (NUMA) topologies, assigning each socket its own local memory domain while QPI handles coherent inter-domain data transfers for scalable parallelism. Servers like the Dell PowerEdge R710 and HP ProLiant DL380 G6, equipped with dual Nehalem or Westmere Xeon processors, rely on QPI for this domain management, optimizing workload distribution across sockets in enterprise environments.^[39]^[40]

Transition and Legacy

Replacement by Ultra Path Interconnect

Intel introduced the Ultra Path Interconnect (UPI) in 2017 alongside the Skylake-SP-based Xeon Scalable processor family, marking a full replacement for the QuickPath Interconnect (QPI) in server platforms.^[4]^[41] This shift was driven by the need to support higher core counts, increased memory and I/O bandwidth demands, and improved scalability in multi-processor systems.^[4] UPI debuted on the Purley platform, enabling more efficient inter-socket communication while preparing the architecture for on-die mesh topologies that replaced the prior ring-based designs.^[4] The transition timeline saw Broadwell-EP (Xeon E5-2600 v4) in 2016 as the final platform relying on QPI, with UPI becoming the standard thereafter.^[42] While UPI maintains backward compatibility at the protocol level—retaining a directory-based home snoop coherency mechanism similar to QPI—the physical layer is incompatible, requiring new hardware implementations.^[4]) Key technical differences include UPI's support for 2 or 3 links per processor operating at an initial speed of 10.4 GT/s, compared to QPI's configurations.^[4]^[43] UPI employs a serdes-based design optimized for shorter on-package links, which reduces power consumption through features like the L0p low-power idle state and improved packetization formats that eliminate QPI's preallocation limitations.^[4]^[44] This results in up to 50% lower idle power and 4-21% better data efficiency over QPI.^[44] Motivations for the replacement included improved power efficiency, support for higher bandwidth and core counts, and better scalability with the introduction of on-die mesh topologies.^[4]^[41]

Performance Impact and Applications

The introduction of Intel QuickPath Interconnect (QPI) significantly enhanced system performance by providing point-to-point bandwidth that addressed the bottlenecks of the shared front-side bus (FSB), particularly in multi-socket configurations. In multi-threaded workloads, QPI delivered up to twice the effective bandwidth compared to FSB equivalents, enabling more efficient data sharing between processors. For instance, in two-socket Nehalem-based systems, overall performance in SPECint_rate_base2006 benchmarks reached approximately 1.7 to 1.8 times that of prior Harpertown FSB systems, with uplifts of 20-30% attributable to improved interconnect efficiency in integer-heavy tasks.^[45]^[6] QPI's low-latency design, with remote memory access latencies around 100-130 nanoseconds, facilitated scalable Non-Uniform Memory Access (NUMA) architectures, allowing systems to handle larger memory pools without proportional performance degradation. This capability was crucial for enabling the growth of cloud infrastructure during the early 2010s, as it supported efficient resource pooling in multi-processor environments.^[46] From 2008 to 2016, QPI dominated data center deployments, powering high-performance computing (HPC) applications such as financial modeling and oil exploration simulations, virtualization platforms like VMware ESXi, and database servers requiring high-throughput inter-processor communication.^[6] Its integration in Xeon processors made it a staple for enterprise workloads, with legacy support extending to embedded systems after 2017 for specialized, cost-sensitive applications. As of 2025, QPI has been phased out in new Intel hardware, replaced by Ultra Path Interconnect (UPI) starting with Skylake-SP processors in 2017, but it remains maintained in older Xeon fleets for upgrades in legacy environments where compatibility and cost outweigh the need for newer interconnects.

References

[1]
An Introduction to the Intel® QuickPath Interconnect
The Intel QuickPath Interconnect is a highspeed, packetized, point-to-point interconnect used in Intel's next generation of microprocessors.
[2]
[PDF] An Introduction to the Intel QuickPath Interconnect
This document is an introduction to the Intel QuickPath Interconnect, but the information is subject to change and not for final design decisions.
[3]
Intel® Xeon® Processor Scalable Family Technical Overview
Jan 12, 2022 · Intel® Ultra Path Interconnect (Intel® UPI). The previous generation of Intel® Xeon® processors utilized Intel QPI, which has been replaced ...
[4]
Intel® Xeon® Gold 6148 Processor
Launch Date. Q3'17. Servicing Status. End of Servicing Updates. End of Servicing Updates Date. Sunday, December 31, 2023. End of Interactive Support Date.Missing: SP | Show results with:SP
[5]
[PDF] Intel® QuickPath Architecture
Intel QuickPath Architecture is a platform with high-speed connections between processors and memory/I/O, using integrated memory controllers and a high-speed ...
[6]
[PDF] QuickPath Interconnect Overview - Hot Chips
Intel's QuickPath Interconnect is a high-performance, low-pin count system interface with efficient scaling, using a link-pair with up to 6.4 GT/s data rate.
[7]
[PDF] Intel® Xeon® Processor E5-2400 Product Family
Intel QuickPath Interconnect States ... A lower power state from L0 that reduces the link from full width to half width.
[8]
Intel Demonstrates Industry's First 32nm Chip and Next-Generation ...
Sep 18, 2007 · 18, 2007 – Intel Corporation President and CEO Paul Otellini ... Intel processor to use the QuickPath Interconnect system architecture.
[9]
Intel names 'Nehalem' launch date - The Register
Nov 6, 2008 · It's official: Intel will launch the Core i7 processor, the first desktop chip to be based on its 'Nehalem' architecture, on 17 November.
[10]
Meet Your New Processor - Intel® Xeon® Processor 5500 Series
SANTA CLARA, Calif., March 30, 2009 – Intel Corporation introduced 17 enterprise-class processors today, led by the Intel® Xeon® processor 5500 series. They ...
[11]
Intel's Quick Path Evolved - Real World Tech
Jul 20, 2011 · QPI was announced in later 2007 and debuted in late 2008, early 2009 across notebooks, desktops and servers. The first generation of processors ...
[12]
[PDF] xeon-e5-v2-datasheet-vol-1.pdf - Intel
Mar 2, 2014 · This is a datasheet for Intel Xeon Processor E5-1600/E5-2600/E5-4600 v2 product families, dated March 2014, reference number 329187-003.
[13]
[PDF] Intel® Xeon® Processor E5-1600, E5-2600, and E5-4600 v3 ...
A Caching Agent is defined per the. RS - Intel® QuickPath Interconnect External Link Specification. DDR4. Fourth generation Double Data Rate SDRAM memory ...
[14]
Haswell Nodes | Node Details | Hardware
Intel QPI 2.0, 2 x 9.6 GT/s · 4 memory channels/CPU · 256 GB RAM (16x16 GB DIMM DDR4 Registered (Buffered)), 1866 MHz) / 128 GB RAM (8x16 GB DIMM DDR4 Registered ...
[15]
Detailed Specifications of the Intel Xeon E5-2600v4 "Broadwell-EP ...
Mar 31, 2016 · This article provides in-depth discussion and analysis of the 14nm Xeon E5-2600v4 series processors (formerly codenamed “Broadwell-EP”).
[16]
[PDF] Intel PC and Chipset Architectures
Comprehensive Intel® QuickPath Interconnect Architecture. Training. Let MindShare Bring “Intel® QuickPath Interconnect (Intel® QPI)” to Life for You. The ...
[17]
[PDF] Intel® Xeon® Processor 7500 Series Datasheet, Volume 2
Mar 2, 2010 · When the Intel QuickPath Interconnect physical layer initialization is complete, the Intel QuickPath Interconnect layer initialization is.<|control11|><|separator|>
[18]
[PDF] Intel(R) Server Board S2600GZGL Technical Product Specification
Intel® QPI Frequency Select. Option Values: Auto Max. 6.4 GT/s. 7.2 GT/s. 8.0 GT/s. Help Text: Allows for selecting the Intel® QuickPath Interconnect Frequency.<|control11|><|separator|>
[19]
Intel® Xeon® Processor E5-2690 v3 (30M Cache, 2.60 GHz)
The Intel Xeon E5-2690 v3 has 12 cores, 24 threads, 2.60 GHz base frequency, 3.50 GHz max turbo, 30MB cache, 22nm lithography, 9.6 GT/s bus, 135W TDP, and 768 ...
[20]
[PDF] Processor E7 v2 2800/4800/8800 Product Family - Intel
... Intel®. QuickPath Interconnects (Intel® QPI) and four Intel® Scalable Memory Interconnect. (Intel® SMI) channels. The Intel® Xeon® E7 v2 product family-based ...
[21]
Intel® QuickPath Interconnect Architectural Features Supporting ...
These Intel micro-architectures provide multiple high-speed (currently up to 25.6 GB/s), point-to-point connections between processors, I/O hubs and third party ...
[22]
None
Below is a merged summary of Intel QPI with IOH Topology and Routing, consolidating all the information from the provided segments into a dense and comprehensive response. To maximize detail retention, I’ve organized key information into tables where appropriate (e.g., supported configurations, routing details, and topology features) and supplemented with narrative text for additional context. The response avoids redundancy while ensuring all unique details are included.
[23]
[PDF] Security Vulnerability in Processor-Interconnect Router Design
Nov 7, 2014 · These nodes are interconnected with a processor-interconnect and some commonly used processor- interconnects include Intel QPI [29] and AMD ...
[24]
[PDF] Intel® Itanium® Quad-Core Architecture for the Enterprise
Same high ILP* core as on the. 9000/9100 series processor. – 6 wide instruction fetch, issue & execution. – 6 integer units, 2 FP units, 4 memory.
[25]
[PDF] Intel® Technology Journal | Volume 14, Issue 3, 2010
platform repartition included introducing the QuickPath Interconnect (Intel®. QuickPath Interconnect (Intel® QPI)) architecture and moving the memory.
[26]
[PDF] Early Performance Evaluation of a Nehalem Cluster Using Scientific ...
Nov 14, 2009 · Uncore has an L3 cache, integrated memory controller, and Quick. Path Interconnect (QPI). The processor has four cores, each with. 64 KB of L1 ...
[27]
[PDF] Nehalem-EX CPU Architecture - Hot Chips
• Intel Architecture capable of QPI connected 8-Sockets / 128 threads. • Scalable systems and >8-socket capability with OEM node controllers. • Twisted ...Missing: IOH | Show results with:IOH
[28]
[PDF] intel-xeon-processor-e5-2600-v2-product-family-technical-overview ...
The E5-2600 V2 has up to 12 cores, 30MB cache, 50% more performance, 25% more performance with less power, and 22nm process tech.
[29]
Sandy Bridge for Servers - Real World Tech
Jul 28, 2011 · Sandy Bridge-EP is fabricated on Intel's 32nm process and is described as a 400mm 2 chip with up to 8 cores, a shared last level cache (LLC), integrated DDR3 ...
[30]
[PDF] Memory Performance of Xeon E5-2600 v4 (Broadwell-EP) based ...
Mar 31, 2016 · Both Haswell-EP and Broadwell-EP support a maximum of 9.6 GT/s here. A further new feature of the memory system concerns the ability to select ...
[31]
Broadwell Brings Xeon E5 A Balanced Performance Bump
Mar 31, 2016 · The fastest versions with the most cores have QPI links that run at 9.6 GT/sec (that's gigatransfers per second), while others with fewer cores ...
[32]
[PDF] A 65 nm 2-Billion Transistor Quad-Core Itanium Processor
Together with the Intel QPI links, FBD2 memory interfaces and system interface logic, Tukwila con- tains more than three times the logic circuits of its ...
[33]
[PDF] Dual-Core Intel Itanium Processor 9100 Series
They also provide excellent per-core performance through multi- threading (two threads per core) and Explicitly Parallel Instruction. Computing (EPIC). EPIC ...Missing: links | Show results with:links
[34]
[PDF] Intel® 5520/5500 Chipset: Datasheet
This is a datasheet for the Intel 5520 and 5500 Chipsets, dated March 2009, order number 321328-001.
[35]
[PDF] I ntel® 5520 Chipset - Hot Chips
• Two QPI links, each connecting to a CPU socket (or IOH). • Differential links ... • Intel® 5520 is first QPI-based chipset with PCIe* 2.0. • Leadership ...
[36]
https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/5520-5500-chipset-ioh-datasheet.pdf
[37]
Nehalem tower servers: Dell, Fujitsu, HP - Computerworld
Dec 2, 2009 · One big improvement over previous-generation Xeons is the addition of an onboard memory controller and Non-Uniform Memory Access (NUMA). Taking ...
[38]
[PDF] Optimal BIOS settings for HPC with Dell PowerEdge 12
This Dell technical white paper analyses the various BIOS options available in Dell PowerEdge 12th generation servers and provides recommendations for High ...
[39]
Intel Xeon Scalable Processor Launch - New Architecture, New ...
Jul 11, 2017 · Replacing QPI is the UPI, Ultra Path Interconnect. With a combination of improved messaging efficiency and higher clock rates, the UPI links ...
[40]
Intel Xeon E5-2600 V4 "Broadwell-EP" Launched - First Benchmarks
Mar 31, 2016 · Welcome to the next generation of Intel's server processors, the aptly named Intel Xeon E5-2600 V4 series formerly code named “Broadwell”.
[41]
Detailed Specifications of the "Skylake-SP" Intel Xeon Processor ...
Jul 11, 2017 · With the “Skylake-SP” architecture, Intel has replaced the older QPI interconnect with UPI. The throughput per link increases from 9.6GT/s ...
[42]
Intel's New Xeon Scalable Processors Are Its Broadest Datacenter ...
Jul 11, 2017 · Intel claims a data efficiency increase of 4% to 21% going from QPI to UPI and a reduction of idle power of up to 50%. This technology also ...
[43]
Nehalem Performance Preview - Page 10 of 12 - Real World Tech
Apr 7, 2009 · Generally speaking, the Nehalem system is about 1.8X faster than the Harpertown based system, although that falls to 1.7X when excluding the ...Missing: socket generation
[44]
NUMA Deep Dive Part 4: Local Memory Optimization
Jul 13, 2016 · Remote memory access experiences the extra latency of multi-hops and the bandwidth constraint of the QPI compared to local memory access.Memory Channel · Dimms Per Channel · Bandwidth And Cas Timings