Fact-checked by Grok 2 weeks ago

Tianhe-2

Tianhe-2, also known as Milky Way-2, is a hybrid developed by China's and installed at the National Supercomputer Center in . It achieved a peak performance of 54.9 petaflops and a sustained performance of 33.86 petaflops on the Linpack benchmark, topping the list of the world's fastest supercomputers from June 2013 until June 2016. Comprising 16,000 compute nodes, each equipped with two Ivy Bridge processors and three coprocessor cards interconnected via a proprietary TH Express-2 network, the system relies heavily on U.S.-sourced components despite being engineered in . The supercomputer's architecture marked a significant advancement in China's capabilities, enabling applications in scientific simulations, , and potentially military modeling given its development by a defense-affiliated . Its prolonged dominance on the rankings underscored China's rapid progress in supercomputing, surpassing U.S. systems like during that period, though the use of hardware highlighted ongoing dependence on foreign technology amid U.S. export controls imposed in targeting high-end computing exports to . U.S. government assessments have raised concerns about its potential role in weapons-related simulations, reflecting broader geopolitical tensions over dual-use technologies.

Development and History

Origins and Design Phase

The Tianhe-2 supercomputer originated from the (NUDT) in , , an institution with a history of supercomputing development spanning decades, beginning with the Galaxy-I system in 1983 and progressing through milestones such as China's first GFlops, TFlops, and PFlops machines. The project built directly on the success of prior NUDT systems like Tianhe-1A, which achieved fourth place on the list in 2010, motivating further scaling to address demands in simulation, analysis, national defense, meteorology, and research applications. Sponsored under China's 863 High Technology Program, along with funding from province and city, the initiative sought to establish an open platform for research, education, and services tailored to southern China's needs, with an original completion target of 2015 but accelerated to operational status by June 2013. NUDT led the effort in collaboration with for manufacturing, installation, and testing, emphasizing indigenous innovations in interconnects and despite reliance on commercial processors. During the design phase, architects prioritized a heterogeneous structure to balance general-purpose and , selecting 16,000 compute nodes each equipped with two Ivy Bridge Xeon processors (2.2 GHz, 12 cores each) for the CPU segment and three Xeon Phi coprocessors (1.1 GHz, 57 cores each) for accelerated workloads, yielding over 3 million cores and a theoretical peak of 54.9 PFlop/s. A frontend of 4,096 NUDT-designed FT-1500 CPUs (16 cores at 1.8 GHz) handled management tasks, while the TH Express-2 interconnect—a proprietary fat-tree topology with hybrid —delivered 2.56 Tbps aggregate throughput to minimize latency in large-scale parallelism. These choices reflected causal trade-offs: leveraging proven for reliability and density, augmented by NUDT's elements to enhance and under power constraints, with total reaching 1.4 and cooling via closed-loop chilled water systems rated at 80 kW per . The software stack incorporated , SLURM for job scheduling, and compilers, prioritizing compatibility with existing HPC codes while supporting OpenMC models for hybrid execution.

Construction and Initial Deployment

Tianhe-2 was constructed by China's (NUDT) primarily in , involving the assembly of over 3 million processor cores into a system with a theoretical peak performance of 54.9 petaflops. The hardware, including 16,000 nodes manufactured by using Ivy Bridge processors and coprocessors, was integrated under NUDT's design leadership. Originally projected for completion in 2015 as part of China's for advancement, construction advanced rapidly, enabling the system's operational readiness by mid-2013. Initial deployment occurred at the National Supercomputer Center in , marking a shift from the development site in . The relocation process began on September 28, 2013, with the first batch of equipment transported to , where installation and testing followed to support open scientific computing access. By June 2013, prior to full site transfer, Tianhe-2 had demonstrated sufficient performance to claim the top position on the list, achieving 33.86 petaflops on the Linpack benchmark. This early operational milestone underscored NUDT's accelerated engineering efforts, positioning the supercomputer for national research applications in fields such as weather modeling and seismic analysis.

Operational Timeline

Tianhe-2 was declared operational in June 2013 at the National Supercomputer Center in , two years ahead of its projected 2015 completion date, following initial testing at the . Upon entry to the list that month, it recorded 33.86 petaflops of Linpack performance, securing the top global ranking. This position was maintained across subsequent biannual lists through June 2015, marking five consecutive victories. By the end of 2013, the system achieved full deployment and computing capacity at the center, enabling broader scientific applications despite U.S. export restrictions on components that prompted partial reliance on domestic alternatives. Tianhe-2 continued operations post-2015, supporting domains such as weather modeling and , though it yielded the TOP500 lead to systems like Sunway TaihuLight in November 2016. In 2017, an upgrade to Tianhe-2A commenced, replacing Xeon Phi coprocessors with indigenous Matrix-2000 accelerators to circumvent sanctions and boost peak performance to 94.3 petaflops; the process, about 25% complete by , reached full functionality by November. The enhanced configuration sustained utility into the early 2020s for computational tasks, including AI-driven .

Architecture and Specifications

Hardware Components

The Tianhe-2 supercomputer is structured around 16,000 compute nodes, each integrating two E5-2692 v2 processors based on the Ivy Bridge architecture, operating at 2.2 GHz with 12 cores per processor. This configuration yields 32,000 CPU sockets and 384,000 CPU cores across the system. Each compute node also incorporates three Intel Xeon Phi 2570 s, utilizing the Knights Corner many-integrated-core architecture with 61 cores per coprocessor clocked at 1.1 GHz, resulting in 48,000 coprocessor cards and an additional 2,928,000 cores. The coprocessors provide vector processing capabilities, enhancing floating-point performance for workloads. System totals 1,375 tebibytes () of DDR3 for the CPUs and GDDR5 for the coprocessors, distributed across the with approximately 64 GiB per for CPU and 24 GiB for the three coprocessors. Interconnects employ the proprietary TH Express-2 network, a fat-tree delivering up to 90 gigabits per second bidirectional per via custom routers and network interfaces designed by the (NUDT). Storage hardware includes local disks on compute nodes for temporary data, supplemented by PCI-e solid-state drives and parallel disk arrays configured in a Lustre file system, with thousands of disks providing petabyte-scale capacity for input/output operations. The overall architecture, assembled by Inspur, emphasizes hybrid CPU-accelerator parallelism to achieve peak performance exceeding 54 petaflops.

Software Stack and Interconnects

The Tianhe-2 supercomputer employed Kylin Linux, a domestically developed operating system variant created by the (NUDT), as its base OS across compute nodes. This environment supported standard operations while incorporating custom optimizations for workloads. Resource management was handled by the , enabling efficient job scheduling and allocation across the system's 16,000 compute nodes. Application development relied on compilers for , , , and , integrated with support for parallel programming models including for shared-memory tasks and a customized MPI 3.0 derived from MPICH version 3.0.4, enhanced by NUDT's Galaxy Express (GLEX) channel library for low-latency communication. The runtime environment featured fault-resilient extensions like Non-stop Resilient MPI (NR-MPI), which allowed applications to continue execution post-failure through runtime detection and state recovery without relaunching. These components formed a layered software prioritizing and reliability for large-scale simulations, though the proprietary nature of GLEX limited portability compared to open standards like verbs. Interconnects utilized the proprietary TH Express-2 , a custom designed by NUDT to minimize and maximize among CPU blades, nodes, and subsystems. This opto-electronic hybrid architecture supported non-blocking all-to-all communication at speeds up to 90 Gbps per port, with router and interface chips optimized for in MPI operations. The organized nodes into hierarchical domains—typically nodes per connected via internal switches—scaling to the full system without bottlenecks, though it relied on domestic to circumvent export restrictions on foreign technologies like those from Mellanox. Performance metrics indicated sub-microsecond for small messages, enabling efficient scaling for applications like weather modeling and .

Power and Efficiency Metrics

The Tianhe-2 consumed 17,808 kilowatts of power during Linpack benchmark runs that achieved its record 33.86 petaflops of sustained performance, as reported in the November 2013 list. This power draw, equivalent to approximately 17.8 megawatts, supported the system's dense configuration of over 16,000 compute nodes but highlighted the challenges of scaling amid energy constraints. Energy efficiency for Tianhe-2 stood at 1.90 gigaflops per watt on the June 2013 list, ranking it 32nd among the world's most efficient supercomputers at the time and reflecting the balance struck by its hybrid CPU-GPU architecture using Ivy Bridge processors and coprocessors. Subsequent measurements in later rankings maintained similar efficiency levels, with no significant improvements reported until hardware upgrades. The system's per-node power profile contributed to this metric, as each coprocessor delivered up to 144 gigaflops at around 65 watts under peak load. An upgrade to Tianhe-2A, deployed by 2017, increased measured power consumption to 18,482 kilowatts while boosting performance, yielding an efficiency exceeding 5 gigaflops per watt in projected configurations with proprietary accelerators replacing some foreign components. These metrics underscore Tianhe-2's role in advancing exascale pursuits, though its overall power demands necessitated substantial cooling infrastructure at the National Supercomputer Center in , estimated to add several megawatts beyond core compute usage.

Performance and Benchmarks

TOP500 Achievements

Tianhe-2 first claimed the top position on the list in June 2013, recording an Rmax performance of 33.86 petaflops per second (PFlop/s) on the High-Performance LINPACK benchmark, more than doubling the 17.59 PFlop/s of the prior leader, . This debut marked the second instance of a Chinese system topping the biannual ranking, following Tianhe-1A's brief hold in November 2010. The system retained the number-one ranking across six consecutive TOP500 lists, spanning from June 2013 to November 2015, a period during which no other displaced it despite rapid global advancements in . Its sustained dominance reflected the scale of its deployment, comprising over 16,000 compute nodes powered by Ivy Bridge processors and Matrix-2000 accelerators, though later scrutiny highlighted reliance on foreign components for peak performance claims. Tianhe-2's record ended in June 2016 when China's assumed the top spot with 93.01 PFlop/s, relegating Tianhe-2 to second place. Subsequent upgrades, rebranded as Tianhe-2A, extended the system's competitiveness, but the original configuration's achievements centered on that initial multi-year reign at 33.86 PFlop/s Rmax. By mid-2025, upgraded variants had fallen to lower ranks, such as 31st in June 2025, amid broader shifts toward domestic architectures in supercomputing.

Real-World Computational Output

Tianhe-2 facilitated high-throughput processing of large-scale genomic datasets through the interface, which integrated Hadoop and frameworks for biomedical applications. In a collaboration with the Beijing Genomics Institute, processed a 300 GB BAM-format genomic dataset using tools like SOAPGaea for filtering, alignment, duplication removal, and , completing the in 1 hour 56 minutes on 250 nodes—compared to 3 hours 59 minutes on BGI's baseline system—demonstrating scalable acceleration for variant calling and downstream analyses. Similarly, GaeaDuplicate for read deduplication on the same dataset achieved 1.1 hours on 250 nodes versus 2 hours on BGI , highlighting Tianhe-2's efficiency in memory-intensive tasks costing approximately 2.4 RMB per node-hour. In high-energy physics, Tianhe-2 supported the BESIII experiment at the by running the offline software for simulations, event reconstruction, calibration, and data analysis. The system scaled to 15,000 parallel processes via an MPI-Python interface, yielding an 80% efficiency gain in computation time while maintaining data consistency, as validated by tests on distributions matching outputs from the Institute of High Energy Physics cluster. Outputs were optimized by in-memory buffering before file writes to mitigate I/O bottlenecks on the 12.4 PB . For atmospheric modeling, Tianhe-2 accelerated the Weather Research and Forecasting (WRF) model in hybrid CPU-MIC configurations, scaling mesoscale simulations to 6,144 nodes with near-ideal weak scaling efficiency and achieving over 8% of peak performance for regional weather predictions. It also enabled parallel implementations of the GRAPES model for national weather forecasting, processing large-scale ensemble predictions. In geophysics, seismic imaging algorithms for oil exploration utilized Tianhe-2's compute cores to invert subsurface structures from reflection data, supporting enhanced reservoir modeling. Tianhe-2's capacity extended to virtual drug screening, evaluating 40 million known compounds against viral targets in days—a task equivalent to 40 years on a single CPU core—advancing computational for pandemic response and . These outputs underscored its role in production-scale simulations beyond synthetic benchmarks, though access was prioritized for state-approved projects at the National Supercomputer Center in .

Comparative Analysis with Contemporaries

Tianhe-2 achieved the top position on the list in June with an Rmax of 33.86 petaflops on the HPL , nearly doubling the performance of the second-ranked at 17.59 petaflops and the third-ranked at 17.17 petaflops. This lead persisted through November 2015, during which no other system surpassed its Linpack score, though competitors like (fourth-ranked in at approximately 10 petaflops) narrowed gaps in specific domains. In , Tianhe-2 employed a hybrid design with Ivy Bridge E5-2692 v2 CPUs and custom Matrix-2000 accelerators across 16,000 compute nodes, connected via a TH Express-2 fat-tree interconnect, enabling high theoretical peak performance of 54.9 petaflops but at the cost of elevated power draw. In contrast, utilized a XK7 platform with CPUs and Kepler K20x GPUs in 18,688 nodes, optimizing for GPU acceleration in scientific workloads, while relied on BlueGene/Q's homogeneous PowerPC A2 cores without discrete accelerators, emphasizing simplicity and interconnect efficiency in its 1.57 million-core setup. , based on BlueGene/Q like , scaled to 10,160 nodes with similar core counts but focused on reliability for large-scale simulations. Power efficiency highlighted disparities: Tianhe-2 consumed approximately 17.8 megawatts, yielding an efficiency of about 1.9 gigaflops per watt, lower than 's 6.8 megawatts and 2.6 gigaflops per watt or 's 7.9 megawatts and 2.2 gigaflops per watt.
SystemRmax (PFlop/s)Power (MW)Efficiency (GFLOP/s/W)Primary Architecture
Tianhe-233.8617.8~1.90Intel Xeon + Matrix-2000
17.596.8~2.58 + Kepler
17.177.9~2.17 BlueGene/ PowerPC
Mira~10~3.9~2.56 BlueGene/ PowerPC
These metrics underscore Tianhe-2's raw computational dominance on synthetic benchmarks but reveal trade-offs in energy use and architectural homogeneity compared to U.S. systems, which prioritized balanced scaling for sustained production runs.

Applications and Scientific Impact

Key Research Domains

Tianhe-2, hosted at the National Supercomputer Center in (NSCC-GZ), facilitated research across six core application domains: and , and engineering, life sciences and , digital design and manufacturing, and related technologies, and development. These domains aligned with China's national priorities under programs like the Twelfth Five-Year Plan's , which allocated funding to eight strategic supercomputing applications including and simulation, aircraft design, , new drug research and development with modeling, rendering, of large-scale engineering projects, complex simulations, and new materials design and performance analysis. In life sciences and , Tianhe-2 enabled high-throughput of 40 million molecules against targets like the , processing 800 TB of data in 20 hours with 84% efficiency using three coprocessors per node, and supported computer-aided simulations reaching 200-300 nanoseconds of , with goals for 1,000 nanoseconds per day. research leveraged the system for handling up to 2 PB of data, scaling toward 100 PB for advanced sequencing and analysis. Digital design and manufacturing applications included (CFD) for and vehicle , such as hybrid Reynolds-averaged Navier-Stokes/ (RANS/) of combustion on 998,400 cores simulating 26,880 million cells at 79% parallel efficiency, and large-scale with 3.5 billion mesh points, projected to expand to 100 billion within five years. These efforts extended to and of major projects under national initiatives. For energy and related technologies, seismic exploration processed 2.2 TB datasets from millions of shots, supporting resource discovery and simulations targeting 100 billion electrons in future runs. Earth science and environmental engineering benefited from these seismic capabilities alongside and modeling, though specific Tianhe-2 outputs emphasized geophysical prospecting. Materials science and engineering involved simulations for new materials design, performance prediction, and complex electromagnetic fields, integrating with broader engineering domains like rendering for visualization. initiatives applied Tianhe-2 to urban modeling over 200 km² areas with 1 km grids, refining to 100 m resolution for , healthcare, and .

Notable Computational Projects

Tianhe-2 supported the BESIII experiment in high-energy physics by hosting the deployment of the offline reconstruction software via containers, enabling efficient processing of large datasets for particle collision , measurements of physical constants, and searches for beyond-Standard-Model phenomena in electron-positron collisions at the Beijing Spectrometer III. This integration leveraged the supercomputer's 3.12 million cores to handle the computational demands of reconstructing events from millions of collisions, reducing processing times from weeks on smaller clusters to days. In , Tianhe-2 performed virtual screenings of approximately 40 million known drug-like molecules against protein targets from unknown viruses, completing the task in hours rather than months on conventional systems, which highlighted its utility for rapid response in infectious disease modeling and identification. The system also contributed to benchmarks through classical simulations of —a problem posited as challenging for classical computers—using its sustained performance to model up to 50-60 modes, thereby quantifying resource thresholds for demonstrating quantum advantage over deterministic classical algorithms. Other applications included simulations for aircraft , optimizing designs through high-fidelity , and regional models that integrated atmospheric data for short-term prediction accuracy improvements over legacy methods. These projects underscored Tianhe-2's role in domains requiring petaflop-scale parallelism, though actual utilization was constrained by restricted access primarily to approved entities.

Utilization Statistics

Tianhe-2 demonstrated limited utilization during its initial years of operation, operating at approximately 34% capacity as of mid-2014 while supporting around 120 clients in fields such as railway design, earthquake simulation, , and . This subdued usage stemmed primarily from inadequate software ecosystems, compelling users to develop custom applications—a task often spanning years—and substantial daily electricity costs of 400,000 to 600,000 , which clients were required to fund directly. Broader assessments of Chinese supercomputing infrastructure, encompassing Tianhe-2, reported an average utilization rate of roughly 40% that year, contrasting sharply with the 90% rates common among U.S. systems, per analysis from the ; these discrepancies highlighted systemic challenges in software maturity and application readiness over hardware prowess. Subsequent upgrades to Tianhe-2A and evolving project demands may have influenced later metrics, though specific post-2014 utilization data for the original configuration remains sparse in public records.

Controversies and Criticisms

Dependence on Foreign Technology

The Tianhe-2 incorporated Ivy Bridge central processing units and coprocessors as its core computing components, sourced from the . The system featured 16,000 compute nodes, with each node containing two such processors and three units, enabling a total of approximately 3.12 million cores. This configuration represented the largest assembly of processors deployed in a single up to that point. The reliance on American semiconductor technology exposed Tianhe-2 to external risks and geopolitical constraints. In April 2015, the U.S. Department of Commerce imposed export restrictions prohibiting from shipping high-performance processors and related components to Chinese supercomputer developers, including the responsible for Tianhe-2, due to concerns over potential military end-use and threats. These controls specifically targeted upgrades that would have incorporated newer chips, such as Knights Landing variants of the . Subsequent modifications to the system, rebranded as Tianhe-2A around 2017, partially mitigated this dependence by substituting U.S.-sourced accelerators with domestically produced Matrix-2000 digital signal processors developed by the . However, the original architecture's heavy integration of foreign processors underscored broader challenges in China's sector, where indigenous chip design lagged behind leading-edge foreign alternatives in performance and efficiency during the early 2010s.

Usability and Accessibility Challenges

Tianhe-2's was hampered by insufficient investment in and optimization, which restricted its effective deployment for diverse applications despite its prowess. Researchers highlighted that while the system excelled in tests like HPL, the underdeveloped made it challenging to adapt codes and workflows efficiently, leading to suboptimal performance in real-world scenarios. The supercomputer's scale also presented practical hurdles, as its vast computational capacity often exceeded the requirements of standard scientific simulations, rendering it for many users and complicating for smaller-scale tasks. This mismatch contributed to potential underutilization, with experts noting in that the system's power might limit its routine applicability beyond specialized, high-intensity computations. Accessibility challenges arose from stringent user policies at the National Supercomputer Center in , which prioritized approved domestic researchers and imposed queuing priorities and enforcement mechanisms that could delay or restrict project submissions. International access remained limited, though selective invitations were extended, such as a call for scientists to collaborate on specific projects, underscoring the need for formal approvals and potential barriers in system documentation and support, predominantly available in .

Skepticism on Benchmark Results

Tianhe-2's reported performance of 33.86 petaflops on the High Performance LINPACK (HPL) benchmark, which secured its top ranking on the list from June 2013 through much of the decade, has faced scrutiny not for fabrication but for the benchmark's limitations in representing broader computational capabilities. The HPL test, which solves systems of linear equations, measures sustained floating-point operations but overlooks critical real-world factors such as data transfer latencies, irregular memory access patterns, and sustained performance under diverse workloads, potentially leading to systems optimized excessively for this narrow metric at the expense of versatility. researchers, including Dr. Alessandro Curioni, argued that TOP500's reliance on HPL incentivizes such tuning, advocating for metrics better aligned with practical scientific simulations rather than isolated theoretical peaks. Experts like Prof. Alan Woodward highlighted that bottlenecks in inter-node communication could undermine Tianhe-2's effective throughput in production environments, despite its impressive HPL score, as the does not penalize inefficiencies in data movement that dominate actual (HPC) tasks. Erich Strohmaier, a TOP500 co-founder, acknowledged HPL's oversimplification of complex system behaviors, noting challenges in adopting more representative tests due to implementation difficulties and lack of . While the results were independently verified by TOP500 organizers, including a site visit by , critics contend that high HPL efficiencies—achieved through specialized libraries and hardware tuning—may mask underperformance on emerging benchmarks like HPCG, which emphasize operations more akin to real applications. Furthermore, analyses have questioned whether Tianhe-2's architecture, reliant on Xeon processors and Matrix-2000 accelerators, translates HPL dominance into proportional gains for scientific codes, with software ecosystem deficiencies in exacerbating utilization gaps; U.S. HPC centers estimated that equivalent could be procured and scaled similarly with alone, diminishing claims of architectural superiority. This skepticism underscores broader concerns that rankings, while verifiable, encourage "gaming" via benchmark-specific optimizations, as evidenced by historical precedents with earlier Chinese systems like , potentially inflating perceived leadership in HPC without commensurate advances in applied output.

Legacy and Current Status

Influence on Chinese Supercomputing

Tianhe-2's sustained position as the world's fastest supercomputer from June 2013 to June 2016, with a peak performance of 54.9 petaflops, underscored China's engineering expertise in scaling large heterogeneous systems, integrating 16,000 nodes powered by processors and domestic Matrix-2000 accelerators via a proprietary high-speed interconnect. This milestone, achieved by a team of over 1,300 researchers at the , boosted national confidence and catalyzed increased state funding for (HPC) infrastructure, contributing to China's rapid expansion in the list from fewer than 50 systems in 2013 to over 100 by mid-decade. The system's prominence, however, exposed dependencies on foreign components amid U.S. export controls imposed in April 2015, which restricted sales of high-end chips for supercomputing applications, thereby intensifying Beijing's push for technological . This spurred parallel advancements in indigenous processors, culminating in the Sunway TaihuLight's deployment in 2016—utilizing entirely domestic SW26010 chips to achieve 93 petaflops without U.S. technology—and influencing subsequent architectures like the Phytium-based systems in later Tianhe iterations. Tianhe-2's legacy extended to policy frameworks, such as the 2014 national strategy emphasizing core HPC , which prioritized R&D in matrix processors and interconnects, fostering a of specialized centers and talent pipelines at institutions like NUDT. While critiques noted limited software progress relative to hardware gains, its operational deployment enabled early applications in weather modeling and , laying groundwork for China's exascale ambitions and the proliferation of domestically oriented HPC clusters by the late .

Technological Limitations Exposed

The Tianhe-2 , despite its peak performance of 54.9 petaflops, achieved only 33.9 petaflops on the High-Performance Linpack (HPL) benchmark, reflecting an of approximately 62%, which underscored architectural constraints in fully utilizing its compute resources under standardized testing conditions. This gap highlighted limitations in the hybrid CPU-coprocessor design, reliant on Ivy Bridge processors and accelerators across 16,000 nodes, where synchronization overheads and bottlenecks reduced effective throughput. Power efficiency emerged as a critical shortfall, with the system drawing 17.8 megawatts to deliver its HPL performance, yielding a performance-to-power ratio of roughly 1.9 gigaflops per watt—far below targets for sustainable scaling toward . Such consumption exposed vulnerabilities in the air-cooled, x86-based , which prioritized raw compute over optimization, exacerbating cooling demands and operational costs in large-scale deployments. Projections based on this ratio indicated that an exaflop-scale system akin to Tianhe-2 would require over 500 megawatts, revealing the impracticality of extrapolating its design without fundamental shifts in processor efficiency and interconnect topology. Post-deployment analyses further revealed weaknesses in real-world workload versatility, as evidenced by subdued performance on the High-Performance Conjugate Gradient (HPCG) benchmark, which stresses memory access and irregular computations more representative of scientific applications. Relative to its scale and power draw, Tianhe-2 underperformed in HPCG, indicating over-optimization for HPL's dense matrix operations at the expense of sparse, bandwidth-limited tasks common in simulations. Additionally, stability issues plagued the coprocessors, with reports of frequent failures necessitating custom fault-tolerance measures in the runtime environment to maintain uptime. The system's expansive node count amplified fault rates and I/O bottlenecks, as parallel communication latencies and storage hierarchies struggled under sustained loads, limiting scalability for applications beyond benchmark tuning. These exposed deficiencies in the interconnect fabric and hybrid storage layers—comprising local disks, PCIe SSDs, and arrays—prompted subsequent Chinese efforts to prioritize domestic architectures with improved resilience and efficiency, as seen in transitions to matrix processors in later prototypes.

Decommissioning and Successors

Tianhe-2 underwent significant upgrades starting in 2017, when the (NUDT) integrated the Matrix-2000 processor and expanded the system to achieve a peak performance of approximately 95 petaflops, rebranded as Tianhe-2A. This enhanced configuration maintained operational status into the mid-2020s, ranking 24th on the list as of November 2024 with sustained Linpack performance reflecting its aging architecture relative to newer systems. No official decommissioning date has been announced, though its displacement from top rankings and the deployment of advanced successors indicate a shift toward legacy utilization for less demanding tasks. The primary successor to Tianhe-2 is Tianhe-3, also referred to as Tianhe Xingyi, developed by NUDT and deployed at the National Supercomputer Center in Guangzhou on December 6, 2023. This exascale system employs domestically produced hybrid processors, such as the MT-3000 CPU combining ARM cores with accelerator units, achieving an estimated 1.7 exaflops peak and 1.3 exaflops sustained performance on Linpack benchmarks—orders of magnitude beyond Tianhe-2A's capabilities. Tianhe-3's architecture emphasizes indigenous technology to circumvent U.S. export restrictions on foreign components that previously affected Tianhe-2 upgrades, supporting applications in AI training, high-performance computing, and scientific simulations. Unlike its predecessor, Tianhe-3 has not appeared on public TOP500 rankings, consistent with China's reduced submissions amid geopolitical tensions, though independent analyses confirm its operational superiority.

References

  1. [1]
    Tianhe-2 (MilkyWay-2) : National University of Defense Technology
    With 16,000 computer nodes, each comprising two Intel Ivy Bridge Xeon processors and three Xeon Phi chips, it represents the world's largest installation of Ivy ...Missing: specifications | Show results with:specifications
  2. [2]
    Chinas Tianhe-2 Supercomputer Takes No. 1 Ranking on ... - TOP500
    Tianhe-2, a supercomputer developed by China's National University of Defense Technology, is the world's new No. 1 system with a performance of 33.86 petaflop/ ...Missing: specifications | Show results with:specifications
  3. [3]
    China's Tianhe-2 Retains Top Spot on Fourth Consecutive TOP500 ...
    Nov 17, 2014 · China's Tianhe-2 Retains Top Spot on Fourth Consecutive TOP500 List · Total combined performance of all 500 systems has grown to 309 Pflop/s, ...Missing: specifications | Show results with:specifications
  4. [4]
    An Open Invitation to Work on the Tianhe-2 Supercomputer
    An official US Government statement alleges that “The Tianhe–1A and Tianhe–2 supercomputers are believed to be used in nuclear explosive activities.” These ...Missing: controversies | Show results with:controversies
  5. [5]
    [PDF] June 3, 2013 1 Visit to the National University for Defense ...
    Jun 3, 2013 · NUDT has long history in research and development on supercomputers, developed the first GFlops, TFlops, and Pflops supercomputer in China.Missing: phase | Show results with:phase
  6. [6]
    Overview of Tianhe-2 - BYJU'S
    The project was originally scheduled for completion in 2015, but was instead declared operational in June 2013. As of June 2013, the supercomputer had yet to ...
  7. [7]
    Full Details Uncovered on Chinese Top Supercomputer - HPCwire
    Jun 2, 2013 · The 16,000-node Inspur-built Tianhe-2 is based on Ivy Bridge (32,000 sockets) and 48,000 Xeon Phi boards, meaning a total of 3,120,000 cores.Missing: construction initial
  8. [8]
    China's Tianhe-2 supercomputer, twice as fast as DoE's Titan ...
    Jun 24, 2013 · The construction of Tianhe-2 (literally Milky Way-2) comes as a huge surprise, as it was originally scheduled for deployment in 2015. No one ...
  9. [9]
    Tianhe-2 located in SYSU remains the world's fastest supercomputer
    Nov 20, 2013 · The first batch of equipment of Tianhe-2 started to be transported from Changsha to Guangzhou on September 28 and the relocation process was ...Missing: construction initial deployment
  10. [10]
    China's Tianhe-2 is the new world champ of supercomputing
    This is the second Chinese system to be ranked as the world's most powerful supercomputer. China broke into the top echelon of HPC in November 2010, when the ...
  11. [11]
    China Scores Fifth TOP500 Win with Tianhe-2 - HPCwire
    Jul 14, 2015 · When China grabbed the TOP500 crown for its Tianhe-2 supercomputer in June 2013 with double the peak FLOPS of the next fastest machine ...
  12. [12]
    Tianhe-2 becomes the world's fastest supercomputer and will enter ...
    Tianhe-2 supercomputer system developed by National University of Defense Technology heads the list again after Tianhe-1 became the first Chinese supercomputer ...
  13. [13]
    Supercomputer, AI to speed up drug discoveries - Chinadaily.com.cn
    Jun 23, 2022 · Tianhe-2 has been used as a platform for drug discovery, and currently, AI-based algorithms making the machine even smarter.Missing: decommissioned | Show results with:decommissioned
  14. [14]
    China's Tianhe-2A will Use Proprietary Accelerator and Boast 95 ...
    Sep 25, 2017 · The upgrade is about 25 percent complete and expected to be fully functional by November 2017 according to a report by Jack Dongarra who ...
  15. [15]
    [PDF] Report on the TianHe-2A system - Innovative Computing Laboratory
    Sep 24, 2017 · The TianHe-2A (TH-2A) compute system, designed by China's National University of Defense. Technology (NUDT), is an upgrade of the TianHe-2 (TH-2) ...Missing: origins | Show results with:origins
  16. [16]
    Tianhe-2 Supercomputer Being Upgraded to 95 Petaflops - TOP500
    Sep 19, 2017 · Sept. 19, 2017 ... The number two-ranked Tianhe-2 supercomputer, installed at the National Super Computer Center in Guangzhou, is being upgraded ...
  17. [17]
    China Arms Upgraded Tianhe-2A Hybrid Supercomputer
    Sep 20, 2017 · The goal with the Tianhe-2A upgrade was to get a machine with more than 100 petaflops of performance, and to get to that scale of machine with a lot less ...Missing: construction | Show results with:construction
  18. [18]
    China's Milky Way-2 Is World's Top Supercomputer
    Developed by the Chinese National University of Defense Technology (NUDT) the new Tianhe-2 will provide an open platform for research and education, and is set ...Missing: goals | Show results with:goals
  19. [19]
    High Performance Interconnect Network for Tianhe System - JCST
    In this paper, we present the Tianhe-2 interconnect network and message passing services. We describe the architecture of the router and network interface ...Missing: processors specs
  20. [20]
    [PDF] Overview of Tianhe2 System and Applica9on
    ❒ Design and APP of Tianhe2 System. ❒ Prospect of Tianhe-‐2A ... ❒ IO Architecture on Tianhe-‐2. ➢ MulAple Layers & Hybrid Storages. ◇ Local ...Missing: project goals
  21. [21]
    Kylin Ubuntu in Tianhe 2
    Aug 24, 2016 · The Tianhe-2 is using Kylin Linux as the operating system. Kylin is an operating system developed by the National University for Defense Technology.
  22. [22]
  23. [23]
    Tianhe-2 supercomputer at 31 petaflops is title contender - Phys.org
    Jun 10, 2013 · He also listed "Fortran, C, C++, and Java compilers, OpenMP, and MPI 3.0 based on MPICH version 3.0.4 with custom GLEX (Galaxy Express) Channel ...
  24. [24]
    [PDF] MilkyWay-2 supercomputer: system and application
    The hardware system is composed of five subsystems, cov- ering compute, communication, storage, monitoring and di- agnostic, and service. MilkyWay-2 has four ...
  25. [25]
    Tianhe-2, the most powerful computer of the world « HPC @UPV/EHU
    Sep 2, 2013 · The interconnection network between the nodes, namely TH Express-2, is a proprietary design firstly installed in Tianhe-2 and tries to avoid ...
  26. [26]
    High Performance Interconnect Network for Tianhe System - JCST
    Mar 4, 2015 · In this paper, we present the Tianhe-2 interconnect network and message passing services. We describe the architecture of the router and ...
  27. [27]
    High Performance Interconnect Network for Tianhe System
    Mar 13, 2015 · In this paper, we present the Tianhe-2 interconnect network and message passing services. We describe the architecture of the router and ...Missing: topology | Show results with:topology<|separator|>
  28. [28]
    TH Express-2 reaches new heights for supercomputer interconnects
    The current Tianhe-2 system uses 16 000 compute nodes to achieve a peak performance of 54.9 petaflops, and the future system will be larger when the performance ...Missing: topology | Show results with:topology
  29. [29]
    China retains supercomputer crown in latest Top500 list - BBC News
    Nov 18, 2013 · According to the Top500 list, Tianhe-2 requires 17,808 of kW power - more than double the 8,209 kW needed by Titan or the 7,890 kW needed by ...
  30. [30]
    The Top 500 Supercomputer List is In: China's Reign Continues
    Jul 13, 2015 · The power consumption of Tianhe-2 while running Linpack was 17.8 MW. ... Titan held the top spot on the Top 500 in November 2012 before the Tianhe ...
  31. [31]
    [PDF] The Green500 announces the most energy-efficient supercomputers ...
    The fastest supercomputer of June 2013— Tianhe-2—performed at 1.9 gigaflops per watt, placing it in the number 32 spot on the Green500 list.Missing: FLOPS | Show results with:FLOPS
  32. [32]
    Tianhe-2A - TH-IVB-FEP Cluster, Intel Xeon E5-2692v2 12C 2.2GHz ...
    Tianhe-2A - TH-IVB-FEP Cluster, Intel Xeon E5-2692v2 12C 2.2GHz, TH Express-2, Matrix-2000 ; 100.68 PFlop/s · 9,773,000 · 18,482.00 kW.Missing: specifications | Show results with:specifications
  33. [33]
    Tianhe-2 Continues Reign As World's Best Supercomputer
    Nov 17, 2014 · Tianhe-2 (also known as Milky Way-2) consists of 16,000 nodes that each contain two Intel Xeon IvyBridge processors and three Xeon Phi ...
  34. [34]
    TOP500 List Refreshed, US Edged Out of Third Place
    Jun 18, 2017 · Tianhe-2 was the number one system in the TOP500 list for three consecutive years, until TaihuLight eclipsed it in June 2016. The new number ...Missing: achievements | Show results with:achievements
  35. [35]
    China's Tianhe-2 Maintains Top Spot on TOP500 List - HPCwire
    Jul 13, 2015 · Total combined performance of all 500 systems has grown to 363 Pflop/s, compared to 309 Pflop/s last November and 274 Pflop/s one year ago. This ...Missing: achievements | Show results with:achievements
  36. [36]
    An Interface for Biomedical Big Data Processing on the Tianhe-2 ...
    In this paper, we propose Orion—a big data interface on the Tianhe-2 supercomputer—to enable big data applications to run on Tianhe-2 via a single command or a ...
  37. [37]
    [PDF] Applications of supercomputer Tianhe-II in BESIII - CERN Indico
    Introduction. High energy physics experiments are pushing forward the precision measurements and searching for new physics beyond standard model.Missing: real- | Show results with:real-
  38. [38]
    Ultra-Scalable CPU-MIC Acceleration of Mesoscale Atmospheric ...
    The proposed hybrid algorithm successfully scales to 6,144 Tianhe-2 nodes with a nearly ideal weak scaling efficiency, and achieve over 8 percent of the peak ...
  39. [39]
    [PDF] Recent Advances of Chinese Efforts in HPC
    • Parallel algorithms for weather forecasting based on. GRAPES. • 10000+ core scale simulation for aircraft design. • Seismic imaging for oil exploration.
  40. [40]
    Will Supercomputers Be Super-Data and Super-AI Machines?
    Nov 1, 2018 · Such computational capability demonstrates how Tianhe-2 is able to screen all known 40 million drug molecules against an unknown virus in a ...
  41. [41]
  42. [42]
    June 2013 - TOP500
    Jun 17, 2013 · Tianhe-2, a supercomputer developed by China's National University of Defense Technology, is the world's new No. 1 system with a performance of 33.86 petaflop/ ...
  43. [43]
    Chinese Supercomputer Retains Title Of World's Fastest - NPR
    Nov 18, 2013 · Tianhe-2 (China, National Super Computer Center in Guangzhou): 33.86 petaflops · Titan (U.S., Cray): 17.59 petaflops · Sequoia (U.S., IBM): 17.17 ...<|separator|>
  44. [44]
    Novermber 2015 - TOP500
    Nov 17, 2015 · Tianhe-2, which means Milky Way-2, led the list with a performance of 33.86 petaflop/s (quadrillions of calculations per second or Pflop/s) on ...
  45. [45]
    China's Tianhe-2 Remains Atop Fastest Supercomputer List - eWeek
    It's powered by Intel's 12-core Xeon E5-2692 processors and Xeon Phi coprocessors, and offers performance of 33.86 petaflops (quadrillions of calculations per ...
  46. [46]
    [PDF] The Tianhe-2 Supercomputer: Less than Meets the Eye?
    Jul 18, 2013 · Titan, the world's former number one supercomputer, has a peak performance of 17.59 petaflops; the TH-2 has a peak performance of 33.86 PFlops.1 ...Missing: history | Show results with:history
  47. [47]
    Scaling applications from six application domains on Tianhe-2
    NSCC-GZ serves six application domains: Earth science and environmental engineering, materials science and engineering, life sciences and personalised medicine, ...
  48. [48]
    Application of a supercomputer Tianhe-II in an electron-positron ...
    Mar 9, 2023 · Precision measurements and new physics searches require massive computation in high energy physics experiments. Supercomputer remains one of ...
  49. [49]
    Tianhe-2 Supercomputer Works Out the Criterion for Quantum ...
    Oct 30, 2018 · (a) The Tianhe-2 supercomputer used for permanent calculation in simulating the boson sampling performance. (b) A small photonic chip could ...
  50. [50]
    Tianhe-2 Remains the Biggest of Computing's Big Iron
    The TOP500 list of supercomputers is still topped by China's Tianhe-2. ... That's 33.86 x 1015 floating point calculations per second, making it the almost ...Missing: specifications | Show results with:specifications
  51. [51]
    China's world-beating supercomputer fails to impress some potential ...
    Jun 30, 2014 · The Tianhe-2 has again topped the global rankings, but a lack of investment on software limits its use, researchers say ... The mainland's billion ...Missing: NSCC rates
  52. [52]
  53. [53]
    China's Tianhe-2 retakes fastest supercomputer crown - BBC News
    Jun 17, 2013 · Tianhe-2, developed by the government-run National University of Defence Technology, topped the latest list, external of the fastest 500 supercomputers.Missing: development NUDT initiation
  54. [54]
    Uncle Sam Shocks Intel With a Ban on Xeon Supercomputers in China
    Unlike some other China supercomputers – Tianhe-2 is fully Intel based machine, the world's largest assembly of Intel Xeon CPUs and Xeon Phi accelerators.Missing: reliance | Show results with:reliance
  55. [55]
    US govt bans Intel from selling chips to China's supercomputer boffins
    Apr 10, 2015 · The US government has blocked Intel from shipping high-end Xeon processors to China's supercomputer builders – and other American chip giants are banned, too.
  56. [56]
    US nuclear fears block Intel China supercomputer update - BBC News
    Apr 10, 2015 · The Tianhe-2 uses 80,000 Intel Xeon chips to generate a computational capacity of more than 33 petaflops. A petaflop is equal to about one ...Missing: reliance | Show results with:reliance
  57. [57]
    China's Tianhe-2 remains world's most powerful computer - DCD
    Nov 19, 2013 · The Tianhe-2 supercomputer at a Chinese university earned the title of the world's fastest computer second time in a row on the November 2013 ...Missing: project goals architecture
  58. [58]
    World's fastest computer, Tianhe-2, might get very little use
    Jun 20, 2013 · The Tianhe-2 has been hailed as an example of China's tech muscle, but some experts say it may be too powerful for most tasks · Company · Services.Missing: challenges | Show results with:challenges
  59. [59]
    Tian-he - CSRC Home
    514 CPU Nodes: 2×Intel Xeon E5-2660 v3 CPU (10 cores, 2.6GHz); 192GB memory; TH highspeed network interface; 500GB local disk for intermediate datasets.Missing: hardware components
  60. [60]
    The Problem With the Top500 Supercomputer List - Slashdot
    Nov 19, 2010 · It's not so much that Tianhe-1 gamed the benchmark, it that's this gaming could lead to a machine that's not very useful. Reply; Thread; Share.
  61. [61]
    China's 'Sunway TaihuLight' named world's most powerful computer ...
    Jun 20, 2016 · China's Sunway TaihuLight supercomputer has been declared the world's most powerful computer, topping previous record holder the Tianhe-2.
  62. [62]
    China builds world's fastest supercomputer without U.S. chips
    Jun 20, 2016 · The world's next fastest system, China's Tianhe-2, which has a peak performance of 54.9 petaflops, uses Intel Xeon processors. TaihuLight, which ...Missing: self- | Show results with:self-<|separator|>
  63. [63]
    A Look Inside China's Chart-Topping New Supercomputer
    Jun 20, 2016 · ... Tianhe-2 with 33.86 peak petaflop capability. One key difference ... https://en.wikipedia.org/wiki/Tsar_Bomba. Norman says: November 20 ...<|control11|><|separator|>
  64. [64]
    China Intercepts U.S. Restrictions with Homegrown Supercomputer ...
    Jul 15, 2015 · However, Tianhe 2 is only about 55% efficient with 33 PFLOPS in HPL and considering its size and power consumption its not very good at HPCG ...
  65. [65]
    Supercomputing's Super Energy Needs, and What to Do About Them
    Sep 24, 2015 · A 1,000-petaflop machine using Tianhe-2's roughly 2-to-1 performance-to-power ratio (33.9 petaflops, 17.8 megawatts) would need to have 500 ...<|control11|><|separator|>
  66. [66]
    Beyond Tianhe-2 - IEEE Spectrum
    Jun 24, 2014 · ... power consumption, difficulties transferring information between parallel lines of computation, and having to make tradeoffs between ...Missing: efficiency | Show results with:efficiency
  67. [67]
  68. [68]
    China publishes list of its most powerful supercomputers, with no ...
    Jan 7, 2025 · The second supercomputer, the Tianhe-3, is capable of 1.7 exaflops peak/1.3 exaflops sustained, Dr. David Kahaner, director of the Asian ...
  69. [69]
    Tianhe-3 - Glenn K. Lockwood
    Jun 20, 2025 · Tianhe-3 / Tianhe Xingyi is an exascale-class supercomputer that was unveiled on December 6, 2023. It is deployed at the National Supercomputing ...
  70. [70]
    China's secretive Tianhe 3 supercomputer uses homegrown hybrid ...
    Feb 12, 2024 · The Tianhe-3 is believed to achieve unprecedented computational performance, potentially reaching 1.57 ExaFLOPS on LINPACK benchmarks.
  71. [71]
    What's Inside China's New Homegrown “Tianhe Xingyi ...
    Dec 6, 2023 · China is using a domestic processor as the backbone for double the performance of the Tianhe-2 system, which topped the Top 500 starting in ...
  72. [72]
    The Mystery Of Tianhe-3, The World's Fastest Supercomputer, Solved?
    Feb 9, 2024 · This machine had 1.2 petaflops peak and 563 teraflops of LINPACK oomph. This machine was installed at the National Supercomputer Center in ...